[jira] [Work logged] (BEAM-8613) Add environment variable support to Docker environment
[ https://issues.apache.org/jira/browse/BEAM-8613?focusedWorklogId=377314=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-377314 ] ASF GitHub Bot logged work on BEAM-8613: Author: ASF GitHub Bot Created on: 26/Jan/20 00:48 Start Date: 26/Jan/20 00:48 Worklog Time Spent: 10m Work Description: stale[bot] commented on issue #10064: [BEAM-8613] Add environment variable support to Docker environment URL: https://github.com/apache/beam/pull/10064#issuecomment-578457771 This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 377314) Time Spent: 1h 20m (was: 1h 10m) > Add environment variable support to Docker environment > -- > > Key: BEAM-8613 > URL: https://issues.apache.org/jira/browse/BEAM-8613 > Project: Beam > Issue Type: Improvement > Components: java-fn-execution, runner-core, runner-direct >Reporter: Nathan Rusch >Assignee: Nathan Rusch >Priority: Trivial > Time Spent: 1h 20m > Remaining Estimate: 0h > > The Process environment allows specifying environment variables via a map > field on its payload message. The Docker environment should support this same > pattern, and forward the contents of the map through to the container runtime. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8613) Add environment variable support to Docker environment
[ https://issues.apache.org/jira/browse/BEAM-8613?focusedWorklogId=377315=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-377315 ] ASF GitHub Bot logged work on BEAM-8613: Author: ASF GitHub Bot Created on: 26/Jan/20 00:48 Start Date: 26/Jan/20 00:48 Worklog Time Spent: 10m Work Description: stale[bot] commented on pull request #10064: [BEAM-8613] Add environment variable support to Docker environment URL: https://github.com/apache/beam/pull/10064 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 377315) Time Spent: 1.5h (was: 1h 20m) > Add environment variable support to Docker environment > -- > > Key: BEAM-8613 > URL: https://issues.apache.org/jira/browse/BEAM-8613 > Project: Beam > Issue Type: Improvement > Components: java-fn-execution, runner-core, runner-direct >Reporter: Nathan Rusch >Assignee: Nathan Rusch >Priority: Trivial > Time Spent: 1.5h > Remaining Estimate: 0h > > The Process environment allows specifying environment variables via a map > field on its payload message. The Docker environment should support this same > pattern, and forward the contents of the map through to the container runtime. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8970) Spark portable runner supports Yarn
[ https://issues.apache.org/jira/browse/BEAM-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023657#comment-17023657 ] Enis Nazif edited comment on BEAM-8970 at 1/25/20 10:12 PM: looking at this issue, to run a pipeline on YARN backed spark cluster, a user should be able to specify runner options of {code:java} ['--runner=SparkRunner', '--spark_submit_uber_jar', '--spark_rest_url=http://spark-rest-api:6066', '--spark_master_url='yarn']{code} As it stands, the 'spark_master_url' isn't being passed into the request created in in [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/spark_uber_jar_job_server.py#L145] It seems that this is necessary to support YARN Failing this, an alternative way may be to bypass the Spark REST API (which seems like fairly hidden functionality) and instead directly spark-submit the portable jars that are created. was (Author: enazif): looking at this issue, to run a pipeline on YARN backed sparked, a user should be able to specify runner options of {code:java} ['--runner=SparkRunner', '--spark_submit_uber_jar', '--spark_rest_url=http://spark-rest-api:6066', '--spark_master_url='yarn']{code} As it stands, the 'spark_master_url' isn't being passed into the request created in in [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/spark_uber_jar_job_server.py#L145] It seems that this is necessary to support YARN Failing this, an alternative way may be to bypass the Spark REST API (which seems like fairly hidden functionality) and instead directly spark-submit the portable jars that are created. > Spark portable runner supports Yarn > --- > > Key: BEAM-8970 > URL: https://issues.apache.org/jira/browse/BEAM-8970 > Project: Beam > Issue Type: Wish > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8970) Spark portable runner supports Yarn
[ https://issues.apache.org/jira/browse/BEAM-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023657#comment-17023657 ] Enis Nazif edited comment on BEAM-8970 at 1/25/20 10:12 PM: looking at this issue, to run a pipeline on YARN backed sparked, a user should be able to specify runner options of {code:java} ['--runner=SparkRunner', '--spark_submit_uber_jar', '--spark_rest_url=http://spark-rest-api:6066', '--spark_master_url='yarn']{code} As it stands, the 'spark_master_url' isn't being passed into the request created in in [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/spark_uber_jar_job_server.py#L145] It seems that this is necessary to support YARN Failing this, an alternative way may be to bypass the Spark REST API (which seems like fairly hidden functionality) and instead directly spark-submit the portable jars that are created. was (Author: enazif): looking at this issue, to run a pipeline on YARN backed sparked, a user should be able to specify runner options of {code:java} ['--runner=SparkRunner', '--spark_submit_uber_jar', '--spark_rest_url=http://spark-rest-api:6066', '--spark_master_url='yarn']{code} As it stands, the 'spark_master_url' isn't being passed into the request created in in [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/spark_uber_jar_job_server.py#L145] It seems that this is necessary to support YARN Failing this, an alternative way may be to bypass the Spark REST API (which seems like fairly hidden functionality) and instead directly {noformat} spark-submit{noformat} spark-submit` the portable jars that are created. > Spark portable runner supports Yarn > --- > > Key: BEAM-8970 > URL: https://issues.apache.org/jira/browse/BEAM-8970 > Project: Beam > Issue Type: Wish > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8970) Spark portable runner supports Yarn
[ https://issues.apache.org/jira/browse/BEAM-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023657#comment-17023657 ] Enis Nazif edited comment on BEAM-8970 at 1/25/20 10:11 PM: looking at this issue, to run a pipeline on YARN backed sparked, a user should be able to specify runner options of {code:java} ['--runner=SparkRunner', '--spark_submit_uber_jar', '--spark_rest_url=http://spark-rest-api:6066', '--spark_master_url='yarn']{code} As it stands, the 'spark_master_url' isn't being passed into the request created in in [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/spark_uber_jar_job_server.py#L145] It seems that this is necessary to support YARN Failing this, an alternative way may be to bypass the Spark REST API (which seems like fairly hidden functionality) and instead directly {noformat} spark-submit{noformat} spark-submit` the portable jars that are created. was (Author: enazif): looking at this issue, to run a pipeline on YARN backed sparked, a user should be able to specify runner options of {code:java} ['--runner=SparkRunner', '--spark_submit_uber_jar', '--spark_rest_url=http://spark-rest-api:6066', '--spark_master_url='yarn']{code} As it stands, the `spark_master_url` isn't being passed into the request created in in [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/spark_uber_jar_job_server.py#L145] It seems that this is necessary to support YARN Failing this, an alternative way may be to bypass the Spark REST API (which seems like fairly hidden functionality) and instead directly `spark-submit` the portable jars that are created. > Spark portable runner supports Yarn > --- > > Key: BEAM-8970 > URL: https://issues.apache.org/jira/browse/BEAM-8970 > Project: Beam > Issue Type: Wish > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8970) Spark portable runner supports Yarn
[ https://issues.apache.org/jira/browse/BEAM-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17023657#comment-17023657 ] Enis Nazif commented on BEAM-8970: -- looking at this issue, to run a pipeline on YARN backed sparked, a user should be able to specify runner options of {code:java} ['--runner=SparkRunner', '--spark_submit_uber_jar', '--spark_rest_url=http://spark-rest-api:6066', '--spark_master_url='yarn']{code} As it stands, the `spark_master_url` isn't being passed into the request created in in [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/spark_uber_jar_job_server.py#L145] It seems that this is necessary to support YARN Failing this, an alternative way may be to bypass the Spark REST API (which seems like fairly hidden functionality) and instead directly `spark-submit` the portable jars that are created. > Spark portable runner supports Yarn > --- > > Key: BEAM-8970 > URL: https://issues.apache.org/jira/browse/BEAM-8970 > Project: Beam > Issue Type: Wish > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=377287=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-377287 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 25/Jan/20 20:42 Start Date: 25/Jan/20 20:42 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #10592: [BEAM-7746] Introduce a protocol to handle various types of partitioning buffers URL: https://github.com/apache/beam/pull/10592#discussion_r370954295 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py ## @@ -1065,24 +1106,37 @@ def append(self, item): self._overlay[self._key] = list(self._underlying[self._key]) self._overlay[self._key].append(item) +StateType = Union[CopyOnWriteState, + DefaultDict[bytes, CopyOnWriteListProtocol]] + def __init__(self): + # type: () -> None self._lock = threading.Lock() - self._state = collections.defaultdict(list) # type: DefaultDict[bytes, List[bytes]] - self._checkpoint = None + self._state = collections.defaultdict(list) # type: FnApiRunner.StateServicer.StateType Review comment: `List[bytes]` implements the `CopyOnWriteListProtocol` protocol (now simply called `Buffer`) by virtue of supporting `__iter__` and `append`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 377287) Time Spent: 56h (was: 55h 50m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 56h > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=377288=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-377288 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 25/Jan/20 20:42 Start Date: 25/Jan/20 20:42 Worklog Time Spent: 10m Work Description: chadrik commented on issue #10592: [BEAM-7746] Introduce a protocol to handle various types of partitioning buffers URL: https://github.com/apache/beam/pull/10592#issuecomment-578441381 I pushed some changes to address the review notes. Let me know what you think. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 377288) Time Spent: 56h 10m (was: 56h) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 56h 10m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=377285=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-377285 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 25/Jan/20 20:32 Start Date: 25/Jan/20 20:32 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #10592: [BEAM-7746] Introduce a protocol to handle various types of partitioning buffers URL: https://github.com/apache/beam/pull/10592#discussion_r370954328 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py ## @@ -1026,12 +1050,23 @@ def _extract_endpoints(stage, # type: fn_api_runner_transforms.Stage class StateServicer(beam_fn_api_pb2_grpc.BeamFnStateServicer, sdk_worker.StateHandler): +class CopyOnWriteListProtocol(Protocol): Review comment: How about we rename this protocol to `Buffer` and the other to `PartitionableBuffer`? `PartitionableBuffer` can inherit from `Buffer` and add the `partition()` method. We don't want to rename `append()` to `overlay()` because we want to use a `list` as one of our data structures that implements this protocol. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 377285) Time Spent: 55h 50m (was: 55h 40m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 55h 50m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=377284=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-377284 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 25/Jan/20 20:31 Start Date: 25/Jan/20 20:31 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #10592: [BEAM-7746] Introduce a protocol to handle various types of partitioning buffers URL: https://github.com/apache/beam/pull/10592#discussion_r370954295 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py ## @@ -1065,24 +1106,37 @@ def append(self, item): self._overlay[self._key] = list(self._underlying[self._key]) self._overlay[self._key].append(item) +StateType = Union[CopyOnWriteState, + DefaultDict[bytes, CopyOnWriteListProtocol]] + def __init__(self): + # type: () -> None self._lock = threading.Lock() - self._state = collections.defaultdict(list) # type: DefaultDict[bytes, List[bytes]] - self._checkpoint = None + self._state = collections.defaultdict(list) # type: FnApiRunner.StateServicer.StateType Review comment: `List[bytes]` implements the `CopyOnWriteListProtocol` protocol by virtue of supporting `__iter__` and `append`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 377284) Time Spent: 55h 40m (was: 55.5h) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 55h 40m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=377283=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-377283 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 25/Jan/20 20:24 Start Date: 25/Jan/20 20:24 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #10592: [BEAM-7746] Introduce a protocol to handle various types of partitioning buffers URL: https://github.com/apache/beam/pull/10592#discussion_r370953947 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py ## @@ -288,6 +303,7 @@ def append(self, elements_data): # type: (bytes) -> None if self._grouped_output: raise RuntimeError('Grouping table append after read.') +assert self._table is not None Review comment: great suggestion This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 377283) Time Spent: 55.5h (was: 55h 20m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 55.5h > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7732) Allow access to SpannerOptions in Beam
[ https://issues.apache.org/jira/browse/BEAM-7732?focusedWorklogId=377184=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-377184 ] ASF GitHub Bot logged work on BEAM-7732: Author: ASF GitHub Bot Created on: 25/Jan/20 12:48 Start Date: 25/Jan/20 12:48 Worklog Time Spent: 10m Work Description: stale[bot] commented on issue #9048: [BEAM-7732] Enable setting custom SpannerOptions. URL: https://github.com/apache/beam/pull/9048#issuecomment-578403621 This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the d...@beam.apache.org list. Thank you for your contributions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 377184) Time Spent: 3h 50m (was: 3h 40m) > Allow access to SpannerOptions in Beam > -- > > Key: BEAM-7732 > URL: https://issues.apache.org/jira/browse/BEAM-7732 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.12.0, 2.13.0 >Reporter: Niel Markwick >Priority: Minor > Time Spent: 3h 50m > Remaining Estimate: 0h > > Beam hides the > [SpannerOptions|https://github.com/googleapis/google-cloud-java/blob/master/google-cloud-clients/google-cloud-spanner/src/main/java/com/google/cloud/spanner/SpannerOptions.java] > object behind a > [SpannerConfig|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerConfig.java] > object because the SpannerOptions object is not serializable. > This means that the only options that can be set are those that can be > specified in SpannerConfig - limited to host, project, instance, database. > Suggestion: add the possibility to set a SpannerOptionsFactory in > SpannerConfig: > {code:java} > public interface SpannerOptionsFactory extends Serializable { > public SpannerOptions create(); > } > {code} > This would allow the user use this factory class to specify custom > SpannerOptions before they are passed onto the connectToSpanner() method; > connectToSpanner() would then become: > {code:java} > public SpannerAccessor connectToSpanner() { > > SpannerOptions.Builder builder = spannerOptionsFactory.create().toBuilder(); > // rest of connectToSpanner follows, setting project, host, etc. > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)