[jira] [Work logged] (BEAM-8503) Improve TestBigQuery and TestPubsub
[ https://issues.apache.org/jira/browse/BEAM-8503?focusedWorklogId=337654&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337654 ] ASF GitHub Bot logged work on BEAM-8503: Author: ASF GitHub Bot Created on: 02/Nov/19 04:21 Start Date: 02/Nov/19 04:21 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #9880: [BEAM-8503] Improve TestBigQuery and TestPubsub URL: https://github.com/apache/beam/pull/9880 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337654) Time Spent: 2h 10m (was: 2h) > Improve TestBigQuery and TestPubsub > --- > > Key: BEAM-8503 > URL: https://issues.apache.org/jira/browse/BEAM-8503 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Minor > Time Spent: 2h 10m > Remaining Estimate: 0h > > Add better support for E2E BigQuery and Pubsub testing: > - TestBigQuery should have the ability to insert data into the underlying > table before a test. > - TestPubsub should have the ability to subcribe to the underlying topic and > read messages that were written during a test. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8547) Portable Wordcount fails with on stadalone Flink cluster
[ https://issues.apache.org/jira/browse/BEAM-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev updated BEAM-8547: -- Description: Repro: # git checkout origin/release-2.16.0 # ./flink-1.8.2/bin/start-cluster.sh # gradlew :runners:flink:1.8:job-server:runShadow -PflinkMasterUrl=localhost:8081 # python -m apache_beam.examples.wordcount --input=/etc/profile --output=/tmp/py-wordcount-direct --runner=PortableRunner --experiments=worker_threads=100 --parallelism=1 --shutdown_sources_on_final_watermark --sdk_worker_parallelism=1 --environment_cache_millis=6 --job_endpoint=localhost:8099 This causes the runner to crash with: {noformat} Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 158, in _execute response = task() File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 191, in self._execute(lambda: worker.do_instruction(work), work) File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 343, in do_instruction request.instruction_id) File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 369, in process_bundle bundle_processor.process_bundle(instruction_id)) File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py", line 663, in process_bundle data.ptransform_id].process_encoded(data.data) File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py", line 143, in process_encoded self.output(decoded_value) File "apache_beam/runners/worker/operations.py", line 255, in apache_beam.runners.worker.operations.Operation.output File "apache_beam/runners/worker/operations.py", line 256, in apache_beam.runners.worker.operations.Operation.output File "apache_beam/runners/worker/operations.py", line 143, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive File "apache_beam/runners/worker/operations.py", line 593, in apache_beam.runners.worker.operations.DoOperation.process File "apache_beam/runners/worker/operations.py", line 594, in apache_beam.runners.worker.operations.DoOperation.process File "apache_beam/runners/common.py", line 776, in apache_beam.runners.common.DoFnRunner.receive File "apache_beam/runners/common.py", line 782, in apache_beam.runners.common.DoFnRunner.process File "apache_beam/runners/common.py", line 849, in apache_beam.runners.common.DoFnRunner._reraise_augmented File "/usr/local/lib/python3.7/site-packages/future/utils/__init__.py", line 421, in raise_with_traceback raise exc.with_traceback(traceback) File "apache_beam/runners/common.py", line 780, in apache_beam.runners.common.DoFnRunner.process File "apache_beam/runners/common.py", line 587, in apache_beam.runners.common.PerWindowInvoker.invoke_process File "apache_beam/runners/common.py", line 660, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window File "/usr/local/lib/python3.7/site-packages/apache_beam/io/iobase.py", line 1042, in process self.writer = self.sink.open_writer(init_result, str(uuid.uuid4())) File "/usr/local/lib/python3.7/site-packages/apache_beam/options/value_provider.py", line 137, in _f return fnc(self, *args, **kwargs) File "/usr/local/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", line 186, in open_writer return FileBasedSinkWriter(self, writer_path) File "/usr/local/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", line 390, in __init__ self.temp_handle = self.sink.open(temp_shard_path) File "/usr/local/lib/python3.7/site-packages/apache_beam/io/textio.py", line 391, in open file_handle = super(_TextSink, self).open(temp_path) File "/usr/local/lib/python3.7/site-packages/apache_beam/options/value_provider.py", line 137, in _f return fnc(self, *args, **kwargs) File "/usr/local/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", line 129, in open return FileSystems.create(temp_path, self.mime_type, self.compression_type) File "/usr/local/lib/python3.7/site-packages/apache_beam/io/filesystems.py", line 203, in create return filesystem.create(path, mime_type, compression_type) File "/usr/local/lib/python3.7/site-packages/apache_beam/io/localfilesystem.py", line 151, in create return self._path_open(path, 'wb', mime_type, compression_type) File "/usr/local/lib/python3.7/site-packages/apache_beam/io/localfilesystem.py", line 134, in _path_open raw_file = open(path, mode) RuntimeError: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/beam-temp-py-wordcount-direct-ea951c18fd1211e9ac84a0c589d778c3/d39e13af-277b-437e-89f2-e00249287e1d.py-wordcount-direct' [while running 'write/Write/WriteImpl/WriteBundles'] {noformat}
[jira] [Work logged] (BEAM-8146) SchemaCoder/RowCoder have no equals() function
[ https://issues.apache.org/jira/browse/BEAM-8146?focusedWorklogId=337647&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337647 ] ASF GitHub Bot logged work on BEAM-8146: Author: ASF GitHub Bot Created on: 02/Nov/19 02:09 Start Date: 02/Nov/19 02:09 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #9493: [BEAM-8146,BEAM-8204,BEAM-8205] Add equals and hashCode to SchemaCoder and RowCoder URL: https://github.com/apache/beam/pull/9493 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337647) Time Spent: 3.5h (was: 3h 20m) > SchemaCoder/RowCoder have no equals() function > -- > > Key: BEAM-8146 > URL: https://issues.apache.org/jira/browse/BEAM-8146 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.15.0 >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > SchemaCoder has no equals function, so it can't be compared in tests, like > CloudComponentsTests$DefaultCoders, which is being re-enabled in > https://github.com/apache/beam/pull/9446 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-7859) Portable Wordcount on Spark runner does not work in DOCKER execution mode.
[ https://issues.apache.org/jira/browse/BEAM-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965196#comment-16965196 ] Valentyn Tymofieiev edited comment on BEAM-7859 at 11/2/19 2:00 AM: Saw a similar error with Flink, when running against a standalone cluster: BEAM-8547. It happened on 2.16.0 even with --sdk_worker_parallelism=1. was (Author: tvalentyn): Saw a similar error with Flink: BEAM-8547. It happened on 2.16.0 even with --sdk_worker_parallelism=1. > Portable Wordcount on Spark runner does not work in DOCKER execution mode. > -- > > Key: BEAM-7859 > URL: https://issues.apache.org/jira/browse/BEAM-7859 > Project: Beam > Issue Type: Bug > Components: runner-spark, sdk-py-harness >Reporter: Valentyn Tymofieiev >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Fix For: 2.16.0 > > Time Spent: 1h > Remaining Estimate: 0h > > It seems that portable wordcount on Spark runner works in Loopback mode only. > If we use a container environment, the wordcount fails on Spark runner (both > on Python 2 and Python 3), but passes on Flink runner. > ./gradlew :sdks:python:container:py3:docker > ./gradlew :runners:spark:job-server:runShadow# replace to > :runners:flink:1.5:job-server:runShadow for WC to pass. > ./gradlew :sdks:python:test-suites:portable:py35:portableWordCountBatch > -PjobEndpoint=localhost:8099 > {noformat} > Task :sdks:python:test-suites:portable:py35:portableWordCountBatch > /usr/local/google/home/valentyn/projects/beam/cleanflink/beam/sdks/python/apache_beam/__init__.py:84: > UserWarning: Some syntactic constructs of Python 3 are not yet fully > supported by Apache Beam. > 'Some syntactic constructs of Python 3 are not yet fully supported by ' > INFO:root:Using latest locally built Python SDK docker image: > valentyn-docker-apache.bintray.io/beam/python3:latest. > INFO:root: > > INFO:root: > > WARNING:root:Discarding unparseable args: ['--parallelism=2', > '--shutdown_sources_on_final_watermark'] > INFO:root:Job state changed to RUNNING > ERROR:root:java.lang.RuntimeException: Error received from SDK harness for > instruction 2: Traceback (most recent call last): > File "apache_beam/runners/common.py", line 782, in > apache_beam.runners.common.DoFnRunner.process > File "apache_beam/runners/common.py", line 594, in > apache_beam.runners.common.PerWindowInvoker.invoke_process > File "apache_beam/runners/common.py", line 666, in > apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window > File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", > line 1041, in process > self.writer = self.sink.open_writer(init_result, str(uuid.uuid4())) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/options/value_provider.py", > line 137, in _f > return fnc(self, *args, **kwargs) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 186, in open_writer > return FileBasedSinkWriter(self, writer_path) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 390, in __init__ > self.temp_handle = self.sink.open(temp_shard_path) > File "/usr/local/lib/python3.5/site-packages/apache_beam/io/textio.py", > line 391, in open > file_handle = super(_TextSink, self).open(temp_path) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/options/value_provider.py", > line 137, in _f > return fnc(self, *args, **kwargs) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 129, in open > return FileSystems.create(temp_path, self.mime_type, > self.compression_type) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line > 203, in create > return filesystem.create(path, mime_type, compression_type) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/localfilesystem.py", > line 151, in create > return self._path_open(path, 'wb', mime_type, compression_type) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/localfilesystem.py", > line 134, in _path_open > raw_file = open(path, mode) > FileNotFoundError: [Errno 2] No such file or directory: > '/tmp/beam-temp-py-wordcount-direct-1590df80b36011e98bf3f4939fefdbd5/ab0591ee-f8b9-471f-8edd-4b4a89481bc2.py-wordcount-direct' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 157, in _execute >
[jira] [Comment Edited] (BEAM-7859) Portable Wordcount on Spark runner does not work in DOCKER execution mode.
[ https://issues.apache.org/jira/browse/BEAM-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965196#comment-16965196 ] Valentyn Tymofieiev edited comment on BEAM-7859 at 11/2/19 2:00 AM: Saw a similar error with Flink, when running against a standalone Flink cluster: BEAM-8547. It happened on 2.16.0 even with --sdk_worker_parallelism=1. was (Author: tvalentyn): Saw a similar error with Flink, when running against a standalone cluster: BEAM-8547. It happened on 2.16.0 even with --sdk_worker_parallelism=1. > Portable Wordcount on Spark runner does not work in DOCKER execution mode. > -- > > Key: BEAM-7859 > URL: https://issues.apache.org/jira/browse/BEAM-7859 > Project: Beam > Issue Type: Bug > Components: runner-spark, sdk-py-harness >Reporter: Valentyn Tymofieiev >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Fix For: 2.16.0 > > Time Spent: 1h > Remaining Estimate: 0h > > It seems that portable wordcount on Spark runner works in Loopback mode only. > If we use a container environment, the wordcount fails on Spark runner (both > on Python 2 and Python 3), but passes on Flink runner. > ./gradlew :sdks:python:container:py3:docker > ./gradlew :runners:spark:job-server:runShadow# replace to > :runners:flink:1.5:job-server:runShadow for WC to pass. > ./gradlew :sdks:python:test-suites:portable:py35:portableWordCountBatch > -PjobEndpoint=localhost:8099 > {noformat} > Task :sdks:python:test-suites:portable:py35:portableWordCountBatch > /usr/local/google/home/valentyn/projects/beam/cleanflink/beam/sdks/python/apache_beam/__init__.py:84: > UserWarning: Some syntactic constructs of Python 3 are not yet fully > supported by Apache Beam. > 'Some syntactic constructs of Python 3 are not yet fully supported by ' > INFO:root:Using latest locally built Python SDK docker image: > valentyn-docker-apache.bintray.io/beam/python3:latest. > INFO:root: > > INFO:root: > > WARNING:root:Discarding unparseable args: ['--parallelism=2', > '--shutdown_sources_on_final_watermark'] > INFO:root:Job state changed to RUNNING > ERROR:root:java.lang.RuntimeException: Error received from SDK harness for > instruction 2: Traceback (most recent call last): > File "apache_beam/runners/common.py", line 782, in > apache_beam.runners.common.DoFnRunner.process > File "apache_beam/runners/common.py", line 594, in > apache_beam.runners.common.PerWindowInvoker.invoke_process > File "apache_beam/runners/common.py", line 666, in > apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window > File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", > line 1041, in process > self.writer = self.sink.open_writer(init_result, str(uuid.uuid4())) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/options/value_provider.py", > line 137, in _f > return fnc(self, *args, **kwargs) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 186, in open_writer > return FileBasedSinkWriter(self, writer_path) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 390, in __init__ > self.temp_handle = self.sink.open(temp_shard_path) > File "/usr/local/lib/python3.5/site-packages/apache_beam/io/textio.py", > line 391, in open > file_handle = super(_TextSink, self).open(temp_path) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/options/value_provider.py", > line 137, in _f > return fnc(self, *args, **kwargs) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 129, in open > return FileSystems.create(temp_path, self.mime_type, > self.compression_type) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line > 203, in create > return filesystem.create(path, mime_type, compression_type) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/localfilesystem.py", > line 151, in create > return self._path_open(path, 'wb', mime_type, compression_type) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/localfilesystem.py", > line 134, in _path_open > raw_file = open(path, mode) > FileNotFoundError: [Errno 2] No such file or directory: > '/tmp/beam-temp-py-wordcount-direct-1590df80b36011e98bf3f4939fefdbd5/ab0591ee-f8b9-471f-8edd-4b4a89481bc2.py-wordcount-direct' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/usr/local/lib/python3.5/site-packages/apache_beam/runners/w
[jira] [Created] (BEAM-8547) Portable Wordcount fails with on stadalone Flink cluster
Valentyn Tymofieiev created BEAM-8547: - Summary: Portable Wordcount fails with on stadalone Flink cluster Key: BEAM-8547 URL: https://issues.apache.org/jira/browse/BEAM-8547 Project: Beam Issue Type: Bug Components: runner-flink, sdk-py-harness Reporter: Valentyn Tymofieiev Repro: # git checkout origin/release-2.16.0 # ./flink-1.8.2/bin/start-cluster.sh # gradlew :runners:flink:1.8:job-server:runShadow -PflinkMasterUrl=localhost:8081 # python -m apache_beam.examples.wordcount --input=/etc/profile --output=/tmp/py-wordcount-direct --runner=PortableRunner --experiments=worker_threads=100 --parallelism=1 --shutdown_sources_on_final_watermark --sdk_worker_parallelism=1 --environment_cache_millis=6 --job_endpoint=localhost:8099 This causes the runner to crash with: {noformat} Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 158, in _execute response = task() File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 191, in self._execute(lambda: worker.do_instruction(work), work) File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 343, in do_instruction request.instruction_id) File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py", line 369, in process_bundle bundle_processor.process_bundle(instruction_id)) File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py", line 663, in process_bundle data.ptransform_id].process_encoded(data.data) File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py", line 143, in process_encoded self.output(decoded_value) File "apache_beam/runners/worker/operations.py", line 255, in apache_beam.runners.worker.operations.Operation.output File "apache_beam/runners/worker/operations.py", line 256, in apache_beam.runners.worker.operations.Operation.output File "apache_beam/runners/worker/operations.py", line 143, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive File "apache_beam/runners/worker/operations.py", line 593, in apache_beam.runners.worker.operations.DoOperation.process File "apache_beam/runners/worker/operations.py", line 594, in apache_beam.runners.worker.operations.DoOperation.process File "apache_beam/runners/common.py", line 776, in apache_beam.runners.common.DoFnRunner.receive File "apache_beam/runners/common.py", line 782, in apache_beam.runners.common.DoFnRunner.process File "apache_beam/runners/common.py", line 849, in apache_beam.runners.common.DoFnRunner._reraise_augmented File "/usr/local/lib/python3.7/site-packages/future/utils/__init__.py", line 421, in raise_with_traceback raise exc.with_traceback(traceback) File "apache_beam/runners/common.py", line 780, in apache_beam.runners.common.DoFnRunner.process File "apache_beam/runners/common.py", line 587, in apache_beam.runners.common.PerWindowInvoker.invoke_process File "apache_beam/runners/common.py", line 660, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window File "/usr/local/lib/python3.7/site-packages/apache_beam/io/iobase.py", line 1042, in process self.writer = self.sink.open_writer(init_result, str(uuid.uuid4())) File "/usr/local/lib/python3.7/site-packages/apache_beam/options/value_provider.py", line 137, in _f return fnc(self, *args, **kwargs) File "/usr/local/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", line 186, in open_writer return FileBasedSinkWriter(self, writer_path) File "/usr/local/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", line 390, in __init__ self.temp_handle = self.sink.open(temp_shard_path) File "/usr/local/lib/python3.7/site-packages/apache_beam/io/textio.py", line 391, in open file_handle = super(_TextSink, self).open(temp_path) File "/usr/local/lib/python3.7/site-packages/apache_beam/options/value_provider.py", line 137, in _f return fnc(self, *args, **kwargs) File "/usr/local/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", line 129, in open return FileSystems.create(temp_path, self.mime_type, self.compression_type) File "/usr/local/lib/python3.7/site-packages/apache_beam/io/filesystems.py", line 203, in create return filesystem.create(path, mime_type, compression_type) File "/usr/local/lib/python3.7/site-packages/apache_beam/io/localfilesystem.py", line 151, in create return self._path_open(path, 'wb', mime_type, compression_type) File "/usr/local/lib/python3.7/site-packages/apache_beam/io/localfilesystem.py", line 134, in _path_open raw_file = open(path, mode) RuntimeError: FileNotFoundError: [Errno 2] No such file or directory: '/t
[jira] [Commented] (BEAM-7859) Portable Wordcount on Spark runner does not work in DOCKER execution mode.
[ https://issues.apache.org/jira/browse/BEAM-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965196#comment-16965196 ] Valentyn Tymofieiev commented on BEAM-7859: --- Saw a similar error with Flink: BEAM-8547. It happened on 2.16.0 even with --sdk_worker_parallelism=1. > Portable Wordcount on Spark runner does not work in DOCKER execution mode. > -- > > Key: BEAM-7859 > URL: https://issues.apache.org/jira/browse/BEAM-7859 > Project: Beam > Issue Type: Bug > Components: runner-spark, sdk-py-harness >Reporter: Valentyn Tymofieiev >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Fix For: 2.16.0 > > Time Spent: 1h > Remaining Estimate: 0h > > It seems that portable wordcount on Spark runner works in Loopback mode only. > If we use a container environment, the wordcount fails on Spark runner (both > on Python 2 and Python 3), but passes on Flink runner. > ./gradlew :sdks:python:container:py3:docker > ./gradlew :runners:spark:job-server:runShadow# replace to > :runners:flink:1.5:job-server:runShadow for WC to pass. > ./gradlew :sdks:python:test-suites:portable:py35:portableWordCountBatch > -PjobEndpoint=localhost:8099 > {noformat} > Task :sdks:python:test-suites:portable:py35:portableWordCountBatch > /usr/local/google/home/valentyn/projects/beam/cleanflink/beam/sdks/python/apache_beam/__init__.py:84: > UserWarning: Some syntactic constructs of Python 3 are not yet fully > supported by Apache Beam. > 'Some syntactic constructs of Python 3 are not yet fully supported by ' > INFO:root:Using latest locally built Python SDK docker image: > valentyn-docker-apache.bintray.io/beam/python3:latest. > INFO:root: > > INFO:root: > > WARNING:root:Discarding unparseable args: ['--parallelism=2', > '--shutdown_sources_on_final_watermark'] > INFO:root:Job state changed to RUNNING > ERROR:root:java.lang.RuntimeException: Error received from SDK harness for > instruction 2: Traceback (most recent call last): > File "apache_beam/runners/common.py", line 782, in > apache_beam.runners.common.DoFnRunner.process > File "apache_beam/runners/common.py", line 594, in > apache_beam.runners.common.PerWindowInvoker.invoke_process > File "apache_beam/runners/common.py", line 666, in > apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window > File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", > line 1041, in process > self.writer = self.sink.open_writer(init_result, str(uuid.uuid4())) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/options/value_provider.py", > line 137, in _f > return fnc(self, *args, **kwargs) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 186, in open_writer > return FileBasedSinkWriter(self, writer_path) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 390, in __init__ > self.temp_handle = self.sink.open(temp_shard_path) > File "/usr/local/lib/python3.5/site-packages/apache_beam/io/textio.py", > line 391, in open > file_handle = super(_TextSink, self).open(temp_path) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/options/value_provider.py", > line 137, in _f > return fnc(self, *args, **kwargs) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", > line 129, in open > return FileSystems.create(temp_path, self.mime_type, > self.compression_type) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line > 203, in create > return filesystem.create(path, mime_type, compression_type) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/localfilesystem.py", > line 151, in create > return self._path_open(path, 'wb', mime_type, compression_type) > File > "/usr/local/lib/python3.5/site-packages/apache_beam/io/localfilesystem.py", > line 134, in _path_open > raw_file = open(path, mode) > FileNotFoundError: [Errno 2] No such file or directory: > '/tmp/beam-temp-py-wordcount-direct-1590df80b36011e98bf3f4939fefdbd5/ab0591ee-f8b9-471f-8edd-4b4a89481bc2.py-wordcount-direct' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File > "/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 157, in _execute > response = task() > File > "/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 190, in > self._execute(lambda: worker.do_instruction(work), work) > File > "/usr/local/li
[jira] [Updated] (BEAM-7859) Portable Wordcount on Spark runner does not work in DOCKER execution mode.
[ https://issues.apache.org/jira/browse/BEAM-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev updated BEAM-7859: -- Description: It seems that portable wordcount on Spark runner works in Loopback mode only. If we use a container environment, the wordcount fails on Spark runner (both on Python 2 and Python 3), but passes on Flink runner. ./gradlew :sdks:python:container:py3:docker ./gradlew :runners:spark:job-server:runShadow# replace to :runners:flink:1.5:job-server:runShadow for WC to pass. ./gradlew :sdks:python:test-suites:portable:py35:portableWordCountBatch -PjobEndpoint=localhost:8099 {noformat} Task :sdks:python:test-suites:portable:py35:portableWordCountBatch /usr/local/google/home/valentyn/projects/beam/cleanflink/beam/sdks/python/apache_beam/__init__.py:84: UserWarning: Some syntactic constructs of Python 3 are not yet fully supported by Apache Beam. 'Some syntactic constructs of Python 3 are not yet fully supported by ' INFO:root:Using latest locally built Python SDK docker image: valentyn-docker-apache.bintray.io/beam/python3:latest. INFO:root: INFO:root: WARNING:root:Discarding unparseable args: ['--parallelism=2', '--shutdown_sources_on_final_watermark'] INFO:root:Job state changed to RUNNING ERROR:root:java.lang.RuntimeException: Error received from SDK harness for instruction 2: Traceback (most recent call last): File "apache_beam/runners/common.py", line 782, in apache_beam.runners.common.DoFnRunner.process File "apache_beam/runners/common.py", line 594, in apache_beam.runners.common.PerWindowInvoker.invoke_process File "apache_beam/runners/common.py", line 666, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", line 1041, in process self.writer = self.sink.open_writer(init_result, str(uuid.uuid4())) File "/usr/local/lib/python3.5/site-packages/apache_beam/options/value_provider.py", line 137, in _f return fnc(self, *args, **kwargs) File "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", line 186, in open_writer return FileBasedSinkWriter(self, writer_path) File "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", line 390, in __init__ self.temp_handle = self.sink.open(temp_shard_path) File "/usr/local/lib/python3.5/site-packages/apache_beam/io/textio.py", line 391, in open file_handle = super(_TextSink, self).open(temp_path) File "/usr/local/lib/python3.5/site-packages/apache_beam/options/value_provider.py", line 137, in _f return fnc(self, *args, **kwargs) File "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", line 129, in open return FileSystems.create(temp_path, self.mime_type, self.compression_type) File "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line 203, in create return filesystem.create(path, mime_type, compression_type) File "/usr/local/lib/python3.5/site-packages/apache_beam/io/localfilesystem.py", line 151, in create return self._path_open(path, 'wb', mime_type, compression_type) File "/usr/local/lib/python3.5/site-packages/apache_beam/io/localfilesystem.py", line 134, in _path_open raw_file = open(path, mode) FileNotFoundError: [Errno 2] No such file or directory: '/tmp/beam-temp-py-wordcount-direct-1590df80b36011e98bf3f4939fefdbd5/ab0591ee-f8b9-471f-8edd-4b4a89481bc2.py-wordcount-direct' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker.py", line 157, in _execute response = task() File "/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker.py", line 190, in self._execute(lambda: worker.do_instruction(work), work) File "/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker.py", line 342, in do_instruction request.instruction_id) File "/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker.py", line 368, in process_bundle bundle_processor.process_bundle(instruction_id)) File "/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", line 593, in process_bundle data.ptransform_id].process_encoded(data.data) File "/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", line 143, in process_encoded self.output(decoded_value) File "apache_beam/runners/worker/operations.py", line 255, in apache_beam.runners.worker.operations.Operation.output File "apache_beam/runners/worker/operations.py", line 256, in apache_beam.runners.worker.operations.Operation.output File "apache_beam/runners/worker/oper
[jira] [Work logged] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam
[ https://issues.apache.org/jira/browse/BEAM-8368?focusedWorklogId=337645&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337645 ] ASF GitHub Bot logged work on BEAM-8368: Author: ASF GitHub Bot Created on: 02/Nov/19 01:19 Start Date: 02/Nov/19 01:19 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9970: [BEAM-8368] [BEAM-8392] Update pyarrow to the latest version URL: https://github.com/apache/beam/pull/9970#issuecomment-548996697 > Well it's a pretty simple fix, just need to change [this line](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/parquetio_test.py#L451) to `pa.Table.from_arrays([orig.column('name')], names=['name'])` A bit of a side question. It seems like minor versions could be breaking with pyarrow. Should we pin this library, instead of using an upper bound? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337645) Time Spent: 3h (was: 2h 50m) > [Python] libprotobuf-generated exception when importing apache_beam > --- > > Key: BEAM-8368 > URL: https://issues.apache.org/jira/browse/BEAM-8368 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.15.0, 2.17.0 >Reporter: Ubaier Bhat >Assignee: Brian Hulette >Priority: Blocker > Fix For: 2.17.0 > > Attachments: error_log.txt > > Time Spent: 3h > Remaining Estimate: 0h > > Unable to import apache_beam after upgrading to macos 10.15 (Catalina). > Cleared all the pipenvs and but can't get it working again. > {code} > import apache_beam as beam > /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84: > UserWarning: Some syntactic constructs of Python 3 are not yet fully > supported by Apache Beam. > 'Some syntactic constructs of Python 3 are not yet fully supported by ' > [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already > exists in database: > [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: > GeneratedDatabase()->Add(encoded_file_descriptor, size): > libc++abi.dylib: terminating with uncaught exception of type > google::protobuf::FatalException: CHECK failed: > GeneratedDatabase()->Add(encoded_file_descriptor, size): > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8254) (Java SDK) Add workerRegion and workerZone options
[ https://issues.apache.org/jira/browse/BEAM-8254?focusedWorklogId=337643&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337643 ] ASF GitHub Bot logged work on BEAM-8254: Author: ASF GitHub Bot Created on: 02/Nov/19 01:05 Start Date: 02/Nov/19 01:05 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #9961: [BEAM-8254] add workerRegion and workerZone options to Java SDK URL: https://github.com/apache/beam/pull/9961#discussion_r341792444 ## File path: runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java ## @@ -357,6 +361,36 @@ public static DataflowRunner fromOptions(PipelineOptions options) { return new DataflowRunner(dataflowOptions); } + @VisibleForTesting + static void validateWorkerSettings(GcpOptions gcpOptions) { +Preconditions.checkArgument( +gcpOptions.getZone() == null || gcpOptions.getWorkerRegion() == null, +"Cannot use option zone with workerRegion."); Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337643) Time Spent: 0.5h (was: 20m) > (Java SDK) Add workerRegion and workerZone options > -- > > Key: BEAM-8254 > URL: https://issues.apache.org/jira/browse/BEAM-8254 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow, sdk-java-core >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks
[ https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=337642&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337642 ] ASF GitHub Bot logged work on BEAM-8457: Author: ASF GitHub Bot Created on: 02/Nov/19 01:04 Start Date: 02/Nov/19 01:04 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9885: [BEAM-8457] Label Dataflow jobs from Notebook URL: https://github.com/apache/beam/pull/9885#issuecomment-548995354 R: @pabloem could you make a first review pass? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337642) Time Spent: 5h 20m (was: 5h 10m) > Instrument Dataflow jobs that are launched from Notebooks > - > > Key: BEAM-8457 > URL: https://issues.apache.org/jira/browse/BEAM-8457 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Fix For: 2.17.0 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > Dataflow needs the capability to tell how many Dataflow jobs are launched > from the Notebook environment, i.e., the Interactive Runner. > # Change the pipeline.run() API to allow supply a runner and an option > parameter so that a pipeline initially bundled w/ an interactive runner can > be directly run by other runners from notebook. > # Implicitly add the necessary source information through user labels when > the user does p.run(runner=DataflowRunner()). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8442) Unfiy bundle register in Python SDK harness
[ https://issues.apache.org/jira/browse/BEAM-8442?focusedWorklogId=337641&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337641 ] ASF GitHub Bot logged work on BEAM-8442: Author: ASF GitHub Bot Created on: 02/Nov/19 01:02 Start Date: 02/Nov/19 01:02 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9842: [BEAM-8442] Remove duplicate code for bundle register in Python SDK harness URL: https://github.com/apache/beam/pull/9842#issuecomment-548995176 For reference, this was reverted in https://github.com/apache/beam/pull/9956 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337641) Time Spent: 1h 50m (was: 1h 40m) > Unfiy bundle register in Python SDK harness > --- > > Key: BEAM-8442 > URL: https://issues.apache.org/jira/browse/BEAM-8442 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-harness >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.18.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > There are two methods for bundle register in Python SDK harness: > `SdkHarness._request_register` and `SdkWorker.register.` It should be unfied. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8521) beam_PostCommit_XVR_Flink failing
[ https://issues.apache.org/jira/browse/BEAM-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965182#comment-16965182 ] Ahmet Altay commented on BEAM-8521: --- Great, thank you! > beam_PostCommit_XVR_Flink failing > - > > Key: BEAM-8521 > URL: https://issues.apache.org/jira/browse/BEAM-8521 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Fix For: 2.18.0 > > Time Spent: 50m > Remaining Estimate: 0h > > https://builds.apache.org/job/beam_PostCommit_XVR_Flink/ > Edit: Made subtasks for what appear to be two separate issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=337640&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337640 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 02/Nov/19 01:00 Start Date: 02/Nov/19 01:00 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9920: [BEAM-7389] Add code snippets for CombineGlobally URL: https://github.com/apache/beam/pull/9920#discussion_r341792002 ## File path: sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally.py ## @@ -0,0 +1,214 @@ +# coding=utf-8 Review comment: Consider sharing more code among the other combine PR and GBK PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337640) Time Spent: 75h 40m (was: 75.5h) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 75h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=337639&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337639 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 02/Nov/19 00:59 Start Date: 02/Nov/19 00:59 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9924: [BEAM-7389] Add code snippet for Distinct URL: https://github.com/apache/beam/pull/9924 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337639) Time Spent: 75.5h (was: 75h 20m) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 75.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=337638&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337638 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 02/Nov/19 00:59 Start Date: 02/Nov/19 00:59 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9922: [BEAM-7389] Add code snippets for CombineValues URL: https://github.com/apache/beam/pull/9922#discussion_r341791911 ## File path: sdks/python/apache_beam/examples/snippets/transforms/aggregation/combinevalues.py ## @@ -0,0 +1,246 @@ +# coding=utf-8 Review comment: There is a lot of repeated code between this PR and GBK snippets PR. Consider merging them maybe? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337638) Time Spent: 75h 20m (was: 75h 10m) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 75h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8521) beam_PostCommit_XVR_Flink failing
[ https://issues.apache.org/jira/browse/BEAM-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965180#comment-16965180 ] Kyle Weaver commented on BEAM-8521: --- We reverted the change to the registration code: https://github.com/apache/beam/pull/9956 > beam_PostCommit_XVR_Flink failing > - > > Key: BEAM-8521 > URL: https://issues.apache.org/jira/browse/BEAM-8521 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Fix For: 2.18.0 > > Time Spent: 50m > Remaining Estimate: 0h > > https://builds.apache.org/job/beam_PostCommit_XVR_Flink/ > Edit: Made subtasks for what appear to be two separate issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=337637&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337637 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 02/Nov/19 00:57 Start Date: 02/Nov/19 00:57 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9720: [BEAM-8335] Add initial modules for interactive streaming support URL: https://github.com/apache/beam/pull/9720#discussion_r341791789 ## File path: model/interactive/OWNERS ## @@ -0,0 +1,7 @@ +# See the OWNERS docs at https://s.apache.org/beam-owners + +reviewers: Review comment: You might want to limit this list to people who are likely to maintain this path. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337637) Time Spent: 14h 50m (was: 14h 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 14h 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=337636&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337636 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 02/Nov/19 00:55 Start Date: 02/Nov/19 00:55 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9921: [BEAM-7389] Add code snippets for CombinePerKey URL: https://github.com/apache/beam/pull/9921 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337636) Time Spent: 75h 10m (was: 75h) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 75h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8521) beam_PostCommit_XVR_Flink failing
[ https://issues.apache.org/jira/browse/BEAM-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965175#comment-16965175 ] Ahmet Altay edited comment on BEAM-8521 at 11/2/19 12:52 AM: - Does this need to be fixed before 2.18? Should we rollback the related PR or make some change in the registration code? /cc [~angoenka] [~mxm] was (Author: altay): Does this need to be fixed before 2.18? > beam_PostCommit_XVR_Flink failing > - > > Key: BEAM-8521 > URL: https://issues.apache.org/jira/browse/BEAM-8521 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Fix For: 2.18.0 > > Time Spent: 50m > Remaining Estimate: 0h > > https://builds.apache.org/job/beam_PostCommit_XVR_Flink/ > Edit: Made subtasks for what appear to be two separate issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8521) beam_PostCommit_XVR_Flink failing
[ https://issues.apache.org/jira/browse/BEAM-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965175#comment-16965175 ] Ahmet Altay commented on BEAM-8521: --- Does this need to be fixed before 2.18? > beam_PostCommit_XVR_Flink failing > - > > Key: BEAM-8521 > URL: https://issues.apache.org/jira/browse/BEAM-8521 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Fix For: 2.18.0 > > Time Spent: 50m > Remaining Estimate: 0h > > https://builds.apache.org/job/beam_PostCommit_XVR_Flink/ > Edit: Made subtasks for what appear to be two separate issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8521) beam_PostCommit_XVR_Flink failing
[ https://issues.apache.org/jira/browse/BEAM-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Altay updated BEAM-8521: -- Fix Version/s: 2.18.0 > beam_PostCommit_XVR_Flink failing > - > > Key: BEAM-8521 > URL: https://issues.apache.org/jira/browse/BEAM-8521 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Fix For: 2.18.0 > > Time Spent: 50m > Remaining Estimate: 0h > > https://builds.apache.org/job/beam_PostCommit_XVR_Flink/ > Edit: Made subtasks for what appear to be two separate issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-6860) WriteToText crash with "GlobalWindow -> ._IntervalWindowBase"
[ https://issues.apache.org/jira/browse/BEAM-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965173#comment-16965173 ] Ahmet Altay commented on BEAM-6860: --- [~chamikara] any idea? > WriteToText crash with "GlobalWindow -> ._IntervalWindowBase" > - > > Key: BEAM-6860 > URL: https://issues.apache.org/jira/browse/BEAM-6860 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.11.0 > Environment: macOS, DirectRunner, python 2.7.15 via > pyenv/pyenv-virtualenv >Reporter: Henrik >Assignee: Pawel Kordek >Priority: Critical > Labels: newbie > Time Spent: 40m > Remaining Estimate: 0h > > Main error: > > Cannot convert GlobalWindow to > > apache_beam.utils.windowed_value._IntervalWindowBase > This is very hard for me to debug. Doing a DoPar call before, printing the > input, gives me just what I want; so the lines of data to serialise are > "alright"; just JSON strings, in fact. > Stacktrace: > {code:java} > Traceback (most recent call last): > File "./okr_end_ride.py", line 254, in > run() > File "./okr_end_ride.py", line 250, in run > run_pipeline(pipeline_options, known_args) > File "./okr_end_ride.py", line 198, in run_pipeline > | 'write_all' >> WriteToText(known_args.output, > file_name_suffix=".txt") > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py", > line 426, in __exit__ > self.run().wait_until_finish() > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py", > line 406, in run > self._options).run(False) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py", > line 419, in run > return self.runner.run_pipeline(self, self._options) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/direct/direct_runner.py", > line 132, in run_pipeline > return runner.run_pipeline(pipeline, options) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 275, in run_pipeline > default_environment=self._default_environment)) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 278, in run_via_runner_api > return self.run_stages(*self.create_stages(pipeline_proto)) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 354, in run_stages > stage_context.safe_coders) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 509, in run_stage > data_input, data_output) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 1206, in process_bundle > result_future = self._controller.control_handler.push(process_bundle) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 821, in push > response = self.worker.do_instruction(request) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 265, in do_instruction > request.instruction_id) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 281, in process_bundle > delayed_applications = bundle_processor.process_bundle(instruction_id) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 552, in process_bundle > op.finish() > File "apache_beam/runners/worker/operations.py", line 549, in > apache_beam.runners.worker.operations.DoOperation.finish > File "apache_beam/runners/worker/operations.py", line 550, in > apache_beam.runners.worker.operations.DoOperation.finish > File "apache_beam/runners/worker/operations.py", line 551, in > apache_beam.runners.worker.operations.DoOperation.finish > File "apache_beam/runners/common.py", line 758, in > apache_beam.runners.common.DoFnRunner.finish > File "apache_beam/runners/common.py", line 752, in > apache_beam.runners.common.DoFnRunner._invoke_bundle_method > File "apache_beam/runners/common.py", line 777, in > apache_beam.runners.common.DoFnRunner._re
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=337633&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337633 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 02/Nov/19 00:44 Start Date: 02/Nov/19 00:44 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9056: [BEAM-7746] Add python type hints URL: https://github.com/apache/beam/pull/9056#discussion_r341790865 ## File path: sdks/python/tox.ini ## @@ -160,16 +160,17 @@ commands = [testenv:py27-lint] # Checks for py2 syntax errors deps = + -r build-requirements.txt Review comment: `tox` runs outside of gradle. (At least for me). What is the issue there? > I assume the reason for the two step test process (sdist + tox, each with its own virtualenv) is to ensure that tox is using an sdist built in the same way every time, using python2, just as it would be for distribution? sdist step is used for other things (for building artifacts for release). That is the reason for it to exists on its own. tox, I do not believe we spent time on configuring it to reuse existing virtualenv. If that is an option we can try that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337633) Time Spent: 17h (was: 16h 50m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 17h > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8442) Unfiy bundle register in Python SDK harness
[ https://issues.apache.org/jira/browse/BEAM-8442?focusedWorklogId=337632&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337632 ] ASF GitHub Bot logged work on BEAM-8442: Author: ASF GitHub Bot Created on: 02/Nov/19 00:40 Start Date: 02/Nov/19 00:40 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #9842: [BEAM-8442] Remove duplicate code for bundle register in Python SDK harness URL: https://github.com/apache/beam/pull/9842#issuecomment-548992947 Yes, there was a similar issue with Java in the past. It is best if registration is synchronous since then any ProcessBundleRequest can error out if the descriptor isn't available. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337632) Time Spent: 1h 40m (was: 1.5h) > Unfiy bundle register in Python SDK harness > --- > > Key: BEAM-8442 > URL: https://issues.apache.org/jira/browse/BEAM-8442 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-harness >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.18.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > There are two methods for bundle register in Python SDK harness: > `SdkHarness._request_register` and `SdkWorker.register.` It should be unfied. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8254) (Java SDK) Add workerRegion and workerZone options
[ https://issues.apache.org/jira/browse/BEAM-8254?focusedWorklogId=337630&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337630 ] ASF GitHub Bot logged work on BEAM-8254: Author: ASF GitHub Bot Created on: 02/Nov/19 00:39 Start Date: 02/Nov/19 00:39 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9961: [BEAM-8254] add workerRegion and workerZone options to Java SDK URL: https://github.com/apache/beam/pull/9961#discussion_r341790434 ## File path: runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java ## @@ -357,6 +361,36 @@ public static DataflowRunner fromOptions(PipelineOptions options) { return new DataflowRunner(dataflowOptions); } + @VisibleForTesting + static void validateWorkerSettings(GcpOptions gcpOptions) { +Preconditions.checkArgument( +gcpOptions.getZone() == null || gcpOptions.getWorkerRegion() == null, +"Cannot use option zone with workerRegion."); Review comment: "Cannot use option zone with workerRegion." -> "Cannot use option zone with workerRegion. Prefer workerRegion and workerZone." ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337630) Time Spent: 20m (was: 10m) > (Java SDK) Add workerRegion and workerZone options > -- > > Key: BEAM-8254 > URL: https://issues.apache.org/jira/browse/BEAM-8254 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow, sdk-java-core >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8252) (Python SDK) Add worker_region and worker_zone options
[ https://issues.apache.org/jira/browse/BEAM-8252?focusedWorklogId=337631&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337631 ] ASF GitHub Bot logged work on BEAM-8252: Author: ASF GitHub Bot Created on: 02/Nov/19 00:39 Start Date: 02/Nov/19 00:39 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9594: [BEAM-8252] Python: add worker_region and worker_zone options URL: https://github.com/apache/beam/pull/9594#issuecomment-548992867 Ack. Reviewed #9961 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337631) Time Spent: 1h 10m (was: 1h) > (Python SDK) Add worker_region and worker_zone options > -- > > Key: BEAM-8252 > URL: https://issues.apache.org/jira/browse/BEAM-8252 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow, sdk-py-core >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Fix For: 2.18.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions
[ https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=337629&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337629 ] ASF GitHub Bot logged work on BEAM-8539: Author: ASF GitHub Bot Created on: 02/Nov/19 00:37 Start Date: 02/Nov/19 00:37 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9969: [BEAM-8539] Provide an initial definition of all job states and the state transition diagram URL: https://github.com/apache/beam/pull/9969#discussion_r341790390 ## File path: model/job-management/src/main/proto/beam_job_api.proto ## @@ -201,6 +201,16 @@ message JobMessagesResponse { } // Enumeration of all JobStates +// +// The state transition diagram is: +// STOPPED -> STARTING -> RUNNING -> DONE +// \> FAILED +// \> CANCELLING -> CANCELLED +// \> UPDATING -> UPDATED +// \> DRAINING -> DRAINED Review comment: Is there someone we can loop into this conversation who would know? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337629) Time Spent: 1h 50m (was: 1h 40m) > Clearly define the valid job state transitions > -- > > Key: BEAM-8539 > URL: https://issues.apache.org/jira/browse/BEAM-8539 > Project: Beam > Issue Type: Improvement > Components: beam-model, runner-core, sdk-java-core, sdk-py-core >Reporter: Chad Dombrova >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > The Beam job state transitions are ill-defined, which is big problem for > anything that relies on the values coming from JobAPI.GetStateStream. > I was hoping to find something like a state transition diagram in the docs so > that I could determine the start state, the terminal states, and the valid > transitions, but I could not find this. The code reveals that the SDKs differ > on the fundamentals: > Java InMemoryJobService: > * start state: *STOPPED* > * run - about to submit to executor: STARTING > * run - actually running on executor: RUNNING > * terminal states: DONE, FAILED, CANCELLED, DRAINED > Python AbstractJobServiceServicer / LocalJobServicer: > * start state: STARTING > * terminal states: DONE, FAILED, CANCELLED, *STOPPED* > I think it would be good to make python work like Java, so that there is a > difference in state between a job that has been prepared and one that has > additionally been run. > It's hard to tell how far this problem has spread within the various runners. > I think a simple thing that can be done to help standardize behavior is to > implement the terminal states as an enum in the beam_job_api.proto, or create > a utility function in each language for checking if a state is terminal, so > that it's not left up to each runner to reimplement this logic. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8472) Get default GCP region from gcloud
[ https://issues.apache.org/jira/browse/BEAM-8472?focusedWorklogId=337628&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337628 ] ASF GitHub Bot logged work on BEAM-8472: Author: ASF GitHub Bot Created on: 02/Nov/19 00:34 Start Date: 02/Nov/19 00:34 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9974: [BEAM-8472] Get default GCP region from gcloud (Java) URL: https://github.com/apache/beam/pull/9974#discussion_r341790105 ## File path: runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java ## @@ -357,6 +357,50 @@ public static DataflowRunner fromOptions(PipelineOptions options) { return new DataflowRunner(dataflowOptions); } + /** + * Get a default value for Google Cloud region according to + * https://cloud.google.com/compute/docs/gcloud-compute/#default-properties. If no other default + * can be found, returns "us-central1". + */ + static String getDefaultGcpRegion() { +String environmentRegion = System.getenv("CLOUDSDK_COMPUTE_REGION"); +if (environmentRegion != null && !environmentRegion.isEmpty()) { + LOG.info("Using default GCP region {} from $CLOUDSDK_COMPUTE_REGION", environmentRegion); + return environmentRegion; +} +try { + ProcessBuilder pb = + new ProcessBuilder(Arrays.asList("gcloud", "config", "get-value", "compute/region")); + Process process = pb.start(); + BufferedReader reader = + new BufferedReader( + new InputStreamReader(process.getInputStream(), StandardCharsets.UTF_8)); + BufferedReader errorReader = + new BufferedReader( + new InputStreamReader(process.getErrorStream(), StandardCharsets.UTF_8)); + process.waitFor(1, TimeUnit.SECONDS); Review comment: - Is 1 second enough? - Should we check the return value of this call? (IIUC it returns false for timeout.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337628) Time Spent: 1h 50m (was: 1h 40m) > Get default GCP region from gcloud > -- > > Key: BEAM-8472 > URL: https://issues.apache.org/jira/browse/BEAM-8472 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > Currently, we default to us-central1 if --region flag is not set. The Google > Cloud SDK generally tries to get a default value in this case for > convenience, which we should follow. > [https://cloud.google.com/compute/docs/gcloud-compute/#order_of_precedence_for_default_properties] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8545) don't docker pull before docker run
[ https://issues.apache.org/jira/browse/BEAM-8545?focusedWorklogId=337627&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337627 ] ASF GitHub Bot logged work on BEAM-8545: Author: ASF GitHub Bot Created on: 02/Nov/19 00:29 Start Date: 02/Nov/19 00:29 Worklog Time Spent: 10m Work Description: ihji commented on issue #9972: [BEAM-8545] don't docker pull before docker run URL: https://github.com/apache/beam/pull/9972#issuecomment-548991708 Run XVR_Flink PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337627) Time Spent: 50m (was: 40m) > don't docker pull before docker run > > > Key: BEAM-8545 > URL: https://issues.apache.org/jira/browse/BEAM-8545 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Since 'docker run' automatically pulls when the image doesn't exist locally, > I think it's safe to remove explicit 'docker pull' before 'docker run'. > Without 'docker pull', we won't update the local image with the remote image > (for the same tag) but it shouldn't be a problem in prod that the unique tag > is assumed for each released version. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8243) Document behavior of FlinkRunner
[ https://issues.apache.org/jira/browse/BEAM-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965162#comment-16965162 ] Kyle Weaver edited comment on BEAM-8243 at 11/2/19 12:12 AM: - Adding BEAM-8512 as a blocker as a way of saying I don't want to advertise features we haven't fully tested yet :) [~thw] Do you mean older versions of FlinkRunner, or the "traditional" process of job submission? was (Author: ibzib): Adding BEAM-8512 as a way of saying I don't want to advertise features we haven't fully tested yet :) [~thw] Do you mean older versions of FlinkRunner, or the "traditional" process of job submission? > Document behavior of FlinkRunner > > > Key: BEAM-8243 > URL: https://issues.apache.org/jira/browse/BEAM-8243 > Project: Beam > Issue Type: Improvement > Components: runner-flink, website >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > > The Flink runner guide should include a couple details > 1) FlinkRunner pulls the job server jar from Maven by default (need to make > this explicit in case of firewall concerns) > 2) how to override in case the above is a problem -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8243) Document behavior of FlinkRunner
[ https://issues.apache.org/jira/browse/BEAM-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965164#comment-16965164 ] Kyle Weaver commented on BEAM-8243: --- Fair enough. > Document behavior of FlinkRunner > > > Key: BEAM-8243 > URL: https://issues.apache.org/jira/browse/BEAM-8243 > Project: Beam > Issue Type: Improvement > Components: runner-flink, website >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > > The Flink runner guide should include a couple details > 1) FlinkRunner pulls the job server jar from Maven by default (need to make > this explicit in case of firewall concerns) > 2) how to override in case the above is a problem -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8243) Document behavior of FlinkRunner
[ https://issues.apache.org/jira/browse/BEAM-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965163#comment-16965163 ] Thomas Weise commented on BEAM-8243: I meant older Beam releases, we should probably only advertise the latest for portability. But the "traditional" process of job submission is also too scary for first time user :) > Document behavior of FlinkRunner > > > Key: BEAM-8243 > URL: https://issues.apache.org/jira/browse/BEAM-8243 > Project: Beam > Issue Type: Improvement > Components: runner-flink, website >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > > The Flink runner guide should include a couple details > 1) FlinkRunner pulls the job server jar from Maven by default (need to make > this explicit in case of firewall concerns) > 2) how to override in case the above is a problem -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions
[ https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=337624&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337624 ] ASF GitHub Bot logged work on BEAM-8539: Author: ASF GitHub Bot Created on: 02/Nov/19 00:07 Start Date: 02/Nov/19 00:07 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #9969: [BEAM-8539] Provide an initial definition of all job states and the state transition diagram URL: https://github.com/apache/beam/pull/9969#discussion_r341787418 ## File path: model/job-management/src/main/proto/beam_job_api.proto ## @@ -201,6 +201,16 @@ message JobMessagesResponse { } // Enumeration of all JobStates +// +// The state transition diagram is: +// STOPPED -> STARTING -> RUNNING -> DONE +// \> FAILED +// \> CANCELLING -> CANCELLED +// \> UPDATING -> UPDATED +// \> DRAINING -> DRAINED Review comment: That should either be DONE, FAILED, or CANCELLED This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337624) Time Spent: 1h 40m (was: 1.5h) > Clearly define the valid job state transitions > -- > > Key: BEAM-8539 > URL: https://issues.apache.org/jira/browse/BEAM-8539 > Project: Beam > Issue Type: Improvement > Components: beam-model, runner-core, sdk-java-core, sdk-py-core >Reporter: Chad Dombrova >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > The Beam job state transitions are ill-defined, which is big problem for > anything that relies on the values coming from JobAPI.GetStateStream. > I was hoping to find something like a state transition diagram in the docs so > that I could determine the start state, the terminal states, and the valid > transitions, but I could not find this. The code reveals that the SDKs differ > on the fundamentals: > Java InMemoryJobService: > * start state: *STOPPED* > * run - about to submit to executor: STARTING > * run - actually running on executor: RUNNING > * terminal states: DONE, FAILED, CANCELLED, DRAINED > Python AbstractJobServiceServicer / LocalJobServicer: > * start state: STARTING > * terminal states: DONE, FAILED, CANCELLED, *STOPPED* > I think it would be good to make python work like Java, so that there is a > difference in state between a job that has been prepared and one that has > additionally been run. > It's hard to tell how far this problem has spread within the various runners. > I think a simple thing that can be done to help standardize behavior is to > implement the terminal states as an enum in the beam_job_api.proto, or create > a utility function in each language for checking if a state is terminal, so > that it's not left up to each runner to reimplement this logic. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8243) Document behavior of FlinkRunner
[ https://issues.apache.org/jira/browse/BEAM-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965162#comment-16965162 ] Kyle Weaver edited comment on BEAM-8243 at 11/2/19 12:07 AM: - Adding BEAM-8512 as a way of saying I don't want to advertise features we haven't fully tested yet :) [~thw] Do you mean older versions of FlinkRunner, or the "traditional" process of job submission? was (Author: ibzib): Adding BEAM-8512 as a way of saying I don't want to advertise features we haven't fully tested yet :) [~thw] What do you mean? > Document behavior of FlinkRunner > > > Key: BEAM-8243 > URL: https://issues.apache.org/jira/browse/BEAM-8243 > Project: Beam > Issue Type: Improvement > Components: runner-flink, website >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > > The Flink runner guide should include a couple details > 1) FlinkRunner pulls the job server jar from Maven by default (need to make > this explicit in case of firewall concerns) > 2) how to override in case the above is a problem -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8243) Document behavior of FlinkRunner
[ https://issues.apache.org/jira/browse/BEAM-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965162#comment-16965162 ] Kyle Weaver commented on BEAM-8243: --- Adding BEAM-8512 as a way of saying I don't want to advertise features we haven't fully tested yet :) [~thw] What do you mean? > Document behavior of FlinkRunner > > > Key: BEAM-8243 > URL: https://issues.apache.org/jira/browse/BEAM-8243 > Project: Beam > Issue Type: Improvement > Components: runner-flink, website >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > > The Flink runner guide should include a couple details > 1) FlinkRunner pulls the job server jar from Maven by default (need to make > this explicit in case of firewall concerns) > 2) how to override in case the above is a problem -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8472) Get default GCP region from gcloud
[ https://issues.apache.org/jira/browse/BEAM-8472?focusedWorklogId=337623&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337623 ] ASF GitHub Bot logged work on BEAM-8472: Author: ASF GitHub Bot Created on: 01/Nov/19 23:54 Start Date: 01/Nov/19 23:54 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #9974: [BEAM-8472] Get default GCP region from gcloud (Java) URL: https://github.com/apache/beam/pull/9974 Same as #9868 but more verbose :smile: As with the Python version, I've verified this works locally. This doesn't fit into a unit test because it requires special environment configuration and/or calling an external process, but nor does it seem to warrant an integration test. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https:
[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source
[ https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=337620&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337620 ] ASF GitHub Bot logged work on BEAM-8427: Author: ASF GitHub Bot Created on: 01/Nov/19 23:52 Start Date: 01/Nov/19 23:52 Worklog Time Spent: 10m Work Description: 11moon11 commented on pull request #9892: [BEAM-8427] [SQL] buildIOWrite from MongoDb Table URL: https://github.com/apache/beam/pull/9892#discussion_r341785685 ## File path: sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbReadWriteIT.java ## @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider.mongodb; + +import static org.apache.beam.sdk.schemas.Schema.FieldType.BOOLEAN; +import static org.apache.beam.sdk.schemas.Schema.FieldType.BYTE; +import static org.apache.beam.sdk.schemas.Schema.FieldType.DOUBLE; +import static org.apache.beam.sdk.schemas.Schema.FieldType.FLOAT; +import static org.apache.beam.sdk.schemas.Schema.FieldType.INT16; +import static org.apache.beam.sdk.schemas.Schema.FieldType.INT32; +import static org.apache.beam.sdk.schemas.Schema.FieldType.INT64; +import static org.apache.beam.sdk.schemas.Schema.FieldType.STRING; +import static org.junit.Assert.assertEquals; + +import com.mongodb.MongoClient; +import java.util.Arrays; +import org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils; +import org.apache.beam.sdk.io.mongodb.MongoDBIOIT.MongoDBPipelineOptions; +import org.apache.beam.sdk.options.PipelineOptionsFactory; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; +import org.junit.AfterClass; +import org.junit.BeforeClass; +import org.junit.Rule; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** + * A test of {@link org.apache.beam.sdk.extensions.sql.meta.provider.mongodb.MongoDbTable} on an + * independent Mongo instance. + * + * This test requires a running instance of MongoDB. Pass in connection information using + * PipelineOptions: + * + * + * ./gradlew integrationTest -p sdks/java/extensions/sql/integrationTest -DintegrationTestPipelineOptions='[ + * "--mongoDBHostName=1.2.3.4", + * "--mongoDBPort=27017", + * "--mongoDBDatabaseName=mypass", + * "--numberOfRecords=1000" ]' + * --tests org.apache.beam.sdk.extensions.sql.meta.provider.mongodb.MongoDbReadWriteIT + * -DintegrationTestRunner=direct + * + * + * A database, specified in the pipeline options, will be created implicitly if it does not exist + * already. And dropped upon completing tests. + * + * Please see 'build_rules.gradle' file for instructions regarding running this test using Beam + * performance testing framework. + */ +@RunWith(JUnit4.class) +public class MongoDbReadWriteIT { + private static final Schema SOURCE_SCHEMA = + Schema.builder() + .addNullableField("_id", STRING) + .addNullableField("c_bigint", INT64) + .addNullableField("c_tinyint", BYTE) + .addNullableField("c_smallint", INT16) + .addNullableField("c_integer", INT32) + .addNullableField("c_float", FLOAT) + .addNullableField("c_double", DOUBLE) + .addNullableField("c_boolean", BOOLEAN) + .addNullableField("c_varchar", STRING) + .addNullableField("c_arr", FieldType.array(STRING)) + .build(); + private static final String collection = "collection"; + private static MongoDBPipelineOptions options; + + @Rule public final TestPipeline writePipeline = TestPipeline.create(); + @Rule public final TestPipeline readPipeline = TestPipeline.create(); + + @BeforeClass + public static void setUp() throws Exception {
[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source
[ https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=337621&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337621 ] ASF GitHub Bot logged work on BEAM-8427: Author: ASF GitHub Bot Created on: 01/Nov/19 23:52 Start Date: 01/Nov/19 23:52 Worklog Time Spent: 10m Work Description: 11moon11 commented on pull request #9892: [BEAM-8427] [SQL] buildIOWrite from MongoDb Table URL: https://github.com/apache/beam/pull/9892#discussion_r341785685 ## File path: sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbReadWriteIT.java ## @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.meta.provider.mongodb; + +import static org.apache.beam.sdk.schemas.Schema.FieldType.BOOLEAN; +import static org.apache.beam.sdk.schemas.Schema.FieldType.BYTE; +import static org.apache.beam.sdk.schemas.Schema.FieldType.DOUBLE; +import static org.apache.beam.sdk.schemas.Schema.FieldType.FLOAT; +import static org.apache.beam.sdk.schemas.Schema.FieldType.INT16; +import static org.apache.beam.sdk.schemas.Schema.FieldType.INT32; +import static org.apache.beam.sdk.schemas.Schema.FieldType.INT64; +import static org.apache.beam.sdk.schemas.Schema.FieldType.STRING; +import static org.junit.Assert.assertEquals; + +import com.mongodb.MongoClient; +import java.util.Arrays; +import org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils; +import org.apache.beam.sdk.io.mongodb.MongoDBIOIT.MongoDBPipelineOptions; +import org.apache.beam.sdk.options.PipelineOptionsFactory; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.Row; +import org.junit.AfterClass; +import org.junit.BeforeClass; +import org.junit.Rule; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** + * A test of {@link org.apache.beam.sdk.extensions.sql.meta.provider.mongodb.MongoDbTable} on an + * independent Mongo instance. + * + * This test requires a running instance of MongoDB. Pass in connection information using + * PipelineOptions: + * + * + * ./gradlew integrationTest -p sdks/java/extensions/sql/integrationTest -DintegrationTestPipelineOptions='[ + * "--mongoDBHostName=1.2.3.4", + * "--mongoDBPort=27017", + * "--mongoDBDatabaseName=mypass", + * "--numberOfRecords=1000" ]' + * --tests org.apache.beam.sdk.extensions.sql.meta.provider.mongodb.MongoDbReadWriteIT + * -DintegrationTestRunner=direct + * + * + * A database, specified in the pipeline options, will be created implicitly if it does not exist + * already. And dropped upon completing tests. + * + * Please see 'build_rules.gradle' file for instructions regarding running this test using Beam + * performance testing framework. + */ +@RunWith(JUnit4.class) +public class MongoDbReadWriteIT { + private static final Schema SOURCE_SCHEMA = + Schema.builder() + .addNullableField("_id", STRING) + .addNullableField("c_bigint", INT64) + .addNullableField("c_tinyint", BYTE) + .addNullableField("c_smallint", INT16) + .addNullableField("c_integer", INT32) + .addNullableField("c_float", FLOAT) + .addNullableField("c_double", DOUBLE) + .addNullableField("c_boolean", BOOLEAN) + .addNullableField("c_varchar", STRING) + .addNullableField("c_arr", FieldType.array(STRING)) + .build(); + private static final String collection = "collection"; + private static MongoDBPipelineOptions options; + + @Rule public final TestPipeline writePipeline = TestPipeline.create(); + @Rule public final TestPipeline readPipeline = TestPipeline.create(); + + @BeforeClass + public static void setUp() throws Exception {
[jira] [Work logged] (BEAM-8146) SchemaCoder/RowCoder have no equals() function
[ https://issues.apache.org/jira/browse/BEAM-8146?focusedWorklogId=337616&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337616 ] ASF GitHub Bot logged work on BEAM-8146: Author: ASF GitHub Bot Created on: 01/Nov/19 23:23 Start Date: 01/Nov/19 23:23 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #9493: [BEAM-8146,BEAM-8204,BEAM-8205] Add equals and hashCode to SchemaCoder and RowCoder URL: https://github.com/apache/beam/pull/9493#issuecomment-548982144 +1 on merging @kennknowles This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337616) Time Spent: 3h 20m (was: 3h 10m) > SchemaCoder/RowCoder have no equals() function > -- > > Key: BEAM-8146 > URL: https://issues.apache.org/jira/browse/BEAM-8146 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.15.0 >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > SchemaCoder has no equals function, so it can't be compared in tests, like > CloudComponentsTests$DefaultCoders, which is being re-enabled in > https://github.com/apache/beam/pull/9446 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=337614&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337614 ] ASF GitHub Bot logged work on BEAM-7926: Author: ASF GitHub Bot Created on: 01/Nov/19 23:19 Start Date: 01/Nov/19 23:19 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9741: [BEAM-7926] Visualize PCollection URL: https://github.com/apache/beam/pull/9741#issuecomment-548981473 Squashed and merged after Ahmet LGTMd, and myself as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337614) Time Spent: 19h 50m (was: 19h 40m) > Visualize PCollection with Interactive Beam > --- > > Key: BEAM-7926 > URL: https://issues.apache.org/jira/browse/BEAM-7926 > Project: Beam > Issue Type: New Feature > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 19h 50m > Remaining Estimate: 0h > > Support auto plotting / charting of materialized data of a given PCollection > with Interactive Beam. > Say an Interactive Beam pipeline defined as > p = create_pipeline() > pcoll = p | 'Transform' >> transform() > The use can call a single function and get auto-magical charting of the data > as materialized pcoll. > e.g., visualize(pcoll) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=337613&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337613 ] ASF GitHub Bot logged work on BEAM-7926: Author: ASF GitHub Bot Created on: 01/Nov/19 23:18 Start Date: 01/Nov/19 23:18 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9741: [BEAM-7926] Visualize PCollection URL: https://github.com/apache/beam/pull/9741 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337613) Time Spent: 19h 40m (was: 19.5h) > Visualize PCollection with Interactive Beam > --- > > Key: BEAM-7926 > URL: https://issues.apache.org/jira/browse/BEAM-7926 > Project: Beam > Issue Type: New Feature > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 19h 40m > Remaining Estimate: 0h > > Support auto plotting / charting of materialized data of a given PCollection > with Interactive Beam. > Say an Interactive Beam pipeline defined as > p = create_pipeline() > pcoll = p | 'Transform' >> transform() > The use can call a single function and get auto-magical charting of the data > as materialized pcoll. > e.g., visualize(pcoll) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation
[ https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=337612&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337612 ] ASF GitHub Bot logged work on BEAM-7013: Author: ASF GitHub Bot Created on: 01/Nov/19 23:16 Start Date: 01/Nov/19 23:16 Worklog Time Spent: 10m Work Description: robinyqiu commented on issue #9778: [BEAM-7013] Update BigQueryHllSketchCompatibilityIT to cover empty sketch cases URL: https://github.com/apache/beam/pull/9778#issuecomment-548981025 > is there any way we could make the conversion functors (parseQueryResultToByteArray, and the inlined one for the other direction) available to users as utility objects or methods? I have considered this but unfortunately we cannot do that, because in the same function users might want to parse other fields. However I do find a way to extract part of the logic into its own function. See the `getSketchFromByteBuffer` function. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337612) Time Spent: 36h 20m (was: 36h 10m) > A new count distinct transform based on BigQuery compatible HyperLogLog++ > implementation > > > Key: BEAM-7013 > URL: https://issues.apache.org/jira/browse/BEAM-7013 > Project: Beam > Issue Type: New Feature > Components: extensions-java-sketching, sdk-java-core >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Fix For: 2.16.0 > > Time Spent: 36h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8467) Enable reading compressed files with Python fileio
[ https://issues.apache.org/jira/browse/BEAM-8467?focusedWorklogId=337611&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337611 ] ASF GitHub Bot logged work on BEAM-8467: Author: ASF GitHub Bot Created on: 01/Nov/19 23:15 Start Date: 01/Nov/19 23:15 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9861: [BEAM-8467] Enabling reading compressed files URL: https://github.com/apache/beam/pull/9861#issuecomment-548980903 Run Python Precommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337611) Time Spent: 1h (was: 50m) > Enable reading compressed files with Python fileio > -- > > Key: BEAM-8467 > URL: https://issues.apache.org/jira/browse/BEAM-8467 > Project: Beam > Issue Type: Improvement > Components: io-py-files >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8467) Enable reading compressed files with Python fileio
[ https://issues.apache.org/jira/browse/BEAM-8467?focusedWorklogId=337610&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337610 ] ASF GitHub Bot logged work on BEAM-8467: Author: ASF GitHub Bot Created on: 01/Nov/19 23:15 Start Date: 01/Nov/19 23:15 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9861: [BEAM-8467] Enabling reading compressed files URL: https://github.com/apache/beam/pull/9861#issuecomment-548980872 Run Python2_PVR_Flink Precommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337610) Time Spent: 50m (was: 40m) > Enable reading compressed files with Python fileio > -- > > Key: BEAM-8467 > URL: https://issues.apache.org/jira/browse/BEAM-8467 > Project: Beam > Issue Type: Improvement > Components: io-py-files >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8435) Allow access to PaneInfo from Python DoFns
[ https://issues.apache.org/jira/browse/BEAM-8435?focusedWorklogId=337597&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337597 ] ASF GitHub Bot logged work on BEAM-8435: Author: ASF GitHub Bot Created on: 01/Nov/19 22:46 Start Date: 01/Nov/19 22:46 Worklog Time Spent: 10m Work Description: robertwb commented on issue #9836: [BEAM-8435] Implement PaneInfo computation for Python. URL: https://github.com/apache/beam/pull/9836#issuecomment-548975390 Thanks. Fixing the lint error and ensuring all tests pass before merging. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337597) Time Spent: 1h 40m (was: 1.5h) > Allow access to PaneInfo from Python DoFns > -- > > Key: BEAM-8435 > URL: https://issues.apache.org/jira/browse/BEAM-8435 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > PaneInfoParam exists, but the plumbing to actually populate it at runtime was > never added. (Nor, clearly, were any tests...) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8544) Install Beam SDK with ccache for faster re-install.
[ https://issues.apache.org/jira/browse/BEAM-8544?focusedWorklogId=337596&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337596 ] ASF GitHub Bot logged work on BEAM-8544: Author: ASF GitHub Bot Created on: 01/Nov/19 22:44 Start Date: 01/Nov/19 22:44 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9966: [BEAM-8544] Use ccache for compiling the Beam Python SDK. URL: https://github.com/apache/beam/pull/9966#discussion_r341774895 ## File path: sdks/python/container/Dockerfile ## @@ -43,9 +45,13 @@ RUN \ # Remove pip cache. rm -rf /root/.cache/pip +# Configure ccache prior to installing the SDK. +RUN ln -s /usr/bin/ccache /usr/local/bin/gcc +# These parameters are needed as pip compiles artifacts in random temporary directories. +RUN ccache --set-config=sloppiness=file_macro && ccache --set-config=hash_dir=false COPY target/apache-beam.tar.gz /opt/apache/beam/tars/ -RUN pip install /opt/apache/beam/tars/apache-beam.tar.gz[gcp] && \ +RUN pip install -v /opt/apache/beam/tars/apache-beam.tar.gz[gcp] && \ # Remove pip cache. rm -rf /root/.cache/pip Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337596) Time Spent: 50m (was: 40m) > Install Beam SDK with ccache for faster re-install. > --- > > Key: BEAM-8544 > URL: https://issues.apache.org/jira/browse/BEAM-8544 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Re-compliling the C modules of the SDK takes 2-3 minutes. This adds to worker > startup time whenever a custom SDK is being used (in particular, during > development and testing). We can use ccache to re-use the old compile results > when the Cython files have not changed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8491) Add ability for multiple output PCollections from composites
[ https://issues.apache.org/jira/browse/BEAM-8491?focusedWorklogId=337588&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337588 ] ASF GitHub Bot logged work on BEAM-8491: Author: ASF GitHub Bot Created on: 01/Nov/19 22:30 Start Date: 01/Nov/19 22:30 Worklog Time Spent: 10m Work Description: rohdesamuel commented on issue #9912: [BEAM-8491] Add ability for replacing transforms with multiple outputs URL: https://github.com/apache/beam/pull/9912#issuecomment-548972093 Thanks Cham! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337588) Time Spent: 1h 20m (was: 1h 10m) > Add ability for multiple output PCollections from composites > > > Key: BEAM-8491 > URL: https://issues.apache.org/jira/browse/BEAM-8491 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > The Python SDK has DoOutputTuples which allows for a single transform to have > multiple outputs. However, this does not include the ability for a composite > transform to have multiple outputs PCollections from different transforms. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8491) Add ability for multiple output PCollections from composites
[ https://issues.apache.org/jira/browse/BEAM-8491?focusedWorklogId=337586&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337586 ] ASF GitHub Bot logged work on BEAM-8491: Author: ASF GitHub Bot Created on: 01/Nov/19 22:28 Start Date: 01/Nov/19 22:28 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #9912: [BEAM-8491] Add ability for replacing transforms with multiple outputs URL: https://github.com/apache/beam/pull/9912#discussion_r341771799 ## File path: sdks/python/apache_beam/pipeline_test.py ## @@ -438,6 +438,101 @@ def get_replacement_transform(self, ptransform): p.replace_all([override]) self.assertEqual(pcoll.producer.inputs[0].element_type, expected_type) + + def test_ptransform_override_multiple_outputs(self): +class _MultiParDoComposite(PTransform): Review comment: Nope! Thanks for the catch! Deleted. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337586) Time Spent: 1h 10m (was: 1h) > Add ability for multiple output PCollections from composites > > > Key: BEAM-8491 > URL: https://issues.apache.org/jira/browse/BEAM-8491 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > The Python SDK has DoOutputTuples which allows for a single transform to have > multiple outputs. However, this does not include the ability for a composite > transform to have multiple outputs PCollections from different transforms. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8491) Add ability for multiple output PCollections from composites
[ https://issues.apache.org/jira/browse/BEAM-8491?focusedWorklogId=337585&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337585 ] ASF GitHub Bot logged work on BEAM-8491: Author: ASF GitHub Bot Created on: 01/Nov/19 22:28 Start Date: 01/Nov/19 22:28 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #9912: [BEAM-8491] Add ability for replacing transforms with multiple outputs URL: https://github.com/apache/beam/pull/9912#discussion_r341771747 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -263,31 +262,29 @@ def _replace_if_needed(self, original_transform_node): new_output = replacement_transform.expand(input_node) - new_output.element_type = None - self.pipeline._infer_result_type(replacement_transform, inputs, - new_output) - + if isinstance(new_output, pvalue.PValue): +new_output.element_type = None +self.pipeline._infer_result_type(replacement_transform, inputs, + new_output) replacement_transform_node.add_output(new_output) - if not new_output.producer: -new_output.producer = replacement_transform_node - - # We only support replacing transforms with a single output with - # another transform that produces a single output. - # TODO: Support replacing PTransforms with multiple outputs. - if (len(original_transform_node.outputs) > 1 or - not isinstance(original_transform_node.outputs[None], - (PCollection, PDone)) or - not isinstance(new_output, (PCollection, PDone))): -raise NotImplementedError( -'PTransform overriding is only supported for PTransforms that ' -'have a single output. Tried to replace output of ' -'AppliedPTransform %r with %r.' -% (original_transform_node, new_output)) # Recording updated outputs. This cannot be done in the same visitor # since if we dynamically update output type here, we'll run into # errors when visiting child nodes. - output_map[original_transform_node.outputs[None]] = new_output + if isinstance(new_output, pvalue.PValue): +if not new_output.producer: + new_output.producer = replacement_transform_node +output_map[original_transform_node.outputs[None]] = new_output + elif isinstance(new_output, (pvalue.DoOutputsTuple, tuple)): +for pcoll in new_output: + if not pcoll.producer: +pcoll.producer = replacement_transform_node + output_map[original_transform_node.outputs[pcoll.tag]] = pcoll Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337585) Time Spent: 1h (was: 50m) > Add ability for multiple output PCollections from composites > > > Key: BEAM-8491 > URL: https://issues.apache.org/jira/browse/BEAM-8491 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > The Python SDK has DoOutputTuples which allows for a single transform to have > multiple outputs. However, this does not include the ability for a composite > transform to have multiple outputs PCollections from different transforms. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam
[ https://issues.apache.org/jira/browse/BEAM-8368?focusedWorklogId=337584&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337584 ] ASF GitHub Bot logged work on BEAM-8368: Author: ASF GitHub Bot Created on: 01/Nov/19 22:25 Start Date: 01/Nov/19 22:25 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9970: [BEAM-8368] [BEAM-8392] Update pyarrow to the latest version URL: https://github.com/apache/beam/pull/9970#issuecomment-548971157 Thank you @TheNeuralBit! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337584) Time Spent: 2h 50m (was: 2h 40m) > [Python] libprotobuf-generated exception when importing apache_beam > --- > > Key: BEAM-8368 > URL: https://issues.apache.org/jira/browse/BEAM-8368 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.15.0, 2.17.0 >Reporter: Ubaier Bhat >Assignee: Brian Hulette >Priority: Blocker > Fix For: 2.17.0 > > Attachments: error_log.txt > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Unable to import apache_beam after upgrading to macos 10.15 (Catalina). > Cleared all the pipenvs and but can't get it working again. > {code} > import apache_beam as beam > /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84: > UserWarning: Some syntactic constructs of Python 3 are not yet fully > supported by Apache Beam. > 'Some syntactic constructs of Python 3 are not yet fully supported by ' > [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already > exists in database: > [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: > GeneratedDatabase()->Add(encoded_file_descriptor, size): > libc++abi.dylib: terminating with uncaught exception of type > google::protobuf::FatalException: CHECK failed: > GeneratedDatabase()->Add(encoded_file_descriptor, size): > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-6747) Adding ExternalTransform in Java SDK
[ https://issues.apache.org/jira/browse/BEAM-6747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heejong Lee resolved BEAM-6747. --- Fix Version/s: 2.13.0 Resolution: Fixed Created a separate issue: BEAM-8546 > Adding ExternalTransform in Java SDK > > > Key: BEAM-6747 > URL: https://issues.apache.org/jira/browse/BEAM-6747 > Project: Beam > Issue Type: Improvement > Components: runner-core, sdk-java-core >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Fix For: 2.13.0 > > Time Spent: 7h > Remaining Estimate: 0h > > Adding Java counterpart of Python ExternalTransform for testing Python > transforms from pipelines in Java SDK. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8546) Moving Java external transform from core-construction-java to core
Heejong Lee created BEAM-8546: - Summary: Moving Java external transform from core-construction-java to core Key: BEAM-8546 URL: https://issues.apache.org/jira/browse/BEAM-8546 Project: Beam Issue Type: Improvement Components: sdk-ideas, sdk-java-core Reporter: Heejong Lee Creating a separate issue from BEAM-6747. Related discusstion: [https://lists.apache.org/list.html?d...@beam.apache.org:lte=111M:How%20to%20expose%2Fuse%20the%20External%20transform%20on%20Java%20SDK] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=337582&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337582 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 01/Nov/19 22:08 Start Date: 01/Nov/19 22:08 Worklog Time Spent: 10m Work Description: rohdesamuel commented on issue #9954: [BEAM-8335] Add the PTuple URL: https://github.com/apache/beam/pull/9954#issuecomment-548967401 I agree that I can use either dicts or tuples but I don't agree that they are a good user experience. Look at how you access the PCollection: dicts are accessed by strings, tuples are accessed by ints, namedtuples are accessed by field names, DoOutputsTuples are accessed by any of the prior. Now imagine that you have an arbitrary PTransform with multiple outputs, because of all the different access patterns the user won't know how to access the outputs. While it may be nice to have "first class support" in theory, in practice it doesn't work well. That's where the PTuple comes in. This gives the user what they want: a homogeneous access pattern for multiple outputs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337582) Time Spent: 14h 40m (was: 14.5h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 14h 40m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3713) Consider moving away from nose to nose2 or pytest.
[ https://issues.apache.org/jira/browse/BEAM-3713?focusedWorklogId=337580&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337580 ] ASF GitHub Bot logged work on BEAM-3713: Author: ASF GitHub Bot Created on: 01/Nov/19 22:00 Start Date: 01/Nov/19 22:00 Worklog Time Spent: 10m Work Description: udim commented on issue #9756: [BEAM-3713] Add pytest for unit tests URL: https://github.com/apache/beam/pull/9756#issuecomment-548965339 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337580) Time Spent: 11.5h (was: 11h 20m) > Consider moving away from nose to nose2 or pytest. > -- > > Key: BEAM-3713 > URL: https://issues.apache.org/jira/browse/BEAM-3713 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: Robert Bradshaw >Assignee: Udi Meiri >Priority: Minor > Time Spent: 11.5h > Remaining Estimate: 0h > > Per > [https://nose.readthedocs.io/en/latest/|https://nose.readthedocs.io/en/latest/,] > , nose is in maintenance mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8545) don't docker pull before docker run
[ https://issues.apache.org/jira/browse/BEAM-8545?focusedWorklogId=337579&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337579 ] ASF GitHub Bot logged work on BEAM-8545: Author: ASF GitHub Bot Created on: 01/Nov/19 21:51 Start Date: 01/Nov/19 21:51 Worklog Time Spent: 10m Work Description: ihji commented on issue #9972: [BEAM-8545] don't docker pull before docker run URL: https://github.com/apache/beam/pull/9972#issuecomment-548963123 run xvr_flink postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337579) Time Spent: 40m (was: 0.5h) > don't docker pull before docker run > > > Key: BEAM-8545 > URL: https://issues.apache.org/jira/browse/BEAM-8545 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Since 'docker run' automatically pulls when the image doesn't exist locally, > I think it's safe to remove explicit 'docker pull' before 'docker run'. > Without 'docker pull', we won't update the local image with the remote image > (for the same tag) but it shouldn't be a problem in prod that the unique tag > is assumed for each released version. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8545) don't docker pull before docker run
[ https://issues.apache.org/jira/browse/BEAM-8545?focusedWorklogId=337578&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337578 ] ASF GitHub Bot logged work on BEAM-8545: Author: ASF GitHub Bot Created on: 01/Nov/19 21:51 Start Date: 01/Nov/19 21:51 Worklog Time Spent: 10m Work Description: ihji commented on issue #9972: [BEAM-8545] don't docker pull before docker run URL: https://github.com/apache/beam/pull/9972#issuecomment-548963259 Run XVR_Flink PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337578) Time Spent: 0.5h (was: 20m) > don't docker pull before docker run > > > Key: BEAM-8545 > URL: https://issues.apache.org/jira/browse/BEAM-8545 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Since 'docker run' automatically pulls when the image doesn't exist locally, > I think it's safe to remove explicit 'docker pull' before 'docker run'. > Without 'docker pull', we won't update the local image with the remote image > (for the same tag) but it shouldn't be a problem in prod that the unique tag > is assumed for each released version. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8545) don't docker pull before docker run
[ https://issues.apache.org/jira/browse/BEAM-8545?focusedWorklogId=337577&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337577 ] ASF GitHub Bot logged work on BEAM-8545: Author: ASF GitHub Bot Created on: 01/Nov/19 21:50 Start Date: 01/Nov/19 21:50 Worklog Time Spent: 10m Work Description: ihji commented on issue #9972: [BEAM-8545] don't docker pull before docker run URL: https://github.com/apache/beam/pull/9972#issuecomment-548963123 run xvr_flink postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337577) Time Spent: 20m (was: 10m) > don't docker pull before docker run > > > Key: BEAM-8545 > URL: https://issues.apache.org/jira/browse/BEAM-8545 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Since 'docker run' automatically pulls when the image doesn't exist locally, > I think it's safe to remove explicit 'docker pull' before 'docker run'. > Without 'docker pull', we won't update the local image with the remote image > (for the same tag) but it shouldn't be a problem in prod that the unique tag > is assumed for each released version. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8545) don't docker pull before docker run
[ https://issues.apache.org/jira/browse/BEAM-8545?focusedWorklogId=337576&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337576 ] ASF GitHub Bot logged work on BEAM-8545: Author: ASF GitHub Bot Created on: 01/Nov/19 21:49 Start Date: 01/Nov/19 21:49 Worklog Time Spent: 10m Work Description: ihji commented on pull request #9972: [BEAM-8545] don't docker pull before docker run URL: https://github.com/apache/beam/pull/9972 Since 'docker run' automatically pulls when the image doesn't exist locally, I think it's safe to remove explicit 'docker pull' before 'docker run'. Without 'docker pull', we won't update the local image with the remote image (for the same tag) but it shouldn't be a problem in prod that the unique tag is assumed for each released version. This will also fix BEAM-8534. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastComple
[jira] [Work logged] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam
[ https://issues.apache.org/jira/browse/BEAM-8368?focusedWorklogId=337573&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337573 ] ASF GitHub Bot logged work on BEAM-8368: Author: ASF GitHub Bot Created on: 01/Nov/19 21:48 Start Date: 01/Nov/19 21:48 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #9970: [BEAM-8368] [BEAM-8392] Update pyarrow to the latest version URL: https://github.com/apache/beam/pull/9970#issuecomment-548962449 Wrote a [commit](https://github.com/apache/beam/pull/9971/commits/c600ef5898930faf35631e6ebb32179d70f6e46b) for this and put up a [PR](https://github.com/apache/beam/pull/9971/commits) to merge into this branch This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337573) Time Spent: 2h 40m (was: 2.5h) > [Python] libprotobuf-generated exception when importing apache_beam > --- > > Key: BEAM-8368 > URL: https://issues.apache.org/jira/browse/BEAM-8368 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.15.0, 2.17.0 >Reporter: Ubaier Bhat >Assignee: Brian Hulette >Priority: Blocker > Fix For: 2.17.0 > > Attachments: error_log.txt > > Time Spent: 2h 40m > Remaining Estimate: 0h > > Unable to import apache_beam after upgrading to macos 10.15 (Catalina). > Cleared all the pipenvs and but can't get it working again. > {code} > import apache_beam as beam > /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84: > UserWarning: Some syntactic constructs of Python 3 are not yet fully > supported by Apache Beam. > 'Some syntactic constructs of Python 3 are not yet fully supported by ' > [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already > exists in database: > [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: > GeneratedDatabase()->Add(encoded_file_descriptor, size): > libc++abi.dylib: terminating with uncaught exception of type > google::protobuf::FatalException: CHECK failed: > GeneratedDatabase()->Add(encoded_file_descriptor, size): > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-7917) Python datastore v1new fails on retry
[ https://issues.apache.org/jira/browse/BEAM-7917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri resolved BEAM-7917. - Resolution: Fixed > Python datastore v1new fails on retry > - > > Key: BEAM-7917 > URL: https://issues.apache.org/jira/browse/BEAM-7917 > Project: Beam > Issue Type: Bug > Components: io-py-gcp, runner-dataflow >Affects Versions: 2.14.0 > Environment: Python 3.7 on Dataflow >Reporter: Dmytro Sadovnychyi >Assignee: Dmytro Sadovnychyi >Priority: Major > Labels: pull-request-available > Fix For: 2.18.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > Traceback (most recent call last): > File "apache_beam/runners/common.py", line 782, in > apache_beam.runners.common.DoFnRunner.process > File "apache_beam/runners/common.py", line 454, in > apache_beam.runners.common.SimpleInvoker.invoke_process > File > "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py", > line 334, in process > self._flush_batch() > File > "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py", > line 349, in _flush_batch > throttle_delay=util.WRITE_BATCH_TARGET_LATENCY_MS // 1000) > File "/usr/local/lib/python3.7/site-packages/apache_beam/utils/retry.py", > line 197, in wrapper > return fun(*args, **kwargs) > File > "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/helper.py", > line 99, in write_mutations > batch.commit() > File > "/usr/local/lib/python3.7/site-packages/google/cloud/datastore/batch.py", > line 271, in commit > raise ValueError("Batch must be in progress to commit()") > ValueError: Batch must be in progress to commit() -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7917) Python datastore v1new fails on retry
[ https://issues.apache.org/jira/browse/BEAM-7917?focusedWorklogId=337570&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337570 ] ASF GitHub Bot logged work on BEAM-7917: Author: ASF GitHub Bot Created on: 01/Nov/19 21:45 Start Date: 01/Nov/19 21:45 Worklog Time Spent: 10m Work Description: udim commented on pull request #9294: [BEAM-7917] Fix datastore writes failing on retry URL: https://github.com/apache/beam/pull/9294#discussion_r341762012 ## File path: sdks/python/apache_beam/io/gcp/datastore/v1new/datastoreio.py ## @@ -340,12 +340,13 @@ def finish_bundle(self): def _init_batch(self): self._batch_bytes_size = 0 self._batch = self._client.batch() - self._batch.begin() + self._batch_mutations = [] def _flush_batch(self): # Flush the current batch of mutations to Cloud Datastore. latency_ms = helper.write_mutations( Review comment: I think your choice is a valid compromise with what to do with the results of `element_to_client_batch_item`. I see 2 options here: 1. Save `client_element` items. 2. Discard `client_element` items after calling `ByteSize()` on them. The first option seems more CPU efficient while the second seems more memory efficient. I don't know which option is faster, but from my limited experience saving CPU at the expense of RAM seems like a good tradeoff. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337570) Time Spent: 3h 40m (was: 3.5h) > Python datastore v1new fails on retry > - > > Key: BEAM-7917 > URL: https://issues.apache.org/jira/browse/BEAM-7917 > Project: Beam > Issue Type: Bug > Components: io-py-gcp, runner-dataflow >Affects Versions: 2.14.0 > Environment: Python 3.7 on Dataflow >Reporter: Dmytro Sadovnychyi >Assignee: Dmytro Sadovnychyi >Priority: Major > Labels: pull-request-available > Time Spent: 3h 40m > Remaining Estimate: 0h > > Traceback (most recent call last): > File "apache_beam/runners/common.py", line 782, in > apache_beam.runners.common.DoFnRunner.process > File "apache_beam/runners/common.py", line 454, in > apache_beam.runners.common.SimpleInvoker.invoke_process > File > "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py", > line 334, in process > self._flush_batch() > File > "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py", > line 349, in _flush_batch > throttle_delay=util.WRITE_BATCH_TARGET_LATENCY_MS // 1000) > File "/usr/local/lib/python3.7/site-packages/apache_beam/utils/retry.py", > line 197, in wrapper > return fun(*args, **kwargs) > File > "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/helper.py", > line 99, in write_mutations > batch.commit() > File > "/usr/local/lib/python3.7/site-packages/google/cloud/datastore/batch.py", > line 271, in commit > raise ValueError("Batch must be in progress to commit()") > ValueError: Batch must be in progress to commit() -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-7917) Python datastore v1new fails on retry
[ https://issues.apache.org/jira/browse/BEAM-7917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri updated BEAM-7917: Fix Version/s: 2.18.0 > Python datastore v1new fails on retry > - > > Key: BEAM-7917 > URL: https://issues.apache.org/jira/browse/BEAM-7917 > Project: Beam > Issue Type: Bug > Components: io-py-gcp, runner-dataflow >Affects Versions: 2.14.0 > Environment: Python 3.7 on Dataflow >Reporter: Dmytro Sadovnychyi >Assignee: Dmytro Sadovnychyi >Priority: Major > Labels: pull-request-available > Fix For: 2.18.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > Traceback (most recent call last): > File "apache_beam/runners/common.py", line 782, in > apache_beam.runners.common.DoFnRunner.process > File "apache_beam/runners/common.py", line 454, in > apache_beam.runners.common.SimpleInvoker.invoke_process > File > "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py", > line 334, in process > self._flush_batch() > File > "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py", > line 349, in _flush_batch > throttle_delay=util.WRITE_BATCH_TARGET_LATENCY_MS // 1000) > File "/usr/local/lib/python3.7/site-packages/apache_beam/utils/retry.py", > line 197, in wrapper > return fun(*args, **kwargs) > File > "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/helper.py", > line 99, in write_mutations > batch.commit() > File > "/usr/local/lib/python3.7/site-packages/google/cloud/datastore/batch.py", > line 271, in commit > raise ValueError("Batch must be in progress to commit()") > ValueError: Batch must be in progress to commit() -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7917) Python datastore v1new fails on retry
[ https://issues.apache.org/jira/browse/BEAM-7917?focusedWorklogId=337572&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337572 ] ASF GitHub Bot logged work on BEAM-7917: Author: ASF GitHub Bot Created on: 01/Nov/19 21:46 Start Date: 01/Nov/19 21:46 Worklog Time Spent: 10m Work Description: udim commented on pull request #9294: [BEAM-7917] Fix datastore writes failing on retry URL: https://github.com/apache/beam/pull/9294 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337572) Time Spent: 3h 50m (was: 3h 40m) > Python datastore v1new fails on retry > - > > Key: BEAM-7917 > URL: https://issues.apache.org/jira/browse/BEAM-7917 > Project: Beam > Issue Type: Bug > Components: io-py-gcp, runner-dataflow >Affects Versions: 2.14.0 > Environment: Python 3.7 on Dataflow >Reporter: Dmytro Sadovnychyi >Assignee: Dmytro Sadovnychyi >Priority: Major > Labels: pull-request-available > Fix For: 2.18.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > Traceback (most recent call last): > File "apache_beam/runners/common.py", line 782, in > apache_beam.runners.common.DoFnRunner.process > File "apache_beam/runners/common.py", line 454, in > apache_beam.runners.common.SimpleInvoker.invoke_process > File > "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py", > line 334, in process > self._flush_batch() > File > "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py", > line 349, in _flush_batch > throttle_delay=util.WRITE_BATCH_TARGET_LATENCY_MS // 1000) > File "/usr/local/lib/python3.7/site-packages/apache_beam/utils/retry.py", > line 197, in wrapper > return fun(*args, **kwargs) > File > "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/helper.py", > line 99, in write_mutations > batch.commit() > File > "/usr/local/lib/python3.7/site-packages/google/cloud/datastore/batch.py", > line 271, in commit > raise ValueError("Batch must be in progress to commit()") > ValueError: Batch must be in progress to commit() -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7917) Python datastore v1new fails on retry
[ https://issues.apache.org/jira/browse/BEAM-7917?focusedWorklogId=337569&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337569 ] ASF GitHub Bot logged work on BEAM-7917: Author: ASF GitHub Bot Created on: 01/Nov/19 21:43 Start Date: 01/Nov/19 21:43 Worklog Time Spent: 10m Work Description: udim commented on pull request #9294: [BEAM-7917] Fix datastore writes failing on retry URL: https://github.com/apache/beam/pull/9294#discussion_r341762012 ## File path: sdks/python/apache_beam/io/gcp/datastore/v1new/datastoreio.py ## @@ -340,12 +340,13 @@ def finish_bundle(self): def _init_batch(self): self._batch_bytes_size = 0 self._batch = self._client.batch() - self._batch.begin() + self._batch_mutations = [] def _flush_batch(self): # Flush the current batch of mutations to Cloud Datastore. latency_ms = helper.write_mutations( Review comment: I think your choice is a valid compromise with what to do with the results of `element_to_client_batch_item`. I see 2 options here: 1. Save the batch items. 2. Discard batch items after calling `ByteSize()` on them. The first option seems more CPU efficient while the second seems more memory efficient. I don't know which option is faster, but from my limited experience saving CPU at the expense of RAM seems like a good tradeoff. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337569) Time Spent: 3.5h (was: 3h 20m) > Python datastore v1new fails on retry > - > > Key: BEAM-7917 > URL: https://issues.apache.org/jira/browse/BEAM-7917 > Project: Beam > Issue Type: Bug > Components: io-py-gcp, runner-dataflow >Affects Versions: 2.14.0 > Environment: Python 3.7 on Dataflow >Reporter: Dmytro Sadovnychyi >Assignee: Dmytro Sadovnychyi >Priority: Major > Labels: pull-request-available > Time Spent: 3.5h > Remaining Estimate: 0h > > Traceback (most recent call last): > File "apache_beam/runners/common.py", line 782, in > apache_beam.runners.common.DoFnRunner.process > File "apache_beam/runners/common.py", line 454, in > apache_beam.runners.common.SimpleInvoker.invoke_process > File > "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py", > line 334, in process > self._flush_batch() > File > "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py", > line 349, in _flush_batch > throttle_delay=util.WRITE_BATCH_TARGET_LATENCY_MS // 1000) > File "/usr/local/lib/python3.7/site-packages/apache_beam/utils/retry.py", > line 197, in wrapper > return fun(*args, **kwargs) > File > "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/helper.py", > line 99, in write_mutations > batch.commit() > File > "/usr/local/lib/python3.7/site-packages/google/cloud/datastore/batch.py", > line 271, in commit > raise ValueError("Batch must be in progress to commit()") > ValueError: Batch must be in progress to commit() -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8523) Add useful timestamp to job servicer GetJobs
[ https://issues.apache.org/jira/browse/BEAM-8523?focusedWorklogId=337568&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337568 ] ASF GitHub Bot logged work on BEAM-8523: Author: ASF GitHub Bot Created on: 01/Nov/19 21:42 Start Date: 01/Nov/19 21:42 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9959: WIP: [BEAM-8523] JobAPI: Give access to timestamped state change history URL: https://github.com/apache/beam/pull/9959#discussion_r341761566 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/JobStateEvent.java ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.jobsubmission; + +import com.google.auto.value.AutoValue; +import java.time.Instant; +import org.apache.beam.model.jobmanagement.v1.JobApi; +import org.apache.beam.vendor.grpc.v1p21p0.com.google.protobuf.Timestamp; + +/** A state transition event. */ +@AutoValue +public abstract class JobStateEvent { Review comment: > use that proto instead of creating this class. I've noticed that there is a pattern in the Java and Python SDKs to avoid using protobuf objects directly as part of the Beam SDK, and instead favor "native" types with methods for converting to/from protobuf representations. I was trying to maintain that separation by providing this `JobStateEvent` class, but I'm happy to use a protobuf object directly. Let me know what you think. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337568) Time Spent: 1.5h (was: 1h 20m) > Add useful timestamp to job servicer GetJobs > > > Key: BEAM-8523 > URL: https://issues.apache.org/jira/browse/BEAM-8523 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > As a user querying jobs with JobService.GetJobs, it would be useful if the > JobInfo result contained timestamps indicating various state changes that may > have been missed by a client. Useful timestamps include: > > * submitted (prepared to the job service) > * started (executor enters the RUNNING state) > * completed (executor enters a terminal state) > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam
[ https://issues.apache.org/jira/browse/BEAM-8368?focusedWorklogId=337567&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337567 ] ASF GitHub Bot logged work on BEAM-8368: Author: ASF GitHub Bot Created on: 01/Nov/19 21:37 Start Date: 01/Nov/19 21:37 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #9970: [BEAM-8368] [BEAM-8392] Update pyarrow to the latest version URL: https://github.com/apache/beam/pull/9970#issuecomment-548959623 Well it's a pretty simple fix, just need to change [this line](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/parquetio_test.py#L451) to `pa.Table.from_arrays([orig.column('name')], names=['name'])` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337567) Time Spent: 2.5h (was: 2h 20m) > [Python] libprotobuf-generated exception when importing apache_beam > --- > > Key: BEAM-8368 > URL: https://issues.apache.org/jira/browse/BEAM-8368 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.15.0, 2.17.0 >Reporter: Ubaier Bhat >Assignee: Brian Hulette >Priority: Blocker > Fix For: 2.17.0 > > Attachments: error_log.txt > > Time Spent: 2.5h > Remaining Estimate: 0h > > Unable to import apache_beam after upgrading to macos 10.15 (Catalina). > Cleared all the pipenvs and but can't get it working again. > {code} > import apache_beam as beam > /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84: > UserWarning: Some syntactic constructs of Python 3 are not yet fully > supported by Apache Beam. > 'Some syntactic constructs of Python 3 are not yet fully supported by ' > [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already > exists in database: > [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: > GeneratedDatabase()->Add(encoded_file_descriptor, size): > libc++abi.dylib: terminating with uncaught exception of type > google::protobuf::FatalException: CHECK failed: > GeneratedDatabase()->Add(encoded_file_descriptor, size): > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8545) don't docker pull before docker run
Heejong Lee created BEAM-8545: - Summary: don't docker pull before docker run Key: BEAM-8545 URL: https://issues.apache.org/jira/browse/BEAM-8545 Project: Beam Issue Type: Bug Components: java-fn-execution Reporter: Heejong Lee Assignee: Heejong Lee Since 'docker run' automatically pulls when the image doesn't exist locally, I think it's safe to remove explicit 'docker pull' before 'docker run'. Without 'docker pull', we won't update the local image with the remote image (for the same tag) but it shouldn't be a problem in prod that the unique tag is assumed for each released version. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8243) Document behavior of FlinkRunner
[ https://issues.apache.org/jira/browse/BEAM-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965112#comment-16965112 ] Thomas Weise commented on BEAM-8243: There are IMO enough other reasons to not keep instructions around for old versions (for portability) :) > Document behavior of FlinkRunner > > > Key: BEAM-8243 > URL: https://issues.apache.org/jira/browse/BEAM-8243 > Project: Beam > Issue Type: Improvement > Components: runner-flink, website >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > > The Flink runner guide should include a couple details > 1) FlinkRunner pulls the job server jar from Maven by default (need to make > this explicit in case of firewall concerns) > 2) how to override in case the above is a problem -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8545) don't docker pull before docker run
[ https://issues.apache.org/jira/browse/BEAM-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heejong Lee updated BEAM-8545: -- Status: Open (was: Triage Needed) > don't docker pull before docker run > > > Key: BEAM-8545 > URL: https://issues.apache.org/jira/browse/BEAM-8545 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > > Since 'docker run' automatically pulls when the image doesn't exist locally, > I think it's safe to remove explicit 'docker pull' before 'docker run'. > Without 'docker pull', we won't update the local image with the remote image > (for the same tag) but it shouldn't be a problem in prod that the unique tag > is assumed for each released version. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8243) Document behavior of FlinkRunner
[ https://issues.apache.org/jira/browse/BEAM-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965107#comment-16965107 ] Robert Bradshaw commented on BEAM-8243: --- Yes, this can be greatly simplified. I'm not sure if/how we should keep the old ways around for old versions though. Kyle, are you on this? > Document behavior of FlinkRunner > > > Key: BEAM-8243 > URL: https://issues.apache.org/jira/browse/BEAM-8243 > Project: Beam > Issue Type: Improvement > Components: runner-flink, website >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > > The Flink runner guide should include a couple details > 1) FlinkRunner pulls the job server jar from Maven by default (need to make > this explicit in case of firewall concerns) > 2) how to override in case the above is a problem -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam
[ https://issues.apache.org/jira/browse/BEAM-8368?focusedWorklogId=337564&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337564 ] ASF GitHub Bot logged work on BEAM-8368: Author: ASF GitHub Bot Created on: 01/Nov/19 21:28 Start Date: 01/Nov/19 21:28 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #9970: [BEAM-8368] [BEAM-8392] Update pyarrow to the latest version URL: https://github.com/apache/beam/pull/9970#issuecomment-548956963 Looks like maybe there was an API change in `pyarrow.parquet`. I'll take a look This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337564) Time Spent: 2h 20m (was: 2h 10m) > [Python] libprotobuf-generated exception when importing apache_beam > --- > > Key: BEAM-8368 > URL: https://issues.apache.org/jira/browse/BEAM-8368 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.15.0, 2.17.0 >Reporter: Ubaier Bhat >Assignee: Brian Hulette >Priority: Blocker > Fix For: 2.17.0 > > Attachments: error_log.txt > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Unable to import apache_beam after upgrading to macos 10.15 (Catalina). > Cleared all the pipenvs and but can't get it working again. > {code} > import apache_beam as beam > /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84: > UserWarning: Some syntactic constructs of Python 3 are not yet fully > supported by Apache Beam. > 'Some syntactic constructs of Python 3 are not yet fully supported by ' > [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already > exists in database: > [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: > GeneratedDatabase()->Add(encoded_file_descriptor, size): > libc++abi.dylib: terminating with uncaught exception of type > google::protobuf::FatalException: CHECK failed: > GeneratedDatabase()->Add(encoded_file_descriptor, size): > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam
[ https://issues.apache.org/jira/browse/BEAM-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965102#comment-16965102 ] Brian Hulette commented on BEAM-8368: - [~ubaierbhat] or [~kamilwu] any chance you could verify that pyarrow 0.15.1 works with beam on MacOS 10.15? > [Python] libprotobuf-generated exception when importing apache_beam > --- > > Key: BEAM-8368 > URL: https://issues.apache.org/jira/browse/BEAM-8368 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.15.0, 2.17.0 >Reporter: Ubaier Bhat >Assignee: Brian Hulette >Priority: Blocker > Fix For: 2.17.0 > > Attachments: error_log.txt > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Unable to import apache_beam after upgrading to macos 10.15 (Catalina). > Cleared all the pipenvs and but can't get it working again. > {code} > import apache_beam as beam > /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84: > UserWarning: Some syntactic constructs of Python 3 are not yet fully > supported by Apache Beam. > 'Some syntactic constructs of Python 3 are not yet fully supported by ' > [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already > exists in database: > [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: > GeneratedDatabase()->Add(encoded_file_descriptor, size): > libc++abi.dylib: terminating with uncaught exception of type > google::protobuf::FatalException: CHECK failed: > GeneratedDatabase()->Add(encoded_file_descriptor, size): > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions
[ https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=337557&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337557 ] ASF GitHub Bot logged work on BEAM-8539: Author: ASF GitHub Bot Created on: 01/Nov/19 21:20 Start Date: 01/Nov/19 21:20 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9969: [BEAM-8539] Provide an initial definition of all job states and the state transition diagram URL: https://github.com/apache/beam/pull/9969#discussion_r341755718 ## File path: model/job-management/src/main/proto/beam_job_api.proto ## @@ -201,6 +201,16 @@ message JobMessagesResponse { } // Enumeration of all JobStates +// +// The state transition diagram is: +// STOPPED -> STARTING -> RUNNING -> DONE +// \> FAILED +// \> CANCELLING -> CANCELLED +// \> UPDATING -> UPDATED +// \> DRAINING -> DRAINED Review comment: > Not sure of any RUNNING -> STOPPED scenarios today. This is either a pause-like event or an erroneous use of STOPPED: https://github.com/apache/beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineResult.java#L133 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337557) Time Spent: 1h 20m (was: 1h 10m) > Clearly define the valid job state transitions > -- > > Key: BEAM-8539 > URL: https://issues.apache.org/jira/browse/BEAM-8539 > Project: Beam > Issue Type: Improvement > Components: beam-model, runner-core, sdk-java-core, sdk-py-core >Reporter: Chad Dombrova >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > The Beam job state transitions are ill-defined, which is big problem for > anything that relies on the values coming from JobAPI.GetStateStream. > I was hoping to find something like a state transition diagram in the docs so > that I could determine the start state, the terminal states, and the valid > transitions, but I could not find this. The code reveals that the SDKs differ > on the fundamentals: > Java InMemoryJobService: > * start state: *STOPPED* > * run - about to submit to executor: STARTING > * run - actually running on executor: RUNNING > * terminal states: DONE, FAILED, CANCELLED, DRAINED > Python AbstractJobServiceServicer / LocalJobServicer: > * start state: STARTING > * terminal states: DONE, FAILED, CANCELLED, *STOPPED* > I think it would be good to make python work like Java, so that there is a > difference in state between a job that has been prepared and one that has > additionally been run. > It's hard to tell how far this problem has spread within the various runners. > I think a simple thing that can be done to help standardize behavior is to > implement the terminal states as an enum in the beam_job_api.proto, or create > a utility function in each language for checking if a state is terminal, so > that it's not left up to each runner to reimplement this logic. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions
[ https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=337558&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337558 ] ASF GitHub Bot logged work on BEAM-8539: Author: ASF GitHub Bot Created on: 01/Nov/19 21:20 Start Date: 01/Nov/19 21:20 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9969: [BEAM-8539] Provide an initial definition of all job states and the state transition diagram URL: https://github.com/apache/beam/pull/9969#discussion_r341755718 ## File path: model/job-management/src/main/proto/beam_job_api.proto ## @@ -201,6 +201,16 @@ message JobMessagesResponse { } // Enumeration of all JobStates +// +// The state transition diagram is: +// STOPPED -> STARTING -> RUNNING -> DONE +// \> FAILED +// \> CANCELLING -> CANCELLED +// \> UPDATING -> UPDATED +// \> DRAINING -> DRAINED Review comment: > Not sure of any RUNNING -> STOPPED scenarios today. This is either a pause-like event or an erroneous (terminal) use of STOPPED: https://github.com/apache/beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineResult.java#L133 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337558) Time Spent: 1.5h (was: 1h 20m) > Clearly define the valid job state transitions > -- > > Key: BEAM-8539 > URL: https://issues.apache.org/jira/browse/BEAM-8539 > Project: Beam > Issue Type: Improvement > Components: beam-model, runner-core, sdk-java-core, sdk-py-core >Reporter: Chad Dombrova >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > The Beam job state transitions are ill-defined, which is big problem for > anything that relies on the values coming from JobAPI.GetStateStream. > I was hoping to find something like a state transition diagram in the docs so > that I could determine the start state, the terminal states, and the valid > transitions, but I could not find this. The code reveals that the SDKs differ > on the fundamentals: > Java InMemoryJobService: > * start state: *STOPPED* > * run - about to submit to executor: STARTING > * run - actually running on executor: RUNNING > * terminal states: DONE, FAILED, CANCELLED, DRAINED > Python AbstractJobServiceServicer / LocalJobServicer: > * start state: STARTING > * terminal states: DONE, FAILED, CANCELLED, *STOPPED* > I think it would be good to make python work like Java, so that there is a > difference in state between a job that has been prepared and one that has > additionally been run. > It's hard to tell how far this problem has spread within the various runners. > I think a simple thing that can be done to help standardize behavior is to > implement the terminal states as an enum in the beam_job_api.proto, or create > a utility function in each language for checking if a state is terminal, so > that it's not left up to each runner to reimplement this logic. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8252) (Python SDK) Add worker_region and worker_zone options
[ https://issues.apache.org/jira/browse/BEAM-8252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver resolved BEAM-8252. --- Fix Version/s: 2.18.0 Resolution: Fixed > (Python SDK) Add worker_region and worker_zone options > -- > > Key: BEAM-8252 > URL: https://issues.apache.org/jira/browse/BEAM-8252 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow, sdk-py-core >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Fix For: 2.18.0 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8534) XlangParquetIOTest failing
[ https://issues.apache.org/jira/browse/BEAM-8534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-8534: -- Status: Open (was: Triage Needed) > XlangParquetIOTest failing > -- > > Key: BEAM-8534 > URL: https://issues.apache.org/jira/browse/BEAM-8534 > Project: Beam > Issue Type: Sub-task > Components: test-failures >Reporter: Kyle Weaver >Assignee: Heejong Lee >Priority: Major > > *13:43:05* [grpc-default-executor-1] ERROR > org.apache.beam.fn.harness.control.BeamFnControlClient - Exception while > trying to handle InstructionRequest 10 java.lang.IllegalArgumentException: > unable to deserialize Custom DoFn With Execution Info*13:43:05* at > org.apache.beam.sdk.util.SerializableUtils.deserializeFromByteArray(SerializableUtils.java:74)*13:43:05* > at > org.apache.beam.runners.core.construction.ParDoTranslation.doFnWithExecutionInformationFromProto(ParDoTranslation.java:609)*13:43:05* > at > org.apache.beam.runners.core.construction.ParDoTranslation.getDoFn(ParDoTranslation.java:285)*13:43:05* > at > org.apache.beam.fn.harness.DoFnPTransformRunnerFactory$Context.(DoFnPTransformRunnerFactory.java:197)*13:43:05* > at > org.apache.beam.fn.harness.DoFnPTransformRunnerFactory.createRunnerForPTransform(DoFnPTransformRunnerFactory.java:96)*13:43:05* > at > org.apache.beam.fn.harness.DoFnPTransformRunnerFactory.createRunnerForPTransform(DoFnPTransformRunnerFactory.java:64)*13:43:05* > at > org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:194)*13:43:05* > at > org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:163)*13:43:05* > at > org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:290)*13:43:05* >at > org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:160)*13:43:05* > at > org.apache.beam.fn.harness.control.BeamFnControlClient.lambda$processInstructionRequests$0(BeamFnControlClient.java:144)*13:43:05* >at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)*13:43:05* > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)*13:43:05* > at java.lang.Thread.run(Thread.java:748)*13:43:05* Caused by: > java.io.InvalidClassException: > org.apache.beam.sdk.options.ValueProvider$StaticValueProvider; local class > incompatible: stream classdesc serialVersionUID = -7089438576249123133, local > class serialVersionUID = -7141898054594373712*13:43:05* at > java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)*13:43:05* >at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)*13:43:05* > at > java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)*13:43:05* >at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)*13:43:05* > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05* >at > java.io.ObjectInputStream.readArray(ObjectInputStream.java:1975)*13:43:05* > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)*13:43:05* >at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05* >at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05* > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05* > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05* >at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05* >at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05* > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05* > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05* >at > java.io.ObjectInputStream.readArray(ObjectInputStream.java:1975)*13:43:05* > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)*13:43:05* >at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05* >at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05* > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05* > at > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05* >at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05* >at > java.io.ObjectInputStream.readSe
[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions
[ https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=337544&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337544 ] ASF GitHub Bot logged work on BEAM-8539: Author: ASF GitHub Bot Created on: 01/Nov/19 20:28 Start Date: 01/Nov/19 20:28 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #9969: [BEAM-8539] Provide an initial definition of all job states and the state transition diagram URL: https://github.com/apache/beam/pull/9969#discussion_r341739058 ## File path: model/job-management/src/main/proto/beam_job_api.proto ## @@ -201,6 +201,16 @@ message JobMessagesResponse { } // Enumeration of all JobStates +// +// The state transition diagram is: +// STOPPED -> STARTING -> RUNNING -> DONE +// \> FAILED +// \> CANCELLING -> CANCELLED +// \> UPDATING -> UPDATED +// \> DRAINING -> DRAINED Review comment: Yeah, this is the problem with UPDATED since it is dependent on the implementation within the Runner. Is an UPDATED job the same job as the prior one or a new job which continues from the prior state? In Dataflow it is the latter but the former can make sense for other runners. Not sure of any RUNNING -> STOPPED scenarios today. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337544) Time Spent: 1h 10m (was: 1h) > Clearly define the valid job state transitions > -- > > Key: BEAM-8539 > URL: https://issues.apache.org/jira/browse/BEAM-8539 > Project: Beam > Issue Type: Improvement > Components: beam-model, runner-core, sdk-java-core, sdk-py-core >Reporter: Chad Dombrova >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > The Beam job state transitions are ill-defined, which is big problem for > anything that relies on the values coming from JobAPI.GetStateStream. > I was hoping to find something like a state transition diagram in the docs so > that I could determine the start state, the terminal states, and the valid > transitions, but I could not find this. The code reveals that the SDKs differ > on the fundamentals: > Java InMemoryJobService: > * start state: *STOPPED* > * run - about to submit to executor: STARTING > * run - actually running on executor: RUNNING > * terminal states: DONE, FAILED, CANCELLED, DRAINED > Python AbstractJobServiceServicer / LocalJobServicer: > * start state: STARTING > * terminal states: DONE, FAILED, CANCELLED, *STOPPED* > I think it would be good to make python work like Java, so that there is a > difference in state between a job that has been prepared and one that has > additionally been run. > It's hard to tell how far this problem has spread within the various runners. > I think a simple thing that can be done to help standardize behavior is to > implement the terminal states as an enum in the beam_job_api.proto, or create > a utility function in each language for checking if a state is terminal, so > that it's not left up to each runner to reimplement this logic. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions
[ https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=337543&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337543 ] ASF GitHub Bot logged work on BEAM-8539: Author: ASF GitHub Bot Created on: 01/Nov/19 20:28 Start Date: 01/Nov/19 20:28 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #9969: [BEAM-8539] Provide an initial definition of all job states and the state transition diagram URL: https://github.com/apache/beam/pull/9969#discussion_r341739058 ## File path: model/job-management/src/main/proto/beam_job_api.proto ## @@ -201,6 +201,16 @@ message JobMessagesResponse { } // Enumeration of all JobStates +// +// The state transition diagram is: +// STOPPED -> STARTING -> RUNNING -> DONE +// \> FAILED +// \> CANCELLING -> CANCELLED +// \> UPDATING -> UPDATED +// \> DRAINING -> DRAINED Review comment: Yeah, this is the problem with UPDATED since it is dependent on the implementation within the Runner. Is an UPDATED job the same job as the prior one or a new job which continues from the prior state? In Dataflow it is the latter but the former can make sense for other runners. Not sure of any RUNNING -> STOPPED scenarios today but adding This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337543) Time Spent: 1h (was: 50m) > Clearly define the valid job state transitions > -- > > Key: BEAM-8539 > URL: https://issues.apache.org/jira/browse/BEAM-8539 > Project: Beam > Issue Type: Improvement > Components: beam-model, runner-core, sdk-java-core, sdk-py-core >Reporter: Chad Dombrova >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > The Beam job state transitions are ill-defined, which is big problem for > anything that relies on the values coming from JobAPI.GetStateStream. > I was hoping to find something like a state transition diagram in the docs so > that I could determine the start state, the terminal states, and the valid > transitions, but I could not find this. The code reveals that the SDKs differ > on the fundamentals: > Java InMemoryJobService: > * start state: *STOPPED* > * run - about to submit to executor: STARTING > * run - actually running on executor: RUNNING > * terminal states: DONE, FAILED, CANCELLED, DRAINED > Python AbstractJobServiceServicer / LocalJobServicer: > * start state: STARTING > * terminal states: DONE, FAILED, CANCELLED, *STOPPED* > I think it would be good to make python work like Java, so that there is a > difference in state between a job that has been prepared and one that has > additionally been run. > It's hard to tell how far this problem has spread within the various runners. > I think a simple thing that can be done to help standardize behavior is to > implement the terminal states as an enum in the beam_job_api.proto, or create > a utility function in each language for checking if a state is terminal, so > that it's not left up to each runner to reimplement this logic. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8521) beam_PostCommit_XVR_Flink failing
[ https://issues.apache.org/jira/browse/BEAM-8521?focusedWorklogId=337542&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337542 ] ASF GitHub Bot logged work on BEAM-8521: Author: ASF GitHub Bot Created on: 01/Nov/19 20:24 Start Date: 01/Nov/19 20:24 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #9939: [BEAM-8521] only build Python 2.7 for xlang test URL: https://github.com/apache/beam/pull/9939 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337542) Time Spent: 50m (was: 40m) > beam_PostCommit_XVR_Flink failing > - > > Key: BEAM-8521 > URL: https://issues.apache.org/jira/browse/BEAM-8521 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > https://builds.apache.org/job/beam_PostCommit_XVR_Flink/ > Edit: Made subtasks for what appear to be two separate issues. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8252) (Python SDK) Add worker_region and worker_zone options
[ https://issues.apache.org/jira/browse/BEAM-8252?focusedWorklogId=337538&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337538 ] ASF GitHub Bot logged work on BEAM-8252: Author: ASF GitHub Bot Created on: 01/Nov/19 20:21 Start Date: 01/Nov/19 20:21 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9594: [BEAM-8252] Python: add worker_region and worker_zone options URL: https://github.com/apache/beam/pull/9594#issuecomment-548937247 > Should this be patched into the release branch? Maybe, but it'd be preferable to introduce the Java and Python options in the same release: #9961. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337538) Time Spent: 1h (was: 50m) > (Python SDK) Add worker_region and worker_zone options > -- > > Key: BEAM-8252 > URL: https://issues.apache.org/jira/browse/BEAM-8252 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow, sdk-py-core >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8544) Install Beam SDK with ccache for faster re-install.
[ https://issues.apache.org/jira/browse/BEAM-8544?focusedWorklogId=337536&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337536 ] ASF GitHub Bot logged work on BEAM-8544: Author: ASF GitHub Bot Created on: 01/Nov/19 20:17 Start Date: 01/Nov/19 20:17 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9966: [BEAM-8544] Use ccache for compiling the Beam Python SDK. URL: https://github.com/apache/beam/pull/9966#discussion_r341735449 ## File path: sdks/python/container/Dockerfile ## @@ -43,9 +45,13 @@ RUN \ # Remove pip cache. rm -rf /root/.cache/pip +# Configure ccache prior to installing the SDK. +RUN ln -s /usr/bin/ccache /usr/local/bin/gcc +# These parameters are needed as pip compiles artifacts in random temporary directories. +RUN ccache --set-config=sloppiness=file_macro && ccache --set-config=hash_dir=false COPY target/apache-beam.tar.gz /opt/apache/beam/tars/ -RUN pip install /opt/apache/beam/tars/apache-beam.tar.gz[gcp] && \ +RUN pip install -v /opt/apache/beam/tars/apache-beam.tar.gz[gcp] && \ # Remove pip cache. rm -rf /root/.cache/pip Review comment: Could we add a `pip freeze --all`? From the logs that will give us a complete list of what exact versions are installed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337536) Time Spent: 40m (was: 0.5h) > Install Beam SDK with ccache for faster re-install. > --- > > Key: BEAM-8544 > URL: https://issues.apache.org/jira/browse/BEAM-8544 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Re-compliling the C modules of the SDK takes 2-3 minutes. This adds to worker > startup time whenever a custom SDK is being used (in particular, during > development and testing). We can use ccache to re-use the old compile results > when the Cython files have not changed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8544) Install Beam SDK with ccache for faster re-install.
[ https://issues.apache.org/jira/browse/BEAM-8544?focusedWorklogId=337534&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337534 ] ASF GitHub Bot logged work on BEAM-8544: Author: ASF GitHub Bot Created on: 01/Nov/19 20:15 Start Date: 01/Nov/19 20:15 Worklog Time Spent: 10m Work Description: ananvay commented on issue #9966: [BEAM-8544] Use ccache for compiling the Beam Python SDK. URL: https://github.com/apache/beam/pull/9966#issuecomment-548935180 Thanks Robert, this is great! LGTM. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337534) Time Spent: 0.5h (was: 20m) > Install Beam SDK with ccache for faster re-install. > --- > > Key: BEAM-8544 > URL: https://issues.apache.org/jira/browse/BEAM-8544 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Re-compliling the C modules of the SDK takes 2-3 minutes. This adds to worker > startup time whenever a custom SDK is being used (in particular, during > development and testing). We can use ccache to re-use the old compile results > when the Cython files have not changed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8347) UnboundedRabbitMqReader can fail to advance watermark if no new data comes in
[ https://issues.apache.org/jira/browse/BEAM-8347?focusedWorklogId=337532&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337532 ] ASF GitHub Bot logged work on BEAM-8347: Author: ASF GitHub Bot Created on: 01/Nov/19 20:10 Start Date: 01/Nov/19 20:10 Worklog Time Spent: 10m Work Description: drobert commented on issue #9820: [BEAM-8347]: Consistently advance UnboundedRabbitMqReader watermark URL: https://github.com/apache/beam/pull/9820#issuecomment-548933689 Quick note: I may be seeing a problem with this implementation locally. This isn't mergeable due to a need to rebase anyway, but there may be an issue updating the watermark when no messages come in (DirectRunner reports "Processing stuck in step (read from rabbit) for at least 05m00s without outputting or completing in state process"). Looking. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337532) Time Spent: 2h 40m (was: 2.5h) > UnboundedRabbitMqReader can fail to advance watermark if no new data comes in > - > > Key: BEAM-8347 > URL: https://issues.apache.org/jira/browse/BEAM-8347 > Project: Beam > Issue Type: Bug > Components: io-java-rabbitmq >Affects Versions: 2.15.0 > Environment: testing has been done using the DirectRunner. I also > have DataflowRunner available >Reporter: Daniel Robert >Assignee: Jean-Baptiste Onofré >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > I stumbled upon this and then saw a similar StackOverflow post: > [https://stackoverflow.com/questions/55736593/apache-beam-rabbitmqio-watermark-doesnt-advance] > When calling `advance()` if there are no messages, no state changes, > including no changes to the CheckpointMark or Watermark. If there is a > relatively constant rate of new messages coming in, this is not a problem. If > data is bursty, and there are periods of no new messages coming in, the > watermark will never advance. > Contrast this with some of the logic in PubsubIO which will make provisions > for periods of inactivity to advance the watermark (although it, too, is > imperfect: https://issues.apache.org/jira/browse/BEAM-7322 ) > The example given in the StackOverflow post is something like this: > > {code:java} > pipeline > .apply(RabbitMqIO.read() > .withUri("amqp://guest:guest@localhost:5672") > .withQueue("test") > .apply("Windowing", > Window.into( > FixedWindows.of(Duration.standardSeconds(10))) > .triggering(AfterWatermark.pastEndOfWindow()) > .withAllowedLateness(Duration.ZERO) > .accumulatingFiredPanes()){code} > If I push 2 messages into my rabbit queue, I see 2 unack'd messages and a > window that never performs an on time trigger. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-6857) Support dynamic timers
[ https://issues.apache.org/jira/browse/BEAM-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chad Dombrova updated BEAM-6857: Description: The Beam timers API currently requires each timer to be statically specified in the DoFn. The user must provide a separate callback method per timer. For example: {code:java} DoFn() { @TimerId("timer1") private final TimerSpec timer1 = TimerSpecs.timer(...); @TimerId("timer2") private final TimerSpec timer2 = TimerSpecs.timer(...); .. set timers in processElement @OnTimer("timer1") public void onTimer1() { .} @OnTimer("timer2") public void onTimer2() {} } {code} However there are many cases where the user does not know the set of timers statically when writing their code. This happens when the timer tag should be based on the data. It also happens when writing a DSL on top of Beam, where the DSL author has to create DoFns but does not know statically which timers their users will want to set (e.g. Scio). The goal is to support dynamic timers. Something as follows; {code:java} DoFn() { @TimerId("timer") private final TimerSpec timer1 = TimerSpecs.dynamicTimer(...); @ProcessElement process(@TimerId("timer") DynamicTimer timer) { timer.set("tag1'", ts); timer.set("tag2", ts); } @OnTimer("timer") public void onTimer1(@TimerTag String tag) { .} } {code} was: The Beam timers API currently requires each timer to be statically specified in the DoFn. The user must provide a separate callback method per timer. For example: {code:java} DoFn() { @TimerId("timer1") private final TimerSpec timer1 = TimerSpecs.timer(...); @TimerId("timer2") private final TimerSpec timer2 = TimerSpecs.timer(...); .. set timers in processElement @OnTimer("timer1") public void onTimer1() \{ .} @OnTimer("timer2") public void onTimer2() {} } {code} However there are many cases where the user does not know the set of timers statically when writing their code. This happens when the timer tag should be based on the data. It also happens when writing a DSL on top of Beam, where the DSL author has to create DoFns but does not know statically which timers their users will want to set (e.g. Scio). The goal is to support dynamic timers. Something as follows; {code:java} DoFn() { @TimerId("timer") private final TimerSpec timer1 = TimerSpecs.dynamicTimer(...); @ProcessElement process(@TimerId("timer") DynamicTimer timer) { timer.set("tag1'", ts); timer.set("tag2", ts); } @OnTimer("timer") public void onTimer1(@TimerTag String tag) { .} } {code} > Support dynamic timers > -- > > Key: BEAM-6857 > URL: https://issues.apache.org/jira/browse/BEAM-6857 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > > The Beam timers API currently requires each timer to be statically specified > in the DoFn. The user must provide a separate callback method per timer. For > example: > > {code:java} > DoFn() > { > @TimerId("timer1") > private final TimerSpec timer1 = TimerSpecs.timer(...); > @TimerId("timer2") > private final TimerSpec timer2 = TimerSpecs.timer(...); > .. set timers in processElement > @OnTimer("timer1") > public void onTimer1() { .} > @OnTimer("timer2") > public void onTimer2() {} > } > {code} > > However there are many cases where the user does not know the set of timers > statically when writing their code. This happens when the timer tag should be > based on the data. It also happens when writing a DSL on top of Beam, where > the DSL author has to create DoFns but does not know statically which timers > their users will want to set (e.g. Scio). > > The goal is to support dynamic timers. Something as follows; > > {code:java} > DoFn() > { > @TimerId("timer") > private final TimerSpec timer1 = TimerSpecs.dynamicTimer(...); > @ProcessElement process(@TimerId("timer") DynamicTimer timer) > { > timer.set("tag1'", ts); >timer.set("tag2", ts); > } > @OnTimer("timer") > public void onTimer1(@TimerTag String tag) { .} > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-6857) Support dynamic timers
[ https://issues.apache.org/jira/browse/BEAM-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chad Dombrova updated BEAM-6857: Description: The Beam timers API currently requires each timer to be statically specified in the DoFn. The user must provide a separate callback method per timer. For example: {code:java} DoFn() { @TimerId("timer1") private final TimerSpec timer1 = TimerSpecs.timer(...); @TimerId("timer2") private final TimerSpec timer2 = TimerSpecs.timer(...); .. set timers in processElement @OnTimer("timer1") public void onTimer1() \{ .} @OnTimer("timer2") public void onTimer2() {} } {code} However there are many cases where the user does not know the set of timers statically when writing their code. This happens when the timer tag should be based on the data. It also happens when writing a DSL on top of Beam, where the DSL author has to create DoFns but does not know statically which timers their users will want to set (e.g. Scio). The goal is to support dynamic timers. Something as follows; {code:java} DoFn() { @TimerId("timer") private final TimerSpec timer1 = TimerSpecs.dynamicTimer(...); @ProcessElement process(@TimerId("timer") DynamicTimer timer) { timer.set("tag1'", ts); timer.set("tag2", ts); } @OnTimer("timer") public void onTimer1(@TimerTag String tag) { .} } {code} was: The Beam timers API currently requires each timer to be statically specified in the DoFn. The user must provide a separate callback method per timer. For example: DoFn() { @TimerId("timer1") private final TimerSpec timer1 = TimerSpecs.timer(...); @TimerId("timer2") private final TimerSpec timer2 = TimerSpecs.timer(...); .. set timers in processElement @OnTimer("timer1") public void onTimer1() \{ .} @OnTimer("timer2") public void onTimer2() \{} } However there are many cases where the user does not know the set of timers statically when writing their code. This happens when the timer tag should be based on the data. It also happens when writing a DSL on top of Beam, where the DSL author has to create DoFns but does not know statically which timers their users will want to set (e.g. Scio). The goal is to support dynamic timers. Something as follows; DoFn() { @TimerId("timer") private final TimerSpec timer1 = TimerSpecs.dynamicTimer(...); @ProcessElement process(@TimerId("timer") DynamicTimer timer) { timer.set("tag1'", ts); timer.set("tag2", ts); } @OnTimer("timer") public void onTimer1(@TimerTag String tag) \{ .} } > Support dynamic timers > -- > > Key: BEAM-6857 > URL: https://issues.apache.org/jira/browse/BEAM-6857 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > > The Beam timers API currently requires each timer to be statically specified > in the DoFn. The user must provide a separate callback method per timer. For > example: > > {code:java} > DoFn() > { @TimerId("timer1") private final TimerSpec timer1 = > TimerSpecs.timer(...); @TimerId("timer2") private final TimerSpec timer2 = > TimerSpecs.timer(...); .. set timers in processElement > @OnTimer("timer1") public void onTimer1() \{ .} > @OnTimer("timer2") public void onTimer2() {} > } > {code} > > However there are many cases where the user does not know the set of timers > statically when writing their code. This happens when the timer tag should be > based on the data. It also happens when writing a DSL on top of Beam, where > the DSL author has to create DoFns but does not know statically which timers > their users will want to set (e.g. Scio). > > The goal is to support dynamic timers. Something as follows; > > {code:java} > DoFn() { > @TimerId("timer") private final TimerSpec timer1 = > TimerSpecs.dynamicTimer(...); > @ProcessElement process(@TimerId("timer") DynamicTimer timer) > { timer.set("tag1'", ts); timer.set("tag2", ts); } > @OnTimer("timer") public void onTimer1(@TimerTag String tag) { .} > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8523) Add useful timestamp to job servicer GetJobs
[ https://issues.apache.org/jira/browse/BEAM-8523?focusedWorklogId=337519&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337519 ] ASF GitHub Bot logged work on BEAM-8523: Author: ASF GitHub Bot Created on: 01/Nov/19 19:39 Start Date: 01/Nov/19 19:39 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9959: WIP: [BEAM-8523] JobAPI: Give access to timestamped state change history URL: https://github.com/apache/beam/pull/9959#discussion_r341722020 ## File path: model/job-management/src/main/proto/beam_job_api.proto ## @@ -153,6 +154,7 @@ message GetJobStateRequest { message GetJobStateResponse { Review comment: ok, saw your comment from Jira: > Typically when you would connect you would want to give some lower bound to filter messages. For example GetJobStateRequest could become: > > ```proto > message GetJobStateRequest { > string job_id = 1; // (required) > // (Optional) If specified, only state transitions after > // the provided timestamp will be provided. > google.protobuf.Timestamp min_message_timestamp; > } > ``` I agree with this conceptually, but was hoping to defer the addition of this until I get to the message stream. The state stream is typically going to be just a few events, so practically speaking, the additional of `min_message_timestamp` here will not be very impactful. The size of the backfill for the message stream on the other hand, with the changes that I'd like to add to it, could be significant, and there could be a number of ways that a user may want to filter (`min_message_timestamp`, `max_num_messages`, `importance`, etc). I'd like the APIs to be similar between the two, and I think the work donw on GetMessageStream will inform our choices for GetStateStream. tl;dr I'd like to defer the addition of this until later, if possible. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337519) Time Spent: 1h 20m (was: 1h 10m) > Add useful timestamp to job servicer GetJobs > > > Key: BEAM-8523 > URL: https://issues.apache.org/jira/browse/BEAM-8523 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > As a user querying jobs with JobService.GetJobs, it would be useful if the > JobInfo result contained timestamps indicating various state changes that may > have been missed by a client. Useful timestamps include: > > * submitted (prepared to the job service) > * started (executor enters the RUNNING state) > * completed (executor enters a terminal state) > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8523) Add useful timestamp to job servicer GetJobs
[ https://issues.apache.org/jira/browse/BEAM-8523?focusedWorklogId=337517&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337517 ] ASF GitHub Bot logged work on BEAM-8523: Author: ASF GitHub Bot Created on: 01/Nov/19 19:38 Start Date: 01/Nov/19 19:38 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9959: WIP: [BEAM-8523] JobAPI: Give access to timestamped state change history URL: https://github.com/apache/beam/pull/9959#issuecomment-548923940 Note that this PR depends on https://github.com/apache/beam/pull/9965 which depends on https://github.com/apache/beam/pull/9969 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337517) Time Spent: 1h 10m (was: 1h) > Add useful timestamp to job servicer GetJobs > > > Key: BEAM-8523 > URL: https://issues.apache.org/jira/browse/BEAM-8523 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > As a user querying jobs with JobService.GetJobs, it would be useful if the > JobInfo result contained timestamps indicating various state changes that may > have been missed by a client. Useful timestamps include: > > * submitted (prepared to the job service) > * started (executor enters the RUNNING state) > * completed (executor enters a terminal state) > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3713) Consider moving away from nose to nose2 or pytest.
[ https://issues.apache.org/jira/browse/BEAM-3713?focusedWorklogId=337518&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337518 ] ASF GitHub Bot logged work on BEAM-3713: Author: ASF GitHub Bot Created on: 01/Nov/19 19:38 Start Date: 01/Nov/19 19:38 Worklog Time Spent: 10m Work Description: udim commented on issue #9756: [BEAM-3713] Add pytest for unit tests URL: https://github.com/apache/beam/pull/9756#issuecomment-548924069 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337518) Time Spent: 11h 20m (was: 11h 10m) > Consider moving away from nose to nose2 or pytest. > -- > > Key: BEAM-3713 > URL: https://issues.apache.org/jira/browse/BEAM-3713 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: Robert Bradshaw >Assignee: Udi Meiri >Priority: Minor > Time Spent: 11h 20m > Remaining Estimate: 0h > > Per > [https://nose.readthedocs.io/en/latest/|https://nose.readthedocs.io/en/latest/,] > , nose is in maintenance mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3713) Consider moving away from nose to nose2 or pytest.
[ https://issues.apache.org/jira/browse/BEAM-3713?focusedWorklogId=337515&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337515 ] ASF GitHub Bot logged work on BEAM-3713: Author: ASF GitHub Bot Created on: 01/Nov/19 19:36 Start Date: 01/Nov/19 19:36 Worklog Time Spent: 10m Work Description: udim commented on issue #9756: [BEAM-3713] Add pytest for unit tests URL: https://github.com/apache/beam/pull/9756#issuecomment-548923380 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337515) Time Spent: 11h (was: 10h 50m) > Consider moving away from nose to nose2 or pytest. > -- > > Key: BEAM-3713 > URL: https://issues.apache.org/jira/browse/BEAM-3713 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: Robert Bradshaw >Assignee: Udi Meiri >Priority: Minor > Time Spent: 11h > Remaining Estimate: 0h > > Per > [https://nose.readthedocs.io/en/latest/|https://nose.readthedocs.io/en/latest/,] > , nose is in maintenance mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3713) Consider moving away from nose to nose2 or pytest.
[ https://issues.apache.org/jira/browse/BEAM-3713?focusedWorklogId=337516&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337516 ] ASF GitHub Bot logged work on BEAM-3713: Author: ASF GitHub Bot Created on: 01/Nov/19 19:36 Start Date: 01/Nov/19 19:36 Worklog Time Spent: 10m Work Description: udim commented on issue #9756: [BEAM-3713] Add pytest for unit tests URL: https://github.com/apache/beam/pull/9756#issuecomment-548923507 Run CommunityMetrics PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337516) Time Spent: 11h 10m (was: 11h) > Consider moving away from nose to nose2 or pytest. > -- > > Key: BEAM-3713 > URL: https://issues.apache.org/jira/browse/BEAM-3713 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: Robert Bradshaw >Assignee: Udi Meiri >Priority: Minor > Time Spent: 11h 10m > Remaining Estimate: 0h > > Per > [https://nose.readthedocs.io/en/latest/|https://nose.readthedocs.io/en/latest/,] > , nose is in maintenance mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8523) Add useful timestamp to job servicer GetJobs
[ https://issues.apache.org/jira/browse/BEAM-8523?focusedWorklogId=337514&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337514 ] ASF GitHub Bot logged work on BEAM-8523: Author: ASF GitHub Bot Created on: 01/Nov/19 19:35 Start Date: 01/Nov/19 19:35 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9959: WIP: [BEAM-8523] JobAPI: Give access to timestamped state change history URL: https://github.com/apache/beam/pull/9959#discussion_r341722020 ## File path: model/job-management/src/main/proto/beam_job_api.proto ## @@ -153,6 +154,7 @@ message GetJobStateRequest { message GetJobStateResponse { Review comment: ok, saw your comment from Jira: > Typically when you would connect you would want to give some lower bound to filter messages. For example GetJobStateRequest could become: > > ```proto > message GetJobStateRequest { > string job_id = 1; // (required) > // (Optional) If specified, only state transitions after > // the provided timestamp will be provided. > google.protobuf.Timestamp min_message_timestamp; > } > ``` I agree with this conceptually, but was hoping to defer the addition of this until I get to the message stream. The state stream is typically going to be just a few events, so practically speaking, the additional of `min_message_timestamp` here will not be very impactful. The size of the backfill for the message stream on the other hand, with the changes that I'd like to add to it, could be significant, and there could be a number of ways that a user may want to filter (`min_message_timestamp`, `max_num_messages`, `importance`, etc). I'd like the APIs to be similar between the two, and I think the GetMessageStream will inform what we want to do for GetStateStream. tl;dr I'd like to defer the addition of this until later, if possible. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337514) Time Spent: 1h (was: 50m) > Add useful timestamp to job servicer GetJobs > > > Key: BEAM-8523 > URL: https://issues.apache.org/jira/browse/BEAM-8523 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > As a user querying jobs with JobService.GetJobs, it would be useful if the > JobInfo result contained timestamps indicating various state changes that may > have been missed by a client. Useful timestamps include: > > * submitted (prepared to the job service) > * started (executor enters the RUNNING state) > * completed (executor enters a terminal state) > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam
[ https://issues.apache.org/jira/browse/BEAM-8368?focusedWorklogId=337513&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337513 ] ASF GitHub Bot logged work on BEAM-8368: Author: ASF GitHub Bot Created on: 01/Nov/19 19:29 Start Date: 01/Nov/19 19:29 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9970: [BEAM-8368] [BEAM-8392] Update pyarrow to the latest version URL: https://github.com/apache/beam/pull/9970#issuecomment-548921330 R: @TheNeuralBit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337513) Time Spent: 2h 10m (was: 2h) > [Python] libprotobuf-generated exception when importing apache_beam > --- > > Key: BEAM-8368 > URL: https://issues.apache.org/jira/browse/BEAM-8368 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.15.0, 2.17.0 >Reporter: Ubaier Bhat >Assignee: Brian Hulette >Priority: Blocker > Fix For: 2.17.0 > > Attachments: error_log.txt > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Unable to import apache_beam after upgrading to macos 10.15 (Catalina). > Cleared all the pipenvs and but can't get it working again. > {code} > import apache_beam as beam > /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84: > UserWarning: Some syntactic constructs of Python 3 are not yet fully > supported by Apache Beam. > 'Some syntactic constructs of Python 3 are not yet fully supported by ' > [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already > exists in database: > [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: > GeneratedDatabase()->Add(encoded_file_descriptor, size): > libc++abi.dylib: terminating with uncaught exception of type > google::protobuf::FatalException: CHECK failed: > GeneratedDatabase()->Add(encoded_file_descriptor, size): > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam
[ https://issues.apache.org/jira/browse/BEAM-8368?focusedWorklogId=337511&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337511 ] ASF GitHub Bot logged work on BEAM-8368: Author: ASF GitHub Bot Created on: 01/Nov/19 19:28 Start Date: 01/Nov/19 19:28 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9970: [BEAM-8368] [BEAM-8392] Update pyarrow to the latest version URL: https://github.com/apache/beam/pull/9970 Update pyarrow to the latest version. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommi
[jira] [Work logged] (BEAM-8523) Add useful timestamp to job servicer GetJobs
[ https://issues.apache.org/jira/browse/BEAM-8523?focusedWorklogId=337510&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337510 ] ASF GitHub Bot logged work on BEAM-8523: Author: ASF GitHub Bot Created on: 01/Nov/19 19:26 Start Date: 01/Nov/19 19:26 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9959: WIP: [BEAM-8523] JobAPI: Give access to timestamped state change history URL: https://github.com/apache/beam/pull/9959#discussion_r341718971 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/JobStateEvent.java ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.jobsubmission; + +import com.google.auto.value.AutoValue; +import java.time.Instant; +import org.apache.beam.model.jobmanagement.v1.JobApi; +import org.apache.beam.vendor.grpc.v1p21p0.com.google.protobuf.Timestamp; + +/** A state transition event. */ +@AutoValue +public abstract class JobStateEvent { Review comment: Do we want a `GetJobStateResponse` to have a `JobStateEvent` ? i.e. ```proto message JobStateEvent { JobState.Enum state = 1; // (required) com.google.Timestamp timestamp = 2; // (required) } message GetJobStateResponse { JobStateEvent state_event = 1; // (required) } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337510) Time Spent: 50m (was: 40m) > Add useful timestamp to job servicer GetJobs > > > Key: BEAM-8523 > URL: https://issues.apache.org/jira/browse/BEAM-8523 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > As a user querying jobs with JobService.GetJobs, it would be useful if the > JobInfo result contained timestamps indicating various state changes that may > have been missed by a client. Useful timestamps include: > > * submitted (prepared to the job service) > * started (executor enters the RUNNING state) > * completed (executor enters a terminal state) > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam
[ https://issues.apache.org/jira/browse/BEAM-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Altay updated BEAM-8368: -- Fix Version/s: (was: 2.18.0) 2.17.0 > [Python] libprotobuf-generated exception when importing apache_beam > --- > > Key: BEAM-8368 > URL: https://issues.apache.org/jira/browse/BEAM-8368 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.15.0, 2.17.0 >Reporter: Ubaier Bhat >Assignee: Brian Hulette >Priority: Blocker > Fix For: 2.17.0 > > Attachments: error_log.txt > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Unable to import apache_beam after upgrading to macos 10.15 (Catalina). > Cleared all the pipenvs and but can't get it working again. > {code} > import apache_beam as beam > /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84: > UserWarning: Some syntactic constructs of Python 3 are not yet fully > supported by Apache Beam. > 'Some syntactic constructs of Python 3 are not yet fully supported by ' > [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already > exists in database: > [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: > GeneratedDatabase()->Add(encoded_file_descriptor, size): > libc++abi.dylib: terminating with uncaught exception of type > google::protobuf::FatalException: CHECK failed: > GeneratedDatabase()->Add(encoded_file_descriptor, size): > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam
[ https://issues.apache.org/jira/browse/BEAM-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965043#comment-16965043 ] Ahmet Altay commented on BEAM-8368: --- 0.15.1 was released a few hours ago ([https://pypi.org/project/pyarrow/0.15.1/]) Could we get this into the 2.17.0 branch? > [Python] libprotobuf-generated exception when importing apache_beam > --- > > Key: BEAM-8368 > URL: https://issues.apache.org/jira/browse/BEAM-8368 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.15.0, 2.17.0 >Reporter: Ubaier Bhat >Assignee: Brian Hulette >Priority: Blocker > Fix For: 2.17.0 > > Attachments: error_log.txt > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Unable to import apache_beam after upgrading to macos 10.15 (Catalina). > Cleared all the pipenvs and but can't get it working again. > {code} > import apache_beam as beam > /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84: > UserWarning: Some syntactic constructs of Python 3 are not yet fully > supported by Apache Beam. > 'Some syntactic constructs of Python 3 are not yet fully supported by ' > [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already > exists in database: > [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: > GeneratedDatabase()->Add(encoded_file_descriptor, size): > libc++abi.dylib: terminating with uncaught exception of type > google::protobuf::FatalException: CHECK failed: > GeneratedDatabase()->Add(encoded_file_descriptor, size): > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8523) Add useful timestamp to job servicer GetJobs
[ https://issues.apache.org/jira/browse/BEAM-8523?focusedWorklogId=337509&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337509 ] ASF GitHub Bot logged work on BEAM-8523: Author: ASF GitHub Bot Created on: 01/Nov/19 19:25 Start Date: 01/Nov/19 19:25 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9959: WIP: [BEAM-8523] JobAPI: Give access to timestamped state change history URL: https://github.com/apache/beam/pull/9959#discussion_r341718971 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/JobStateEvent.java ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.jobsubmission; + +import com.google.auto.value.AutoValue; +import java.time.Instant; +import org.apache.beam.model.jobmanagement.v1.JobApi; +import org.apache.beam.vendor.grpc.v1p21p0.com.google.protobuf.Timestamp; + +/** A state transition event. */ +@AutoValue +public abstract class JobStateEvent { Review comment: Do we want a `GetJobStateResponse` to have a `JobStateEvent` ? i.e. ```proto message GetJobStateResponse { JobStateEvent state_event = 1; // (required) } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337509) Time Spent: 40m (was: 0.5h) > Add useful timestamp to job servicer GetJobs > > > Key: BEAM-8523 > URL: https://issues.apache.org/jira/browse/BEAM-8523 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > As a user querying jobs with JobService.GetJobs, it would be useful if the > JobInfo result contained timestamps indicating various state changes that may > have been missed by a client. Useful timestamps include: > > * submitted (prepared to the job service) > * started (executor enters the RUNNING state) > * completed (executor enters a terminal state) > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8523) Add useful timestamp to job servicer GetJobs
[ https://issues.apache.org/jira/browse/BEAM-8523?focusedWorklogId=337507&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337507 ] ASF GitHub Bot logged work on BEAM-8523: Author: ASF GitHub Bot Created on: 01/Nov/19 19:23 Start Date: 01/Nov/19 19:23 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #9959: WIP: [BEAM-8523] JobAPI: Give access to timestamped state change history URL: https://github.com/apache/beam/pull/9959#discussion_r341718342 ## File path: model/job-management/src/main/proto/beam_job_api.proto ## @@ -153,6 +154,7 @@ message GetJobStateRequest { message GetJobStateResponse { Review comment: Why? What would be different about them? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337507) Time Spent: 0.5h (was: 20m) > Add useful timestamp to job servicer GetJobs > > > Key: BEAM-8523 > URL: https://issues.apache.org/jira/browse/BEAM-8523 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > As a user querying jobs with JobService.GetJobs, it would be useful if the > JobInfo result contained timestamps indicating various state changes that may > have been missed by a client. Useful timestamps include: > > * submitted (prepared to the job service) > * started (executor enters the RUNNING state) > * completed (executor enters a terminal state) > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8435) Allow access to PaneInfo from Python DoFns
[ https://issues.apache.org/jira/browse/BEAM-8435?focusedWorklogId=337506&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337506 ] ASF GitHub Bot logged work on BEAM-8435: Author: ASF GitHub Bot Created on: 01/Nov/19 19:22 Start Date: 01/Nov/19 19:22 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #9836: [BEAM-8435] Implement PaneInfo computation for Python. URL: https://github.com/apache/beam/pull/9836#discussion_r341716543 ## File path: sdks/python/apache_beam/transforms/trigger.py ## @@ -1162,12 +1173,33 @@ def process_timer(self, window_id, unused_name, time_domain, timestamp, if self.trigger_fn.should_fire(time_domain, timestamp, window, context): finished = self.trigger_fn.on_fire(timestamp, window, context) - yield self._output(window, finished, state) + yield self._output(window, finished, state, timestamp, + time_domain == TimeDomain.WATERMARK) else: raise Exception('Unexpected time domain: %s' % time_domain) - def _output(self, window, finished, state): + def _output(self, window, finished, state, watermark, maybe_ontime): """Output window and clean up if appropriate.""" +index = state.get_state(window, self.INDEX) +state.add_state(window, self.INDEX, 1) +if watermark <= window.max_timestamp(): + nonspeculative_index = -1 + timing = windowed_value.PaneInfoTiming.EARLY + if state.get_state(window, self.NONSPECULATIVE_INDEX): +logging.warning('Watermark moved backwards in time.') +else: + nonspeculative_index = state.get_state(window, self.NONSPECULATIVE_INDEX) + state.add_state(window, self.NONSPECULATIVE_INDEX, 1) + timing = ( + windowed_value.PaneInfoTiming.ON_TIME + if maybe_ontime and nonspeculative_index == 0 Review comment: Sorry, I miss read that line. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 337506) Time Spent: 1.5h (was: 1h 20m) > Allow access to PaneInfo from Python DoFns > -- > > Key: BEAM-8435 > URL: https://issues.apache.org/jira/browse/BEAM-8435 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > PaneInfoParam exists, but the plumbing to actually populate it at runtime was > never added. (Nor, clearly, were any tests...) -- This message was sent by Atlassian Jira (v8.3.4#803005)