[jira] [Work logged] (BEAM-8503) Improve TestBigQuery and TestPubsub

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8503?focusedWorklogId=337654&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337654
 ]

ASF GitHub Bot logged work on BEAM-8503:


Author: ASF GitHub Bot
Created on: 02/Nov/19 04:21
Start Date: 02/Nov/19 04:21
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #9880: 
[BEAM-8503] Improve TestBigQuery and TestPubsub
URL: https://github.com/apache/beam/pull/9880
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337654)
Time Spent: 2h 10m  (was: 2h)

> Improve TestBigQuery and TestPubsub
> ---
>
> Key: BEAM-8503
> URL: https://issues.apache.org/jira/browse/BEAM-8503
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Minor
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Add better support for E2E BigQuery and Pubsub testing:
> - TestBigQuery should have the ability to insert data into the underlying 
> table before a test.
> - TestPubsub should have the ability to subcribe to the underlying topic and 
> read messages that were written during a test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8547) Portable Wordcount fails with on stadalone Flink cluster

2019-11-01 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-8547:
--
Description: 
Repro:
 # git checkout origin/release-2.16.0
 # ./flink-1.8.2/bin/start-cluster.sh
 # gradlew :runners:flink:1.8:job-server:runShadow 
-PflinkMasterUrl=localhost:8081
 # python -m apache_beam.examples.wordcount --input=/etc/profile 
--output=/tmp/py-wordcount-direct --runner=PortableRunner 
--experiments=worker_threads=100 --parallelism=1 
--shutdown_sources_on_final_watermark --sdk_worker_parallelism=1 
--environment_cache_millis=6 --job_endpoint=localhost:8099

This causes the runner to crash with:
{noformat}
Traceback (most recent call last):
  File 
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 158, in _execute
response = task()
  File 
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 191, in 
self._execute(lambda: worker.do_instruction(work), work)
  File 
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 343, in do_instruction
request.instruction_id)
  File 
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 369, in process_bundle
bundle_processor.process_bundle(instruction_id))
  File 
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
 line 663, in process_bundle
data.ptransform_id].process_encoded(data.data)
  File 
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
 line 143, in process_encoded
self.output(decoded_value)
  File "apache_beam/runners/worker/operations.py", line 255, in 
apache_beam.runners.worker.operations.Operation.output
  File "apache_beam/runners/worker/operations.py", line 256, in 
apache_beam.runners.worker.operations.Operation.output
  File "apache_beam/runners/worker/operations.py", line 143, in 
apache_beam.runners.worker.operations.SingletonConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 593, in 
apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 594, in 
apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 776, in 
apache_beam.runners.common.DoFnRunner.receive
  File "apache_beam/runners/common.py", line 782, in 
apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 849, in 
apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "/usr/local/lib/python3.7/site-packages/future/utils/__init__.py", line 
421, in raise_with_traceback
raise exc.with_traceback(traceback)
  File "apache_beam/runners/common.py", line 780, in 
apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 587, in 
apache_beam.runners.common.PerWindowInvoker.invoke_process
  File "apache_beam/runners/common.py", line 660, in 
apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/iobase.py", line 
1042, in process
self.writer = self.sink.open_writer(init_result, str(uuid.uuid4()))
  File 
"/usr/local/lib/python3.7/site-packages/apache_beam/options/value_provider.py", 
line 137, in _f
return fnc(self, *args, **kwargs)
  File 
"/usr/local/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", line 
186, in open_writer
return FileBasedSinkWriter(self, writer_path)
  File 
"/usr/local/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", line 
390, in __init__
self.temp_handle = self.sink.open(temp_shard_path)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/textio.py", line 
391, in open
file_handle = super(_TextSink, self).open(temp_path)
  File 
"/usr/local/lib/python3.7/site-packages/apache_beam/options/value_provider.py", 
line 137, in _f
return fnc(self, *args, **kwargs)
  File 
"/usr/local/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", line 
129, in open
return FileSystems.create(temp_path, self.mime_type, self.compression_type)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/filesystems.py", 
line 203, in create
return filesystem.create(path, mime_type, compression_type)
  File 
"/usr/local/lib/python3.7/site-packages/apache_beam/io/localfilesystem.py", 
line 151, in create
return self._path_open(path, 'wb', mime_type, compression_type)
  File 
"/usr/local/lib/python3.7/site-packages/apache_beam/io/localfilesystem.py", 
line 134, in _path_open
raw_file = open(path, mode)
RuntimeError: FileNotFoundError: [Errno 2] No such file or directory: 
'/tmp/beam-temp-py-wordcount-direct-ea951c18fd1211e9ac84a0c589d778c3/d39e13af-277b-437e-89f2-e00249287e1d.py-wordcount-direct'
 [while running 'write/Write/WriteImpl/WriteBundles'] {noformat}

[jira] [Work logged] (BEAM-8146) SchemaCoder/RowCoder have no equals() function

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8146?focusedWorklogId=337647&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337647
 ]

ASF GitHub Bot logged work on BEAM-8146:


Author: ASF GitHub Bot
Created on: 02/Nov/19 02:09
Start Date: 02/Nov/19 02:09
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #9493: 
[BEAM-8146,BEAM-8204,BEAM-8205] Add equals and hashCode to SchemaCoder and 
RowCoder
URL: https://github.com/apache/beam/pull/9493
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337647)
Time Spent: 3.5h  (was: 3h 20m)

> SchemaCoder/RowCoder have no equals() function
> --
>
> Key: BEAM-8146
> URL: https://issues.apache.org/jira/browse/BEAM-8146
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> SchemaCoder has no equals function, so it can't be compared in tests, like 
> CloudComponentsTests$DefaultCoders, which is being re-enabled in 
> https://github.com/apache/beam/pull/9446



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-7859) Portable Wordcount on Spark runner does not work in DOCKER execution mode.

2019-11-01 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965196#comment-16965196
 ] 

Valentyn Tymofieiev edited comment on BEAM-7859 at 11/2/19 2:00 AM:


Saw a similar error with Flink, when running against a standalone cluster: 
BEAM-8547. It happened on 2.16.0 even with --sdk_worker_parallelism=1.


was (Author: tvalentyn):
Saw a similar error with Flink: BEAM-8547. It happened on 2.16.0 even with 
--sdk_worker_parallelism=1.

> Portable Wordcount on Spark runner does not work in DOCKER execution mode.
> --
>
> Key: BEAM-7859
> URL: https://issues.apache.org/jira/browse/BEAM-7859
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark, sdk-py-harness
>Reporter: Valentyn Tymofieiev
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
> Fix For: 2.16.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> It seems that portable wordcount on Spark runner works in Loopback mode only. 
> If we use a container environment, the wordcount fails on Spark runner (both 
> on Python 2 and Python 3), but passes on Flink runner.
> ./gradlew :sdks:python:container:py3:docker
> ./gradlew :runners:spark:job-server:runShadow# replace to 
> :runners:flink:1.5:job-server:runShadow  for WC to pass.
> ./gradlew :sdks:python:test-suites:portable:py35:portableWordCountBatch  
> -PjobEndpoint=localhost:8099
> {noformat} 
>  Task :sdks:python:test-suites:portable:py35:portableWordCountBatch
> /usr/local/google/home/valentyn/projects/beam/cleanflink/beam/sdks/python/apache_beam/__init__.py:84:
>  UserWarning: Some syntactic constructs of Python 3 are not yet fully 
> supported by Apache Beam.
>   'Some syntactic constructs of Python 3 are not yet fully supported by '
> INFO:root:Using latest locally built Python SDK docker image: 
> valentyn-docker-apache.bintray.io/beam/python3:latest.
> INFO:root:  
> 
> INFO:root:  
> 
> WARNING:root:Discarding unparseable args: ['--parallelism=2', 
> '--shutdown_sources_on_final_watermark']
> INFO:root:Job state changed to RUNNING
> ERROR:root:java.lang.RuntimeException: Error received from SDK harness for 
> instruction 2: Traceback (most recent call last):
>   File "apache_beam/runners/common.py", line 782, in 
> apache_beam.runners.common.DoFnRunner.process
>   File "apache_beam/runners/common.py", line 594, in 
> apache_beam.runners.common.PerWindowInvoker.invoke_process
>   File "apache_beam/runners/common.py", line 666, in 
> apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", 
> line 1041, in process
> self.writer = self.sink.open_writer(init_result, str(uuid.uuid4()))
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/options/value_provider.py",
>  line 137, in _f
> return fnc(self, *args, **kwargs)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 186, in open_writer
> return FileBasedSinkWriter(self, writer_path)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 390, in __init__
> self.temp_handle = self.sink.open(temp_shard_path)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/textio.py", 
> line 391, in open
> file_handle = super(_TextSink, self).open(temp_path)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/options/value_provider.py",
>  line 137, in _f
> return fnc(self, *args, **kwargs)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 129, in open
> return FileSystems.create(temp_path, self.mime_type, 
> self.compression_type)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line 
> 203, in create
> return filesystem.create(path, mime_type, compression_type)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/localfilesystem.py", 
> line 151, in create
> return self._path_open(path, 'wb', mime_type, compression_type)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/localfilesystem.py", 
> line 134, in _path_open
> raw_file = open(path, mode)
> FileNotFoundError: [Errno 2] No such file or directory: 
> '/tmp/beam-temp-py-wordcount-direct-1590df80b36011e98bf3f4939fefdbd5/ab0591ee-f8b9-471f-8edd-4b4a89481bc2.py-wordcount-direct'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 157, in _execute
> 

[jira] [Comment Edited] (BEAM-7859) Portable Wordcount on Spark runner does not work in DOCKER execution mode.

2019-11-01 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965196#comment-16965196
 ] 

Valentyn Tymofieiev edited comment on BEAM-7859 at 11/2/19 2:00 AM:


Saw a similar error with Flink, when running against a standalone Flink 
cluster: BEAM-8547. It happened on 2.16.0 even with --sdk_worker_parallelism=1.


was (Author: tvalentyn):
Saw a similar error with Flink, when running against a standalone cluster: 
BEAM-8547. It happened on 2.16.0 even with --sdk_worker_parallelism=1.

> Portable Wordcount on Spark runner does not work in DOCKER execution mode.
> --
>
> Key: BEAM-7859
> URL: https://issues.apache.org/jira/browse/BEAM-7859
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark, sdk-py-harness
>Reporter: Valentyn Tymofieiev
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
> Fix For: 2.16.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> It seems that portable wordcount on Spark runner works in Loopback mode only. 
> If we use a container environment, the wordcount fails on Spark runner (both 
> on Python 2 and Python 3), but passes on Flink runner.
> ./gradlew :sdks:python:container:py3:docker
> ./gradlew :runners:spark:job-server:runShadow# replace to 
> :runners:flink:1.5:job-server:runShadow  for WC to pass.
> ./gradlew :sdks:python:test-suites:portable:py35:portableWordCountBatch  
> -PjobEndpoint=localhost:8099
> {noformat} 
>  Task :sdks:python:test-suites:portable:py35:portableWordCountBatch
> /usr/local/google/home/valentyn/projects/beam/cleanflink/beam/sdks/python/apache_beam/__init__.py:84:
>  UserWarning: Some syntactic constructs of Python 3 are not yet fully 
> supported by Apache Beam.
>   'Some syntactic constructs of Python 3 are not yet fully supported by '
> INFO:root:Using latest locally built Python SDK docker image: 
> valentyn-docker-apache.bintray.io/beam/python3:latest.
> INFO:root:  
> 
> INFO:root:  
> 
> WARNING:root:Discarding unparseable args: ['--parallelism=2', 
> '--shutdown_sources_on_final_watermark']
> INFO:root:Job state changed to RUNNING
> ERROR:root:java.lang.RuntimeException: Error received from SDK harness for 
> instruction 2: Traceback (most recent call last):
>   File "apache_beam/runners/common.py", line 782, in 
> apache_beam.runners.common.DoFnRunner.process
>   File "apache_beam/runners/common.py", line 594, in 
> apache_beam.runners.common.PerWindowInvoker.invoke_process
>   File "apache_beam/runners/common.py", line 666, in 
> apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", 
> line 1041, in process
> self.writer = self.sink.open_writer(init_result, str(uuid.uuid4()))
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/options/value_provider.py",
>  line 137, in _f
> return fnc(self, *args, **kwargs)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 186, in open_writer
> return FileBasedSinkWriter(self, writer_path)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 390, in __init__
> self.temp_handle = self.sink.open(temp_shard_path)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/textio.py", 
> line 391, in open
> file_handle = super(_TextSink, self).open(temp_path)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/options/value_provider.py",
>  line 137, in _f
> return fnc(self, *args, **kwargs)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 129, in open
> return FileSystems.create(temp_path, self.mime_type, 
> self.compression_type)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line 
> 203, in create
> return filesystem.create(path, mime_type, compression_type)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/localfilesystem.py", 
> line 151, in create
> return self._path_open(path, 'wb', mime_type, compression_type)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/localfilesystem.py", 
> line 134, in _path_open
> raw_file = open(path, mode)
> FileNotFoundError: [Errno 2] No such file or directory: 
> '/tmp/beam-temp-py-wordcount-direct-1590df80b36011e98bf3f4939fefdbd5/ab0591ee-f8b9-471f-8edd-4b4a89481bc2.py-wordcount-direct'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/runners/w

[jira] [Created] (BEAM-8547) Portable Wordcount fails with on stadalone Flink cluster

2019-11-01 Thread Valentyn Tymofieiev (Jira)
Valentyn Tymofieiev created BEAM-8547:
-

 Summary: Portable Wordcount fails with on stadalone Flink cluster 
 Key: BEAM-8547
 URL: https://issues.apache.org/jira/browse/BEAM-8547
 Project: Beam
  Issue Type: Bug
  Components: runner-flink, sdk-py-harness
Reporter: Valentyn Tymofieiev


Repro:
 # git checkout origin/release-2.16.0
 # ./flink-1.8.2/bin/start-cluster.sh
 # gradlew :runners:flink:1.8:job-server:runShadow 
-PflinkMasterUrl=localhost:8081
 # python -m apache_beam.examples.wordcount --input=/etc/profile 
--output=/tmp/py-wordcount-direct --runner=PortableRunner 
--experiments=worker_threads=100 --parallelism=1 
--shutdown_sources_on_final_watermark --sdk_worker_parallelism=1 
--environment_cache_millis=6 --job_endpoint=localhost:8099

This causes the runner to crash with:
{noformat}
Traceback (most recent call last):
  File 
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 158, in _execute
response = task()
  File 
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 191, in 
self._execute(lambda: worker.do_instruction(work), work)
  File 
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 343, in do_instruction
request.instruction_id)
  File 
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 369, in process_bundle
bundle_processor.process_bundle(instruction_id))
  File 
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
 line 663, in process_bundle
data.ptransform_id].process_encoded(data.data)
  File 
"/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
 line 143, in process_encoded
self.output(decoded_value)
  File "apache_beam/runners/worker/operations.py", line 255, in 
apache_beam.runners.worker.operations.Operation.output
  File "apache_beam/runners/worker/operations.py", line 256, in 
apache_beam.runners.worker.operations.Operation.output
  File "apache_beam/runners/worker/operations.py", line 143, in 
apache_beam.runners.worker.operations.SingletonConsumerSet.receive
  File "apache_beam/runners/worker/operations.py", line 593, in 
apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/worker/operations.py", line 594, in 
apache_beam.runners.worker.operations.DoOperation.process
  File "apache_beam/runners/common.py", line 776, in 
apache_beam.runners.common.DoFnRunner.receive
  File "apache_beam/runners/common.py", line 782, in 
apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 849, in 
apache_beam.runners.common.DoFnRunner._reraise_augmented
  File "/usr/local/lib/python3.7/site-packages/future/utils/__init__.py", line 
421, in raise_with_traceback
raise exc.with_traceback(traceback)
  File "apache_beam/runners/common.py", line 780, in 
apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 587, in 
apache_beam.runners.common.PerWindowInvoker.invoke_process
  File "apache_beam/runners/common.py", line 660, in 
apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/iobase.py", line 
1042, in process
self.writer = self.sink.open_writer(init_result, str(uuid.uuid4()))
  File 
"/usr/local/lib/python3.7/site-packages/apache_beam/options/value_provider.py", 
line 137, in _f
return fnc(self, *args, **kwargs)
  File 
"/usr/local/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", line 
186, in open_writer
return FileBasedSinkWriter(self, writer_path)
  File 
"/usr/local/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", line 
390, in __init__
self.temp_handle = self.sink.open(temp_shard_path)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/textio.py", line 
391, in open
file_handle = super(_TextSink, self).open(temp_path)
  File 
"/usr/local/lib/python3.7/site-packages/apache_beam/options/value_provider.py", 
line 137, in _f
return fnc(self, *args, **kwargs)
  File 
"/usr/local/lib/python3.7/site-packages/apache_beam/io/filebasedsink.py", line 
129, in open
return FileSystems.create(temp_path, self.mime_type, self.compression_type)
  File "/usr/local/lib/python3.7/site-packages/apache_beam/io/filesystems.py", 
line 203, in create
return filesystem.create(path, mime_type, compression_type)
  File 
"/usr/local/lib/python3.7/site-packages/apache_beam/io/localfilesystem.py", 
line 151, in create
return self._path_open(path, 'wb', mime_type, compression_type)
  File 
"/usr/local/lib/python3.7/site-packages/apache_beam/io/localfilesystem.py", 
line 134, in _path_open
raw_file = open(path, mode)
RuntimeError: FileNotFoundError: [Errno 2] No such file or directory: 
'/t

[jira] [Commented] (BEAM-7859) Portable Wordcount on Spark runner does not work in DOCKER execution mode.

2019-11-01 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965196#comment-16965196
 ] 

Valentyn Tymofieiev commented on BEAM-7859:
---

Saw a similar error with Flink: BEAM-8547. It happened on 2.16.0 even with 
--sdk_worker_parallelism=1.

> Portable Wordcount on Spark runner does not work in DOCKER execution mode.
> --
>
> Key: BEAM-7859
> URL: https://issues.apache.org/jira/browse/BEAM-7859
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark, sdk-py-harness
>Reporter: Valentyn Tymofieiev
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
> Fix For: 2.16.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> It seems that portable wordcount on Spark runner works in Loopback mode only. 
> If we use a container environment, the wordcount fails on Spark runner (both 
> on Python 2 and Python 3), but passes on Flink runner.
> ./gradlew :sdks:python:container:py3:docker
> ./gradlew :runners:spark:job-server:runShadow# replace to 
> :runners:flink:1.5:job-server:runShadow  for WC to pass.
> ./gradlew :sdks:python:test-suites:portable:py35:portableWordCountBatch  
> -PjobEndpoint=localhost:8099
> {noformat} 
>  Task :sdks:python:test-suites:portable:py35:portableWordCountBatch
> /usr/local/google/home/valentyn/projects/beam/cleanflink/beam/sdks/python/apache_beam/__init__.py:84:
>  UserWarning: Some syntactic constructs of Python 3 are not yet fully 
> supported by Apache Beam.
>   'Some syntactic constructs of Python 3 are not yet fully supported by '
> INFO:root:Using latest locally built Python SDK docker image: 
> valentyn-docker-apache.bintray.io/beam/python3:latest.
> INFO:root:  
> 
> INFO:root:  
> 
> WARNING:root:Discarding unparseable args: ['--parallelism=2', 
> '--shutdown_sources_on_final_watermark']
> INFO:root:Job state changed to RUNNING
> ERROR:root:java.lang.RuntimeException: Error received from SDK harness for 
> instruction 2: Traceback (most recent call last):
>   File "apache_beam/runners/common.py", line 782, in 
> apache_beam.runners.common.DoFnRunner.process
>   File "apache_beam/runners/common.py", line 594, in 
> apache_beam.runners.common.PerWindowInvoker.invoke_process
>   File "apache_beam/runners/common.py", line 666, in 
> apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", 
> line 1041, in process
> self.writer = self.sink.open_writer(init_result, str(uuid.uuid4()))
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/options/value_provider.py",
>  line 137, in _f
> return fnc(self, *args, **kwargs)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 186, in open_writer
> return FileBasedSinkWriter(self, writer_path)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 390, in __init__
> self.temp_handle = self.sink.open(temp_shard_path)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/io/textio.py", 
> line 391, in open
> file_handle = super(_TextSink, self).open(temp_path)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/options/value_provider.py",
>  line 137, in _f
> return fnc(self, *args, **kwargs)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", 
> line 129, in open
> return FileSystems.create(temp_path, self.mime_type, 
> self.compression_type)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", line 
> 203, in create
> return filesystem.create(path, mime_type, compression_type)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/localfilesystem.py", 
> line 151, in create
> return self._path_open(path, 'wb', mime_type, compression_type)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/io/localfilesystem.py", 
> line 134, in _path_open
> raw_file = open(path, mode)
> FileNotFoundError: [Errno 2] No such file or directory: 
> '/tmp/beam-temp-py-wordcount-direct-1590df80b36011e98bf3f4939fefdbd5/ab0591ee-f8b9-471f-8edd-4b4a89481bc2.py-wordcount-direct'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 157, in _execute
> response = task()
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 190, in 
> self._execute(lambda: worker.do_instruction(work), work)
>   File 
> "/usr/local/li

[jira] [Updated] (BEAM-7859) Portable Wordcount on Spark runner does not work in DOCKER execution mode.

2019-11-01 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-7859:
--
Description: 
It seems that portable wordcount on Spark runner works in Loopback mode only. 
If we use a container environment, the wordcount fails on Spark runner (both on 
Python 2 and Python 3), but passes on Flink runner.

./gradlew :sdks:python:container:py3:docker
./gradlew :runners:spark:job-server:runShadow# replace to 
:runners:flink:1.5:job-server:runShadow  for WC to pass.
./gradlew :sdks:python:test-suites:portable:py35:portableWordCountBatch  
-PjobEndpoint=localhost:8099
{noformat} 
 Task :sdks:python:test-suites:portable:py35:portableWordCountBatch
/usr/local/google/home/valentyn/projects/beam/cleanflink/beam/sdks/python/apache_beam/__init__.py:84:
 UserWarning: Some syntactic constructs of Python 3 are not yet fully supported 
by Apache Beam.
  'Some syntactic constructs of Python 3 are not yet fully supported by '
INFO:root:Using latest locally built Python SDK docker image: 
valentyn-docker-apache.bintray.io/beam/python3:latest.
INFO:root:  

INFO:root:  

WARNING:root:Discarding unparseable args: ['--parallelism=2', 
'--shutdown_sources_on_final_watermark']
INFO:root:Job state changed to RUNNING
ERROR:root:java.lang.RuntimeException: Error received from SDK harness for 
instruction 2: Traceback (most recent call last):
  File "apache_beam/runners/common.py", line 782, in 
apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 594, in 
apache_beam.runners.common.PerWindowInvoker.invoke_process
  File "apache_beam/runners/common.py", line 666, in 
apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
  File "/usr/local/lib/python3.5/site-packages/apache_beam/io/iobase.py", line 
1041, in process
self.writer = self.sink.open_writer(init_result, str(uuid.uuid4()))
  File 
"/usr/local/lib/python3.5/site-packages/apache_beam/options/value_provider.py", 
line 137, in _f
return fnc(self, *args, **kwargs)
  File 
"/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", line 
186, in open_writer
return FileBasedSinkWriter(self, writer_path)
  File 
"/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", line 
390, in __init__
self.temp_handle = self.sink.open(temp_shard_path)
  File "/usr/local/lib/python3.5/site-packages/apache_beam/io/textio.py", line 
391, in open
file_handle = super(_TextSink, self).open(temp_path)
  File 
"/usr/local/lib/python3.5/site-packages/apache_beam/options/value_provider.py", 
line 137, in _f
return fnc(self, *args, **kwargs)
  File 
"/usr/local/lib/python3.5/site-packages/apache_beam/io/filebasedsink.py", line 
129, in open
return FileSystems.create(temp_path, self.mime_type, self.compression_type)
  File "/usr/local/lib/python3.5/site-packages/apache_beam/io/filesystems.py", 
line 203, in create
return filesystem.create(path, mime_type, compression_type)
  File 
"/usr/local/lib/python3.5/site-packages/apache_beam/io/localfilesystem.py", 
line 151, in create
return self._path_open(path, 'wb', mime_type, compression_type)
  File 
"/usr/local/lib/python3.5/site-packages/apache_beam/io/localfilesystem.py", 
line 134, in _path_open
raw_file = open(path, mode)
FileNotFoundError: [Errno 2] No such file or directory: 
'/tmp/beam-temp-py-wordcount-direct-1590df80b36011e98bf3f4939fefdbd5/ab0591ee-f8b9-471f-8edd-4b4a89481bc2.py-wordcount-direct'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File 
"/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 157, in _execute
response = task()
  File 
"/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 190, in 
self._execute(lambda: worker.do_instruction(work), work)
  File 
"/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 342, in do_instruction
request.instruction_id)
  File 
"/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker.py",
 line 368, in process_bundle
bundle_processor.process_bundle(instruction_id))
  File 
"/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py",
 line 593, in process_bundle
data.ptransform_id].process_encoded(data.data)
  File 
"/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py",
 line 143, in process_encoded
self.output(decoded_value)
  File "apache_beam/runners/worker/operations.py", line 255, in 
apache_beam.runners.worker.operations.Operation.output
  File "apache_beam/runners/worker/operations.py", line 256, in 
apache_beam.runners.worker.operations.Operation.output
  File "apache_beam/runners/worker/oper

[jira] [Work logged] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8368?focusedWorklogId=337645&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337645
 ]

ASF GitHub Bot logged work on BEAM-8368:


Author: ASF GitHub Bot
Created on: 02/Nov/19 01:19
Start Date: 02/Nov/19 01:19
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9970: [BEAM-8368] 
[BEAM-8392] Update pyarrow to the latest version
URL: https://github.com/apache/beam/pull/9970#issuecomment-548996697
 
 
   > Well it's a pretty simple fix, just need to change [this 
line](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/parquetio_test.py#L451)
 to `pa.Table.from_arrays([orig.column('name')], names=['name'])`
   
   A bit of a side question. It seems like minor versions could be breaking 
with pyarrow. Should we pin this library, instead of using an upper bound?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337645)
Time Spent: 3h  (was: 2h 50m)

> [Python] libprotobuf-generated exception when importing apache_beam
> ---
>
> Key: BEAM-8368
> URL: https://issues.apache.org/jira/browse/BEAM-8368
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.15.0, 2.17.0
>Reporter: Ubaier Bhat
>Assignee: Brian Hulette
>Priority: Blocker
> Fix For: 2.17.0
>
> Attachments: error_log.txt
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Unable to import apache_beam after upgrading to macos 10.15 (Catalina). 
> Cleared all the pipenvs and but can't get it working again.
> {code}
> import apache_beam as beam
> /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84:
>  UserWarning: Some syntactic constructs of Python 3 are not yet fully 
> supported by Apache Beam.
>   'Some syntactic constructs of Python 3 are not yet fully supported by '
> [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already 
> exists in database: 
> [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> libc++abi.dylib: terminating with uncaught exception of type 
> google::protobuf::FatalException: CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8254) (Java SDK) Add workerRegion and workerZone options

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8254?focusedWorklogId=337643&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337643
 ]

ASF GitHub Bot logged work on BEAM-8254:


Author: ASF GitHub Bot
Created on: 02/Nov/19 01:05
Start Date: 02/Nov/19 01:05
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9961: [BEAM-8254] add 
workerRegion and workerZone options to Java SDK
URL: https://github.com/apache/beam/pull/9961#discussion_r341792444
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java
 ##
 @@ -357,6 +361,36 @@ public static DataflowRunner fromOptions(PipelineOptions 
options) {
 return new DataflowRunner(dataflowOptions);
   }
 
+  @VisibleForTesting
+  static void validateWorkerSettings(GcpOptions gcpOptions) {
+Preconditions.checkArgument(
+gcpOptions.getZone() == null || gcpOptions.getWorkerRegion() == null,
+"Cannot use option zone with workerRegion.");
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337643)
Time Spent: 0.5h  (was: 20m)

> (Java SDK) Add workerRegion and workerZone options
> --
>
> Key: BEAM-8254
> URL: https://issues.apache.org/jira/browse/BEAM-8254
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow, sdk-java-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=337642&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337642
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 02/Nov/19 01:04
Start Date: 02/Nov/19 01:04
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9885: [BEAM-8457] Label 
Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#issuecomment-548995354
 
 
   R: @pabloem could you make a first review pass?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337642)
Time Spent: 5h 20m  (was: 5h 10m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8442) Unfiy bundle register in Python SDK harness

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8442?focusedWorklogId=337641&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337641
 ]

ASF GitHub Bot logged work on BEAM-8442:


Author: ASF GitHub Bot
Created on: 02/Nov/19 01:02
Start Date: 02/Nov/19 01:02
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9842: [BEAM-8442] Remove 
duplicate code for bundle register in Python SDK harness
URL: https://github.com/apache/beam/pull/9842#issuecomment-548995176
 
 
   For reference, this was reverted in https://github.com/apache/beam/pull/9956
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337641)
Time Spent: 1h 50m  (was: 1h 40m)

> Unfiy bundle register in Python SDK harness
> ---
>
> Key: BEAM-8442
> URL: https://issues.apache.org/jira/browse/BEAM-8442
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> There are two methods for bundle register in Python SDK harness:
> `SdkHarness._request_register` and `SdkWorker.register.` It should be unfied.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8521) beam_PostCommit_XVR_Flink failing

2019-11-01 Thread Ahmet Altay (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965182#comment-16965182
 ] 

Ahmet Altay commented on BEAM-8521:
---

Great, thank you!

> beam_PostCommit_XVR_Flink failing
> -
>
> Key: BEAM-8521
> URL: https://issues.apache.org/jira/browse/BEAM-8521
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> https://builds.apache.org/job/beam_PostCommit_XVR_Flink/
> Edit: Made subtasks for what appear to be two separate issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=337640&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337640
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 02/Nov/19 01:00
Start Date: 02/Nov/19 01:00
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9920: [BEAM-7389] Add 
code snippets for CombineGlobally
URL: https://github.com/apache/beam/pull/9920#discussion_r341792002
 
 

 ##
 File path: 
sdks/python/apache_beam/examples/snippets/transforms/aggregation/combineglobally.py
 ##
 @@ -0,0 +1,214 @@
+# coding=utf-8
 
 Review comment:
   Consider sharing more code among the other combine PR and GBK PR.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337640)
Time Spent: 75h 40m  (was: 75.5h)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 75h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=337639&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337639
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 02/Nov/19 00:59
Start Date: 02/Nov/19 00:59
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9924: [BEAM-7389] Add 
code snippet for Distinct
URL: https://github.com/apache/beam/pull/9924
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337639)
Time Spent: 75.5h  (was: 75h 20m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 75.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=337638&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337638
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 02/Nov/19 00:59
Start Date: 02/Nov/19 00:59
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9922: [BEAM-7389] Add 
code snippets for CombineValues
URL: https://github.com/apache/beam/pull/9922#discussion_r341791911
 
 

 ##
 File path: 
sdks/python/apache_beam/examples/snippets/transforms/aggregation/combinevalues.py
 ##
 @@ -0,0 +1,246 @@
+# coding=utf-8
 
 Review comment:
   There is a lot of repeated code between this PR and GBK snippets PR. 
Consider merging them maybe?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337638)
Time Spent: 75h 20m  (was: 75h 10m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 75h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8521) beam_PostCommit_XVR_Flink failing

2019-11-01 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965180#comment-16965180
 ] 

Kyle Weaver commented on BEAM-8521:
---

We reverted the change to the registration code: 
https://github.com/apache/beam/pull/9956

> beam_PostCommit_XVR_Flink failing
> -
>
> Key: BEAM-8521
> URL: https://issues.apache.org/jira/browse/BEAM-8521
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> https://builds.apache.org/job/beam_PostCommit_XVR_Flink/
> Edit: Made subtasks for what appear to be two separate issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=337637&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337637
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 02/Nov/19 00:57
Start Date: 02/Nov/19 00:57
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9720: [BEAM-8335] Add 
initial modules for interactive streaming support
URL: https://github.com/apache/beam/pull/9720#discussion_r341791789
 
 

 ##
 File path: model/interactive/OWNERS
 ##
 @@ -0,0 +1,7 @@
+# See the OWNERS docs at https://s.apache.org/beam-owners
+
+reviewers:
 
 Review comment:
   You might want to limit this list to people who are likely to maintain this 
path.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337637)
Time Spent: 14h 50m  (was: 14h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 14h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=337636&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337636
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 02/Nov/19 00:55
Start Date: 02/Nov/19 00:55
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9921: [BEAM-7389] Add 
code snippets for CombinePerKey
URL: https://github.com/apache/beam/pull/9921
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337636)
Time Spent: 75h 10m  (was: 75h)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 75h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8521) beam_PostCommit_XVR_Flink failing

2019-11-01 Thread Ahmet Altay (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965175#comment-16965175
 ] 

Ahmet Altay edited comment on BEAM-8521 at 11/2/19 12:52 AM:
-

Does this need to be fixed before 2.18?

 

Should we rollback the related PR or make some change in the registration code?

 

/cc [~angoenka] [~mxm]


was (Author: altay):
Does this need to be fixed before 2.18?

> beam_PostCommit_XVR_Flink failing
> -
>
> Key: BEAM-8521
> URL: https://issues.apache.org/jira/browse/BEAM-8521
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> https://builds.apache.org/job/beam_PostCommit_XVR_Flink/
> Edit: Made subtasks for what appear to be two separate issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8521) beam_PostCommit_XVR_Flink failing

2019-11-01 Thread Ahmet Altay (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965175#comment-16965175
 ] 

Ahmet Altay commented on BEAM-8521:
---

Does this need to be fixed before 2.18?

> beam_PostCommit_XVR_Flink failing
> -
>
> Key: BEAM-8521
> URL: https://issues.apache.org/jira/browse/BEAM-8521
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> https://builds.apache.org/job/beam_PostCommit_XVR_Flink/
> Edit: Made subtasks for what appear to be two separate issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8521) beam_PostCommit_XVR_Flink failing

2019-11-01 Thread Ahmet Altay (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-8521:
--
Fix Version/s: 2.18.0

> beam_PostCommit_XVR_Flink failing
> -
>
> Key: BEAM-8521
> URL: https://issues.apache.org/jira/browse/BEAM-8521
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> https://builds.apache.org/job/beam_PostCommit_XVR_Flink/
> Edit: Made subtasks for what appear to be two separate issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-6860) WriteToText crash with "GlobalWindow -> ._IntervalWindowBase"

2019-11-01 Thread Ahmet Altay (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965173#comment-16965173
 ] 

Ahmet Altay commented on BEAM-6860:
---

[~chamikara] any idea?

> WriteToText crash with "GlobalWindow -> ._IntervalWindowBase"
> -
>
> Key: BEAM-6860
> URL: https://issues.apache.org/jira/browse/BEAM-6860
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.11.0
> Environment: macOS, DirectRunner, python 2.7.15 via 
> pyenv/pyenv-virtualenv
>Reporter: Henrik
>Assignee: Pawel Kordek
>Priority: Critical
>  Labels: newbie
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Main error:
> > Cannot convert GlobalWindow to 
> > apache_beam.utils.windowed_value._IntervalWindowBase
> This is very hard for me to debug. Doing a DoPar call before, printing the 
> input, gives me just what I want; so the lines of data to serialise are 
> "alright"; just JSON strings, in fact.
> Stacktrace:
> {code:java}
> Traceback (most recent call last):
>   File "./okr_end_ride.py", line 254, in 
>     run()
>   File "./okr_end_ride.py", line 250, in run
>     run_pipeline(pipeline_options, known_args)
>   File "./okr_end_ride.py", line 198, in run_pipeline
>     | 'write_all' >> WriteToText(known_args.output, 
> file_name_suffix=".txt")
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py",
>  line 426, in __exit__
>     self.run().wait_until_finish()
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py",
>  line 406, in run
>     self._options).run(False)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py",
>  line 419, in run
>     return self.runner.run_pipeline(self, self._options)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/direct/direct_runner.py",
>  line 132, in run_pipeline
>     return runner.run_pipeline(pipeline, options)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 275, in run_pipeline
>     default_environment=self._default_environment))
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 278, in run_via_runner_api
>     return self.run_stages(*self.create_stages(pipeline_proto))
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 354, in run_stages
>     stage_context.safe_coders)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 509, in run_stage
>     data_input, data_output)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 1206, in process_bundle
>     result_future = self._controller.control_handler.push(process_bundle)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 821, in push
>     response = self.worker.do_instruction(request)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 265, in do_instruction
>     request.instruction_id)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 281, in process_bundle
>     delayed_applications = bundle_processor.process_bundle(instruction_id)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 552, in process_bundle
>     op.finish()
>   File "apache_beam/runners/worker/operations.py", line 549, in 
> apache_beam.runners.worker.operations.DoOperation.finish
>   File "apache_beam/runners/worker/operations.py", line 550, in 
> apache_beam.runners.worker.operations.DoOperation.finish
>   File "apache_beam/runners/worker/operations.py", line 551, in 
> apache_beam.runners.worker.operations.DoOperation.finish
>   File "apache_beam/runners/common.py", line 758, in 
> apache_beam.runners.common.DoFnRunner.finish
>   File "apache_beam/runners/common.py", line 752, in 
> apache_beam.runners.common.DoFnRunner._invoke_bundle_method
>   File "apache_beam/runners/common.py", line 777, in 
> apache_beam.runners.common.DoFnRunner._re

[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=337633&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337633
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 02/Nov/19 00:44
Start Date: 02/Nov/19 00:44
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9056: [BEAM-7746] Add 
python type hints
URL: https://github.com/apache/beam/pull/9056#discussion_r341790865
 
 

 ##
 File path: sdks/python/tox.ini
 ##
 @@ -160,16 +160,17 @@ commands =
 [testenv:py27-lint]
 # Checks for py2 syntax errors
 deps =
+  -r build-requirements.txt
 
 Review comment:
   `tox` runs outside of gradle. (At least for me). What is the issue there?
   
   > I assume the reason for the two step test process (sdist + tox, each with 
its own virtualenv) is to ensure that tox is using an sdist built in the same 
way every time, using python2, just as it would be for distribution?
   
   sdist step is used for other things (for building artifacts for release). 
That is the reason for it to exists on its own.
   
   tox, I do not believe we spent time on configuring it to reuse existing 
virtualenv. If that is an option we can try that.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337633)
Time Spent: 17h  (was: 16h 50m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 17h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8442) Unfiy bundle register in Python SDK harness

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8442?focusedWorklogId=337632&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337632
 ]

ASF GitHub Bot logged work on BEAM-8442:


Author: ASF GitHub Bot
Created on: 02/Nov/19 00:40
Start Date: 02/Nov/19 00:40
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #9842: [BEAM-8442] Remove 
duplicate code for bundle register in Python SDK harness
URL: https://github.com/apache/beam/pull/9842#issuecomment-548992947
 
 
   Yes, there was a similar issue with Java in the past. It is best if 
registration is synchronous since then any ProcessBundleRequest can error out 
if the descriptor isn't available.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337632)
Time Spent: 1h 40m  (was: 1.5h)

> Unfiy bundle register in Python SDK harness
> ---
>
> Key: BEAM-8442
> URL: https://issues.apache.org/jira/browse/BEAM-8442
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> There are two methods for bundle register in Python SDK harness:
> `SdkHarness._request_register` and `SdkWorker.register.` It should be unfied.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8254) (Java SDK) Add workerRegion and workerZone options

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8254?focusedWorklogId=337630&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337630
 ]

ASF GitHub Bot logged work on BEAM-8254:


Author: ASF GitHub Bot
Created on: 02/Nov/19 00:39
Start Date: 02/Nov/19 00:39
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9961: [BEAM-8254] add 
workerRegion and workerZone options to Java SDK
URL: https://github.com/apache/beam/pull/9961#discussion_r341790434
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java
 ##
 @@ -357,6 +361,36 @@ public static DataflowRunner fromOptions(PipelineOptions 
options) {
 return new DataflowRunner(dataflowOptions);
   }
 
+  @VisibleForTesting
+  static void validateWorkerSettings(GcpOptions gcpOptions) {
+Preconditions.checkArgument(
+gcpOptions.getZone() == null || gcpOptions.getWorkerRegion() == null,
+"Cannot use option zone with workerRegion.");
 
 Review comment:
   "Cannot use option zone with workerRegion." -> "Cannot use option zone with 
workerRegion. Prefer workerRegion and workerZone." ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337630)
Time Spent: 20m  (was: 10m)

> (Java SDK) Add workerRegion and workerZone options
> --
>
> Key: BEAM-8254
> URL: https://issues.apache.org/jira/browse/BEAM-8254
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow, sdk-java-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8252) (Python SDK) Add worker_region and worker_zone options

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8252?focusedWorklogId=337631&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337631
 ]

ASF GitHub Bot logged work on BEAM-8252:


Author: ASF GitHub Bot
Created on: 02/Nov/19 00:39
Start Date: 02/Nov/19 00:39
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9594: [BEAM-8252] Python: 
add worker_region and worker_zone options
URL: https://github.com/apache/beam/pull/9594#issuecomment-548992867
 
 
   Ack. Reviewed #9961 
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337631)
Time Spent: 1h 10m  (was: 1h)

> (Python SDK) Add worker_region and worker_zone options
> --
>
> Key: BEAM-8252
> URL: https://issues.apache.org/jira/browse/BEAM-8252
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow, sdk-py-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=337629&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337629
 ]

ASF GitHub Bot logged work on BEAM-8539:


Author: ASF GitHub Bot
Created on: 02/Nov/19 00:37
Start Date: 02/Nov/19 00:37
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9969: [BEAM-8539] 
Provide an initial definition of all job states and the state transition diagram
URL: https://github.com/apache/beam/pull/9969#discussion_r341790390
 
 

 ##
 File path: model/job-management/src/main/proto/beam_job_api.proto
 ##
 @@ -201,6 +201,16 @@ message JobMessagesResponse {
 }
 
 // Enumeration of all JobStates
+//
+// The state transition diagram is:
+//   STOPPED -> STARTING -> RUNNING -> DONE
+//  \> FAILED
+//  \> CANCELLING -> CANCELLED
+//  \> UPDATING -> UPDATED
+//  \> DRAINING -> DRAINED
 
 Review comment:
   Is there someone we can loop into this conversation who would know?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337629)
Time Spent: 1h 50m  (was: 1h 40m)

> Clearly define the valid job state transitions
> --
>
> Key: BEAM-8539
> URL: https://issues.apache.org/jira/browse/BEAM-8539
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, runner-core, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The Beam job state transitions are ill-defined, which is big problem for 
> anything that relies on the values coming from JobAPI.GetStateStream.
> I was hoping to find something like a state transition diagram in the docs so 
> that I could determine the start state, the terminal states, and the valid 
> transitions, but I could not find this. The code reveals that the SDKs differ 
> on the fundamentals:
> Java InMemoryJobService:
>  * start state: *STOPPED*
>  * run - about to submit to executor:  STARTING
>  * run - actually running on executor:  RUNNING
>  * terminal states: DONE, FAILED, CANCELLED, DRAINED
> Python AbstractJobServiceServicer / LocalJobServicer:
>  * start state: STARTING
>  * terminal states: DONE, FAILED, CANCELLED, *STOPPED*
> I think it would be good to make python work like Java, so that there is a 
> difference in state between a job that has been prepared and one that has 
> additionally been run.
> It's hard to tell how far this problem has spread within the various runners. 
>  I think a simple thing that can be done to help standardize behavior is to 
> implement the terminal states as an enum in the beam_job_api.proto, or create 
> a utility function in each language for checking if a state is terminal, so 
> that it's not left up to each runner to reimplement this logic.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8472) Get default GCP region from gcloud

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8472?focusedWorklogId=337628&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337628
 ]

ASF GitHub Bot logged work on BEAM-8472:


Author: ASF GitHub Bot
Created on: 02/Nov/19 00:34
Start Date: 02/Nov/19 00:34
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9974: [BEAM-8472] Get 
default GCP region from gcloud (Java)
URL: https://github.com/apache/beam/pull/9974#discussion_r341790105
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowRunner.java
 ##
 @@ -357,6 +357,50 @@ public static DataflowRunner fromOptions(PipelineOptions 
options) {
 return new DataflowRunner(dataflowOptions);
   }
 
+  /**
+   * Get a default value for Google Cloud region according to
+   * https://cloud.google.com/compute/docs/gcloud-compute/#default-properties. 
If no other default
+   * can be found, returns "us-central1".
+   */
+  static String getDefaultGcpRegion() {
+String environmentRegion = System.getenv("CLOUDSDK_COMPUTE_REGION");
+if (environmentRegion != null && !environmentRegion.isEmpty()) {
+  LOG.info("Using default GCP region {} from $CLOUDSDK_COMPUTE_REGION", 
environmentRegion);
+  return environmentRegion;
+}
+try {
+  ProcessBuilder pb =
+  new ProcessBuilder(Arrays.asList("gcloud", "config", "get-value", 
"compute/region"));
+  Process process = pb.start();
+  BufferedReader reader =
+  new BufferedReader(
+  new InputStreamReader(process.getInputStream(), 
StandardCharsets.UTF_8));
+  BufferedReader errorReader =
+  new BufferedReader(
+  new InputStreamReader(process.getErrorStream(), 
StandardCharsets.UTF_8));
+  process.waitFor(1, TimeUnit.SECONDS);
 
 Review comment:
   - Is 1 second enough?
   - Should we check the return value of this call? (IIUC it returns false for 
timeout.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337628)
Time Spent: 1h 50m  (was: 1h 40m)

> Get default GCP region from gcloud
> --
>
> Key: BEAM-8472
> URL: https://issues.apache.org/jira/browse/BEAM-8472
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently, we default to us-central1 if --region flag is not set. The Google 
> Cloud SDK generally tries to get a default value in this case for 
> convenience, which we should follow. 
> [https://cloud.google.com/compute/docs/gcloud-compute/#order_of_precedence_for_default_properties]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8545) don't docker pull before docker run

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8545?focusedWorklogId=337627&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337627
 ]

ASF GitHub Bot logged work on BEAM-8545:


Author: ASF GitHub Bot
Created on: 02/Nov/19 00:29
Start Date: 02/Nov/19 00:29
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9972: [BEAM-8545] don't docker 
pull before docker run
URL: https://github.com/apache/beam/pull/9972#issuecomment-548991708
 
 
   Run XVR_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337627)
Time Spent: 50m  (was: 40m)

> don't docker pull  before docker run
> 
>
> Key: BEAM-8545
> URL: https://issues.apache.org/jira/browse/BEAM-8545
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Since 'docker run' automatically pulls when the image doesn't exist locally, 
> I think it's safe to remove explicit 'docker pull' before 'docker run'. 
> Without 'docker pull', we won't update the local image with the remote image 
> (for the same tag) but it shouldn't be a problem in prod that the unique tag 
> is assumed for each released version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8243) Document behavior of FlinkRunner

2019-11-01 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965162#comment-16965162
 ] 

Kyle Weaver edited comment on BEAM-8243 at 11/2/19 12:12 AM:
-

Adding BEAM-8512 as a blocker as a way of saying I don't want to advertise 
features we haven't fully tested yet :)

[~thw] Do you mean older versions of FlinkRunner, or the "traditional" process 
of job submission?


was (Author: ibzib):
Adding BEAM-8512 as a way of saying I don't want to advertise features we 
haven't fully tested yet :)

[~thw] Do you mean older versions of FlinkRunner, or the "traditional" process 
of job submission?

> Document behavior of FlinkRunner
> 
>
> Key: BEAM-8243
> URL: https://issues.apache.org/jira/browse/BEAM-8243
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink, website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>
> The Flink runner guide should include a couple details
> 1) FlinkRunner pulls the job server jar from Maven by default (need to make 
> this explicit in case of firewall concerns)
> 2) how to override in case the above is a problem



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8243) Document behavior of FlinkRunner

2019-11-01 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965164#comment-16965164
 ] 

Kyle Weaver commented on BEAM-8243:
---

Fair enough.

> Document behavior of FlinkRunner
> 
>
> Key: BEAM-8243
> URL: https://issues.apache.org/jira/browse/BEAM-8243
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink, website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>
> The Flink runner guide should include a couple details
> 1) FlinkRunner pulls the job server jar from Maven by default (need to make 
> this explicit in case of firewall concerns)
> 2) how to override in case the above is a problem



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8243) Document behavior of FlinkRunner

2019-11-01 Thread Thomas Weise (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965163#comment-16965163
 ] 

Thomas Weise commented on BEAM-8243:


I meant older Beam releases, we should probably only advertise the latest for 
portability. But the "traditional" process of job submission is also too scary 
for first time user :)

> Document behavior of FlinkRunner
> 
>
> Key: BEAM-8243
> URL: https://issues.apache.org/jira/browse/BEAM-8243
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink, website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>
> The Flink runner guide should include a couple details
> 1) FlinkRunner pulls the job server jar from Maven by default (need to make 
> this explicit in case of firewall concerns)
> 2) how to override in case the above is a problem



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=337624&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337624
 ]

ASF GitHub Bot logged work on BEAM-8539:


Author: ASF GitHub Bot
Created on: 02/Nov/19 00:07
Start Date: 02/Nov/19 00:07
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #9969: [BEAM-8539] 
Provide an initial definition of all job states and the state transition diagram
URL: https://github.com/apache/beam/pull/9969#discussion_r341787418
 
 

 ##
 File path: model/job-management/src/main/proto/beam_job_api.proto
 ##
 @@ -201,6 +201,16 @@ message JobMessagesResponse {
 }
 
 // Enumeration of all JobStates
+//
+// The state transition diagram is:
+//   STOPPED -> STARTING -> RUNNING -> DONE
+//  \> FAILED
+//  \> CANCELLING -> CANCELLED
+//  \> UPDATING -> UPDATED
+//  \> DRAINING -> DRAINED
 
 Review comment:
   That should either be DONE, FAILED, or CANCELLED
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337624)
Time Spent: 1h 40m  (was: 1.5h)

> Clearly define the valid job state transitions
> --
>
> Key: BEAM-8539
> URL: https://issues.apache.org/jira/browse/BEAM-8539
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, runner-core, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The Beam job state transitions are ill-defined, which is big problem for 
> anything that relies on the values coming from JobAPI.GetStateStream.
> I was hoping to find something like a state transition diagram in the docs so 
> that I could determine the start state, the terminal states, and the valid 
> transitions, but I could not find this. The code reveals that the SDKs differ 
> on the fundamentals:
> Java InMemoryJobService:
>  * start state: *STOPPED*
>  * run - about to submit to executor:  STARTING
>  * run - actually running on executor:  RUNNING
>  * terminal states: DONE, FAILED, CANCELLED, DRAINED
> Python AbstractJobServiceServicer / LocalJobServicer:
>  * start state: STARTING
>  * terminal states: DONE, FAILED, CANCELLED, *STOPPED*
> I think it would be good to make python work like Java, so that there is a 
> difference in state between a job that has been prepared and one that has 
> additionally been run.
> It's hard to tell how far this problem has spread within the various runners. 
>  I think a simple thing that can be done to help standardize behavior is to 
> implement the terminal states as an enum in the beam_job_api.proto, or create 
> a utility function in each language for checking if a state is terminal, so 
> that it's not left up to each runner to reimplement this logic.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8243) Document behavior of FlinkRunner

2019-11-01 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965162#comment-16965162
 ] 

Kyle Weaver edited comment on BEAM-8243 at 11/2/19 12:07 AM:
-

Adding BEAM-8512 as a way of saying I don't want to advertise features we 
haven't fully tested yet :)

[~thw] Do you mean older versions of FlinkRunner, or the "traditional" process 
of job submission?


was (Author: ibzib):
Adding BEAM-8512 as a way of saying I don't want to advertise features we 
haven't fully tested yet :)

[~thw] What do you mean?

> Document behavior of FlinkRunner
> 
>
> Key: BEAM-8243
> URL: https://issues.apache.org/jira/browse/BEAM-8243
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink, website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>
> The Flink runner guide should include a couple details
> 1) FlinkRunner pulls the job server jar from Maven by default (need to make 
> this explicit in case of firewall concerns)
> 2) how to override in case the above is a problem



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8243) Document behavior of FlinkRunner

2019-11-01 Thread Kyle Weaver (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965162#comment-16965162
 ] 

Kyle Weaver commented on BEAM-8243:
---

Adding BEAM-8512 as a way of saying I don't want to advertise features we 
haven't fully tested yet :)

[~thw] What do you mean?

> Document behavior of FlinkRunner
> 
>
> Key: BEAM-8243
> URL: https://issues.apache.org/jira/browse/BEAM-8243
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink, website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>
> The Flink runner guide should include a couple details
> 1) FlinkRunner pulls the job server jar from Maven by default (need to make 
> this explicit in case of firewall concerns)
> 2) how to override in case the above is a problem



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8472) Get default GCP region from gcloud

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8472?focusedWorklogId=337623&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337623
 ]

ASF GitHub Bot logged work on BEAM-8472:


Author: ASF GitHub Bot
Created on: 01/Nov/19 23:54
Start Date: 01/Nov/19 23:54
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9974: [BEAM-8472] Get 
default GCP region from gcloud (Java)
URL: https://github.com/apache/beam/pull/9974
 
 
   Same as #9868 but more verbose :smile: 
   
   As with the Python version, I've verified this works locally. This doesn't 
fit into a unit test because it requires special environment configuration 
and/or calling an external process, but nor does it seem to warrant an 
integration test.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https:

[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=337620&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337620
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 01/Nov/19 23:52
Start Date: 01/Nov/19 23:52
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9892: [BEAM-8427] 
[SQL] buildIOWrite from MongoDb Table
URL: https://github.com/apache/beam/pull/9892#discussion_r341785685
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbReadWriteIT.java
 ##
 @@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.meta.provider.mongodb;
+
+import static org.apache.beam.sdk.schemas.Schema.FieldType.BOOLEAN;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.BYTE;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.DOUBLE;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.FLOAT;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.INT16;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.INT32;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.INT64;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.STRING;
+import static org.junit.Assert.assertEquals;
+
+import com.mongodb.MongoClient;
+import java.util.Arrays;
+import org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils;
+import org.apache.beam.sdk.io.mongodb.MongoDBIOIT.MongoDBPipelineOptions;
+import org.apache.beam.sdk.options.PipelineOptionsFactory;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.testing.PAssert;
+import org.apache.beam.sdk.testing.TestPipeline;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/**
+ * A test of {@link 
org.apache.beam.sdk.extensions.sql.meta.provider.mongodb.MongoDbTable} on an
+ * independent Mongo instance.
+ *
+ * This test requires a running instance of MongoDB. Pass in connection 
information using
+ * PipelineOptions:
+ *
+ * 
+ *  ./gradlew integrationTest -p sdks/java/extensions/sql/integrationTest 
-DintegrationTestPipelineOptions='[
+ *  "--mongoDBHostName=1.2.3.4",
+ *  "--mongoDBPort=27017",
+ *  "--mongoDBDatabaseName=mypass",
+ *  "--numberOfRecords=1000" ]'
+ *  --tests 
org.apache.beam.sdk.extensions.sql.meta.provider.mongodb.MongoDbReadWriteIT
+ *  -DintegrationTestRunner=direct
+ * 
+ *
+ * A database, specified in the pipeline options, will be created implicitly 
if it does not exist
+ * already. And dropped upon completing tests.
+ *
+ * Please see 'build_rules.gradle' file for instructions regarding running 
this test using Beam
+ * performance testing framework.
+ */
+@RunWith(JUnit4.class)
+public class MongoDbReadWriteIT {
+  private static final Schema SOURCE_SCHEMA =
+  Schema.builder()
+  .addNullableField("_id", STRING)
+  .addNullableField("c_bigint", INT64)
+  .addNullableField("c_tinyint", BYTE)
+  .addNullableField("c_smallint", INT16)
+  .addNullableField("c_integer", INT32)
+  .addNullableField("c_float", FLOAT)
+  .addNullableField("c_double", DOUBLE)
+  .addNullableField("c_boolean", BOOLEAN)
+  .addNullableField("c_varchar", STRING)
+  .addNullableField("c_arr", FieldType.array(STRING))
+  .build();
+  private static final String collection = "collection";
+  private static MongoDBPipelineOptions options;
+
+  @Rule public final TestPipeline writePipeline = TestPipeline.create();
+  @Rule public final TestPipeline readPipeline = TestPipeline.create();
+
+  @BeforeClass
+  public static void setUp() throws Exception {

[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=337621&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337621
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 01/Nov/19 23:52
Start Date: 01/Nov/19 23:52
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9892: [BEAM-8427] 
[SQL] buildIOWrite from MongoDb Table
URL: https://github.com/apache/beam/pull/9892#discussion_r341785685
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbReadWriteIT.java
 ##
 @@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.meta.provider.mongodb;
+
+import static org.apache.beam.sdk.schemas.Schema.FieldType.BOOLEAN;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.BYTE;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.DOUBLE;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.FLOAT;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.INT16;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.INT32;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.INT64;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.STRING;
+import static org.junit.Assert.assertEquals;
+
+import com.mongodb.MongoClient;
+import java.util.Arrays;
+import org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils;
+import org.apache.beam.sdk.io.mongodb.MongoDBIOIT.MongoDBPipelineOptions;
+import org.apache.beam.sdk.options.PipelineOptionsFactory;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.testing.PAssert;
+import org.apache.beam.sdk.testing.TestPipeline;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/**
+ * A test of {@link 
org.apache.beam.sdk.extensions.sql.meta.provider.mongodb.MongoDbTable} on an
+ * independent Mongo instance.
+ *
+ * This test requires a running instance of MongoDB. Pass in connection 
information using
+ * PipelineOptions:
+ *
+ * 
+ *  ./gradlew integrationTest -p sdks/java/extensions/sql/integrationTest 
-DintegrationTestPipelineOptions='[
+ *  "--mongoDBHostName=1.2.3.4",
+ *  "--mongoDBPort=27017",
+ *  "--mongoDBDatabaseName=mypass",
+ *  "--numberOfRecords=1000" ]'
+ *  --tests 
org.apache.beam.sdk.extensions.sql.meta.provider.mongodb.MongoDbReadWriteIT
+ *  -DintegrationTestRunner=direct
+ * 
+ *
+ * A database, specified in the pipeline options, will be created implicitly 
if it does not exist
+ * already. And dropped upon completing tests.
+ *
+ * Please see 'build_rules.gradle' file for instructions regarding running 
this test using Beam
+ * performance testing framework.
+ */
+@RunWith(JUnit4.class)
+public class MongoDbReadWriteIT {
+  private static final Schema SOURCE_SCHEMA =
+  Schema.builder()
+  .addNullableField("_id", STRING)
+  .addNullableField("c_bigint", INT64)
+  .addNullableField("c_tinyint", BYTE)
+  .addNullableField("c_smallint", INT16)
+  .addNullableField("c_integer", INT32)
+  .addNullableField("c_float", FLOAT)
+  .addNullableField("c_double", DOUBLE)
+  .addNullableField("c_boolean", BOOLEAN)
+  .addNullableField("c_varchar", STRING)
+  .addNullableField("c_arr", FieldType.array(STRING))
+  .build();
+  private static final String collection = "collection";
+  private static MongoDBPipelineOptions options;
+
+  @Rule public final TestPipeline writePipeline = TestPipeline.create();
+  @Rule public final TestPipeline readPipeline = TestPipeline.create();
+
+  @BeforeClass
+  public static void setUp() throws Exception {

[jira] [Work logged] (BEAM-8146) SchemaCoder/RowCoder have no equals() function

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8146?focusedWorklogId=337616&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337616
 ]

ASF GitHub Bot logged work on BEAM-8146:


Author: ASF GitHub Bot
Created on: 01/Nov/19 23:23
Start Date: 01/Nov/19 23:23
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9493: 
[BEAM-8146,BEAM-8204,BEAM-8205] Add equals and hashCode to SchemaCoder and 
RowCoder
URL: https://github.com/apache/beam/pull/9493#issuecomment-548982144
 
 
   +1 on merging @kennknowles 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337616)
Time Spent: 3h 20m  (was: 3h 10m)

> SchemaCoder/RowCoder have no equals() function
> --
>
> Key: BEAM-8146
> URL: https://issues.apache.org/jira/browse/BEAM-8146
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> SchemaCoder has no equals function, so it can't be compared in tests, like 
> CloudComponentsTests$DefaultCoders, which is being re-enabled in 
> https://github.com/apache/beam/pull/9446



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=337614&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337614
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 01/Nov/19 23:19
Start Date: 01/Nov/19 23:19
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9741: [BEAM-7926] Visualize 
PCollection
URL: https://github.com/apache/beam/pull/9741#issuecomment-548981473
 
 
   Squashed and merged after Ahmet LGTMd, and myself as well.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337614)
Time Spent: 19h 50m  (was: 19h 40m)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 19h 50m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=337613&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337613
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 01/Nov/19 23:18
Start Date: 01/Nov/19 23:18
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9741: [BEAM-7926] 
Visualize PCollection
URL: https://github.com/apache/beam/pull/9741
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337613)
Time Spent: 19h 40m  (was: 19.5h)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 19h 40m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=337612&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337612
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 01/Nov/19 23:16
Start Date: 01/Nov/19 23:16
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on issue #9778: [BEAM-7013] Update 
BigQueryHllSketchCompatibilityIT to cover empty sketch cases
URL: https://github.com/apache/beam/pull/9778#issuecomment-548981025
 
 
   > is there any way we could make the conversion functors 
(parseQueryResultToByteArray, and the inlined one for the other direction) 
available to users as utility objects or methods?
   
   I have considered this but unfortunately we cannot do that, because in the 
same function users might want to parse other fields. However I do find a way 
to extract part of the logic into its own function. See the 
`getSketchFromByteBuffer` function.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337612)
Time Spent: 36h 20m  (was: 36h 10m)

> A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 36h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8467) Enable reading compressed files with Python fileio

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8467?focusedWorklogId=337611&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337611
 ]

ASF GitHub Bot logged work on BEAM-8467:


Author: ASF GitHub Bot
Created on: 01/Nov/19 23:15
Start Date: 01/Nov/19 23:15
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9861: [BEAM-8467] Enabling 
reading compressed files
URL: https://github.com/apache/beam/pull/9861#issuecomment-548980903
 
 
   Run Python Precommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337611)
Time Spent: 1h  (was: 50m)

> Enable reading compressed files with Python fileio
> --
>
> Key: BEAM-8467
> URL: https://issues.apache.org/jira/browse/BEAM-8467
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-files
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8467) Enable reading compressed files with Python fileio

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8467?focusedWorklogId=337610&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337610
 ]

ASF GitHub Bot logged work on BEAM-8467:


Author: ASF GitHub Bot
Created on: 01/Nov/19 23:15
Start Date: 01/Nov/19 23:15
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9861: [BEAM-8467] Enabling 
reading compressed files
URL: https://github.com/apache/beam/pull/9861#issuecomment-548980872
 
 
   Run Python2_PVR_Flink Precommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337610)
Time Spent: 50m  (was: 40m)

> Enable reading compressed files with Python fileio
> --
>
> Key: BEAM-8467
> URL: https://issues.apache.org/jira/browse/BEAM-8467
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-files
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8435) Allow access to PaneInfo from Python DoFns

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8435?focusedWorklogId=337597&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337597
 ]

ASF GitHub Bot logged work on BEAM-8435:


Author: ASF GitHub Bot
Created on: 01/Nov/19 22:46
Start Date: 01/Nov/19 22:46
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9836: [BEAM-8435] 
Implement PaneInfo computation for Python.
URL: https://github.com/apache/beam/pull/9836#issuecomment-548975390
 
 
   Thanks. Fixing the lint error and ensuring all tests pass before merging. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337597)
Time Spent: 1h 40m  (was: 1.5h)

> Allow access to PaneInfo from Python DoFns
> --
>
> Key: BEAM-8435
> URL: https://issues.apache.org/jira/browse/BEAM-8435
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> PaneInfoParam exists, but the plumbing to actually populate it at runtime was 
> never added. (Nor, clearly, were any tests...)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8544) Install Beam SDK with ccache for faster re-install.

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8544?focusedWorklogId=337596&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337596
 ]

ASF GitHub Bot logged work on BEAM-8544:


Author: ASF GitHub Bot
Created on: 01/Nov/19 22:44
Start Date: 01/Nov/19 22:44
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9966: [BEAM-8544] 
Use ccache for compiling the Beam Python SDK.
URL: https://github.com/apache/beam/pull/9966#discussion_r341774895
 
 

 ##
 File path: sdks/python/container/Dockerfile
 ##
 @@ -43,9 +45,13 @@ RUN \
 # Remove pip cache.
 rm -rf /root/.cache/pip
 
+# Configure ccache prior to installing the SDK.
+RUN ln -s /usr/bin/ccache /usr/local/bin/gcc
+# These parameters are needed as pip compiles artifacts in random temporary 
directories.
+RUN ccache --set-config=sloppiness=file_macro && ccache 
--set-config=hash_dir=false
 
 COPY target/apache-beam.tar.gz /opt/apache/beam/tars/
-RUN pip install /opt/apache/beam/tars/apache-beam.tar.gz[gcp] && \
+RUN pip install -v /opt/apache/beam/tars/apache-beam.tar.gz[gcp] && \
 # Remove pip cache.
 rm -rf /root/.cache/pip
 
 
 Review comment:
   Done. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337596)
Time Spent: 50m  (was: 40m)

> Install Beam SDK with ccache for faster re-install.
> ---
>
> Key: BEAM-8544
> URL: https://issues.apache.org/jira/browse/BEAM-8544
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Re-compliling the C modules of the SDK takes 2-3 minutes. This adds to worker 
> startup time whenever a custom SDK is being used (in particular, during 
> development and testing). We can use ccache to re-use the old compile results 
> when the Cython files have not changed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8491) Add ability for multiple output PCollections from composites

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8491?focusedWorklogId=337588&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337588
 ]

ASF GitHub Bot logged work on BEAM-8491:


Author: ASF GitHub Bot
Created on: 01/Nov/19 22:30
Start Date: 01/Nov/19 22:30
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #9912: [BEAM-8491] Add 
ability for replacing transforms with multiple outputs
URL: https://github.com/apache/beam/pull/9912#issuecomment-548972093
 
 
   Thanks Cham!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337588)
Time Spent: 1h 20m  (was: 1h 10m)

> Add ability for multiple output PCollections from composites
> 
>
> Key: BEAM-8491
> URL: https://issues.apache.org/jira/browse/BEAM-8491
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The Python SDK has DoOutputTuples which allows for a single transform to have 
> multiple outputs. However, this does not include the ability for a composite 
> transform to have multiple outputs PCollections from different transforms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8491) Add ability for multiple output PCollections from composites

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8491?focusedWorklogId=337586&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337586
 ]

ASF GitHub Bot logged work on BEAM-8491:


Author: ASF GitHub Bot
Created on: 01/Nov/19 22:28
Start Date: 01/Nov/19 22:28
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9912: 
[BEAM-8491] Add ability for replacing transforms with multiple outputs
URL: https://github.com/apache/beam/pull/9912#discussion_r341771799
 
 

 ##
 File path: sdks/python/apache_beam/pipeline_test.py
 ##
 @@ -438,6 +438,101 @@ def get_replacement_transform(self, ptransform):
   p.replace_all([override])
   self.assertEqual(pcoll.producer.inputs[0].element_type, expected_type)
 
+
+  def test_ptransform_override_multiple_outputs(self):
+class _MultiParDoComposite(PTransform):
 
 Review comment:
   Nope! Thanks for the catch! Deleted.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337586)
Time Spent: 1h 10m  (was: 1h)

> Add ability for multiple output PCollections from composites
> 
>
> Key: BEAM-8491
> URL: https://issues.apache.org/jira/browse/BEAM-8491
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The Python SDK has DoOutputTuples which allows for a single transform to have 
> multiple outputs. However, this does not include the ability for a composite 
> transform to have multiple outputs PCollections from different transforms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8491) Add ability for multiple output PCollections from composites

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8491?focusedWorklogId=337585&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337585
 ]

ASF GitHub Bot logged work on BEAM-8491:


Author: ASF GitHub Bot
Created on: 01/Nov/19 22:28
Start Date: 01/Nov/19 22:28
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9912: 
[BEAM-8491] Add ability for replacing transforms with multiple outputs
URL: https://github.com/apache/beam/pull/9912#discussion_r341771747
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -263,31 +262,29 @@ def _replace_if_needed(self, original_transform_node):
 
   new_output = replacement_transform.expand(input_node)
 
-  new_output.element_type = None
-  self.pipeline._infer_result_type(replacement_transform, inputs,
-   new_output)
-
+  if isinstance(new_output, pvalue.PValue):
+new_output.element_type = None
+self.pipeline._infer_result_type(replacement_transform, inputs,
+ new_output)
   replacement_transform_node.add_output(new_output)
-  if not new_output.producer:
-new_output.producer = replacement_transform_node
-
-  # We only support replacing transforms with a single output with
-  # another transform that produces a single output.
-  # TODO: Support replacing PTransforms with multiple outputs.
-  if (len(original_transform_node.outputs) > 1 or
-  not isinstance(original_transform_node.outputs[None],
- (PCollection, PDone)) or
-  not isinstance(new_output, (PCollection, PDone))):
-raise NotImplementedError(
-'PTransform overriding is only supported for PTransforms that '
-'have a single output. Tried to replace output of '
-'AppliedPTransform %r with %r.'
-% (original_transform_node, new_output))
 
   # Recording updated outputs. This cannot be done in the same visitor
   # since if we dynamically update output type here, we'll run into
   # errors when visiting child nodes.
-  output_map[original_transform_node.outputs[None]] = new_output
+  if isinstance(new_output, pvalue.PValue):
+if not new_output.producer:
+  new_output.producer = replacement_transform_node
+output_map[original_transform_node.outputs[None]] = new_output
+  elif isinstance(new_output, (pvalue.DoOutputsTuple, tuple)):
+for pcoll in new_output:
+  if not pcoll.producer:
+pcoll.producer = replacement_transform_node
+  output_map[original_transform_node.outputs[pcoll.tag]] = pcoll
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337585)
Time Spent: 1h  (was: 50m)

> Add ability for multiple output PCollections from composites
> 
>
> Key: BEAM-8491
> URL: https://issues.apache.org/jira/browse/BEAM-8491
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The Python SDK has DoOutputTuples which allows for a single transform to have 
> multiple outputs. However, this does not include the ability for a composite 
> transform to have multiple outputs PCollections from different transforms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8368?focusedWorklogId=337584&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337584
 ]

ASF GitHub Bot logged work on BEAM-8368:


Author: ASF GitHub Bot
Created on: 01/Nov/19 22:25
Start Date: 01/Nov/19 22:25
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9970: [BEAM-8368] 
[BEAM-8392] Update pyarrow to the latest version
URL: https://github.com/apache/beam/pull/9970#issuecomment-548971157
 
 
   Thank you @TheNeuralBit!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337584)
Time Spent: 2h 50m  (was: 2h 40m)

> [Python] libprotobuf-generated exception when importing apache_beam
> ---
>
> Key: BEAM-8368
> URL: https://issues.apache.org/jira/browse/BEAM-8368
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.15.0, 2.17.0
>Reporter: Ubaier Bhat
>Assignee: Brian Hulette
>Priority: Blocker
> Fix For: 2.17.0
>
> Attachments: error_log.txt
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Unable to import apache_beam after upgrading to macos 10.15 (Catalina). 
> Cleared all the pipenvs and but can't get it working again.
> {code}
> import apache_beam as beam
> /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84:
>  UserWarning: Some syntactic constructs of Python 3 are not yet fully 
> supported by Apache Beam.
>   'Some syntactic constructs of Python 3 are not yet fully supported by '
> [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already 
> exists in database: 
> [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> libc++abi.dylib: terminating with uncaught exception of type 
> google::protobuf::FatalException: CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-6747) Adding ExternalTransform in Java SDK

2019-11-01 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee resolved BEAM-6747.
---
Fix Version/s: 2.13.0
   Resolution: Fixed

Created a separate issue: BEAM-8546

> Adding ExternalTransform in Java SDK
> 
>
> Key: BEAM-6747
> URL: https://issues.apache.org/jira/browse/BEAM-6747
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core, sdk-java-core
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
> Fix For: 2.13.0
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> Adding Java counterpart of Python ExternalTransform for testing Python 
> transforms from pipelines in Java SDK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8546) Moving Java external transform from core-construction-java to core

2019-11-01 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-8546:
-

 Summary: Moving Java external transform from 
core-construction-java to core
 Key: BEAM-8546
 URL: https://issues.apache.org/jira/browse/BEAM-8546
 Project: Beam
  Issue Type: Improvement
  Components: sdk-ideas, sdk-java-core
Reporter: Heejong Lee


Creating a separate issue from BEAM-6747.

Related discusstion: 
[https://lists.apache.org/list.html?d...@beam.apache.org:lte=111M:How%20to%20expose%2Fuse%20the%20External%20transform%20on%20Java%20SDK]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=337582&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337582
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 01/Nov/19 22:08
Start Date: 01/Nov/19 22:08
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #9954: [BEAM-8335] Add 
the PTuple
URL: https://github.com/apache/beam/pull/9954#issuecomment-548967401
 
 
   I agree that I can use either dicts or tuples but I don't agree that they 
are a good user experience. Look at how you access the PCollection: dicts are 
accessed by strings, tuples are accessed by ints, namedtuples are accessed by 
field names, DoOutputsTuples are accessed by any of the prior. Now imagine that 
you have an arbitrary PTransform with multiple outputs, because of all the 
different access patterns the user won't know how to access the outputs. While 
it may be nice to have "first class support" in theory, in practice it doesn't 
work well.
   
   That's where the PTuple comes in. This gives the user what they want: a 
homogeneous access pattern for multiple outputs.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337582)
Time Spent: 14h 40m  (was: 14.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 14h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3713) Consider moving away from nose to nose2 or pytest.

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3713?focusedWorklogId=337580&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337580
 ]

ASF GitHub Bot logged work on BEAM-3713:


Author: ASF GitHub Bot
Created on: 01/Nov/19 22:00
Start Date: 01/Nov/19 22:00
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9756: [BEAM-3713] Add pytest 
for unit tests
URL: https://github.com/apache/beam/pull/9756#issuecomment-548965339
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337580)
Time Spent: 11.5h  (was: 11h 20m)

> Consider moving away from nose to nose2 or pytest.
> --
>
> Key: BEAM-3713
> URL: https://issues.apache.org/jira/browse/BEAM-3713
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: Robert Bradshaw
>Assignee: Udi Meiri
>Priority: Minor
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> Per 
> [https://nose.readthedocs.io/en/latest/|https://nose.readthedocs.io/en/latest/,]
>  , nose is in maintenance mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8545) don't docker pull before docker run

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8545?focusedWorklogId=337579&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337579
 ]

ASF GitHub Bot logged work on BEAM-8545:


Author: ASF GitHub Bot
Created on: 01/Nov/19 21:51
Start Date: 01/Nov/19 21:51
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9972: [BEAM-8545] don't docker 
pull before docker run
URL: https://github.com/apache/beam/pull/9972#issuecomment-548963123
 
 
   run xvr_flink postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337579)
Time Spent: 40m  (was: 0.5h)

> don't docker pull  before docker run
> 
>
> Key: BEAM-8545
> URL: https://issues.apache.org/jira/browse/BEAM-8545
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Since 'docker run' automatically pulls when the image doesn't exist locally, 
> I think it's safe to remove explicit 'docker pull' before 'docker run'. 
> Without 'docker pull', we won't update the local image with the remote image 
> (for the same tag) but it shouldn't be a problem in prod that the unique tag 
> is assumed for each released version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8545) don't docker pull before docker run

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8545?focusedWorklogId=337578&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337578
 ]

ASF GitHub Bot logged work on BEAM-8545:


Author: ASF GitHub Bot
Created on: 01/Nov/19 21:51
Start Date: 01/Nov/19 21:51
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9972: [BEAM-8545] don't docker 
pull before docker run
URL: https://github.com/apache/beam/pull/9972#issuecomment-548963259
 
 
   Run XVR_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337578)
Time Spent: 0.5h  (was: 20m)

> don't docker pull  before docker run
> 
>
> Key: BEAM-8545
> URL: https://issues.apache.org/jira/browse/BEAM-8545
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Since 'docker run' automatically pulls when the image doesn't exist locally, 
> I think it's safe to remove explicit 'docker pull' before 'docker run'. 
> Without 'docker pull', we won't update the local image with the remote image 
> (for the same tag) but it shouldn't be a problem in prod that the unique tag 
> is assumed for each released version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8545) don't docker pull before docker run

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8545?focusedWorklogId=337577&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337577
 ]

ASF GitHub Bot logged work on BEAM-8545:


Author: ASF GitHub Bot
Created on: 01/Nov/19 21:50
Start Date: 01/Nov/19 21:50
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9972: [BEAM-8545] don't docker 
pull before docker run
URL: https://github.com/apache/beam/pull/9972#issuecomment-548963123
 
 
   run xvr_flink postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337577)
Time Spent: 20m  (was: 10m)

> don't docker pull  before docker run
> 
>
> Key: BEAM-8545
> URL: https://issues.apache.org/jira/browse/BEAM-8545
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Since 'docker run' automatically pulls when the image doesn't exist locally, 
> I think it's safe to remove explicit 'docker pull' before 'docker run'. 
> Without 'docker pull', we won't update the local image with the remote image 
> (for the same tag) but it shouldn't be a problem in prod that the unique tag 
> is assumed for each released version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8545) don't docker pull before docker run

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8545?focusedWorklogId=337576&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337576
 ]

ASF GitHub Bot logged work on BEAM-8545:


Author: ASF GitHub Bot
Created on: 01/Nov/19 21:49
Start Date: 01/Nov/19 21:49
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #9972: [BEAM-8545] don't 
docker pull before docker run
URL: https://github.com/apache/beam/pull/9972
 
 
   Since 'docker run' automatically pulls when the image doesn't exist
   locally, I think it's safe to remove explicit 'docker pull' before
   'docker run'. Without 'docker pull', we won't update the local image
   with the remote image (for the same tag) but it shouldn't be a problem
   in prod that the unique tag is assumed for each released version.
   
   This will also fix BEAM-8534.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastComple

[jira] [Work logged] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8368?focusedWorklogId=337573&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337573
 ]

ASF GitHub Bot logged work on BEAM-8368:


Author: ASF GitHub Bot
Created on: 01/Nov/19 21:48
Start Date: 01/Nov/19 21:48
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9970: [BEAM-8368] 
[BEAM-8392] Update pyarrow to the latest version
URL: https://github.com/apache/beam/pull/9970#issuecomment-548962449
 
 
   Wrote a 
[commit](https://github.com/apache/beam/pull/9971/commits/c600ef5898930faf35631e6ebb32179d70f6e46b)
 for this and put up a [PR](https://github.com/apache/beam/pull/9971/commits) 
to merge into this branch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337573)
Time Spent: 2h 40m  (was: 2.5h)

> [Python] libprotobuf-generated exception when importing apache_beam
> ---
>
> Key: BEAM-8368
> URL: https://issues.apache.org/jira/browse/BEAM-8368
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.15.0, 2.17.0
>Reporter: Ubaier Bhat
>Assignee: Brian Hulette
>Priority: Blocker
> Fix For: 2.17.0
>
> Attachments: error_log.txt
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Unable to import apache_beam after upgrading to macos 10.15 (Catalina). 
> Cleared all the pipenvs and but can't get it working again.
> {code}
> import apache_beam as beam
> /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84:
>  UserWarning: Some syntactic constructs of Python 3 are not yet fully 
> supported by Apache Beam.
>   'Some syntactic constructs of Python 3 are not yet fully supported by '
> [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already 
> exists in database: 
> [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> libc++abi.dylib: terminating with uncaught exception of type 
> google::protobuf::FatalException: CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-7917) Python datastore v1new fails on retry

2019-11-01 Thread Udi Meiri (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri resolved BEAM-7917.
-
Resolution: Fixed

> Python datastore v1new fails on retry
> -
>
> Key: BEAM-7917
> URL: https://issues.apache.org/jira/browse/BEAM-7917
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, runner-dataflow
>Affects Versions: 2.14.0
> Environment: Python 3.7 on Dataflow
>Reporter: Dmytro Sadovnychyi
>Assignee: Dmytro Sadovnychyi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.18.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Traceback (most recent call last):
>   File "apache_beam/runners/common.py", line 782, in 
> apache_beam.runners.common.DoFnRunner.process
>   File "apache_beam/runners/common.py", line 454, in 
> apache_beam.runners.common.SimpleInvoker.invoke_process
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py",
>  line 334, in process
> self._flush_batch()
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py",
>  line 349, in _flush_batch
> throttle_delay=util.WRITE_BATCH_TARGET_LATENCY_MS // 1000)
>   File "/usr/local/lib/python3.7/site-packages/apache_beam/utils/retry.py", 
> line 197, in wrapper
> return fun(*args, **kwargs)
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/helper.py",
>  line 99, in write_mutations
> batch.commit()
>   File 
> "/usr/local/lib/python3.7/site-packages/google/cloud/datastore/batch.py", 
> line 271, in commit
> raise ValueError("Batch must be in progress to commit()")
> ValueError: Batch must be in progress to commit()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7917) Python datastore v1new fails on retry

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7917?focusedWorklogId=337570&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337570
 ]

ASF GitHub Bot logged work on BEAM-7917:


Author: ASF GitHub Bot
Created on: 01/Nov/19 21:45
Start Date: 01/Nov/19 21:45
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9294: [BEAM-7917] Fix 
datastore writes failing on retry
URL: https://github.com/apache/beam/pull/9294#discussion_r341762012
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/datastore/v1new/datastoreio.py
 ##
 @@ -340,12 +340,13 @@ def finish_bundle(self):
 def _init_batch(self):
   self._batch_bytes_size = 0
   self._batch = self._client.batch()
-  self._batch.begin()
+  self._batch_mutations = []
 
 def _flush_batch(self):
   # Flush the current batch of mutations to Cloud Datastore.
   latency_ms = helper.write_mutations(
 
 Review comment:
   I think your choice is a valid compromise with what to do with the results 
of `element_to_client_batch_item`. I see 2 options here: 
   1. Save `client_element` items.
   2. Discard `client_element` items after calling `ByteSize()` on them.
   
   The first option seems more CPU efficient while the second seems more memory 
efficient. I don't know which option is faster, but from my limited experience 
saving CPU at the expense of RAM seems like a good tradeoff.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337570)
Time Spent: 3h 40m  (was: 3.5h)

> Python datastore v1new fails on retry
> -
>
> Key: BEAM-7917
> URL: https://issues.apache.org/jira/browse/BEAM-7917
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, runner-dataflow
>Affects Versions: 2.14.0
> Environment: Python 3.7 on Dataflow
>Reporter: Dmytro Sadovnychyi
>Assignee: Dmytro Sadovnychyi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Traceback (most recent call last):
>   File "apache_beam/runners/common.py", line 782, in 
> apache_beam.runners.common.DoFnRunner.process
>   File "apache_beam/runners/common.py", line 454, in 
> apache_beam.runners.common.SimpleInvoker.invoke_process
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py",
>  line 334, in process
> self._flush_batch()
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py",
>  line 349, in _flush_batch
> throttle_delay=util.WRITE_BATCH_TARGET_LATENCY_MS // 1000)
>   File "/usr/local/lib/python3.7/site-packages/apache_beam/utils/retry.py", 
> line 197, in wrapper
> return fun(*args, **kwargs)
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/helper.py",
>  line 99, in write_mutations
> batch.commit()
>   File 
> "/usr/local/lib/python3.7/site-packages/google/cloud/datastore/batch.py", 
> line 271, in commit
> raise ValueError("Batch must be in progress to commit()")
> ValueError: Batch must be in progress to commit()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-7917) Python datastore v1new fails on retry

2019-11-01 Thread Udi Meiri (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri updated BEAM-7917:

Fix Version/s: 2.18.0

> Python datastore v1new fails on retry
> -
>
> Key: BEAM-7917
> URL: https://issues.apache.org/jira/browse/BEAM-7917
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, runner-dataflow
>Affects Versions: 2.14.0
> Environment: Python 3.7 on Dataflow
>Reporter: Dmytro Sadovnychyi
>Assignee: Dmytro Sadovnychyi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.18.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Traceback (most recent call last):
>   File "apache_beam/runners/common.py", line 782, in 
> apache_beam.runners.common.DoFnRunner.process
>   File "apache_beam/runners/common.py", line 454, in 
> apache_beam.runners.common.SimpleInvoker.invoke_process
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py",
>  line 334, in process
> self._flush_batch()
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py",
>  line 349, in _flush_batch
> throttle_delay=util.WRITE_BATCH_TARGET_LATENCY_MS // 1000)
>   File "/usr/local/lib/python3.7/site-packages/apache_beam/utils/retry.py", 
> line 197, in wrapper
> return fun(*args, **kwargs)
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/helper.py",
>  line 99, in write_mutations
> batch.commit()
>   File 
> "/usr/local/lib/python3.7/site-packages/google/cloud/datastore/batch.py", 
> line 271, in commit
> raise ValueError("Batch must be in progress to commit()")
> ValueError: Batch must be in progress to commit()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7917) Python datastore v1new fails on retry

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7917?focusedWorklogId=337572&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337572
 ]

ASF GitHub Bot logged work on BEAM-7917:


Author: ASF GitHub Bot
Created on: 01/Nov/19 21:46
Start Date: 01/Nov/19 21:46
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9294: [BEAM-7917] Fix 
datastore writes failing on retry
URL: https://github.com/apache/beam/pull/9294
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337572)
Time Spent: 3h 50m  (was: 3h 40m)

> Python datastore v1new fails on retry
> -
>
> Key: BEAM-7917
> URL: https://issues.apache.org/jira/browse/BEAM-7917
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, runner-dataflow
>Affects Versions: 2.14.0
> Environment: Python 3.7 on Dataflow
>Reporter: Dmytro Sadovnychyi
>Assignee: Dmytro Sadovnychyi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.18.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Traceback (most recent call last):
>   File "apache_beam/runners/common.py", line 782, in 
> apache_beam.runners.common.DoFnRunner.process
>   File "apache_beam/runners/common.py", line 454, in 
> apache_beam.runners.common.SimpleInvoker.invoke_process
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py",
>  line 334, in process
> self._flush_batch()
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py",
>  line 349, in _flush_batch
> throttle_delay=util.WRITE_BATCH_TARGET_LATENCY_MS // 1000)
>   File "/usr/local/lib/python3.7/site-packages/apache_beam/utils/retry.py", 
> line 197, in wrapper
> return fun(*args, **kwargs)
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/helper.py",
>  line 99, in write_mutations
> batch.commit()
>   File 
> "/usr/local/lib/python3.7/site-packages/google/cloud/datastore/batch.py", 
> line 271, in commit
> raise ValueError("Batch must be in progress to commit()")
> ValueError: Batch must be in progress to commit()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7917) Python datastore v1new fails on retry

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7917?focusedWorklogId=337569&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337569
 ]

ASF GitHub Bot logged work on BEAM-7917:


Author: ASF GitHub Bot
Created on: 01/Nov/19 21:43
Start Date: 01/Nov/19 21:43
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9294: [BEAM-7917] Fix 
datastore writes failing on retry
URL: https://github.com/apache/beam/pull/9294#discussion_r341762012
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/datastore/v1new/datastoreio.py
 ##
 @@ -340,12 +340,13 @@ def finish_bundle(self):
 def _init_batch(self):
   self._batch_bytes_size = 0
   self._batch = self._client.batch()
-  self._batch.begin()
+  self._batch_mutations = []
 
 def _flush_batch(self):
   # Flush the current batch of mutations to Cloud Datastore.
   latency_ms = helper.write_mutations(
 
 Review comment:
   I think your choice is a valid compromise with what to do with the results 
of `element_to_client_batch_item`. I see 2 options here: 
   1. Save the batch items.
   2. Discard batch items after calling `ByteSize()` on them.
   
   The first option seems more CPU efficient while the second seems more memory 
efficient. I don't know which option is faster, but from my limited experience 
saving CPU at the expense of RAM seems like a good tradeoff.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337569)
Time Spent: 3.5h  (was: 3h 20m)

> Python datastore v1new fails on retry
> -
>
> Key: BEAM-7917
> URL: https://issues.apache.org/jira/browse/BEAM-7917
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, runner-dataflow
>Affects Versions: 2.14.0
> Environment: Python 3.7 on Dataflow
>Reporter: Dmytro Sadovnychyi
>Assignee: Dmytro Sadovnychyi
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Traceback (most recent call last):
>   File "apache_beam/runners/common.py", line 782, in 
> apache_beam.runners.common.DoFnRunner.process
>   File "apache_beam/runners/common.py", line 454, in 
> apache_beam.runners.common.SimpleInvoker.invoke_process
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py",
>  line 334, in process
> self._flush_batch()
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/datastoreio.py",
>  line 349, in _flush_batch
> throttle_delay=util.WRITE_BATCH_TARGET_LATENCY_MS // 1000)
>   File "/usr/local/lib/python3.7/site-packages/apache_beam/utils/retry.py", 
> line 197, in wrapper
> return fun(*args, **kwargs)
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/datastore/v1new/helper.py",
>  line 99, in write_mutations
> batch.commit()
>   File 
> "/usr/local/lib/python3.7/site-packages/google/cloud/datastore/batch.py", 
> line 271, in commit
> raise ValueError("Batch must be in progress to commit()")
> ValueError: Batch must be in progress to commit()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8523) Add useful timestamp to job servicer GetJobs

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8523?focusedWorklogId=337568&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337568
 ]

ASF GitHub Bot logged work on BEAM-8523:


Author: ASF GitHub Bot
Created on: 01/Nov/19 21:42
Start Date: 01/Nov/19 21:42
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9959: WIP: 
[BEAM-8523] JobAPI: Give access to timestamped state change history
URL: https://github.com/apache/beam/pull/9959#discussion_r341761566
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/JobStateEvent.java
 ##
 @@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.jobsubmission;
+
+import com.google.auto.value.AutoValue;
+import java.time.Instant;
+import org.apache.beam.model.jobmanagement.v1.JobApi;
+import org.apache.beam.vendor.grpc.v1p21p0.com.google.protobuf.Timestamp;
+
+/** A state transition event. */
+@AutoValue
+public abstract class JobStateEvent {
 
 Review comment:
   > use that proto instead of creating this class.
   
   I've noticed that there is a pattern in the Java and Python SDKs to avoid 
using protobuf objects directly as part of the Beam SDK, and instead favor 
"native" types with methods for converting to/from protobuf representations.  I 
was trying to maintain that separation by providing this `JobStateEvent` class, 
but I'm happy to use a protobuf object directly.  Let me know what you think.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337568)
Time Spent: 1.5h  (was: 1h 20m)

> Add useful timestamp to job servicer GetJobs
> 
>
> Key: BEAM-8523
> URL: https://issues.apache.org/jira/browse/BEAM-8523
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> As a user querying jobs with JobService.GetJobs, it would be useful if the 
> JobInfo result contained timestamps indicating various state changes that may 
> have been missed by a client.   Useful timestamps include:
>  
>  * submitted (prepared to the job service)
>  * started (executor enters the RUNNING state)
>  * completed (executor enters a terminal state)
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8368?focusedWorklogId=337567&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337567
 ]

ASF GitHub Bot logged work on BEAM-8368:


Author: ASF GitHub Bot
Created on: 01/Nov/19 21:37
Start Date: 01/Nov/19 21:37
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9970: [BEAM-8368] 
[BEAM-8392] Update pyarrow to the latest version
URL: https://github.com/apache/beam/pull/9970#issuecomment-548959623
 
 
   Well it's a pretty simple fix, just need to change [this 
line](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/parquetio_test.py#L451)
 to  `pa.Table.from_arrays([orig.column('name')], names=['name'])`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337567)
Time Spent: 2.5h  (was: 2h 20m)

> [Python] libprotobuf-generated exception when importing apache_beam
> ---
>
> Key: BEAM-8368
> URL: https://issues.apache.org/jira/browse/BEAM-8368
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.15.0, 2.17.0
>Reporter: Ubaier Bhat
>Assignee: Brian Hulette
>Priority: Blocker
> Fix For: 2.17.0
>
> Attachments: error_log.txt
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Unable to import apache_beam after upgrading to macos 10.15 (Catalina). 
> Cleared all the pipenvs and but can't get it working again.
> {code}
> import apache_beam as beam
> /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84:
>  UserWarning: Some syntactic constructs of Python 3 are not yet fully 
> supported by Apache Beam.
>   'Some syntactic constructs of Python 3 are not yet fully supported by '
> [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already 
> exists in database: 
> [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> libc++abi.dylib: terminating with uncaught exception of type 
> google::protobuf::FatalException: CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8545) don't docker pull before docker run

2019-11-01 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-8545:
-

 Summary: don't docker pull  before docker run
 Key: BEAM-8545
 URL: https://issues.apache.org/jira/browse/BEAM-8545
 Project: Beam
  Issue Type: Bug
  Components: java-fn-execution
Reporter: Heejong Lee
Assignee: Heejong Lee


Since 'docker run' automatically pulls when the image doesn't exist locally, I 
think it's safe to remove explicit 'docker pull' before 'docker run'. Without 
'docker pull', we won't update the local image with the remote image (for the 
same tag) but it shouldn't be a problem in prod that the unique tag is assumed 
for each released version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8243) Document behavior of FlinkRunner

2019-11-01 Thread Thomas Weise (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965112#comment-16965112
 ] 

Thomas Weise commented on BEAM-8243:


There are IMO enough other reasons to not keep instructions around for old 
versions (for portability) :)

> Document behavior of FlinkRunner
> 
>
> Key: BEAM-8243
> URL: https://issues.apache.org/jira/browse/BEAM-8243
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink, website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>
> The Flink runner guide should include a couple details
> 1) FlinkRunner pulls the job server jar from Maven by default (need to make 
> this explicit in case of firewall concerns)
> 2) how to override in case the above is a problem



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8545) don't docker pull before docker run

2019-11-01 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-8545:
--
Status: Open  (was: Triage Needed)

> don't docker pull  before docker run
> 
>
> Key: BEAM-8545
> URL: https://issues.apache.org/jira/browse/BEAM-8545
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>
> Since 'docker run' automatically pulls when the image doesn't exist locally, 
> I think it's safe to remove explicit 'docker pull' before 'docker run'. 
> Without 'docker pull', we won't update the local image with the remote image 
> (for the same tag) but it shouldn't be a problem in prod that the unique tag 
> is assumed for each released version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8243) Document behavior of FlinkRunner

2019-11-01 Thread Robert Bradshaw (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965107#comment-16965107
 ] 

Robert Bradshaw commented on BEAM-8243:
---

Yes, this can be greatly simplified. I'm not sure if/how we should keep the old 
ways around for old versions though. Kyle, are you on this? 

> Document behavior of FlinkRunner
> 
>
> Key: BEAM-8243
> URL: https://issues.apache.org/jira/browse/BEAM-8243
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink, website
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>
> The Flink runner guide should include a couple details
> 1) FlinkRunner pulls the job server jar from Maven by default (need to make 
> this explicit in case of firewall concerns)
> 2) how to override in case the above is a problem



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8368?focusedWorklogId=337564&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337564
 ]

ASF GitHub Bot logged work on BEAM-8368:


Author: ASF GitHub Bot
Created on: 01/Nov/19 21:28
Start Date: 01/Nov/19 21:28
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9970: [BEAM-8368] 
[BEAM-8392] Update pyarrow to the latest version
URL: https://github.com/apache/beam/pull/9970#issuecomment-548956963
 
 
   Looks like maybe there was an API change in `pyarrow.parquet`. I'll take a 
look
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337564)
Time Spent: 2h 20m  (was: 2h 10m)

> [Python] libprotobuf-generated exception when importing apache_beam
> ---
>
> Key: BEAM-8368
> URL: https://issues.apache.org/jira/browse/BEAM-8368
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.15.0, 2.17.0
>Reporter: Ubaier Bhat
>Assignee: Brian Hulette
>Priority: Blocker
> Fix For: 2.17.0
>
> Attachments: error_log.txt
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Unable to import apache_beam after upgrading to macos 10.15 (Catalina). 
> Cleared all the pipenvs and but can't get it working again.
> {code}
> import apache_beam as beam
> /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84:
>  UserWarning: Some syntactic constructs of Python 3 are not yet fully 
> supported by Apache Beam.
>   'Some syntactic constructs of Python 3 are not yet fully supported by '
> [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already 
> exists in database: 
> [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> libc++abi.dylib: terminating with uncaught exception of type 
> google::protobuf::FatalException: CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam

2019-11-01 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965102#comment-16965102
 ] 

Brian Hulette commented on BEAM-8368:
-

[~ubaierbhat] or [~kamilwu] any chance you could verify that pyarrow 0.15.1 
works with beam on MacOS 10.15?

> [Python] libprotobuf-generated exception when importing apache_beam
> ---
>
> Key: BEAM-8368
> URL: https://issues.apache.org/jira/browse/BEAM-8368
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.15.0, 2.17.0
>Reporter: Ubaier Bhat
>Assignee: Brian Hulette
>Priority: Blocker
> Fix For: 2.17.0
>
> Attachments: error_log.txt
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Unable to import apache_beam after upgrading to macos 10.15 (Catalina). 
> Cleared all the pipenvs and but can't get it working again.
> {code}
> import apache_beam as beam
> /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84:
>  UserWarning: Some syntactic constructs of Python 3 are not yet fully 
> supported by Apache Beam.
>   'Some syntactic constructs of Python 3 are not yet fully supported by '
> [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already 
> exists in database: 
> [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> libc++abi.dylib: terminating with uncaught exception of type 
> google::protobuf::FatalException: CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=337557&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337557
 ]

ASF GitHub Bot logged work on BEAM-8539:


Author: ASF GitHub Bot
Created on: 01/Nov/19 21:20
Start Date: 01/Nov/19 21:20
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9969: [BEAM-8539] 
Provide an initial definition of all job states and the state transition diagram
URL: https://github.com/apache/beam/pull/9969#discussion_r341755718
 
 

 ##
 File path: model/job-management/src/main/proto/beam_job_api.proto
 ##
 @@ -201,6 +201,16 @@ message JobMessagesResponse {
 }
 
 // Enumeration of all JobStates
+//
+// The state transition diagram is:
+//   STOPPED -> STARTING -> RUNNING -> DONE
+//  \> FAILED
+//  \> CANCELLING -> CANCELLED
+//  \> UPDATING -> UPDATED
+//  \> DRAINING -> DRAINED
 
 Review comment:
   > Not sure of any RUNNING -> STOPPED scenarios today.
   
   This is either a pause-like event or an  erroneous use of STOPPED:
   
   
https://github.com/apache/beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineResult.java#L133
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337557)
Time Spent: 1h 20m  (was: 1h 10m)

> Clearly define the valid job state transitions
> --
>
> Key: BEAM-8539
> URL: https://issues.apache.org/jira/browse/BEAM-8539
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, runner-core, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The Beam job state transitions are ill-defined, which is big problem for 
> anything that relies on the values coming from JobAPI.GetStateStream.
> I was hoping to find something like a state transition diagram in the docs so 
> that I could determine the start state, the terminal states, and the valid 
> transitions, but I could not find this. The code reveals that the SDKs differ 
> on the fundamentals:
> Java InMemoryJobService:
>  * start state: *STOPPED*
>  * run - about to submit to executor:  STARTING
>  * run - actually running on executor:  RUNNING
>  * terminal states: DONE, FAILED, CANCELLED, DRAINED
> Python AbstractJobServiceServicer / LocalJobServicer:
>  * start state: STARTING
>  * terminal states: DONE, FAILED, CANCELLED, *STOPPED*
> I think it would be good to make python work like Java, so that there is a 
> difference in state between a job that has been prepared and one that has 
> additionally been run.
> It's hard to tell how far this problem has spread within the various runners. 
>  I think a simple thing that can be done to help standardize behavior is to 
> implement the terminal states as an enum in the beam_job_api.proto, or create 
> a utility function in each language for checking if a state is terminal, so 
> that it's not left up to each runner to reimplement this logic.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=337558&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337558
 ]

ASF GitHub Bot logged work on BEAM-8539:


Author: ASF GitHub Bot
Created on: 01/Nov/19 21:20
Start Date: 01/Nov/19 21:20
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9969: [BEAM-8539] 
Provide an initial definition of all job states and the state transition diagram
URL: https://github.com/apache/beam/pull/9969#discussion_r341755718
 
 

 ##
 File path: model/job-management/src/main/proto/beam_job_api.proto
 ##
 @@ -201,6 +201,16 @@ message JobMessagesResponse {
 }
 
 // Enumeration of all JobStates
+//
+// The state transition diagram is:
+//   STOPPED -> STARTING -> RUNNING -> DONE
+//  \> FAILED
+//  \> CANCELLING -> CANCELLED
+//  \> UPDATING -> UPDATED
+//  \> DRAINING -> DRAINED
 
 Review comment:
   > Not sure of any RUNNING -> STOPPED scenarios today.
   
   This is either a pause-like event or an  erroneous (terminal) use of STOPPED:
   
   
https://github.com/apache/beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineResult.java#L133
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337558)
Time Spent: 1.5h  (was: 1h 20m)

> Clearly define the valid job state transitions
> --
>
> Key: BEAM-8539
> URL: https://issues.apache.org/jira/browse/BEAM-8539
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, runner-core, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The Beam job state transitions are ill-defined, which is big problem for 
> anything that relies on the values coming from JobAPI.GetStateStream.
> I was hoping to find something like a state transition diagram in the docs so 
> that I could determine the start state, the terminal states, and the valid 
> transitions, but I could not find this. The code reveals that the SDKs differ 
> on the fundamentals:
> Java InMemoryJobService:
>  * start state: *STOPPED*
>  * run - about to submit to executor:  STARTING
>  * run - actually running on executor:  RUNNING
>  * terminal states: DONE, FAILED, CANCELLED, DRAINED
> Python AbstractJobServiceServicer / LocalJobServicer:
>  * start state: STARTING
>  * terminal states: DONE, FAILED, CANCELLED, *STOPPED*
> I think it would be good to make python work like Java, so that there is a 
> difference in state between a job that has been prepared and one that has 
> additionally been run.
> It's hard to tell how far this problem has spread within the various runners. 
>  I think a simple thing that can be done to help standardize behavior is to 
> implement the terminal states as an enum in the beam_job_api.proto, or create 
> a utility function in each language for checking if a state is terminal, so 
> that it's not left up to each runner to reimplement this logic.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8252) (Python SDK) Add worker_region and worker_zone options

2019-11-01 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver resolved BEAM-8252.
---
Fix Version/s: 2.18.0
   Resolution: Fixed

> (Python SDK) Add worker_region and worker_zone options
> --
>
> Key: BEAM-8252
> URL: https://issues.apache.org/jira/browse/BEAM-8252
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow, sdk-py-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8534) XlangParquetIOTest failing

2019-11-01 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8534:
--
Status: Open  (was: Triage Needed)

> XlangParquetIOTest failing
> --
>
> Key: BEAM-8534
> URL: https://issues.apache.org/jira/browse/BEAM-8534
> Project: Beam
>  Issue Type: Sub-task
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Heejong Lee
>Priority: Major
>
>  *13:43:05* [grpc-default-executor-1] ERROR 
> org.apache.beam.fn.harness.control.BeamFnControlClient - Exception while 
> trying to handle InstructionRequest 10 java.lang.IllegalArgumentException: 
> unable to deserialize Custom DoFn With Execution Info*13:43:05*   at 
> org.apache.beam.sdk.util.SerializableUtils.deserializeFromByteArray(SerializableUtils.java:74)*13:43:05*
>  at 
> org.apache.beam.runners.core.construction.ParDoTranslation.doFnWithExecutionInformationFromProto(ParDoTranslation.java:609)*13:43:05*
> at 
> org.apache.beam.runners.core.construction.ParDoTranslation.getDoFn(ParDoTranslation.java:285)*13:43:05*
>   at 
> org.apache.beam.fn.harness.DoFnPTransformRunnerFactory$Context.(DoFnPTransformRunnerFactory.java:197)*13:43:05*
> at 
> org.apache.beam.fn.harness.DoFnPTransformRunnerFactory.createRunnerForPTransform(DoFnPTransformRunnerFactory.java:96)*13:43:05*
>   at 
> org.apache.beam.fn.harness.DoFnPTransformRunnerFactory.createRunnerForPTransform(DoFnPTransformRunnerFactory.java:64)*13:43:05*
>   at 
> org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:194)*13:43:05*
> at 
> org.apache.beam.fn.harness.control.ProcessBundleHandler.createRunnerAndConsumersForPTransformRecursively(ProcessBundleHandler.java:163)*13:43:05*
> at 
> org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:290)*13:43:05*
>at 
> org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:160)*13:43:05*
>   at 
> org.apache.beam.fn.harness.control.BeamFnControlClient.lambda$processInstructionRequests$0(BeamFnControlClient.java:144)*13:43:05*
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)*13:43:05*
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)*13:43:05*
> at java.lang.Thread.run(Thread.java:748)*13:43:05* Caused by: 
> java.io.InvalidClassException: 
> org.apache.beam.sdk.options.ValueProvider$StaticValueProvider; local class 
> incompatible: stream classdesc serialVersionUID = -7089438576249123133, local 
> class serialVersionUID = -7141898054594373712*13:43:05*   at 
> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)*13:43:05*  
>at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)*13:43:05*
> at 
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)*13:43:05*
>at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)*13:43:05*
>   at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05*  
>at 
> java.io.ObjectInputStream.readArray(ObjectInputStream.java:1975)*13:43:05*   
> at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)*13:43:05*  
>at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05*
>at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05*
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05*
>   at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05*  
>at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05*
>at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05*
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05*
>   at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05*  
>at 
> java.io.ObjectInputStream.readArray(ObjectInputStream.java:1975)*13:43:05*   
> at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)*13:43:05*  
>at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05*
>at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)*13:43:05*
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)*13:43:05*
>   at 
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)*13:43:05*  
>at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)*13:43:05*
>at 
> java.io.ObjectInputStream.readSe

[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=337544&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337544
 ]

ASF GitHub Bot logged work on BEAM-8539:


Author: ASF GitHub Bot
Created on: 01/Nov/19 20:28
Start Date: 01/Nov/19 20:28
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #9969: [BEAM-8539] 
Provide an initial definition of all job states and the state transition diagram
URL: https://github.com/apache/beam/pull/9969#discussion_r341739058
 
 

 ##
 File path: model/job-management/src/main/proto/beam_job_api.proto
 ##
 @@ -201,6 +201,16 @@ message JobMessagesResponse {
 }
 
 // Enumeration of all JobStates
+//
+// The state transition diagram is:
+//   STOPPED -> STARTING -> RUNNING -> DONE
+//  \> FAILED
+//  \> CANCELLING -> CANCELLED
+//  \> UPDATING -> UPDATED
+//  \> DRAINING -> DRAINED
 
 Review comment:
   Yeah, this is the problem with UPDATED since it is dependent on the 
implementation within the Runner. Is an UPDATED job the same job as the prior 
one or a new job which continues from the prior state? In Dataflow it is the 
latter but the former can make sense for other runners.
   
   Not sure of any RUNNING -> STOPPED scenarios today.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337544)
Time Spent: 1h 10m  (was: 1h)

> Clearly define the valid job state transitions
> --
>
> Key: BEAM-8539
> URL: https://issues.apache.org/jira/browse/BEAM-8539
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, runner-core, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The Beam job state transitions are ill-defined, which is big problem for 
> anything that relies on the values coming from JobAPI.GetStateStream.
> I was hoping to find something like a state transition diagram in the docs so 
> that I could determine the start state, the terminal states, and the valid 
> transitions, but I could not find this. The code reveals that the SDKs differ 
> on the fundamentals:
> Java InMemoryJobService:
>  * start state: *STOPPED*
>  * run - about to submit to executor:  STARTING
>  * run - actually running on executor:  RUNNING
>  * terminal states: DONE, FAILED, CANCELLED, DRAINED
> Python AbstractJobServiceServicer / LocalJobServicer:
>  * start state: STARTING
>  * terminal states: DONE, FAILED, CANCELLED, *STOPPED*
> I think it would be good to make python work like Java, so that there is a 
> difference in state between a job that has been prepared and one that has 
> additionally been run.
> It's hard to tell how far this problem has spread within the various runners. 
>  I think a simple thing that can be done to help standardize behavior is to 
> implement the terminal states as an enum in the beam_job_api.proto, or create 
> a utility function in each language for checking if a state is terminal, so 
> that it's not left up to each runner to reimplement this logic.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8539) Clearly define the valid job state transitions

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8539?focusedWorklogId=337543&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337543
 ]

ASF GitHub Bot logged work on BEAM-8539:


Author: ASF GitHub Bot
Created on: 01/Nov/19 20:28
Start Date: 01/Nov/19 20:28
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #9969: [BEAM-8539] 
Provide an initial definition of all job states and the state transition diagram
URL: https://github.com/apache/beam/pull/9969#discussion_r341739058
 
 

 ##
 File path: model/job-management/src/main/proto/beam_job_api.proto
 ##
 @@ -201,6 +201,16 @@ message JobMessagesResponse {
 }
 
 // Enumeration of all JobStates
+//
+// The state transition diagram is:
+//   STOPPED -> STARTING -> RUNNING -> DONE
+//  \> FAILED
+//  \> CANCELLING -> CANCELLED
+//  \> UPDATING -> UPDATED
+//  \> DRAINING -> DRAINED
 
 Review comment:
   Yeah, this is the problem with UPDATED since it is dependent on the 
implementation within the Runner. Is an UPDATED job the same job as the prior 
one or a new job which continues from the prior state? In Dataflow it is the 
latter but the former can make sense for other runners.
   
   Not sure of any RUNNING -> STOPPED scenarios today but adding 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337543)
Time Spent: 1h  (was: 50m)

> Clearly define the valid job state transitions
> --
>
> Key: BEAM-8539
> URL: https://issues.apache.org/jira/browse/BEAM-8539
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, runner-core, sdk-java-core, sdk-py-core
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The Beam job state transitions are ill-defined, which is big problem for 
> anything that relies on the values coming from JobAPI.GetStateStream.
> I was hoping to find something like a state transition diagram in the docs so 
> that I could determine the start state, the terminal states, and the valid 
> transitions, but I could not find this. The code reveals that the SDKs differ 
> on the fundamentals:
> Java InMemoryJobService:
>  * start state: *STOPPED*
>  * run - about to submit to executor:  STARTING
>  * run - actually running on executor:  RUNNING
>  * terminal states: DONE, FAILED, CANCELLED, DRAINED
> Python AbstractJobServiceServicer / LocalJobServicer:
>  * start state: STARTING
>  * terminal states: DONE, FAILED, CANCELLED, *STOPPED*
> I think it would be good to make python work like Java, so that there is a 
> difference in state between a job that has been prepared and one that has 
> additionally been run.
> It's hard to tell how far this problem has spread within the various runners. 
>  I think a simple thing that can be done to help standardize behavior is to 
> implement the terminal states as an enum in the beam_job_api.proto, or create 
> a utility function in each language for checking if a state is terminal, so 
> that it's not left up to each runner to reimplement this logic.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8521) beam_PostCommit_XVR_Flink failing

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8521?focusedWorklogId=337542&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337542
 ]

ASF GitHub Bot logged work on BEAM-8521:


Author: ASF GitHub Bot
Created on: 01/Nov/19 20:24
Start Date: 01/Nov/19 20:24
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9939: [BEAM-8521] only 
build Python 2.7 for xlang test
URL: https://github.com/apache/beam/pull/9939
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337542)
Time Spent: 50m  (was: 40m)

> beam_PostCommit_XVR_Flink failing
> -
>
> Key: BEAM-8521
> URL: https://issues.apache.org/jira/browse/BEAM-8521
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> https://builds.apache.org/job/beam_PostCommit_XVR_Flink/
> Edit: Made subtasks for what appear to be two separate issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8252) (Python SDK) Add worker_region and worker_zone options

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8252?focusedWorklogId=337538&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337538
 ]

ASF GitHub Bot logged work on BEAM-8252:


Author: ASF GitHub Bot
Created on: 01/Nov/19 20:21
Start Date: 01/Nov/19 20:21
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9594: [BEAM-8252] Python: add 
worker_region and worker_zone options
URL: https://github.com/apache/beam/pull/9594#issuecomment-548937247
 
 
   > Should this be patched into the release branch?
   
   Maybe, but it'd be preferable to introduce the Java and Python options in 
the same release: #9961.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337538)
Time Spent: 1h  (was: 50m)

> (Python SDK) Add worker_region and worker_zone options
> --
>
> Key: BEAM-8252
> URL: https://issues.apache.org/jira/browse/BEAM-8252
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow, sdk-py-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8544) Install Beam SDK with ccache for faster re-install.

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8544?focusedWorklogId=337536&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337536
 ]

ASF GitHub Bot logged work on BEAM-8544:


Author: ASF GitHub Bot
Created on: 01/Nov/19 20:17
Start Date: 01/Nov/19 20:17
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9966: [BEAM-8544] Use 
ccache for compiling the Beam Python SDK.
URL: https://github.com/apache/beam/pull/9966#discussion_r341735449
 
 

 ##
 File path: sdks/python/container/Dockerfile
 ##
 @@ -43,9 +45,13 @@ RUN \
 # Remove pip cache.
 rm -rf /root/.cache/pip
 
+# Configure ccache prior to installing the SDK.
+RUN ln -s /usr/bin/ccache /usr/local/bin/gcc
+# These parameters are needed as pip compiles artifacts in random temporary 
directories.
+RUN ccache --set-config=sloppiness=file_macro && ccache 
--set-config=hash_dir=false
 
 COPY target/apache-beam.tar.gz /opt/apache/beam/tars/
-RUN pip install /opt/apache/beam/tars/apache-beam.tar.gz[gcp] && \
+RUN pip install -v /opt/apache/beam/tars/apache-beam.tar.gz[gcp] && \
 # Remove pip cache.
 rm -rf /root/.cache/pip
 
 
 Review comment:
   Could we add a `pip freeze --all`? From the logs that will give us a 
complete list of what exact versions are installed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337536)
Time Spent: 40m  (was: 0.5h)

> Install Beam SDK with ccache for faster re-install.
> ---
>
> Key: BEAM-8544
> URL: https://issues.apache.org/jira/browse/BEAM-8544
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Re-compliling the C modules of the SDK takes 2-3 minutes. This adds to worker 
> startup time whenever a custom SDK is being used (in particular, during 
> development and testing). We can use ccache to re-use the old compile results 
> when the Cython files have not changed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8544) Install Beam SDK with ccache for faster re-install.

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8544?focusedWorklogId=337534&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337534
 ]

ASF GitHub Bot logged work on BEAM-8544:


Author: ASF GitHub Bot
Created on: 01/Nov/19 20:15
Start Date: 01/Nov/19 20:15
Worklog Time Spent: 10m 
  Work Description: ananvay commented on issue #9966: [BEAM-8544] Use 
ccache for compiling the Beam Python SDK.
URL: https://github.com/apache/beam/pull/9966#issuecomment-548935180
 
 
   Thanks Robert, this is great! LGTM.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337534)
Time Spent: 0.5h  (was: 20m)

> Install Beam SDK with ccache for faster re-install.
> ---
>
> Key: BEAM-8544
> URL: https://issues.apache.org/jira/browse/BEAM-8544
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Re-compliling the C modules of the SDK takes 2-3 minutes. This adds to worker 
> startup time whenever a custom SDK is being used (in particular, during 
> development and testing). We can use ccache to re-use the old compile results 
> when the Cython files have not changed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8347) UnboundedRabbitMqReader can fail to advance watermark if no new data comes in

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8347?focusedWorklogId=337532&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337532
 ]

ASF GitHub Bot logged work on BEAM-8347:


Author: ASF GitHub Bot
Created on: 01/Nov/19 20:10
Start Date: 01/Nov/19 20:10
Worklog Time Spent: 10m 
  Work Description: drobert commented on issue #9820: [BEAM-8347]: 
Consistently advance UnboundedRabbitMqReader watermark
URL: https://github.com/apache/beam/pull/9820#issuecomment-548933689
 
 
   Quick note: I may be seeing a problem with this implementation locally. This 
isn't mergeable due to a need to rebase anyway, but there may be an issue 
updating the watermark when no messages come in (DirectRunner reports 
"Processing stuck in step (read from rabbit) for at least 05m00s without 
outputting or completing in state process"). Looking.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337532)
Time Spent: 2h 40m  (was: 2.5h)

> UnboundedRabbitMqReader can fail to advance watermark if no new data comes in
> -
>
> Key: BEAM-8347
> URL: https://issues.apache.org/jira/browse/BEAM-8347
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-rabbitmq
>Affects Versions: 2.15.0
> Environment: testing has been done using the DirectRunner. I also 
> have DataflowRunner available
>Reporter: Daniel Robert
>Assignee: Jean-Baptiste Onofré
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> I stumbled upon this and then saw a similar StackOverflow post: 
> [https://stackoverflow.com/questions/55736593/apache-beam-rabbitmqio-watermark-doesnt-advance]
> When calling `advance()` if there are no messages, no state changes, 
> including no changes to the CheckpointMark or Watermark.  If there is a 
> relatively constant rate of new messages coming in, this is not a problem. If 
> data is bursty, and there are periods of no new messages coming in, the 
> watermark will never advance.
> Contrast this with some of the logic in PubsubIO which will make provisions 
> for periods of inactivity to advance the watermark (although it, too, is 
> imperfect: https://issues.apache.org/jira/browse/BEAM-7322 )
> The example given in the StackOverflow post is something like this:
>  
> {code:java}
> pipeline
>   .apply(RabbitMqIO.read()
>   .withUri("amqp://guest:guest@localhost:5672")
>   .withQueue("test")
>   .apply("Windowing", 
> Window.into(
>   FixedWindows.of(Duration.standardSeconds(10)))
> .triggering(AfterWatermark.pastEndOfWindow())
> .withAllowedLateness(Duration.ZERO)
> .accumulatingFiredPanes()){code}
> If I push 2 messages into my rabbit queue, I see 2 unack'd messages and a 
> window that never performs an on time trigger.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-6857) Support dynamic timers

2019-11-01 Thread Chad Dombrova (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chad Dombrova updated BEAM-6857:

Description: 
The Beam timers API currently requires each timer to be statically specified in 
the DoFn. The user must provide a separate callback method per timer. For 
example:

 
{code:java}
DoFn()
{   
  @TimerId("timer1") 
  private final TimerSpec timer1 = TimerSpecs.timer(...);   
  @TimerId("timer2") 
  private final TimerSpec timer2 = TimerSpecs.timer(...);                 
  .. set timers in processElement    
  @OnTimer("timer1") 
  public void onTimer1() { .}
  @OnTimer("timer2") 
  public void onTimer2() {}
}
{code}
 

However there are many cases where the user does not know the set of timers 
statically when writing their code. This happens when the timer tag should be 
based on the data. It also happens when writing a DSL on top of Beam, where the 
DSL author has to create DoFns but does not know statically which timers their 
users will want to set (e.g. Scio).

 

The goal is to support dynamic timers. Something as follows;

 
{code:java}
DoFn() 
{
  @TimerId("timer") 
  private final TimerSpec timer1 = TimerSpecs.dynamicTimer(...);
  @ProcessElement process(@TimerId("timer") DynamicTimer timer)
  {
       timer.set("tag1'", ts);       
   timer.set("tag2", ts);     
  }
  @OnTimer("timer") 
  public void onTimer1(@TimerTag String tag) { .}
}
{code}
 

  was:
The Beam timers API currently requires each timer to be statically specified in 
the DoFn. The user must provide a separate callback method per timer. For 
example:

 
{code:java}
DoFn()
{   @TimerId("timer1") private final TimerSpec timer1 = TimerSpecs.timer(...);  
 @TimerId("timer2") private final TimerSpec timer2 = TimerSpecs.timer(...);     
            .. set timers in processElement    @OnTimer("timer1") public 
void onTimer1() \{ .}
   @OnTimer("timer2") public void onTimer2() {}
}
{code}
 

However there are many cases where the user does not know the set of timers 
statically when writing their code. This happens when the timer tag should be 
based on the data. It also happens when writing a DSL on top of Beam, where the 
DSL author has to create DoFns but does not know statically which timers their 
users will want to set (e.g. Scio).

 

The goal is to support dynamic timers. Something as follows;

 
{code:java}
DoFn() {
  @TimerId("timer") private final TimerSpec timer1 = 
TimerSpecs.dynamicTimer(...);
   @ProcessElement process(@TimerId("timer") DynamicTimer timer)
{       timer.set("tag1'", ts);       timer.set("tag2", ts);     }
   @OnTimer("timer") public void onTimer1(@TimerTag String tag) { .}
}
{code}
 


> Support dynamic timers
> --
>
> Key: BEAM-6857
> URL: https://issues.apache.org/jira/browse/BEAM-6857
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> The Beam timers API currently requires each timer to be statically specified 
> in the DoFn. The user must provide a separate callback method per timer. For 
> example:
>  
> {code:java}
> DoFn()
> {   
>   @TimerId("timer1") 
>   private final TimerSpec timer1 = TimerSpecs.timer(...);   
>   @TimerId("timer2") 
>   private final TimerSpec timer2 = TimerSpecs.timer(...);                 
>   .. set timers in processElement    
>   @OnTimer("timer1") 
>   public void onTimer1() { .}
>   @OnTimer("timer2") 
>   public void onTimer2() {}
> }
> {code}
>  
> However there are many cases where the user does not know the set of timers 
> statically when writing their code. This happens when the timer tag should be 
> based on the data. It also happens when writing a DSL on top of Beam, where 
> the DSL author has to create DoFns but does not know statically which timers 
> their users will want to set (e.g. Scio).
>  
> The goal is to support dynamic timers. Something as follows;
>  
> {code:java}
> DoFn() 
> {
>   @TimerId("timer") 
>   private final TimerSpec timer1 = TimerSpecs.dynamicTimer(...);
>   @ProcessElement process(@TimerId("timer") DynamicTimer timer)
>   {
>        timer.set("tag1'", ts);       
>timer.set("tag2", ts);     
>   }
>   @OnTimer("timer") 
>   public void onTimer1(@TimerTag String tag) { .}
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-6857) Support dynamic timers

2019-11-01 Thread Chad Dombrova (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chad Dombrova updated BEAM-6857:

Description: 
The Beam timers API currently requires each timer to be statically specified in 
the DoFn. The user must provide a separate callback method per timer. For 
example:

 
{code:java}
DoFn()
{   @TimerId("timer1") private final TimerSpec timer1 = TimerSpecs.timer(...);  
 @TimerId("timer2") private final TimerSpec timer2 = TimerSpecs.timer(...);     
            .. set timers in processElement    @OnTimer("timer1") public 
void onTimer1() \{ .}
   @OnTimer("timer2") public void onTimer2() {}
}
{code}
 

However there are many cases where the user does not know the set of timers 
statically when writing their code. This happens when the timer tag should be 
based on the data. It also happens when writing a DSL on top of Beam, where the 
DSL author has to create DoFns but does not know statically which timers their 
users will want to set (e.g. Scio).

 

The goal is to support dynamic timers. Something as follows;

 
{code:java}
DoFn() {
  @TimerId("timer") private final TimerSpec timer1 = 
TimerSpecs.dynamicTimer(...);
   @ProcessElement process(@TimerId("timer") DynamicTimer timer)
{       timer.set("tag1'", ts);       timer.set("tag2", ts);     }
   @OnTimer("timer") public void onTimer1(@TimerTag String tag) { .}
}
{code}
 

  was:
The Beam timers API currently requires each timer to be statically specified in 
the DoFn. The user must provide a separate callback method per timer. For 
example:

DoFn() {

  @TimerId("timer1") private final TimerSpec timer1 = TimerSpecs.timer(...);

  @TimerId("timer2") private final TimerSpec timer2 = TimerSpecs.timer(...);

                .. set timers in processElement

   @OnTimer("timer1") public void onTimer1() \{ .}

   @OnTimer("timer2") public void onTimer2() \{}

}

However there are many cases where the user does not know the set of timers 
statically when writing their code. This happens when the timer tag should be 
based on the data. It also happens when writing a DSL on top of Beam, where the 
DSL author has to create DoFns but does not know statically which timers their 
users will want to set (e.g. Scio).

 

The goal is to support dynamic timers. Something as follows;

DoFn() {

  @TimerId("timer") private final TimerSpec timer1 = 
TimerSpecs.dynamicTimer(...);

   @ProcessElement process(@TimerId("timer") DynamicTimer timer) {

      timer.set("tag1'", ts);

      timer.set("tag2", ts);

    }

   @OnTimer("timer") public void onTimer1(@TimerTag String tag) \{ .}

}


> Support dynamic timers
> --
>
> Key: BEAM-6857
> URL: https://issues.apache.org/jira/browse/BEAM-6857
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>
> The Beam timers API currently requires each timer to be statically specified 
> in the DoFn. The user must provide a separate callback method per timer. For 
> example:
>  
> {code:java}
> DoFn()
> {   @TimerId("timer1") private final TimerSpec timer1 = 
> TimerSpecs.timer(...);   @TimerId("timer2") private final TimerSpec timer2 = 
> TimerSpecs.timer(...);                 .. set timers in processElement    
> @OnTimer("timer1") public void onTimer1() \{ .}
>    @OnTimer("timer2") public void onTimer2() {}
> }
> {code}
>  
> However there are many cases where the user does not know the set of timers 
> statically when writing their code. This happens when the timer tag should be 
> based on the data. It also happens when writing a DSL on top of Beam, where 
> the DSL author has to create DoFns but does not know statically which timers 
> their users will want to set (e.g. Scio).
>  
> The goal is to support dynamic timers. Something as follows;
>  
> {code:java}
> DoFn() {
>   @TimerId("timer") private final TimerSpec timer1 = 
> TimerSpecs.dynamicTimer(...);
>    @ProcessElement process(@TimerId("timer") DynamicTimer timer)
> {       timer.set("tag1'", ts);       timer.set("tag2", ts);     }
>    @OnTimer("timer") public void onTimer1(@TimerTag String tag) { .}
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8523) Add useful timestamp to job servicer GetJobs

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8523?focusedWorklogId=337519&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337519
 ]

ASF GitHub Bot logged work on BEAM-8523:


Author: ASF GitHub Bot
Created on: 01/Nov/19 19:39
Start Date: 01/Nov/19 19:39
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9959: WIP: 
[BEAM-8523] JobAPI: Give access to timestamped state change history
URL: https://github.com/apache/beam/pull/9959#discussion_r341722020
 
 

 ##
 File path: model/job-management/src/main/proto/beam_job_api.proto
 ##
 @@ -153,6 +154,7 @@ message GetJobStateRequest {
 
 message GetJobStateResponse {
 
 Review comment:
   ok, saw your comment from Jira:
   
   > Typically when you would connect you would want to give some lower bound 
to filter messages. For example GetJobStateRequest could become:
   > 
   > ```proto
   > message GetJobStateRequest {
   >   string job_id = 1; // (required)
   >   // (Optional) If specified, only state transitions after
   >   // the provided timestamp will be provided.
   >   google.protobuf.Timestamp min_message_timestamp;
   > }
   > ```
   
   I agree with this conceptually, but was hoping to defer the addition of this 
until I get to the message stream.  The state stream is typically going to be 
just a few events, so practically speaking, the additional of 
`min_message_timestamp` here will not be very impactful.  The size of the 
backfill for the message stream on the other hand, with the changes that I'd 
like to add to it, could be significant, and there could be a number of ways 
that a user may want to filter (`min_message_timestamp`, `max_num_messages`, 
`importance`, etc).  I'd like the APIs to be similar between the two, and I 
think the work donw on GetMessageStream will inform our choices for 
GetStateStream. 
   
   tl;dr I'd like to defer the addition of this until later, if possible. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337519)
Time Spent: 1h 20m  (was: 1h 10m)

> Add useful timestamp to job servicer GetJobs
> 
>
> Key: BEAM-8523
> URL: https://issues.apache.org/jira/browse/BEAM-8523
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> As a user querying jobs with JobService.GetJobs, it would be useful if the 
> JobInfo result contained timestamps indicating various state changes that may 
> have been missed by a client.   Useful timestamps include:
>  
>  * submitted (prepared to the job service)
>  * started (executor enters the RUNNING state)
>  * completed (executor enters a terminal state)
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8523) Add useful timestamp to job servicer GetJobs

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8523?focusedWorklogId=337517&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337517
 ]

ASF GitHub Bot logged work on BEAM-8523:


Author: ASF GitHub Bot
Created on: 01/Nov/19 19:38
Start Date: 01/Nov/19 19:38
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9959: WIP: [BEAM-8523] 
JobAPI: Give access to timestamped state change history
URL: https://github.com/apache/beam/pull/9959#issuecomment-548923940
 
 
   Note that this PR depends on https://github.com/apache/beam/pull/9965 which 
depends on https://github.com/apache/beam/pull/9969
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337517)
Time Spent: 1h 10m  (was: 1h)

> Add useful timestamp to job servicer GetJobs
> 
>
> Key: BEAM-8523
> URL: https://issues.apache.org/jira/browse/BEAM-8523
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> As a user querying jobs with JobService.GetJobs, it would be useful if the 
> JobInfo result contained timestamps indicating various state changes that may 
> have been missed by a client.   Useful timestamps include:
>  
>  * submitted (prepared to the job service)
>  * started (executor enters the RUNNING state)
>  * completed (executor enters a terminal state)
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3713) Consider moving away from nose to nose2 or pytest.

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3713?focusedWorklogId=337518&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337518
 ]

ASF GitHub Bot logged work on BEAM-3713:


Author: ASF GitHub Bot
Created on: 01/Nov/19 19:38
Start Date: 01/Nov/19 19:38
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9756: [BEAM-3713] Add pytest 
for unit tests
URL: https://github.com/apache/beam/pull/9756#issuecomment-548924069
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337518)
Time Spent: 11h 20m  (was: 11h 10m)

> Consider moving away from nose to nose2 or pytest.
> --
>
> Key: BEAM-3713
> URL: https://issues.apache.org/jira/browse/BEAM-3713
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: Robert Bradshaw
>Assignee: Udi Meiri
>Priority: Minor
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> Per 
> [https://nose.readthedocs.io/en/latest/|https://nose.readthedocs.io/en/latest/,]
>  , nose is in maintenance mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3713) Consider moving away from nose to nose2 or pytest.

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3713?focusedWorklogId=337515&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337515
 ]

ASF GitHub Bot logged work on BEAM-3713:


Author: ASF GitHub Bot
Created on: 01/Nov/19 19:36
Start Date: 01/Nov/19 19:36
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9756: [BEAM-3713] Add pytest 
for unit tests
URL: https://github.com/apache/beam/pull/9756#issuecomment-548923380
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337515)
Time Spent: 11h  (was: 10h 50m)

> Consider moving away from nose to nose2 or pytest.
> --
>
> Key: BEAM-3713
> URL: https://issues.apache.org/jira/browse/BEAM-3713
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: Robert Bradshaw
>Assignee: Udi Meiri
>Priority: Minor
>  Time Spent: 11h
>  Remaining Estimate: 0h
>
> Per 
> [https://nose.readthedocs.io/en/latest/|https://nose.readthedocs.io/en/latest/,]
>  , nose is in maintenance mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3713) Consider moving away from nose to nose2 or pytest.

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3713?focusedWorklogId=337516&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337516
 ]

ASF GitHub Bot logged work on BEAM-3713:


Author: ASF GitHub Bot
Created on: 01/Nov/19 19:36
Start Date: 01/Nov/19 19:36
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9756: [BEAM-3713] Add pytest 
for unit tests
URL: https://github.com/apache/beam/pull/9756#issuecomment-548923507
 
 
   Run CommunityMetrics PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337516)
Time Spent: 11h 10m  (was: 11h)

> Consider moving away from nose to nose2 or pytest.
> --
>
> Key: BEAM-3713
> URL: https://issues.apache.org/jira/browse/BEAM-3713
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: Robert Bradshaw
>Assignee: Udi Meiri
>Priority: Minor
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> Per 
> [https://nose.readthedocs.io/en/latest/|https://nose.readthedocs.io/en/latest/,]
>  , nose is in maintenance mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8523) Add useful timestamp to job servicer GetJobs

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8523?focusedWorklogId=337514&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337514
 ]

ASF GitHub Bot logged work on BEAM-8523:


Author: ASF GitHub Bot
Created on: 01/Nov/19 19:35
Start Date: 01/Nov/19 19:35
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9959: WIP: 
[BEAM-8523] JobAPI: Give access to timestamped state change history
URL: https://github.com/apache/beam/pull/9959#discussion_r341722020
 
 

 ##
 File path: model/job-management/src/main/proto/beam_job_api.proto
 ##
 @@ -153,6 +154,7 @@ message GetJobStateRequest {
 
 message GetJobStateResponse {
 
 Review comment:
   ok, saw your comment from Jira:
   
   > Typically when you would connect you would want to give some lower bound 
to filter messages. For example GetJobStateRequest could become:
   > 
   > ```proto
   > message GetJobStateRequest {
   >   string job_id = 1; // (required)
   >   // (Optional) If specified, only state transitions after
   >   // the provided timestamp will be provided.
   >   google.protobuf.Timestamp min_message_timestamp;
   > }
   > ```
   
   I agree with this conceptually, but was hoping to defer the addition of this 
until I get to the message stream.  The state stream is typically going to be 
just a few events, so practically speaking, the additional of 
`min_message_timestamp` here will not be very impactful.  The size of the 
backfill for the message stream on the other hand, with the changes that I'd 
like to add to it, could be significant, and there could be a number of ways 
that a user may want to filter (`min_message_timestamp`, `max_num_messages`, 
`importance`, etc).  I'd like the APIs to be similar between the two, and I 
think the GetMessageStream will inform what we want to do for GetStateStream. 
   
   tl;dr I'd like to defer the addition of this until later, if possible. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337514)
Time Spent: 1h  (was: 50m)

> Add useful timestamp to job servicer GetJobs
> 
>
> Key: BEAM-8523
> URL: https://issues.apache.org/jira/browse/BEAM-8523
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> As a user querying jobs with JobService.GetJobs, it would be useful if the 
> JobInfo result contained timestamps indicating various state changes that may 
> have been missed by a client.   Useful timestamps include:
>  
>  * submitted (prepared to the job service)
>  * started (executor enters the RUNNING state)
>  * completed (executor enters a terminal state)
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8368?focusedWorklogId=337513&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337513
 ]

ASF GitHub Bot logged work on BEAM-8368:


Author: ASF GitHub Bot
Created on: 01/Nov/19 19:29
Start Date: 01/Nov/19 19:29
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9970: [BEAM-8368] 
[BEAM-8392] Update pyarrow to the latest version
URL: https://github.com/apache/beam/pull/9970#issuecomment-548921330
 
 
   R: @TheNeuralBit 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337513)
Time Spent: 2h 10m  (was: 2h)

> [Python] libprotobuf-generated exception when importing apache_beam
> ---
>
> Key: BEAM-8368
> URL: https://issues.apache.org/jira/browse/BEAM-8368
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.15.0, 2.17.0
>Reporter: Ubaier Bhat
>Assignee: Brian Hulette
>Priority: Blocker
> Fix For: 2.17.0
>
> Attachments: error_log.txt
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Unable to import apache_beam after upgrading to macos 10.15 (Catalina). 
> Cleared all the pipenvs and but can't get it working again.
> {code}
> import apache_beam as beam
> /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84:
>  UserWarning: Some syntactic constructs of Python 3 are not yet fully 
> supported by Apache Beam.
>   'Some syntactic constructs of Python 3 are not yet fully supported by '
> [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already 
> exists in database: 
> [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> libc++abi.dylib: terminating with uncaught exception of type 
> google::protobuf::FatalException: CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8368?focusedWorklogId=337511&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337511
 ]

ASF GitHub Bot logged work on BEAM-8368:


Author: ASF GitHub Bot
Created on: 01/Nov/19 19:28
Start Date: 01/Nov/19 19:28
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9970: [BEAM-8368] 
[BEAM-8392] Update pyarrow to the latest version
URL: https://github.com/apache/beam/pull/9970
 
 
   Update pyarrow to the latest version.
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommi

[jira] [Work logged] (BEAM-8523) Add useful timestamp to job servicer GetJobs

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8523?focusedWorklogId=337510&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337510
 ]

ASF GitHub Bot logged work on BEAM-8523:


Author: ASF GitHub Bot
Created on: 01/Nov/19 19:26
Start Date: 01/Nov/19 19:26
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9959: WIP: 
[BEAM-8523] JobAPI: Give access to timestamped state change history
URL: https://github.com/apache/beam/pull/9959#discussion_r341718971
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/JobStateEvent.java
 ##
 @@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.jobsubmission;
+
+import com.google.auto.value.AutoValue;
+import java.time.Instant;
+import org.apache.beam.model.jobmanagement.v1.JobApi;
+import org.apache.beam.vendor.grpc.v1p21p0.com.google.protobuf.Timestamp;
+
+/** A state transition event. */
+@AutoValue
+public abstract class JobStateEvent {
 
 Review comment:
   Do we want a `GetJobStateResponse` to have a `JobStateEvent` ?  i.e.
   
   ```proto
   message JobStateEvent {
 JobState.Enum state = 1; // (required)
 com.google.Timestamp timestamp = 2; // (required)
   }
   
   message GetJobStateResponse {
 JobStateEvent state_event = 1; // (required)
   }
   ```
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337510)
Time Spent: 50m  (was: 40m)

> Add useful timestamp to job servicer GetJobs
> 
>
> Key: BEAM-8523
> URL: https://issues.apache.org/jira/browse/BEAM-8523
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> As a user querying jobs with JobService.GetJobs, it would be useful if the 
> JobInfo result contained timestamps indicating various state changes that may 
> have been missed by a client.   Useful timestamps include:
>  
>  * submitted (prepared to the job service)
>  * started (executor enters the RUNNING state)
>  * completed (executor enters a terminal state)
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam

2019-11-01 Thread Ahmet Altay (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-8368:
--
Fix Version/s: (was: 2.18.0)
   2.17.0

> [Python] libprotobuf-generated exception when importing apache_beam
> ---
>
> Key: BEAM-8368
> URL: https://issues.apache.org/jira/browse/BEAM-8368
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.15.0, 2.17.0
>Reporter: Ubaier Bhat
>Assignee: Brian Hulette
>Priority: Blocker
> Fix For: 2.17.0
>
> Attachments: error_log.txt
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Unable to import apache_beam after upgrading to macos 10.15 (Catalina). 
> Cleared all the pipenvs and but can't get it working again.
> {code}
> import apache_beam as beam
> /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84:
>  UserWarning: Some syntactic constructs of Python 3 are not yet fully 
> supported by Apache Beam.
>   'Some syntactic constructs of Python 3 are not yet fully supported by '
> [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already 
> exists in database: 
> [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> libc++abi.dylib: terminating with uncaught exception of type 
> google::protobuf::FatalException: CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam

2019-11-01 Thread Ahmet Altay (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16965043#comment-16965043
 ] 

Ahmet Altay commented on BEAM-8368:
---

0.15.1 was released a few hours ago ([https://pypi.org/project/pyarrow/0.15.1/])

 

Could we get this into the 2.17.0 branch?

> [Python] libprotobuf-generated exception when importing apache_beam
> ---
>
> Key: BEAM-8368
> URL: https://issues.apache.org/jira/browse/BEAM-8368
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.15.0, 2.17.0
>Reporter: Ubaier Bhat
>Assignee: Brian Hulette
>Priority: Blocker
> Fix For: 2.17.0
>
> Attachments: error_log.txt
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Unable to import apache_beam after upgrading to macos 10.15 (Catalina). 
> Cleared all the pipenvs and but can't get it working again.
> {code}
> import apache_beam as beam
> /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84:
>  UserWarning: Some syntactic constructs of Python 3 are not yet fully 
> supported by Apache Beam.
>   'Some syntactic constructs of Python 3 are not yet fully supported by '
> [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already 
> exists in database: 
> [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> libc++abi.dylib: terminating with uncaught exception of type 
> google::protobuf::FatalException: CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8523) Add useful timestamp to job servicer GetJobs

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8523?focusedWorklogId=337509&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337509
 ]

ASF GitHub Bot logged work on BEAM-8523:


Author: ASF GitHub Bot
Created on: 01/Nov/19 19:25
Start Date: 01/Nov/19 19:25
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9959: WIP: 
[BEAM-8523] JobAPI: Give access to timestamped state change history
URL: https://github.com/apache/beam/pull/9959#discussion_r341718971
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/JobStateEvent.java
 ##
 @@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.jobsubmission;
+
+import com.google.auto.value.AutoValue;
+import java.time.Instant;
+import org.apache.beam.model.jobmanagement.v1.JobApi;
+import org.apache.beam.vendor.grpc.v1p21p0.com.google.protobuf.Timestamp;
+
+/** A state transition event. */
+@AutoValue
+public abstract class JobStateEvent {
 
 Review comment:
   Do we want a `GetJobStateResponse` to have a `JobStateEvent` ?  i.e.
   
   ```proto
   message GetJobStateResponse {
 JobStateEvent state_event = 1; // (required)
   }
   ```
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337509)
Time Spent: 40m  (was: 0.5h)

> Add useful timestamp to job servicer GetJobs
> 
>
> Key: BEAM-8523
> URL: https://issues.apache.org/jira/browse/BEAM-8523
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> As a user querying jobs with JobService.GetJobs, it would be useful if the 
> JobInfo result contained timestamps indicating various state changes that may 
> have been missed by a client.   Useful timestamps include:
>  
>  * submitted (prepared to the job service)
>  * started (executor enters the RUNNING state)
>  * completed (executor enters a terminal state)
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8523) Add useful timestamp to job servicer GetJobs

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8523?focusedWorklogId=337507&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337507
 ]

ASF GitHub Bot logged work on BEAM-8523:


Author: ASF GitHub Bot
Created on: 01/Nov/19 19:23
Start Date: 01/Nov/19 19:23
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9959: WIP: 
[BEAM-8523] JobAPI: Give access to timestamped state change history
URL: https://github.com/apache/beam/pull/9959#discussion_r341718342
 
 

 ##
 File path: model/job-management/src/main/proto/beam_job_api.proto
 ##
 @@ -153,6 +154,7 @@ message GetJobStateRequest {
 
 message GetJobStateResponse {
 
 Review comment:
   Why?  What would be different about them?  
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337507)
Time Spent: 0.5h  (was: 20m)

> Add useful timestamp to job servicer GetJobs
> 
>
> Key: BEAM-8523
> URL: https://issues.apache.org/jira/browse/BEAM-8523
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As a user querying jobs with JobService.GetJobs, it would be useful if the 
> JobInfo result contained timestamps indicating various state changes that may 
> have been missed by a client.   Useful timestamps include:
>  
>  * submitted (prepared to the job service)
>  * started (executor enters the RUNNING state)
>  * completed (executor enters a terminal state)
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8435) Allow access to PaneInfo from Python DoFns

2019-11-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8435?focusedWorklogId=337506&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-337506
 ]

ASF GitHub Bot logged work on BEAM-8435:


Author: ASF GitHub Bot
Created on: 01/Nov/19 19:22
Start Date: 01/Nov/19 19:22
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #9836: [BEAM-8435] 
Implement PaneInfo computation for Python.
URL: https://github.com/apache/beam/pull/9836#discussion_r341716543
 
 

 ##
 File path: sdks/python/apache_beam/transforms/trigger.py
 ##
 @@ -1162,12 +1173,33 @@ def process_timer(self, window_id, unused_name, 
time_domain, timestamp,
 if self.trigger_fn.should_fire(time_domain, timestamp,
window, context):
   finished = self.trigger_fn.on_fire(timestamp, window, context)
-  yield self._output(window, finished, state)
+  yield self._output(window, finished, state, timestamp,
+ time_domain == TimeDomain.WATERMARK)
 else:
   raise Exception('Unexpected time domain: %s' % time_domain)
 
-  def _output(self, window, finished, state):
+  def _output(self, window, finished, state, watermark, maybe_ontime):
 """Output window and clean up if appropriate."""
+index = state.get_state(window, self.INDEX)
+state.add_state(window, self.INDEX, 1)
+if watermark <= window.max_timestamp():
+  nonspeculative_index = -1
+  timing = windowed_value.PaneInfoTiming.EARLY
+  if state.get_state(window, self.NONSPECULATIVE_INDEX):
+logging.warning('Watermark moved backwards in time.')
+else:
+  nonspeculative_index = state.get_state(window, self.NONSPECULATIVE_INDEX)
+  state.add_state(window, self.NONSPECULATIVE_INDEX, 1)
+  timing = (
+  windowed_value.PaneInfoTiming.ON_TIME
+  if maybe_ontime and nonspeculative_index == 0
 
 Review comment:
   Sorry, I miss read that line.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 337506)
Time Spent: 1.5h  (was: 1h 20m)

> Allow access to PaneInfo from Python DoFns
> --
>
> Key: BEAM-8435
> URL: https://issues.apache.org/jira/browse/BEAM-8435
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> PaneInfoParam exists, but the plumbing to actually populate it at runtime was 
> never added. (Nor, clearly, were any tests...)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >