[jira] [Commented] (BEAM-9994) Cannot create a virtualenv using Python 3.8 on Jenkins machines

2020-05-26 Thread Yifan Zou (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116981#comment-17116981
 ] 

Yifan Zou commented on BEAM-9994:
-

The upgrade was complete. All executors are now using  
*jenkins-slave-boot-image-20200522*

> Cannot create a virtualenv using Python 3.8 on Jenkins machines
> ---
>
> Key: BEAM-9994
> URL: https://issues.apache.org/jira/browse/BEAM-9994
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: P2
>
> Command: *virtualenv --python /usr/bin/python3.8 env*
> Output:
> {noformat}
> Running virtualenv with interpreter /usr/bin/python3.8
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.5/dist-packages/virtualenv.py", line 22, in 
> 
> import distutils.spawn
> ModuleNotFoundError: No module named 'distutils.spawn'
> {noformat}
> Example test affected: 
> https://builds.apache.org/job/beam_PreCommit_PythonFormatter_Commit/1723/console



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9994) Cannot create a virtualenv using Python 3.8 on Jenkins machines

2020-05-26 Thread Yifan Zou (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116942#comment-17116942
 ] 

Yifan Zou commented on BEAM-9994:
-

I can help to renew the VM disk images and reboot jenkins. 

> Cannot create a virtualenv using Python 3.8 on Jenkins machines
> ---
>
> Key: BEAM-9994
> URL: https://issues.apache.org/jira/browse/BEAM-9994
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: P2
>
> Command: *virtualenv --python /usr/bin/python3.8 env*
> Output:
> {noformat}
> Running virtualenv with interpreter /usr/bin/python3.8
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.5/dist-packages/virtualenv.py", line 22, in 
> 
> import distutils.spawn
> ModuleNotFoundError: No module named 'distutils.spawn'
> {noformat}
> Example test affected: 
> https://builds.apache.org/job/beam_PreCommit_PythonFormatter_Commit/1723/console



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (BEAM-9994) Cannot create a virtualenv using Python 3.8 on Jenkins machines

2020-05-20 Thread Yifan Zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Zou updated BEAM-9994:

Comment: was deleted

(was: Yes, We will have to reimage the Jenkins to ensure the consistent 
environment across the executor fleet.)

> Cannot create a virtualenv using Python 3.8 on Jenkins machines
> ---
>
> Key: BEAM-9994
> URL: https://issues.apache.org/jira/browse/BEAM-9994
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: P2
>
> Command: *virtualenv --python /usr/bin/python3.8 env*
> Output:
> {noformat}
> Running virtualenv with interpreter /usr/bin/python3.8
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.5/dist-packages/virtualenv.py", line 22, in 
> 
> import distutils.spawn
> ModuleNotFoundError: No module named 'distutils.spawn'
> {noformat}
> Example test affected: 
> https://builds.apache.org/job/beam_PreCommit_PythonFormatter_Commit/1723/console



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9994) Cannot create a virtualenv using Python 3.8 on Jenkins machines

2020-05-20 Thread Yifan Zou (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112613#comment-17112613
 ] 

Yifan Zou commented on BEAM-9994:
-

Yes, We will have to reimage the Jenkins to ensure the consistent environment 
across the executor fleet.

> Cannot create a virtualenv using Python 3.8 on Jenkins machines
> ---
>
> Key: BEAM-9994
> URL: https://issues.apache.org/jira/browse/BEAM-9994
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: P2
>
> Command: *virtualenv --python /usr/bin/python3.8 env*
> Output:
> {noformat}
> Running virtualenv with interpreter /usr/bin/python3.8
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.5/dist-packages/virtualenv.py", line 22, in 
> 
> import distutils.spawn
> ModuleNotFoundError: No module named 'distutils.spawn'
> {noformat}
> Example test affected: 
> https://builds.apache.org/job/beam_PreCommit_PythonFormatter_Commit/1723/console



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9507) Beam dependency check failing

2020-03-24 Thread Yifan Zou (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066377#comment-17066377
 ] 

Yifan Zou commented on BEAM-9507:
-

[https://github.com/apache/beam/pull/11194] fixed the job. Thanks 
[~piotr-szuberski].

> Beam dependency check failing
> -
>
> Key: BEAM-9507
> URL: https://issues.apache.org/jira/browse/BEAM-9507
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Michał Walenia
>Assignee: Piotr Szuberski
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Here are the logs:
> [https://builds.apache.org/job/beam_Dependency_Check/257/console]
>  
>from grpc_tools import protoc*13:04:25* ImportError: No module 
> named 'grpc_tools'*13:04:25* *13:04:25* During handling of the above 
> exception, another exception occurred:*13:04:25* *13:04:25* Traceback 
> (most recent call last):*13:04:25*   File 
> "/usr/lib/python3.5/multiprocessing/process.py", line 249, in 
> _bootstrap*13:04:25* self.run()*13:04:25*   File 
> "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run*13:04:25*
>  self._target(*self._args, **self._kwargs)*13:04:25*   File 
> "/home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/sdks/python/gen_protos.py",
>  line 378, in _install_grpcio_tools_and_generate_proto_files*13:04:25*
>  generate_proto_files(force=force)*13:04:25*   File 
> "/home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/sdks/python/gen_protos.py",
>  line 315, in generate_proto_files*13:04:25* protoc_gen_mypy = 
> _find_protoc_gen_mypy()*13:04:25*   File 
> "/home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/sdks/python/gen_protos.py",
>  line 233, in _find_protoc_gen_mypy*13:04:25* (fname, ', 
> '.join(search_paths)))*13:04:25* RuntimeError: Could not find 
> protoc-gen-mypy in 
> /home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/sdks/python/sdks/python/bin,
>  
> /home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/sdks/python/sdks/python/bin,
>  /home/jenkins/tools/java/latest1.8/bin, /usr/local/sbin, /usr/local/bin, 
> /usr/sbin, /usr/bin, /sbin, /bin, /usr/games, /usr/local/games*13:04:25* 
> Traceback (most recent call last):*13:04:25*   File 
> "/home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/sdks/python/gen_protos.py",
>  line 292, in generate_proto_files*13:04:25* from grpc_tools import 
> protoc*13:04:25* ImportError: No module named 'grpc_tools'*13:04:25* 
> *13:04:25* During handling of the above exception, another exception 
> occurred:*13:04:25* *13:04:25* Traceback (most recent call 
> last):*13:04:25*   File "", line 1, in *13:04:25*   
> File 
> "/home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/sdks/python/setup.py",
>  line 315, in *13:04:25* 'mypy': 
> generate_protos_first(mypy),*13:04:25*   File 
> "/home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/sdks/python/sdks/python/lib/python3.5/site-packages/setuptools/__init__.py",
>  line 144, in setup*13:04:25* return 
> distutils.core.setup(**attrs)*13:04:25*   File 
> "/usr/lib/python3.5/distutils/core.py", line 148, in setup*13:04:25* 
> dist.run_commands()*13:04:25*   File 
> "/usr/lib/python3.5/distutils/dist.py", line 955, in run_commands*13:04:25*   
>   self.run_command(cmd)*13:04:25*   File 
> "/usr/lib/python3.5/distutils/dist.py", line 974, in run_command*13:04:25*
>  cmd_obj.run()*13:04:25*   File 
> "/home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/sdks/python/setup.py",
>  line 239, in run*13:04:25* 
> gen_protos.generate_proto_files()*13:04:25*   File 
> "/home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/sdks/python/gen_protos.py",
>  line 310, in generate_proto_files*13:04:25* raise ValueError("Proto 
> generation failed (see log for details).")*13:04:25* ValueError: Proto 
> generation failed (see log for details).*13:04:25* 
> *13:04:25* ERROR: Command errored out 
> with exit status 1: python setup.py egg_info Check the logs for full command 
> output.*13:04:25* *13:04:25* >
>  *Task :sdks:python:dependencyUpdates*
>  FAILED*13:04:25* *13:04:25* FAILURE: Build failed with an 
> exception.*13:04:25* *13:04:25* * Where:*13:04:25* Build file 
> '/home/jenkins/jenkins-slave/workspace/beam_Dependency_Check/src/sdks/python/build.gradle'
>  line: 94*13:04:25* *13:04:25* * What went wrong:*13:04:25* Execution failed 
> for task ':sdks:python:dependencyUpdates'.*13:04:25* > Process 'command 'sh'' 
> finished with non-zero exit value 1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9244) Bump version of GCloud tools on Jenkins workers to at least 258.0.0

2020-02-28 Thread Yifan Zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Zou reassigned BEAM-9244:
---

Assignee: Mark Liu  (was: Yifan Zou)

> Bump version of GCloud tools on Jenkins workers to at least 258.0.0
> ---
>
> Key: BEAM-9244
> URL: https://issues.apache.org/jira/browse/BEAM-9244
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, testing
>Reporter: Michał Walenia
>Assignee: Mark Liu
>Priority: Major
>
> I recently tried to add a param to a GCloud command to limit lifetime of 
> created Dataproc clusters. This broke the job due to the parameter not being 
> recognized by the tool.
> Can we update gcloud on the workers?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9244) Bump version of GCloud tools on Jenkins workers to at least 258.0.0

2020-02-28 Thread Yifan Zou (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048071#comment-17048071
 ] 

Yifan Zou commented on BEAM-9244:
-

Very sorry for jumping into this such late. Mark is actively working on Jenins. 
I reassigned to Mark.

> Bump version of GCloud tools on Jenkins workers to at least 258.0.0
> ---
>
> Key: BEAM-9244
> URL: https://issues.apache.org/jira/browse/BEAM-9244
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, testing
>Reporter: Michał Walenia
>Assignee: Yifan Zou
>Priority: Major
>
> I recently tried to add a param to a GCloud command to limit lifetime of 
> created Dataproc clusters. This broke the job due to the parameter not being 
> recognized by the tool.
> Can we update gcloud on the workers?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9319) ResourceExhausted: topics-per-project

2020-02-20 Thread Yifan Zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Zou reassigned BEAM-9319:
---

Assignee: Brian Hulette  (was: Yifan Zou)

> ResourceExhausted: topics-per-project
> -
>
> Key: BEAM-9319
> URL: https://issues.apache.org/jira/browse/BEAM-9319
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures, testing
>Reporter: Ahmet Altay
>Assignee: Brian Hulette
>Priority: Major
>
> Tests are failing due to quota issues. Do we need to clean up topics after 
> tests or set a shorter TTL?
> Log: https://builds.apache.org/job/beam_PreCommit_Python_Commit/11178/
> Error: 
> 08:24:40 
> ==
> 08:24:40 ERROR: test_streaming_wordcount_it 
> (apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT)
> 08:24:40 
> --
> 08:24:40 Traceback (most recent call last):
> 08:24:40   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py",
>  line 58, in setUp
> 08:24:40 self.pub_client.topic_path(self.project, INPUT_TOPIC + 
> self.uuid))
> 08:24:40   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/google/cloud/pubsub_v1/_gapic.py",
>  line 40, in 
> 08:24:40 fx = lambda self, *a, **kw: wrapped_fx(self.api, *a, **kw)  # 
> noqa
> 08:24:40   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/google/cloud/pubsub_v1/gapic/publisher_client.py",
>  line 332, in create_topic
> 08:24:40 request, retry=retry, timeout=timeout, metadata=metadata
> 08:24:40   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/google/api_core/gapic_v1/method.py",
>  line 143, in __call__
> 08:24:40 return wrapped_func(*args, **kwargs)
> 08:24:40   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/google/api_core/retry.py",
>  line 286, in retry_wrapped_func
> 08:24:40 on_error=on_error,
> 08:24:40   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/google/api_core/retry.py",
>  line 184, in retry_target
> 08:24:40 return target()
> 08:24:40   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/google/api_core/timeout.py",
>  line 214, in func_with_timeout
> 08:24:40 return func(*args, **kwargs)
> 08:24:40   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/google/api_core/grpc_helpers.py",
>  line 59, in error_remapped_callable
> 08:24:40 six.raise_from(exceptions.from_grpc_error(exc), exc)
> 08:24:40   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/six.py",
>  line 738, in raise_from
> 08:24:40 raise value
> 08:24:40 ResourceExhausted: 429 Your project has exceeded a limit: 
> (type="topics-per-project", current=1, maximum=1).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9319) ResourceExhausted: topics-per-project

2020-02-14 Thread Yifan Zou (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17037395#comment-17037395
 ] 

Yifan Zou commented on BEAM-9319:
-

Done. 5800 old topics deleted.

> ResourceExhausted: topics-per-project
> -
>
> Key: BEAM-9319
> URL: https://issues.apache.org/jira/browse/BEAM-9319
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures, testing
>Reporter: Ahmet Altay
>Assignee: Yifan Zou
>Priority: Major
>
> Tests are failing due to quota issues. Do we need to clean up topics after 
> tests or set a shorter TTL?
> Log: https://builds.apache.org/job/beam_PreCommit_Python_Commit/11178/
> Error: 
> 08:24:40 
> ==
> 08:24:40 ERROR: test_streaming_wordcount_it 
> (apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT)
> 08:24:40 
> --
> 08:24:40 Traceback (most recent call last):
> 08:24:40   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py",
>  line 58, in setUp
> 08:24:40 self.pub_client.topic_path(self.project, INPUT_TOPIC + 
> self.uuid))
> 08:24:40   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/google/cloud/pubsub_v1/_gapic.py",
>  line 40, in 
> 08:24:40 fx = lambda self, *a, **kw: wrapped_fx(self.api, *a, **kw)  # 
> noqa
> 08:24:40   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/google/cloud/pubsub_v1/gapic/publisher_client.py",
>  line 332, in create_topic
> 08:24:40 request, retry=retry, timeout=timeout, metadata=metadata
> 08:24:40   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/google/api_core/gapic_v1/method.py",
>  line 143, in __call__
> 08:24:40 return wrapped_func(*args, **kwargs)
> 08:24:40   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/google/api_core/retry.py",
>  line 286, in retry_wrapped_func
> 08:24:40 on_error=on_error,
> 08:24:40   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/google/api_core/retry.py",
>  line 184, in retry_target
> 08:24:40 return target()
> 08:24:40   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/google/api_core/timeout.py",
>  line 214, in func_with_timeout
> 08:24:40 return func(*args, **kwargs)
> 08:24:40   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/google/api_core/grpc_helpers.py",
>  line 59, in error_remapped_callable
> 08:24:40 six.raise_from(exceptions.from_grpc_error(exc), exc)
> 08:24:40   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/six.py",
>  line 738, in raise_from
> 08:24:40 raise value
> 08:24:40 ResourceExhausted: 429 Your project has exceeded a limit: 
> (type="topics-per-project", current=1, maximum=1).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9319) ResourceExhausted: topics-per-project

2020-02-14 Thread Yifan Zou (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17037207#comment-17037207
 ] 

Yifan Zou commented on BEAM-9319:
-

Looked into our pubsub topics. Seems none of them has ttl applied. I can help 
to delete stale topics which start with "integ-test-PubsubJsonIT" at this 
moment. And we do need to consider to have the tests release the resource usage 
after completion. Or, having a tool to cleanup the unused resources 
periodically.

> ResourceExhausted: topics-per-project
> -
>
> Key: BEAM-9319
> URL: https://issues.apache.org/jira/browse/BEAM-9319
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures, testing
>Reporter: Ahmet Altay
>Assignee: Yifan Zou
>Priority: Major
>
> Tests are failing due to quota issues. Do we need to clean up topics after 
> tests or set a shorter TTL?
> Log: https://builds.apache.org/job/beam_PreCommit_Python_Commit/11178/
> Error: 
> 08:24:40 
> ==
> 08:24:40 ERROR: test_streaming_wordcount_it 
> (apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT)
> 08:24:40 
> --
> 08:24:40 Traceback (most recent call last):
> 08:24:40   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py",
>  line 58, in setUp
> 08:24:40 self.pub_client.topic_path(self.project, INPUT_TOPIC + 
> self.uuid))
> 08:24:40   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/google/cloud/pubsub_v1/_gapic.py",
>  line 40, in 
> 08:24:40 fx = lambda self, *a, **kw: wrapped_fx(self.api, *a, **kw)  # 
> noqa
> 08:24:40   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/google/cloud/pubsub_v1/gapic/publisher_client.py",
>  line 332, in create_topic
> 08:24:40 request, retry=retry, timeout=timeout, metadata=metadata
> 08:24:40   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/google/api_core/gapic_v1/method.py",
>  line 143, in __call__
> 08:24:40 return wrapped_func(*args, **kwargs)
> 08:24:40   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/google/api_core/retry.py",
>  line 286, in retry_wrapped_func
> 08:24:40 on_error=on_error,
> 08:24:40   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/google/api_core/retry.py",
>  line 184, in retry_target
> 08:24:40 return target()
> 08:24:40   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/google/api_core/timeout.py",
>  line 214, in func_with_timeout
> 08:24:40 return func(*args, **kwargs)
> 08:24:40   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/google/api_core/grpc_helpers.py",
>  line 59, in error_remapped_callable
> 08:24:40 six.raise_from(exceptions.from_grpc_error(exc), exc)
> 08:24:40   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/build/gradleenv/-194514014/local/lib/python2.7/site-packages/six.py",
>  line 738, in raise_from
> 08:24:40 raise value
> 08:24:40 ResourceExhausted: 429 Your project has exceeded a limit: 
> (type="topics-per-project", current=1, maximum=1).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9302) No space left on device - apache-beam-jenkins-7

2020-02-12 Thread Yifan Zou (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035739#comment-17035739
 ] 

Yifan Zou commented on BEAM-9302:
-

Cleaned the workspace, released 147G disk.

> No space left on device - apache-beam-jenkins-7
> ---
>
> Key: BEAM-9302
> URL: https://issues.apache.org/jira/browse/BEAM-9302
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Michał Walenia
>Priority: Blocker
>
> [https://builds.apache.org/job/beam_PreCommit_SQL_Commit/543/consoleFull] log 
> of a failed job with this error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9302) No space left on device - apache-beam-jenkins-7

2020-02-12 Thread Yifan Zou (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035730#comment-17035730
 ] 

Yifan Zou edited comment on BEAM-9302 at 2/12/20 9:44 PM:
--

I ran the du command from HOME and it took several hours and still running.

I can clean the Jenkins workspace, which will release 150G disk. I believe this 
could bring the node back to normal.

 


was (Author: yifanzou):
I ran the du command from HOME and it took several hours and still running.

I can clean the Jenkins workspace, which will release 150G space. I believe 
this could bring the node back to normal.

 

> No space left on device - apache-beam-jenkins-7
> ---
>
> Key: BEAM-9302
> URL: https://issues.apache.org/jira/browse/BEAM-9302
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Michał Walenia
>Priority: Blocker
>
> [https://builds.apache.org/job/beam_PreCommit_SQL_Commit/543/consoleFull] log 
> of a failed job with this error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9302) No space left on device - apache-beam-jenkins-7

2020-02-12 Thread Yifan Zou (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035730#comment-17035730
 ] 

Yifan Zou commented on BEAM-9302:
-

I ran the du command from HOME and it took several hours and still running.

I can clean the Jenkins workspace, which will release 150G space. I believe 
this could bring the node back to normal.

 

> No space left on device - apache-beam-jenkins-7
> ---
>
> Key: BEAM-9302
> URL: https://issues.apache.org/jira/browse/BEAM-9302
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Michał Walenia
>Priority: Blocker
>
> [https://builds.apache.org/job/beam_PreCommit_SQL_Commit/543/consoleFull] log 
> of a failed job with this error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9302) No space left on device - apache-beam-jenkins-7

2020-02-12 Thread Yifan Zou (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035557#comment-17035557
 ] 

Yifan Zou commented on BEAM-9302:
-

Sorry for jumping into this late. Checked on that machine, and seems like 
dev/sha1 is full. I'm tracking down where the excess usage is being stored.

> No space left on device - apache-beam-jenkins-7
> ---
>
> Key: BEAM-9302
> URL: https://issues.apache.org/jira/browse/BEAM-9302
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Michał Walenia
>Priority: Blocker
>
> [https://builds.apache.org/job/beam_PreCommit_SQL_Commit/543/consoleFull] log 
> of a failed job with this error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9244) Bump version of GCloud tools on Jenkins workers to at least 258.0.0

2020-02-07 Thread yifan zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou reassigned BEAM-9244:
---

Assignee: (was: yifan zou)

> Bump version of GCloud tools on Jenkins workers to at least 258.0.0
> ---
>
> Key: BEAM-9244
> URL: https://issues.apache.org/jira/browse/BEAM-9244
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, testing
>Reporter: Michal Walenia
>Priority: Major
>
> I recently tried to add a param to a GCloud command to limit lifetime of 
> created Dataproc clusters. This broke the job due to the parameter not being 
> recognized by the tool.
> Can we update gcloud on the workers?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9244) Bump version of GCloud tools on Jenkins workers to at least 258.0.0

2020-02-07 Thread yifan zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou reassigned BEAM-9244:
---

Assignee: yifan zou

> Bump version of GCloud tools on Jenkins workers to at least 258.0.0
> ---
>
> Key: BEAM-9244
> URL: https://issues.apache.org/jira/browse/BEAM-9244
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, testing
>Reporter: Michal Walenia
>Assignee: yifan zou
>Priority: Major
>
> I recently tried to add a param to a GCloud command to limit lifetime of 
> created Dataproc clusters. This broke the job due to the parameter not being 
> recognized by the tool.
> Can we update gcloud on the workers?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8195) Quota exceeded for create requests

2019-12-30 Thread Yifan Zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Zou resolved BEAM-8195.
-
Fix Version/s: Not applicable
   Resolution: Fixed

> Quota exceeded for create requests
> --
>
> Key: BEAM-8195
> URL: https://issues.apache.org/jira/browse/BEAM-8195
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core, testing
>Reporter: Ahmet Altay
>Assignee: Yifan Zou
>Priority: Critical
> Fix For: Not applicable
>
>
> Post commits failied with the following error:
> HttpError accessing 
> :
>  response: <{'server': 'ESF', '-content-encoding': 'gzip', 'content-type': 
> 'application/json; charset=UTF-8', 'content-length': '598', 
> 'transfer-encoding': 'chunked', 'cache-control': 'private', 
> 'x-xss-protection': '0', 'date': 'Tue, 10 Sep 2019 12:02:24 GMT', 'vary': 
> 'Origin, X-Origin, Referer', 'x-frame-options': 'SAMEORIGIN', 'status': 
> '429', 'x-content-type-options': 'nosniff'}>, content <{
>   "error": {
> "code": 429,
> "message": "Quota exceeded for quota metric 
> 'dataflow.googleapis.com/create_requests' and limit 
> 'CreateRequestsPerMinutePerUser' of service 'dataflow.googleapis.com' for 
> consumer 'project_number:844138762903'.",
> "status": "RESOURCE_EXHAUSTED",
> "details": [
>   {
> Could we increase the quota?
> /cc [~alanmyrvold] [~kenn]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8195) Quota exceeded for create requests

2019-12-16 Thread Yifan Zou (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997709#comment-16997709
 ] 

Yifan Zou commented on BEAM-8195:
-

Not sure how to apply customized user name to Jenkins. Will do some research.

> Quota exceeded for create requests
> --
>
> Key: BEAM-8195
> URL: https://issues.apache.org/jira/browse/BEAM-8195
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core, testing
>Reporter: Ahmet Altay
>Assignee: Yifan Zou
>Priority: Critical
>
> Post commits failied with the following error:
> HttpError accessing 
> :
>  response: <{'server': 'ESF', '-content-encoding': 'gzip', 'content-type': 
> 'application/json; charset=UTF-8', 'content-length': '598', 
> 'transfer-encoding': 'chunked', 'cache-control': 'private', 
> 'x-xss-protection': '0', 'date': 'Tue, 10 Sep 2019 12:02:24 GMT', 'vary': 
> 'Origin, X-Origin, Referer', 'x-frame-options': 'SAMEORIGIN', 'status': 
> '429', 'x-content-type-options': 'nosniff'}>, content <{
>   "error": {
> "code": 429,
> "message": "Quota exceeded for quota metric 
> 'dataflow.googleapis.com/create_requests' and limit 
> 'CreateRequestsPerMinutePerUser' of service 'dataflow.googleapis.com' for 
> consumer 'project_number:844138762903'.",
> "status": "RESOURCE_EXHAUSTED",
> "details": [
>   {
> Could we increase the quota?
> /cc [~alanmyrvold] [~kenn]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8654) [Java] beam_Dependency_Check's not getting outdated report from Gradle

2019-11-13 Thread Yifan Zou (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973722#comment-16973722
 ] 

Yifan Zou commented on BEAM-8654:
-

>From the gradle-versions-plugin document:
 * release: selects the latest release
 * milestone: select the latest version being either a milestone or a release 
(default)

When we design and implement the beam dependency tool, we wanted the report to 
be readable and less spamy. We expect to highlight the dependencies which have 
high risk using an old version, for example, an old version could lead to a 
dependency diamond. Thus, having the revision=release to check on the latest 
release version only does make sense to us.

To this specific case, I have no answer on why bigtable-client-core 1.12.1 
didn't show up in the plugin checking results. We also don't understand what 
does a "milestone" version stand for.

 

> [Java] beam_Dependency_Check's not getting outdated report from Gradle
> --
>
> Key: BEAM-8654
> URL: https://issues.apache.org/jira/browse/BEAM-8654
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Tomo Suzuki
>Priority: Major
>
> Cont. of https://issues.apache.org/jira/browse/BEAM-8621
> https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_Dependency_Check/234/consoleFull
>  says
> {noformat}
> 18:23:12 The following dependencies are using the latest release version:
> ...
> 18:23:12  - com.google.cloud.bigdataoss:util:1.9.16
> 18:23:12  - com.google.cloud.bigtable:bigtable-client-core:1.8.0
> {noformat}
> But they are not the latest release.
> * 
> https://search.maven.org/artifact/com.google.cloud.bigdataoss/util/2.0.0/jar 
> * 
> https://search.maven.org/artifact/com.google.cloud.bigtable/bigtable-client-core/1.12.1/jar
> Why does Gradle think they're the latest release?
> It seems that " -Drevision=release" flag plays some role here. Without the 
> flag, Gradle reports these artifacts are not the latest.
> https://gist.github.com/suztomo/1460f2be48025c8ea764e86a2c6e39a8
> Even with the flag, it should report the following
> {noformat}
> The following dependencies have later release versions:
>  - com.google.cloud.bigtable:bigtable-client-core [1.8.0 -> 1.12.1]
>  https://cloud.google.com/bigtable/
> {noformat}
> https://gist.github.com/suztomo/13473e6b9765c0e96c22aeffab18ef66



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8621) [Java] beam_Dependency_Check reads smaller number of dependencies than expected

2019-11-13 Thread Yifan Zou (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973567#comment-16973567
 ] 

Yifan Zou commented on BEAM-8621:
-

The log shows that those things are using the latest release version.

[https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_Dependency_Check/234/consoleFull]
*00:04:18.534*  - com.google.cloud.bigdataoss:util:1.9.16*00:04:18.534*  - 
com.google.cloud.bigtable:bigtable-client-core:1.8.0

> [Java] beam_Dependency_Check reads smaller number of dependencies than 
> expected
> ---
>
> Key: BEAM-8621
> URL: https://issues.apache.org/jira/browse/BEAM-8621
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Yifan Zou
>Priority: Major
> Fix For: 2.18.0
>
> Attachments: O0Mv0emfgA3.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> beam_Dependency_Check reads smaller number of dependencies than expected. 
> Email thread: [Completeness of Beam Java Dependency Check Report 
> .|https://lists.apache.org/thread.html/834072b5829317f788d9d02f266bdc4aaa494639f73be2c1e1e72640@%3Cdev.beam.apache.org%3E]
> [https://builds.apache.org/job/beam_Dependency_Check/232/console 
> |https://builds.apache.org/job/beam_Dependency_Check/232/console]shows the 
> execution log when it runs  "gradlew runBeamDependencyCheck". The output of 
> gradle is different from my development environment.
>  
> !O0Mv0emfgA3.png! )



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-8621) [Java] beam_Dependency_Check reads smaller number of dependencies than expected

2019-11-12 Thread Yifan Zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Zou closed BEAM-8621.
---
Fix Version/s: 2.18.0
   Resolution: Fixed

> [Java] beam_Dependency_Check reads smaller number of dependencies than 
> expected
> ---
>
> Key: BEAM-8621
> URL: https://issues.apache.org/jira/browse/BEAM-8621
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Yifan Zou
>Priority: Major
> Fix For: 2.18.0
>
> Attachments: O0Mv0emfgA3.png
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> beam_Dependency_Check reads smaller number of dependencies than expected. 
> Email thread: [Completeness of Beam Java Dependency Check Report 
> .|https://lists.apache.org/thread.html/834072b5829317f788d9d02f266bdc4aaa494639f73be2c1e1e72640@%3Cdev.beam.apache.org%3E]
> [https://builds.apache.org/job/beam_Dependency_Check/232/console 
> |https://builds.apache.org/job/beam_Dependency_Check/232/console]shows the 
> execution log when it runs  "gradlew runBeamDependencyCheck". The output of 
> gradle is different from my development environment.
>  
> !O0Mv0emfgA3.png! )



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8621) [Java] beam_Dependency_Check reads smaller number of dependencies than expected

2019-11-12 Thread Yifan Zou (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972907#comment-16972907
 ] 

Yifan Zou commented on BEAM-8621:
-

The problem is that Jenkins runs `gradle :runBeamDependencyCheck`. When the 
Gradle runs tasks with colons (aka. qualified names), then the path is resolved 
relative to the root project. In our case, it only checks dependencies defined 
in the root directory. Removing the ":" from the task name will lead to a 
multi-project build. [https://github.com/apache/beam/pull/10079] will fix this 
issue.

> [Java] beam_Dependency_Check reads smaller number of dependencies than 
> expected
> ---
>
> Key: BEAM-8621
> URL: https://issues.apache.org/jira/browse/BEAM-8621
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Yifan Zou
>Priority: Major
> Attachments: O0Mv0emfgA3.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> beam_Dependency_Check reads smaller number of dependencies than expected. 
> Email thread: [Completeness of Beam Java Dependency Check Report 
> .|https://lists.apache.org/thread.html/834072b5829317f788d9d02f266bdc4aaa494639f73be2c1e1e72640@%3Cdev.beam.apache.org%3E]
> [https://builds.apache.org/job/beam_Dependency_Check/232/console 
> |https://builds.apache.org/job/beam_Dependency_Check/232/console]shows the 
> execution log when it runs  "gradlew runBeamDependencyCheck". The output of 
> gradle is different from my development environment.
>  
> !O0Mv0emfgA3.png! )



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-5571) Beam Dependency Update Request: org.slf4j:jcl-over-slf4j

2019-11-12 Thread Yifan Zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Zou closed BEAM-5571.
---
Resolution: Won't Fix

We're not updating this to an alpha version

> Beam Dependency Update Request: org.slf4j:jcl-over-slf4j
> 
>
> Key: BEAM-5571
> URL: https://issues.apache.org/jira/browse/BEAM-5571
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Priority: Major
> Fix For: Not applicable
>
>
>  - 2018-10-01 19:31:41.671404 
> -
> Please consider upgrading the dependency org.slf4j:jcl-over-slf4j. 
> The current version is 1.7.25. The latest version is 1.8.0-beta2 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-10-08 12:19:22.531634 
> -
> Please consider upgrading the dependency org.slf4j:jcl-over-slf4j. 
> The current version is 1.7.25. The latest version is 1.8.0-beta2 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-11-12 22:54:32.326978 
> -
> Please consider upgrading the dependency org.slf4j:jcl-over-slf4j. 
> The current version is 1.7.25. The latest version is 2.0.0-alpha1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-5588) Beam Dependency Update Request: org.slf4j:slf4j-jdk14

2019-11-12 Thread Yifan Zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Zou closed BEAM-5588.
---
Resolution: Won't Fix

We're not updating this to an alpha version

> Beam Dependency Update Request: org.slf4j:slf4j-jdk14
> -
>
> Key: BEAM-5588
> URL: https://issues.apache.org/jira/browse/BEAM-5588
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Priority: Major
> Fix For: Not applicable
>
>
>  - 2018-10-01 19:32:27.843922 
> -
> Please consider upgrading the dependency org.slf4j:slf4j-jdk14. 
> The current version is 1.7.25. The latest version is 1.8.0-beta2 
> cc: [~swegner], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-10-08 12:20:59.649783 
> -
> Please consider upgrading the dependency org.slf4j:slf4j-jdk14. 
> The current version is 1.7.25. The latest version is 1.8.0-beta2 
> cc: [~swegner], 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-11-12 22:54:49.387398 
> -
> Please consider upgrading the dependency org.slf4j:slf4j-jdk14. 
> The current version is 1.7.25. The latest version is 2.0.0-alpha1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-5587) Beam Dependency Update Request: org.slf4j:slf4j-api

2019-11-12 Thread Yifan Zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Zou closed BEAM-5587.
---
Resolution: Won't Fix

We're not updating this to an alpha version.

> Beam Dependency Update Request: org.slf4j:slf4j-api
> ---
>
> Key: BEAM-5587
> URL: https://issues.apache.org/jira/browse/BEAM-5587
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Priority: Major
> Fix For: Not applicable
>
>
>  - 2018-10-01 19:32:26.011070 
> -
> Please consider upgrading the dependency org.slf4j:slf4j-api. 
> The current version is 1.7.25. The latest version is 1.8.0-beta2 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-10-08 12:20:54.196725 
> -
> Please consider upgrading the dependency org.slf4j:slf4j-api. 
> The current version is 1.7.25. The latest version is 1.8.0-beta2 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-11-12 22:54:40.801959 
> -
> Please consider upgrading the dependency org.slf4j:slf4j-api. 
> The current version is 1.7.25. The latest version is 2.0.0-alpha1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8621) [Java] beam_Dependency_Check reads smaller number of dependencies than expected

2019-11-12 Thread Yifan Zou (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972670#comment-16972670
 ] 

Yifan Zou commented on BEAM-8621:
-

I ran the plugin on my local and got the same results as Jenkins log. Might 
because environmental settings lead to different behaviors of the plugin. 
[~suztomo] Would you provide more info of your local environment? Such as the 
gradle version, which directory you've run the command in, and anything you 
suspect that may cause the problem? Thanks. 

> [Java] beam_Dependency_Check reads smaller number of dependencies than 
> expected
> ---
>
> Key: BEAM-8621
> URL: https://issues.apache.org/jira/browse/BEAM-8621
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Yifan Zou
>Priority: Major
> Attachments: O0Mv0emfgA3.png
>
>
> beam_Dependency_Check reads smaller number of dependencies than expected. 
> Email thread: [Completeness of Beam Java Dependency Check Report 
> .|https://lists.apache.org/thread.html/834072b5829317f788d9d02f266bdc4aaa494639f73be2c1e1e72640@%3Cdev.beam.apache.org%3E]
> [https://builds.apache.org/job/beam_Dependency_Check/232/console 
> |https://builds.apache.org/job/beam_Dependency_Check/232/console]shows the 
> execution log when it runs  "gradlew runBeamDependencyCheck". The output of 
> gradle is different from my development environment.
>  
> !O0Mv0emfgA3.png! )



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8621) [Java] beam_Dependency_Check reads smaller number of dependencies than expected

2019-11-12 Thread Yifan Zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Zou reassigned BEAM-8621:
---

Assignee: Yifan Zou

> [Java] beam_Dependency_Check reads smaller number of dependencies than 
> expected
> ---
>
> Key: BEAM-8621
> URL: https://issues.apache.org/jira/browse/BEAM-8621
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Yifan Zou
>Priority: Major
> Attachments: O0Mv0emfgA3.png
>
>
> beam_Dependency_Check reads smaller number of dependencies than expected. 
> Email thread: [Completeness of Beam Java Dependency Check Report 
> .|https://lists.apache.org/thread.html/834072b5829317f788d9d02f266bdc4aaa494639f73be2c1e1e72640@%3Cdev.beam.apache.org%3E]
> [https://builds.apache.org/job/beam_Dependency_Check/232/console 
> |https://builds.apache.org/job/beam_Dependency_Check/232/console]shows the 
> execution log when it runs  "gradlew runBeamDependencyCheck". The output of 
> gradle is different from my development environment.
>  
> !O0Mv0emfgA3.png! )



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8409) docker-credential-gcloud not installed or not available in PATH

2019-10-16 Thread Yifan Zou (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16953003#comment-16953003
 ] 

Yifan Zou commented on BEAM-8409:
-

The docker-credential-gcloud was installed in 
https://issues.apache.org/jira/browse/BEAM-7381

I guess we are hitting the https://issues.apache.org/jira/browse/BEAM-7405 
again.

> docker-credential-gcloud not installed or not available in PATH
> ---
>
> Key: BEAM-8409
> URL: https://issues.apache.org/jira/browse/BEAM-8409
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Kamil Wasilewski
>Assignee: Yifan Zou
>Priority: Major
>  Labels: currently-failing
>
> _Use this form to file an issue for test failure:_
>  * 
> [beam_PreCommit_CommunityMetrics_Commit|https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PreCommit_CommunityMetrics_Commit/1355/]
>  * 
> [beam_PostCommit_Python2_PR|https://builds.apache.org/job/beam_PostCommit_Python2_PR]
> Initial investigation:
> Jenkins job fails when executing docker-compose script.
> It seems the only Jenkins worker affected is *apache-beam-jenkins-15.*
>  
> Relevant logs:
> 1)
>  
> {code:java}
> 11:56:24 Execution failed for task ':beam-test-infra-metrics:composeUp'.
> 11:56:24 > Exit-code 255 when calling docker-compose, stdout: postgresql uses 
> an image, skipping
> 11:56:24   prometheus uses an image, skipping
> 11:56:24   pushgateway uses an image, skipping
> 11:56:24   alertmanager uses an image, skipping
> 11:56:24   Building grafana
> 11:56:24   [17038] Failed to execute script docker-compose
> 11:56:24   Traceback (most recent call last):
> 11:56:24 File "bin/docker-compose", line 6, in 
> 11:56:24 File "compose/cli/main.py", line 71, in main
> 11:56:24 File "compose/cli/main.py", line 127, in perform_command
> 11:56:24 File "compose/cli/main.py", line 287, in build
> 11:56:24 File "compose/project.py", line 386, in build
> 11:56:24 File "compose/project.py", line 368, in build_service
> 11:56:24 File "compose/service.py", line 1084, in build
> 11:56:24 File "site-packages/docker/api/build.py", line 260, in build
> 11:56:24 File "site-packages/docker/api/build.py", line 307, in 
> _set_auth_headers
> 11:56:24 File "site-packages/docker/auth.py", line 310, in 
> get_all_credentials
> 11:56:24 File "site-packages/docker/auth.py", line 262, in 
> _resolve_authconfig_credstore
> 11:56:24 File "site-packages/docker/auth.py", line 287, in 
> _get_store_instance
> 11:56:24 File "site-packages/dockerpycreds/store.py", line 25, in __init__
> 11:56:24   dockerpycreds.errors.InitializationError: docker-credential-gcloud 
> not installed or not available in PATH
> {code}
> 2)
> {code:java}
> 16:26:08 [9316] Failed to execute script docker-compose
> 16:26:08 Traceback (most recent call last):
> 16:26:08   File "bin/docker-compose", line 6, in 
> 16:26:08   File "compose/cli/main.py", line 71, in main
> 16:26:08   File "compose/cli/main.py", line 127, in perform_command
> 16:26:08   File "compose/cli/main.py", line 287, in build
> 16:26:08   File "compose/project.py", line 386, in build
> 16:26:08   File "compose/project.py", line 368, in build_service
> 16:26:08   File "compose/service.py", line 1084, in build
> 16:26:08   File "site-packages/docker/api/build.py", line 260, in build
> 16:26:08   File "site-packages/docker/api/build.py", line 307, in 
> _set_auth_headers
> 16:26:08   File "site-packages/docker/auth.py", line 310, in 
> get_all_credentials
> 16:26:08   File "site-packages/docker/auth.py", line 262, in 
> _resolve_authconfig_credstore
> 16:26:08   File "site-packages/docker/auth.py", line 287, in 
> _get_store_instance
> 16:26:08   File "site-packages/dockerpycreds/store.py", line 25, in __init__
> 16:26:08 dockerpycreds.errors.InitializationError: docker-credential-gcloud 
> not installed or not available in PATH
> {code}
>  **
>  
> 
> _After you've filled out the above details, pl__ease [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-8116) Beam 2.15 Release Retrospective Actions

2019-10-07 Thread Yifan Zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Zou closed BEAM-8116.
---
Fix Version/s: 3.0.0
   Not applicable
   Resolution: Fixed

> Beam 2.15 Release Retrospective Actions
> ---
>
> Key: BEAM-8116
> URL: https://issues.apache.org/jira/browse/BEAM-8116
> Project: Beam
>  Issue Type: Task
>  Components: project-management, testing, website
>Reporter: Yifan Zou
>Assignee: Yifan Zou
>Priority: Major
> Fix For: Not applicable, 3.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We summarized some pain points during the 2.15 release. Log bugs under this 
> task to track the progress.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-8119) The python RC validation script need cleanup the pubsub resources after finish.

2019-10-07 Thread Yifan Zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Zou closed BEAM-8119.
---
Fix Version/s: 2.16.0
   Not applicable
   Resolution: Fixed

> The python RC validation script need cleanup the pubsub resources after 
> finish.
> ---
>
> Key: BEAM-8119
> URL: https://issues.apache.org/jira/browse/BEAM-8119
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Yifan Zou
>Priority: Major
> Fix For: Not applicable, 2.16.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> * Python staging artifacts were moved to ‘python/’ directory. This needs to 
> be fixed in the script, otherwise the script won’t download the sdks and hash 
> files correctly.
>  * If the test fails, it sometimes left some died pubsub topics. This might 
> cause failures in the reruns since the resources already exist: 
>  * ERROR: Failed to create topic 
> [projects/apache-beam-testing/topics/wordstream-python-topic-1]: Resource 
> already exists in the project (resource=wordstream-python-topic-1). 
>  * *Workaround*: I manually deleted the topics using cmd: gcloud alpha pubsub 
> topics delete wordstream-python-topic-1
>  * TODO: 
>  ** Change the download directory in the script.
>  ** Add cleanup if the job was interrupted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-6896) Beam Dependency Update Request: PyYAML

2019-09-30 Thread Yifan Zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Zou updated BEAM-6896:

Fix Version/s: (was: Not applicable)
   2.17.0

> Beam Dependency Update Request: PyYAML
> --
>
> Key: BEAM-6896
> URL: https://issues.apache.org/jira/browse/BEAM-6896
> Project: Beam
>  Issue Type: Bug
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Robert Bradshaw
>Priority: Major
> Fix For: 2.17.0
>
>
>  - 2019-03-25 04:17:47.501359 
> -
> Please consider upgrading the dependency PyYAML. 
> The current version is 3.13. The latest version is 5.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (BEAM-6896) Beam Dependency Update Request: PyYAML

2019-09-30 Thread Yifan Zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Zou reopened BEAM-6896:
-
  Assignee: Robert Bradshaw

[~eblanchi_cni] pointed out that we have {{'pyyaml>=3.12,<4.0.0'}} in the 
python sdk, which is impacting other libraries (such as googleads). Would we 
bring it up to at least pyyaml 5.1?

cc: [~eblanchi_cni]

> Beam Dependency Update Request: PyYAML
> --
>
> Key: BEAM-6896
> URL: https://issues.apache.org/jira/browse/BEAM-6896
> Project: Beam
>  Issue Type: Bug
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Robert Bradshaw
>Priority: Major
> Fix For: Not applicable
>
>
>  - 2019-03-25 04:17:47.501359 
> -
> Please consider upgrading the dependency PyYAML. 
> The current version is 3.13. The latest version is 5.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (BEAM-6896) Beam Dependency Update Request: PyYAML

2019-09-30 Thread Yifan Zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Zou updated BEAM-6896:

Comment: was deleted

(was: This is not used by the SDK, but only for the dependency check itself. )

> Beam Dependency Update Request: PyYAML
> --
>
> Key: BEAM-6896
> URL: https://issues.apache.org/jira/browse/BEAM-6896
> Project: Beam
>  Issue Type: Bug
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Priority: Major
> Fix For: Not applicable
>
>
>  - 2019-03-25 04:17:47.501359 
> -
> Please consider upgrading the dependency PyYAML. 
> The current version is 3.13. The latest version is 5.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8195) Quota exceeded for create requests

2019-09-10 Thread yifan zou (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927112#comment-16927112
 ] 

yifan zou commented on BEAM-8195:
-

Technically it's doable. [~alanmyrvold] Any thoughts?

> Quota exceeded for create requests
> --
>
> Key: BEAM-8195
> URL: https://issues.apache.org/jira/browse/BEAM-8195
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core, testing
>Reporter: Ahmet Altay
>Assignee: yifan zou
>Priority: Critical
>
> Post commits failied with the following error:
> HttpError accessing 
> :
>  response: <{'server': 'ESF', '-content-encoding': 'gzip', 'content-type': 
> 'application/json; charset=UTF-8', 'content-length': '598', 
> 'transfer-encoding': 'chunked', 'cache-control': 'private', 
> 'x-xss-protection': '0', 'date': 'Tue, 10 Sep 2019 12:02:24 GMT', 'vary': 
> 'Origin, X-Origin, Referer', 'x-frame-options': 'SAMEORIGIN', 'status': 
> '429', 'x-content-type-options': 'nosniff'}>, content <{
>   "error": {
> "code": 429,
> "message": "Quota exceeded for quota metric 
> 'dataflow.googleapis.com/create_requests' and limit 
> 'CreateRequestsPerMinutePerUser' of service 'dataflow.googleapis.com' for 
> consumer 'project_number:844138762903'.",
> "status": "RESOURCE_EXHAUSTED",
> "details": [
>   {
> Could we increase the quota?
> /cc [~alanmyrvold] [~kenn]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-8195) Quota exceeded for create requests

2019-09-10 Thread yifan zou (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926861#comment-16926861
 ] 

yifan zou commented on BEAM-8195:
-

We already set the quota of CreateRequestsPerMinutePerUser to its maximum 
number 60. Couldn't add more on it. 
[https://pantheon.corp.google.com/apis/api/dataflow.googleapis.com/quotas?project=apache-beam-testing=PT6H]

I guess all jobs were created by the user "jenkins". We need consider to apply 
different users when creating the job (Job creation requests per minute is 
unlimited), or reduce the concurrent tests running on workers.

> Quota exceeded for create requests
> --
>
> Key: BEAM-8195
> URL: https://issues.apache.org/jira/browse/BEAM-8195
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core, testing
>Reporter: Ahmet Altay
>Assignee: yifan zou
>Priority: Critical
>
> Post commits failied with the following error:
> HttpError accessing 
> :
>  response: <{'server': 'ESF', '-content-encoding': 'gzip', 'content-type': 
> 'application/json; charset=UTF-8', 'content-length': '598', 
> 'transfer-encoding': 'chunked', 'cache-control': 'private', 
> 'x-xss-protection': '0', 'date': 'Tue, 10 Sep 2019 12:02:24 GMT', 'vary': 
> 'Origin, X-Origin, Referer', 'x-frame-options': 'SAMEORIGIN', 'status': 
> '429', 'x-content-type-options': 'nosniff'}>, content <{
>   "error": {
> "code": 429,
> "message": "Quota exceeded for quota metric 
> 'dataflow.googleapis.com/create_requests' and limit 
> 'CreateRequestsPerMinutePerUser' of service 'dataflow.googleapis.com' for 
> consumer 'project_number:844138762903'.",
> "status": "RESOURCE_EXHAUSTED",
> "details": [
>   {
> Could we increase the quota?
> /cc [~alanmyrvold] [~kenn]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-7821) Wheels build on osx fails in Travis

2019-09-09 Thread yifan zou (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926030#comment-16926030
 ] 

yifan zou commented on BEAM-7821:
-

fixed in 
[https://github.com/apache/beam-wheels/commit/3c52d43d74851089e46a0ddbef58ef074274a695].
 This could be closed.

> Wheels build on osx fails in Travis 
> 
>
> Key: BEAM-7821
> URL: https://issues.apache.org/jira/browse/BEAM-7821
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, sdk-py-core
>Reporter: Anton Kedin
>Priority: Major
> Fix For: 2.16.0
>
>
> Attempt to build wheels on OSX in Travis for 2.14.0 RC1 failed due to 
> inability to check the certificate of dist.apache.org: 
> {code}
> --2019-07-25 18:28:31--  
> https://dist.apache.org/repos/dist/dev/beam/2.14.0/python/apache-beam-2.14.0.zip
> Resolving dist.apache.org... 209.188.14.144
> Connecting to dist.apache.org|209.188.14.144|:443... connected.
> ERROR: cannot verify dist.apache.org's certificate, issued by ‘CN=Sectigo 
> RSA Domain Validation Secure Server CA,O=Sectigo Limited,L=Salford,ST=Greater 
> Manchester,C=GB’:
>   Unable to locally verify the issuer's authority.
> To connect to dist.apache.org insecurely, use `--no-check-certificate'.
> {code}
> Full Travis Log: 
> https://gist.github.com/akedin/a5b50dbd0ecacff538186cbb9d7f6bca



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (BEAM-8116) Beam 2.15 Release Retrospective Improvements

2019-09-03 Thread yifan zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-8116:

Summary: Beam 2.15 Release Retrospective Improvements  (was: Beam Release 
Retrospective Improvements)

> Beam 2.15 Release Retrospective Improvements
> 
>
> Key: BEAM-8116
> URL: https://issues.apache.org/jira/browse/BEAM-8116
> Project: Beam
>  Issue Type: Task
>  Components: project-management, testing, website
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We summarized some pain points during the 2.15 release. Log bugs under this 
> task to track the progress.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (BEAM-8116) Beam 2.15 Release Retrospective Actions

2019-09-03 Thread yifan zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-8116:

Summary: Beam 2.15 Release Retrospective Actions  (was: Beam 2.15 Release 
Retrospective Improvements)

> Beam 2.15 Release Retrospective Actions
> ---
>
> Key: BEAM-8116
> URL: https://issues.apache.org/jira/browse/BEAM-8116
> Project: Beam
>  Issue Type: Task
>  Components: project-management, testing, website
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We summarized some pain points during the 2.15 release. Log bugs under this 
> task to track the progress.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (BEAM-8117) Improve the preparation_before_release script

2019-09-03 Thread yifan zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou resolved BEAM-8117.
-
Fix Version/s: Not applicable
   Resolution: Fixed

> Improve the preparation_before_release script
> -
>
> Key: BEAM-8117
> URL: https://issues.apache.org/jira/browse/BEAM-8117
> Project: Beam
>  Issue Type: Sub-task
>  Components: project-management
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> * Setup GPG keys: 
>  * The preparation_before_release.sh interrupted. Git command failed when 
> configuring git signing key.
>  * It required a PMC to add the key in dev@ list, the script doesn’t really 
> help.
>  * Apache requires the key has at least 4096 bits, but script generates the 
> 3072b key by default. There were a few options to select the size of the key, 
> but there was no instruction indicates which option the release manager 
> should choose. 
>  * *Solution*: I follow the Apache official [release signing 
> guide|https://www.apache.org/dev/release-signing.html] to generate the RSA 
> keys then asked a PMC member adding it to the dev and release key list.
>  * Reference: [GPG Cheat Sheet|http://irtfweb.ifa.hawaii.edu/~lockhart/gpg/]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (BEAM-8097) Update the release guide

2019-09-03 Thread yifan zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou resolved BEAM-8097.
-
Fix Version/s: 2.16.0
   Resolution: Fixed

> Update the release guide
> 
>
> Key: BEAM-8097
> URL: https://issues.apache.org/jira/browse/BEAM-8097
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The release guide was modified based on the 2.14 release experience 
> ([https://github.com/apache/beam/pull/9319]). But, it is reverted since we 
> don't want separate the guide in multiple sections 
> ([https://github.com/apache/beam/pull/9436]). Please review the reverted 
> guide and update the current guide with the up-to-date information.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (BEAM-8117) Improve the preparation_before_release script

2019-09-03 Thread yifan zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou reassigned BEAM-8117:
---

Assignee: yifan zou

> Improve the preparation_before_release script
> -
>
> Key: BEAM-8117
> URL: https://issues.apache.org/jira/browse/BEAM-8117
> Project: Beam
>  Issue Type: Sub-task
>  Components: project-management
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * Setup GPG keys: 
>  * The preparation_before_release.sh interrupted. Git command failed when 
> configuring git signing key.
>  * It required a PMC to add the key in dev@ list, the script doesn’t really 
> help.
>  * Apache requires the key has at least 4096 bits, but script generates the 
> 3072b key by default. There were a few options to select the size of the key, 
> but there was no instruction indicates which option the release manager 
> should choose. 
>  * *Solution*: I follow the Apache official [release signing 
> guide|https://www.apache.org/dev/release-signing.html] to generate the RSA 
> keys then asked a PMC member adding it to the dev and release key list.
>  * Reference: [GPG Cheat Sheet|http://irtfweb.ifa.hawaii.edu/~lockhart/gpg/]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (BEAM-8118) Update build_release_candidate script with the beam-site changes.

2019-09-01 Thread yifan zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou resolved BEAM-8118.
-
Fix Version/s: Not applicable
   Resolution: Fixed

> Update build_release_candidate script with the beam-site changes.
> -
>
> Key: BEAM-8118
> URL: https://issues.apache.org/jira/browse/BEAM-8118
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
> Fix For: Not applicable
>
>
> We moved the beam-site to beam/website, the instructions (at the end) in the 
> release scripts did not update.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (BEAM-8117) Improve the preparation_before_release script

2019-09-01 Thread yifan zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-8117:

Status: Open  (was: Triage Needed)

> Improve the preparation_before_release script
> -
>
> Key: BEAM-8117
> URL: https://issues.apache.org/jira/browse/BEAM-8117
> Project: Beam
>  Issue Type: Sub-task
>  Components: project-management
>Reporter: yifan zou
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * Setup GPG keys: 
>  * The preparation_before_release.sh interrupted. Git command failed when 
> configuring git signing key.
>  * It required a PMC to add the key in dev@ list, the script doesn’t really 
> help.
>  * Apache requires the key has at least 4096 bits, but script generates the 
> 3072b key by default. There were a few options to select the size of the key, 
> but there was no instruction indicates which option the release manager 
> should choose. 
>  * *Solution*: I follow the Apache official [release signing 
> guide|https://www.apache.org/dev/release-signing.html] to generate the RSA 
> keys then asked a PMC member adding it to the dev and release key list.
>  * Reference: [GPG Cheat Sheet|http://irtfweb.ifa.hawaii.edu/~lockhart/gpg/]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (BEAM-8116) Beam Release Retrospective Improvements

2019-08-30 Thread yifan zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-8116:

Summary: Beam Release Retrospective Improvements  (was: Beam 2.15.0 Release 
Retrospective Improvements)

> Beam Release Retrospective Improvements
> ---
>
> Key: BEAM-8116
> URL: https://issues.apache.org/jira/browse/BEAM-8116
> Project: Beam
>  Issue Type: Task
>  Components: project-management, testing, website
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We summarized some pain points during the 2.15 release. Log bugs under this 
> task to track the progress.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work started] (BEAM-8118) Update build_release_candidate script with the beam-site changes.

2019-08-30 Thread yifan zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-8118 started by yifan zou.
---
> Update build_release_candidate script with the beam-site changes.
> -
>
> Key: BEAM-8118
> URL: https://issues.apache.org/jira/browse/BEAM-8118
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>
> We moved the beam-site to beam/website, the instructions (at the end) in the 
> release scripts did not update.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (BEAM-8118) Update build_release_candidate script with the beam-site changes.

2019-08-30 Thread yifan zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-8118:

Status: Open  (was: Triage Needed)

> Update build_release_candidate script with the beam-site changes.
> -
>
> Key: BEAM-8118
> URL: https://issues.apache.org/jira/browse/BEAM-8118
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>
> We moved the beam-site to beam/website, the instructions (at the end) in the 
> release scripts did not update.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Closed] (BEAM-8120) Update the finalization in the release guide

2019-08-30 Thread yifan zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou closed BEAM-8120.
---
Fix Version/s: Not applicable
   Resolution: Won't Fix

> Update the finalization in the release guide
> 
>
> Key: BEAM-8120
> URL: https://issues.apache.org/jira/browse/BEAM-8120
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
> Fix For: Not applicable
>
>
> * command failure: Deploy Python artifacts to pypi (third step), twine upload 
> . -> twine upload * 
>  * typo: upldaed -> updated (in the section Deploy source release to 
> dist.apache.org)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-8120) Update the finalization in the release guide

2019-08-30 Thread yifan zou (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919751#comment-16919751
 ] 

yifan zou commented on BEAM-8120:
-

The revised the release guide was reverted. This won't be fixed. See 
https://issues.apache.org/jira/browse/BEAM-8097 for release guide reverse.

> Update the finalization in the release guide
> 
>
> Key: BEAM-8120
> URL: https://issues.apache.org/jira/browse/BEAM-8120
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>
> * command failure: Deploy Python artifacts to pypi (third step), twine upload 
> . -> twine upload * 
>  * typo: upldaed -> updated (in the section Deploy source release to 
> dist.apache.org)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (BEAM-8120) Update the finalization in the release guide

2019-08-30 Thread yifan zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou reassigned BEAM-8120:
---

Assignee: yifan zou

> Update the finalization in the release guide
> 
>
> Key: BEAM-8120
> URL: https://issues.apache.org/jira/browse/BEAM-8120
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>
> * command failure: Deploy Python artifacts to pypi (third step), twine upload 
> . -> twine upload * 
>  * typo: upldaed -> updated (in the section Deploy source release to 
> dist.apache.org)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (BEAM-8120) Update the finalization in the release guide

2019-08-30 Thread yifan zou (Jira)
yifan zou created BEAM-8120:
---

 Summary: Update the finalization in the release guide
 Key: BEAM-8120
 URL: https://issues.apache.org/jira/browse/BEAM-8120
 Project: Beam
  Issue Type: Sub-task
  Components: website
Reporter: yifan zou


* command failure: Deploy Python artifacts to pypi (third step), twine upload . 
-> twine upload * 
 * typo: upldaed -> updated (in the section Deploy source release to 
dist.apache.org)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (BEAM-8119) The python RC validation script need cleanup the pubsub resources after finish.

2019-08-30 Thread yifan zou (Jira)
yifan zou created BEAM-8119:
---

 Summary: The python RC validation script need cleanup the pubsub 
resources after finish.
 Key: BEAM-8119
 URL: https://issues.apache.org/jira/browse/BEAM-8119
 Project: Beam
  Issue Type: Sub-task
  Components: testing
Reporter: yifan zou


* Python staging artifacts were moved to ‘python/’ directory. This needs to be 
fixed in the script, otherwise the script won’t download the sdks and hash 
files correctly.
 * If the test fails, it sometimes left some died pubsub topics. This might 
cause failures in the reruns since the resources already exist: 
 * ERROR: Failed to create topic 
[projects/apache-beam-testing/topics/wordstream-python-topic-1]: Resource 
already exists in the project (resource=wordstream-python-topic-1). 
 * *Workaround*: I manually deleted the topics using cmd: gcloud alpha pubsub 
topics delete wordstream-python-topic-1


 * TODO: 
 ** Change the download directory in the script.
 ** Add cleanup if the job was interrupted.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (BEAM-8097) Update the release guide

2019-08-30 Thread yifan zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-8097:

Parent: BEAM-8116
Issue Type: Sub-task  (was: Improvement)

> Update the release guide
> 
>
> Key: BEAM-8097
> URL: https://issues.apache.org/jira/browse/BEAM-8097
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>
> The release guide was modified based on the 2.14 release experience 
> ([https://github.com/apache/beam/pull/9319]). But, it is reverted since we 
> don't want separate the guide in multiple sections 
> ([https://github.com/apache/beam/pull/9436]). Please review the reverted 
> guide and update the current guide with the up-to-date information.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (BEAM-8116) Beam 2.15.0 Release Retrospective Improvements

2019-08-30 Thread yifan zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou reassigned BEAM-8116:
---

Assignee: yifan zou

> Beam 2.15.0 Release Retrospective Improvements
> --
>
> Key: BEAM-8116
> URL: https://issues.apache.org/jira/browse/BEAM-8116
> Project: Beam
>  Issue Type: Task
>  Components: project-management, testing, website
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>
> We summarized some pain points during the 2.15 release. Log bugs under this 
> task to track the progress.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (BEAM-8116) Beam 2.15.0 Release Retrospective Improvements

2019-08-30 Thread yifan zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou reassigned BEAM-8116:
---

Assignee: (was: yifan zou)

> Beam 2.15.0 Release Retrospective Improvements
> --
>
> Key: BEAM-8116
> URL: https://issues.apache.org/jira/browse/BEAM-8116
> Project: Beam
>  Issue Type: Task
>  Components: project-management, testing, website
>Reporter: yifan zou
>Priority: Major
>
> We summarized some pain points during the 2.15 release. Log bugs under this 
> task to track the progress.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (BEAM-8118) Update build_release_candidate script with the beam-site changes.

2019-08-30 Thread yifan zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou reassigned BEAM-8118:
---

Assignee: yifan zou

> Update build_release_candidate script with the beam-site changes.
> -
>
> Key: BEAM-8118
> URL: https://issues.apache.org/jira/browse/BEAM-8118
> Project: Beam
>  Issue Type: Sub-task
>  Components: website
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>
> We moved the beam-site to beam/website, the instructions (at the end) in the 
> release scripts did not update.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (BEAM-8118) Update build_release_candidate script with the beam-site changes.

2019-08-30 Thread yifan zou (Jira)
yifan zou created BEAM-8118:
---

 Summary: Update build_release_candidate script with the beam-site 
changes.
 Key: BEAM-8118
 URL: https://issues.apache.org/jira/browse/BEAM-8118
 Project: Beam
  Issue Type: Sub-task
  Components: website
Reporter: yifan zou


We moved the beam-site to beam/website, the instructions (at the end) in the 
release scripts did not update.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (BEAM-8117) Improve the preparation_before_release script

2019-08-30 Thread yifan zou (Jira)
yifan zou created BEAM-8117:
---

 Summary: Improve the preparation_before_release script
 Key: BEAM-8117
 URL: https://issues.apache.org/jira/browse/BEAM-8117
 Project: Beam
  Issue Type: Sub-task
  Components: project-management
Reporter: yifan zou


* Setup GPG keys: 
 * The preparation_before_release.sh interrupted. Git command failed when 
configuring git signing key.
 * It required a PMC to add the key in dev@ list, the script doesn’t really 
help.
 * Apache requires the key has at least 4096 bits, but script generates the 
3072b key by default. There were a few options to select the size of the key, 
but there was no instruction indicates which option the release manager should 
choose. 
 * *Solution*: I follow the Apache official [release signing 
guide|https://www.apache.org/dev/release-signing.html] to generate the RSA keys 
then asked a PMC member adding it to the dev and release key list.
 * Reference: [GPG Cheat Sheet|http://irtfweb.ifa.hawaii.edu/~lockhart/gpg/]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (BEAM-8116) Beam 2.15.0 Release Retrospective Improvements

2019-08-30 Thread yifan zou (Jira)
yifan zou created BEAM-8116:
---

 Summary: Beam 2.15.0 Release Retrospective Improvements
 Key: BEAM-8116
 URL: https://issues.apache.org/jira/browse/BEAM-8116
 Project: Beam
  Issue Type: Task
  Components: project-management, testing, website
Reporter: yifan zou
Assignee: yifan zou


We summarized some pain points during the 2.15 release. Log bugs under this 
task to track the progress.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (BEAM-8097) Update the release guide

2019-08-26 Thread yifan zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-8097:

Description: The release guide was modified based on the 2.14 release 
experience ([https://github.com/apache/beam/pull/9319]). But, it is reverted 
since we don't want separate the guide in multiple sections 
([https://github.com/apache/beam/pull/9436]). Please review the reverted guide 
and update the current guide with the up-to-date information.  (was: The 
release guide was modified based on the 2.14 release experience. But, it is 
reverted since we don't want separate the guide in multiple sections. Please 
review the reverted guide and update the current guide with the up-to-date 
information.)

> Update the release guide
> 
>
> Key: BEAM-8097
> URL: https://issues.apache.org/jira/browse/BEAM-8097
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>
> The release guide was modified based on the 2.14 release experience 
> ([https://github.com/apache/beam/pull/9319]). But, it is reverted since we 
> don't want separate the guide in multiple sections 
> ([https://github.com/apache/beam/pull/9436]). Please review the reverted 
> guide and update the current guide with the up-to-date information.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (BEAM-8097) Update the release guide

2019-08-26 Thread yifan zou (Jira)
yifan zou created BEAM-8097:
---

 Summary: Update the release guide
 Key: BEAM-8097
 URL: https://issues.apache.org/jira/browse/BEAM-8097
 Project: Beam
  Issue Type: Improvement
  Components: website
Reporter: yifan zou
Assignee: yifan zou


The release guide was modified based on the 2.14 release experience. But, it is 
reverted since we don't want separate the guide in multiple sections. Please 
review the reverted guide and update the current guide with the up-to-date 
information.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (BEAM-7866) Python MongoDB IO performance and correctness issues

2019-08-14 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou resolved BEAM-7866.
-
Resolution: Fixed

> Python MongoDB IO performance and correctness issues
> 
>
> Key: BEAM-7866
> URL: https://issues.apache.org/jira/browse/BEAM-7866
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Yichi Zhang
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/mongodbio.py
>  splits the query result by computing number of results in constructor, and 
> then in each reader re-executing the whole query and getting an index 
> sub-range of those results.
> This is broken in several critical ways:
> - The order of query results returned by find() is not necessarily 
> deterministic, so the idea of index ranges on it is meaningless: each shard 
> may basically get random, possibly overlapping subsets of the total results
> - Even if you add order by `_id`, the database may be changing concurrently 
> to reading and splitting. E.g. if the database contained documents with ids 
> 10 20 30 40 50, and this was split into shards 0..2 and 3..5 (under the 
> assumption that these shards would contain respectively 10 20 30, and 40 50), 
> and then suppose shard 10 20 30 is read and then document 25 is inserted - 
> then the 3..5 shard will read 30 40 50, i.e. document 30 is duplicated and 
> document 25 is lost.
> - Every shard re-executes the query and skips the first start_offset items, 
> which in total is quadratic complexity
> - The query is first executed in the constructor in order to count results, 
> which 1) means the constructor can be super slow and 2) it won't work at all 
> if the database is unavailable at the time the pipeline is constructed (e.g. 
> if this is a template).
> Unfortunately, none of these issues are caught by SourceTestUtils: this class 
> has extensive coverage with it, and the tests pass. This is because the tests 
> return the same results in the same order. I don't know how to catch this 
> automatically, and I don't know how to catch the performance issue 
> automatically, but these would all be important follow-up items after the 
> actual fix.
> CC: [~chamikara] as reviewer.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7866) Python MongoDB IO performance and correctness issues

2019-08-14 Thread yifan zou (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907646#comment-16907646
 ] 

yifan zou commented on BEAM-7866:
-

The PR got merged. I mark this ticket as resolved.

> Python MongoDB IO performance and correctness issues
> 
>
> Key: BEAM-7866
> URL: https://issues.apache.org/jira/browse/BEAM-7866
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Yichi Zhang
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/mongodbio.py
>  splits the query result by computing number of results in constructor, and 
> then in each reader re-executing the whole query and getting an index 
> sub-range of those results.
> This is broken in several critical ways:
> - The order of query results returned by find() is not necessarily 
> deterministic, so the idea of index ranges on it is meaningless: each shard 
> may basically get random, possibly overlapping subsets of the total results
> - Even if you add order by `_id`, the database may be changing concurrently 
> to reading and splitting. E.g. if the database contained documents with ids 
> 10 20 30 40 50, and this was split into shards 0..2 and 3..5 (under the 
> assumption that these shards would contain respectively 10 20 30, and 40 50), 
> and then suppose shard 10 20 30 is read and then document 25 is inserted - 
> then the 3..5 shard will read 30 40 50, i.e. document 30 is duplicated and 
> document 25 is lost.
> - Every shard re-executes the query and skips the first start_offset items, 
> which in total is quadratic complexity
> - The query is first executed in the constructor in order to count results, 
> which 1) means the constructor can be super slow and 2) it won't work at all 
> if the database is unavailable at the time the pipeline is constructed (e.g. 
> if this is a template).
> Unfortunately, none of these issues are caught by SourceTestUtils: this class 
> has extensive coverage with it, and the tests pass. This is because the tests 
> return the same results in the same order. I don't know how to catch this 
> automatically, and I don't know how to catch the performance issue 
> automatically, but these would all be important follow-up items after the 
> actual fix.
> CC: [~chamikara] as reviewer.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7866) Python MongoDB IO performance and correctness issues

2019-08-14 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-7866:

Fix Version/s: (was: 2.16.0)
   2.15.0

> Python MongoDB IO performance and correctness issues
> 
>
> Key: BEAM-7866
> URL: https://issues.apache.org/jira/browse/BEAM-7866
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Yichi Zhang
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/mongodbio.py
>  splits the query result by computing number of results in constructor, and 
> then in each reader re-executing the whole query and getting an index 
> sub-range of those results.
> This is broken in several critical ways:
> - The order of query results returned by find() is not necessarily 
> deterministic, so the idea of index ranges on it is meaningless: each shard 
> may basically get random, possibly overlapping subsets of the total results
> - Even if you add order by `_id`, the database may be changing concurrently 
> to reading and splitting. E.g. if the database contained documents with ids 
> 10 20 30 40 50, and this was split into shards 0..2 and 3..5 (under the 
> assumption that these shards would contain respectively 10 20 30, and 40 50), 
> and then suppose shard 10 20 30 is read and then document 25 is inserted - 
> then the 3..5 shard will read 30 40 50, i.e. document 30 is duplicated and 
> document 25 is lost.
> - Every shard re-executes the query and skips the first start_offset items, 
> which in total is quadratic complexity
> - The query is first executed in the constructor in order to count results, 
> which 1) means the constructor can be super slow and 2) it won't work at all 
> if the database is unavailable at the time the pipeline is constructed (e.g. 
> if this is a template).
> Unfortunately, none of these issues are caught by SourceTestUtils: this class 
> has extensive coverage with it, and the tests pass. This is because the tests 
> return the same results in the same order. I don't know how to catch this 
> automatically, and I don't know how to catch the performance issue 
> automatically, but these would all be important follow-up items after the 
> actual fix.
> CC: [~chamikara] as reviewer.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7821) Wheels build on osx fails in Travis

2019-08-13 Thread yifan zou (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906905#comment-16906905
 ] 

yifan zou commented on BEAM-7821:
-

Got the same error in 2.15 wheel.  Changing the xcode version to 9.4 solved 
this problem.

> Wheels build on osx fails in Travis 
> 
>
> Key: BEAM-7821
> URL: https://issues.apache.org/jira/browse/BEAM-7821
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, sdk-py-core
>Reporter: Anton Kedin
>Priority: Major
> Fix For: 2.16.0
>
>
> Attempt to build wheels on OSX in Travis for 2.14.0 RC1 failed due to 
> inability to check the certificate of dist.apache.org: 
> {code}
> --2019-07-25 18:28:31--  
> https://dist.apache.org/repos/dist/dev/beam/2.14.0/python/apache-beam-2.14.0.zip
> Resolving dist.apache.org... 209.188.14.144
> Connecting to dist.apache.org|209.188.14.144|:443... connected.
> ERROR: cannot verify dist.apache.org's certificate, issued by ‘CN=Sectigo 
> RSA Domain Validation Secure Server CA,O=Sectigo Limited,L=Salford,ST=Greater 
> Manchester,C=GB’:
>   Unable to locally verify the issuer's authority.
> To connect to dist.apache.org insecurely, use `--no-check-certificate'.
> {code}
> Full Travis Log: 
> https://gist.github.com/akedin/a5b50dbd0ecacff538186cbb9d7f6bca



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (BEAM-7860) v1new ReadFromDatastore returns duplicates if keys are of mixed types

2019-08-13 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou resolved BEAM-7860.
-
Resolution: Fixed

> v1new ReadFromDatastore returns duplicates if keys are of mixed types
> -
>
> Key: BEAM-7860
> URL: https://issues.apache.org/jira/browse/BEAM-7860
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Affects Versions: 2.13.0
> Environment: Python 2.7
> Python 3.7
>Reporter: Niels Stender
>Assignee: Udi Meiri
>Priority: Blocker
> Fix For: 2.15.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In the presence of mixed type keys, v1new ReadFromDatastore may return 
> duplicate items. The attached example returns 4 records, not the expected 3.
>  
> {code:java}
> // code placeholder
> from __future__ import unicode_literals
> import apache_beam as beam
> from apache_beam.io.gcp.datastore.v1new.types import Key, Entity, Query
> from apache_beam.io.gcp.datastore.v1new import datastoreio
> config = dict(project='your-google-project', namespace='test')
> def test_mixed():
> keys = [
> Key(['mixed', '10038260-iperm_eservice'], **config),
> Key(['mixed', 4812224868188160], **config),
> Key(['mixed', '99152975-pointshop'], **config)
> ]
> entities = map(lambda key: Entity(key=key), keys)
> with beam.Pipeline() as p:
> (p
> | beam.Create(entities)
> | datastoreio.WriteToDatastore(project=config['project'])
> )
> query = Query(kind='mixed', **config)
> with beam.Pipeline() as p:
> (p
> | datastoreio.ReadFromDatastore(query=query, num_splits=4)
> | beam.io.WriteToText('tmp.txt', num_shards=1, 
> shard_name_template='')
> )
> items = open('tmp.txt').read().strip().split('\n')
> assert len(items) == 3, 'incorrect number of items'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (BEAM-7608) v1new ReadFromDatastore skips entities

2019-08-13 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou resolved BEAM-7608.
-
Resolution: Fixed

> v1new ReadFromDatastore skips entities
> --
>
> Key: BEAM-7608
> URL: https://issues.apache.org/jira/browse/BEAM-7608
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Affects Versions: 2.13.0
> Environment: MacOS 10.14.5, Python 2.7
>Reporter: Jacob Gur
>Assignee: Udi Meiri
>Priority: Blocker
> Fix For: 2.15.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> A simple map over a datastore kind in local emulator using the new 
> v1new.datastoreio.ReadFromDatastore skip entities.
> The kind has 1516 entities, and when I map over it using the old 
> ReadFromDatastore transform, it maps all of them, i.e., I can map to id and 
> write to text file.
> But the new transform only maps 365 entities. There is no error. The tail of 
> the standard output is:
> {code:java}
> INFO:root:Latest stats timestamp for kind face_apilog is 2019-06-18 
> 08:15:21+00:00
>  INFO:root:Estimated size bytes for query: 116188
>  INFO:root:Splitting the query into 12 splits
>  INFO:root:Running 
> (((GetEntities/Reshuffle/ReshufflePerKey/GroupByKey/Read)(ref_AppliedPTransform_GetEntities/Reshuffle/ReshufflePerKey/FlatMap(restore_timestamps)_14))((ref_AppliedPTransform_GetEntities/Reshuffle/RemoveRandomKeys_15)(ref_AppliedPTransform_GetEntities/Read_16)))((ref_AppliedPTransform_MapToId_17)((ref_AppliedPTransform_WriteToFile/Write/WriteImpl/WriteBundles_24)((ref_AppliedPTransform_WriteToFile/Write/WriteImpl/Pair_25)((ref_AppliedPTransform_WriteToFile/Write/WriteImpl/WindowInto(WindowIntoFn)_26)(WriteToFile/Write/WriteImpl/GroupByKey/Write)
>  INFO:root:Running 
> (WriteToFile/Write/WriteImpl/GroupByKey/Read)((ref_AppliedPTransform_WriteToFile/Write/WriteImpl/Extract_31)(ref_PCollection_PCollection_20/Write))
>  INFO:root:Running 
> (ref_PCollection_PCollection_12/Read)((ref_AppliedPTransform_WriteToFile/Write/WriteImpl/PreFinalize_32)(ref_PCollection_PCollection_21/Write))
>  INFO:root:Running 
> (ref_PCollection_PCollection_12/Read)+(ref_AppliedPTransform_WriteToFile/Write/WriteImpl/FinalizeWrite_33)
>  INFO:root:Starting finalize_write threads with num_shards: 1 (skipped: 0), 
> batches: 1, num_threads: 1
>  INFO:root:Renamed 1 shards in 0.12 seconds.{code}
>  
> The code for the job on the new transform is:
>  
>  
> {code:java}
> from __future__ import absolute_import
> import logging
> import os
> import sys
> import apache_beam as beam
> from apache_beam.io.gcp.datastore.v1new.datastoreio import ReadFromDatastore
> from apache_beam.io.gcp.datastore.v1new.types import Query
> # TODO: should be set outside of python process
> os.environ['DATASTORE_EMULATOR_HOST'] = 'localhost:8085'
> def map_to_id(element):
>  face_log_id = element.to_client_entity().id
>  return face_log_id
> def run(argv=None):
>  p = beam.Pipeline(argv=argv)
>  project = 'dev'
>  (p
>  | 'GetEntities' >> ReadFromDatastore(Query(kind='face_apilog', 
> project=project))
>  | 'MapToId' >> beam.Map(map_to_id)
>  | 'WriteToFile' >> beam.io.WriteToText('result')
>  )
>  p.run().wait_until_finish()
> if __name__ == '__main__':
>  logging.getLogger().setLevel(logging.INFO)
>  run(sys.argv){code}
>  
> For comparison, the code for the job on the old transform is:
>  
> {code:java}
> from __future__ import absolute_import
> import logging
> import os
> import sys
> import apache_beam as beam
> from apache_beam.io.gcp.datastore.v1.datastoreio import ReadFromDatastore
> from google.cloud.proto.datastore.v1 import query_pb2
> # TODO: should be set outside of python process
> os.environ['DATASTORE_EMULATOR_HOST'] = 'localhost:8085'
> def map_to_id(element):
>  face_log_id = element.key.path[-1].id
>  return face_log_id
> def run(argv=None):
>  p = beam.Pipeline(argv=argv)
>  project = 'dev'
>  query = query_pb2.Query()
>  query.kind.add().name = 'face_apilog'
>  (p
>  | 'GetEntities' >> ReadFromDatastore(project=project, query=query)
>  # TODO: ParDo???
>  | 'MapToId' >> beam.Map(map_to_id)
>  | 'WriteToFile' >> beam.io.WriteToText('result')
>  )
>  p.run().wait_until_finish()
> if __name__ == '__main__':
>  logging.getLogger().setLevel(logging.INFO)
>  run(sys.argv){code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (BEAM-7874) FnApi only supports up to 10 workers

2019-08-12 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou resolved BEAM-7874.
-
Resolution: Fixed

> FnApi only supports up to 10 workers
> 
>
> Key: BEAM-7874
> URL: https://issues.apache.org/jira/browse/BEAM-7874
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Blocker
> Fix For: 2.15.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Because max_workers of grpc servers are hardcoded to 10, it only supports up 
> to 10 workers, and if we pass more direct_num_workers greater than 10, 
> pipeline hangs, because not all workers get connected to the runner.
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/fn_api_runner.py#L1141]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (BEAM-7873) FnApi with Subprocess runner hangs frequently when running with multi workers with py2

2019-08-12 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou resolved BEAM-7873.
-
Resolution: Fixed

> FnApi with Subprocess runner hangs frequently when running with multi workers 
> with py2
> --
>
> Key: BEAM-7873
> URL: https://issues.apache.org/jira/browse/BEAM-7873
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Blocker
> Fix For: 2.15.0
>
>
> Pipeline hangs at 
> [subprocess.Popen()|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/local_job_service.py#L203]
>  when shut it down. I looked into source code of subprocess lib. 
> [py27|https://github.com/enthought/Python-2.7.3/blob/master/Lib/subprocess.py#L1286]
>  doesn't do any lock while 
> [py3|https://github.com/python/cpython/blob/3.7/Lib/subprocess.py#L1592] 
> locks when waiting. Py3 added locks at other places of Popen() as well, all 
> unlocked places with py2 may contribute to the problem. We can add a lock 
> when calling Popen() to prevent the deadlock. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7866) Python MongoDB IO performance and correctness issues

2019-08-12 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-7866:

Fix Version/s: (was: 2.15.0)
   2.16.0

> Python MongoDB IO performance and correctness issues
> 
>
> Key: BEAM-7866
> URL: https://issues.apache.org/jira/browse/BEAM-7866
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Yichi Zhang
>Priority: Blocker
> Fix For: 2.16.0
>
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/mongodbio.py
>  splits the query result by computing number of results in constructor, and 
> then in each reader re-executing the whole query and getting an index 
> sub-range of those results.
> This is broken in several critical ways:
> - The order of query results returned by find() is not necessarily 
> deterministic, so the idea of index ranges on it is meaningless: each shard 
> may basically get random, possibly overlapping subsets of the total results
> - Even if you add order by `_id`, the database may be changing concurrently 
> to reading and splitting. E.g. if the database contained documents with ids 
> 10 20 30 40 50, and this was split into shards 0..2 and 3..5 (under the 
> assumption that these shards would contain respectively 10 20 30, and 40 50), 
> and then suppose shard 10 20 30 is read and then document 25 is inserted - 
> then the 3..5 shard will read 30 40 50, i.e. document 30 is duplicated and 
> document 25 is lost.
> - Every shard re-executes the query and skips the first start_offset items, 
> which in total is quadratic complexity
> - The query is first executed in the constructor in order to count results, 
> which 1) means the constructor can be super slow and 2) it won't work at all 
> if the database is unavailable at the time the pipeline is constructed (e.g. 
> if this is a template).
> Unfortunately, none of these issues are caught by SourceTestUtils: this class 
> has extensive coverage with it, and the tests pass. This is because the tests 
> return the same results in the same order. I don't know how to catch this 
> automatically, and I don't know how to catch the performance issue 
> automatically, but these would all be important follow-up items after the 
> actual fix.
> CC: [~chamikara] as reviewer.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7866) Python MongoDB IO performance and correctness issues

2019-08-12 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-7866:

Priority: Major  (was: Blocker)

> Python MongoDB IO performance and correctness issues
> 
>
> Key: BEAM-7866
> URL: https://issues.apache.org/jira/browse/BEAM-7866
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Yichi Zhang
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/mongodbio.py
>  splits the query result by computing number of results in constructor, and 
> then in each reader re-executing the whole query and getting an index 
> sub-range of those results.
> This is broken in several critical ways:
> - The order of query results returned by find() is not necessarily 
> deterministic, so the idea of index ranges on it is meaningless: each shard 
> may basically get random, possibly overlapping subsets of the total results
> - Even if you add order by `_id`, the database may be changing concurrently 
> to reading and splitting. E.g. if the database contained documents with ids 
> 10 20 30 40 50, and this was split into shards 0..2 and 3..5 (under the 
> assumption that these shards would contain respectively 10 20 30, and 40 50), 
> and then suppose shard 10 20 30 is read and then document 25 is inserted - 
> then the 3..5 shard will read 30 40 50, i.e. document 30 is duplicated and 
> document 25 is lost.
> - Every shard re-executes the query and skips the first start_offset items, 
> which in total is quadratic complexity
> - The query is first executed in the constructor in order to count results, 
> which 1) means the constructor can be super slow and 2) it won't work at all 
> if the database is unavailable at the time the pipeline is constructed (e.g. 
> if this is a template).
> Unfortunately, none of these issues are caught by SourceTestUtils: this class 
> has extensive coverage with it, and the tests pass. This is because the tests 
> return the same results in the same order. I don't know how to catch this 
> automatically, and I don't know how to catch the performance issue 
> automatically, but these would all be important follow-up items after the 
> actual fix.
> CC: [~chamikara] as reviewer.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7952) Make the input queue of the input buffer in Python SDK Harness size limited.

2019-08-12 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-7952:

Fix Version/s: (was: 2.15.0)
   2.16.0

> Make the input queue of the input buffer in Python SDK Harness size limited.
> 
>
> Key: BEAM-7952
> URL: https://issues.apache.org/jira/browse/BEAM-7952
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>
> At Python SDK harness, the input queue size of the input buffer in Python SDK 
> Harness is not size limited and also not configurable. This may become a 
> problem if the data production rate is more than the data consumption rate.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7950) Remove the Python 3 warning as it has already been supported

2019-08-12 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-7950:

Fix Version/s: (was: 2.15.0)
   2.16.0

> Remove the Python 3 warning as it has already been supported
> 
>
> Key: BEAM-7950
> URL: https://issues.apache.org/jira/browse/BEAM-7950
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>
> There are warnings that Python 3 is not fully supported in Beam 
> (beam/sdks/python/setup.py). As mentioned in the ML, we should remove the 
> Python 3 warning as it has already been supported as an effort of 
> https://issues.apache.org/jira/browse/BEAM-1251.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7954) Beam should only use guava imports coming from vendored guava

2019-08-12 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-7954:

Fix Version/s: (was: 2.15.0)
   2.16.0

> Beam should only use guava imports coming from vendored guava
> -
>
> Key: BEAM-7954
> URL: https://issues.apache.org/jira/browse/BEAM-7954
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> gRPC vendors guava too and some Beam classes have mistakenly started to use 
> guava coming from the vendored gRPC dependency. We should probably restrict 
> the use of Guava only to its vendored dependency.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7951) Allow runner to configure customization WindowedValue coder such as ValueOnlyWindowedValueCoder

2019-08-12 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-7951:

Fix Version/s: (was: 2.15.0)
   2.16.0

> Allow runner to configure customization WindowedValue coder such as 
> ValueOnlyWindowedValueCoder
> ---
>
> Key: BEAM-7951
> URL: https://issues.apache.org/jira/browse/BEAM-7951
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>
> The coder of WindowedValue cannot be configured and it’s always 
> FullWindowedValueCoder. We don't need to serialize the timestamp, window and 
> pane properties in Flink and so it will be better to make the coder 
> configurable (i.e. allowing to use ValueOnlyWindowedValueCoder)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7949) Add time-based cache threshold support in the data service of the Python SDK harness

2019-08-12 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-7949:

Fix Version/s: (was: 2.15.0)
   2.16.0

> Add time-based cache threshold support in the data service of the Python SDK 
> harness
> 
>
> Key: BEAM-7949
> URL: https://issues.apache.org/jira/browse/BEAM-7949
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>
> Currently only size-based cache threshold is supported in the data service of 
> Python SDK harness. It should also support the time-based cache threshold. 
> This is very important, especially for streaming jobs which are sensitive to 
> the delay. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7947) Improves the interfaces of classes such as FnDataService, BundleProcessor, ActiveBundle, etc to change the parameter type from WindowedValue to T

2019-08-12 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-7947:

Fix Version/s: (was: 2.15.0)
   2.16.0

> Improves the interfaces of classes such as FnDataService, BundleProcessor, 
> ActiveBundle, etc to change the parameter type from WindowedValue to T
> 
>
> Key: BEAM-7947
> URL: https://issues.apache.org/jira/browse/BEAM-7947
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>
> Both `Coder>` and `FnDataReceiver>` use 
> `WindowedValue` as the data structure that both sides of Runner and SDK 
> Harness know each other. Control Plane/Data Plane/State Plane/Logging is a 
> highly abstraction, such as Control Plane and Logging, these are common 
> requirements for all multi-language platforms. For example, the Flink 
> community is also discussing how to support Python UDF, as well as how to 
> deal with docker environment. how to data transfer, how to state access, how 
> to logging etc. If Beam can further abstract these service interfaces, i.e., 
> interface definitions are compatible with multiple engines, and finally 
> provided to other projects in the form of class libraries, it definitely will 
> help other platforms that want to support multiple languages. Here I am to 
> throw out a minnow to catch a whale, take the FnDataService#receive interface 
> as an example, and turn `WindowedValue` into `T` so that other platforms 
> can be extended arbitrarily, as follows:
> {code}
>  InboundDataClient receive(LogicalEndpoint inputLocation, Coder coder, 
> FnDataReceiver> listener);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7948) Add time-based cache threshold support in the Java data service

2019-08-12 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-7948:

Fix Version/s: (was: 2.15.0)
   2.16.0

> Add time-based cache threshold support in the Java data service
> ---
>
> Key: BEAM-7948
> URL: https://issues.apache.org/jira/browse/BEAM-7948
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>
> Currently only size-based cache threshold is supported in data service. It 
> should also support the time-based cache threshold. This is very important, 
> especially for streaming jobs which are sensitive to the delay.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7945) Allow runner to configure "semi_persist_dir" which is used in the SDK harness

2019-08-12 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-7945:

Fix Version/s: (was: 2.15.0)
   2.16.0

> Allow runner to configure "semi_persist_dir" which is used in the SDK harness
> -
>
> Key: BEAM-7945
> URL: https://issues.apache.org/jira/browse/BEAM-7945
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution, sdk-go, sdk-java-core, sdk-py-core
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>
> Currently "semi_persist_dir" is not configurable. This may become a problem 
> in certain scenarios. For example, the default value of "semi_persist_dir" is 
> "/tmp" 
> ([https://github.com/apache/beam/blob/master/sdks/python/container/boot.go#L48])
>  in Python SDK harness. When the environment type is "PROCESS", the disk of 
> "/tmp" may be filled up and unexpected issues will occur in production 
> environment. We should provide a way to configure "semi_persist_dir" in 
> EnvironmentFactory at the runner side. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7944) Improvements of portability framework to make it usable in other projects

2019-08-12 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-7944:

Fix Version/s: (was: 2.15.0)
   2.16.0

> Improvements of portability framework to make it usable in other projects
> -
>
> Key: BEAM-7944
> URL: https://issues.apache.org/jira/browse/BEAM-7944
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core, sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Labels: portability
> Fix For: 2.16.0
>
>
> The Flink community will use Beam's portability framework for its 
> multi-language support, such as Python UDF execution. This is an umbrella 
> JIRA which tracks all the improvement requirements collected from Flink 
> community.
> For details of the discussion can be found in [1].
> [1] 
> [https://lists.apache.org/list.html?d...@beam.apache.org:lte=1M:%5BDISCUSS%5D%20Turn%20%60WindowedValue|https://lists.apache.org/list.html?d...@beam.apache.org:lte=1M:%5BDISCUSS%5D%20Turn%20%60WindowedValue|http://example.com]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7476) Datastore write failures with "[Errno 32] Broken pipe"

2019-08-09 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-7476:

Fix Version/s: (was: 2.15.0)
   2.16.0

> Datastore write failures with "[Errno 32] Broken pipe"
> --
>
> Key: BEAM-7476
> URL: https://issues.apache.org/jira/browse/BEAM-7476
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Affects Versions: 2.11.0
> Environment: dataflow python 2.7
>Reporter: Dmytro Sadovnychyi
>Assignee: Dmytro Sadovnychyi
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.16.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> We are getting lots of Broken pipe errors and it's only a matter of luck for 
> write to succeed. It's been happening for months.
> Partial stack trace:
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/helper.py",
>  line 225, in commit
> response = datastore.commit(request)
>   File 
> "/usr/local/lib/python2.7/dist-packages/googledatastore/connection.py", line 
> 140, in commit
> datastore_pb2.CommitResponse)
>   File 
> "/usr/local/lib/python2.7/dist-packages/googledatastore/connection.py", line 
> 199, in _call_method
> method='POST', body=payload, headers=headers)
>   File "/usr/local/lib/python2.7/dist-packages/oauth2client/transport.py", 
> line 169, in new_request
> redirections, connection_type)
>   File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 
> 1609, in request
> (response, content) = self._request(conn, authority, uri, request_uri, 
> method, body, headers, redirections, cachekey)
>   File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 
> 1351, in _request
> (response, content) = self._conn_request(conn, request_uri, method, body, 
> headers)
>   File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 
> 1273, in _conn_request
> conn.request(method, request_uri, body, headers)
>   File "/usr/lib/python2.7/httplib.py", line 1042, in request
> self._send_request(method, url, body, headers)
>   File "/usr/lib/python2.7/httplib.py", line 1082, in _send_request
> self.endheaders(body)
>   File "/usr/lib/python2.7/httplib.py", line 1038, in endheaders
> self._send_output(message_body)
>   File "/usr/lib/python2.7/httplib.py", line 882, in _send_output
> self.send(msg)
>   File "/usr/lib/python2.7/httplib.py", line 858, in send
> self.sock.sendall(data)
>   File "/usr/lib/python2.7/ssl.py", line 753, in sendall
> v = self.send(data[count:])
>   File "/usr/lib/python2.7/ssl.py", line 719, in send
> v = self._sslobj.write(data)
> RuntimeError: error: [Errno 32] Broken pipe [while running 'Groups to 
> datastore/Write Mutation to Datastore']
> Workaround: https://github.com/apache/beam/pull/8346



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7476) Datastore write failures with "[Errno 32] Broken pipe"

2019-08-09 Thread yifan zou (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904116#comment-16904116
 ] 

yifan zou commented on BEAM-7476:
-

moved to 2.16.0

> Datastore write failures with "[Errno 32] Broken pipe"
> --
>
> Key: BEAM-7476
> URL: https://issues.apache.org/jira/browse/BEAM-7476
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Affects Versions: 2.11.0
> Environment: dataflow python 2.7
>Reporter: Dmytro Sadovnychyi
>Assignee: Dmytro Sadovnychyi
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.16.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> We are getting lots of Broken pipe errors and it's only a matter of luck for 
> write to succeed. It's been happening for months.
> Partial stack trace:
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/helper.py",
>  line 225, in commit
> response = datastore.commit(request)
>   File 
> "/usr/local/lib/python2.7/dist-packages/googledatastore/connection.py", line 
> 140, in commit
> datastore_pb2.CommitResponse)
>   File 
> "/usr/local/lib/python2.7/dist-packages/googledatastore/connection.py", line 
> 199, in _call_method
> method='POST', body=payload, headers=headers)
>   File "/usr/local/lib/python2.7/dist-packages/oauth2client/transport.py", 
> line 169, in new_request
> redirections, connection_type)
>   File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 
> 1609, in request
> (response, content) = self._request(conn, authority, uri, request_uri, 
> method, body, headers, redirections, cachekey)
>   File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 
> 1351, in _request
> (response, content) = self._conn_request(conn, request_uri, method, body, 
> headers)
>   File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 
> 1273, in _conn_request
> conn.request(method, request_uri, body, headers)
>   File "/usr/lib/python2.7/httplib.py", line 1042, in request
> self._send_request(method, url, body, headers)
>   File "/usr/lib/python2.7/httplib.py", line 1082, in _send_request
> self.endheaders(body)
>   File "/usr/lib/python2.7/httplib.py", line 1038, in endheaders
> self._send_output(message_body)
>   File "/usr/lib/python2.7/httplib.py", line 882, in _send_output
> self.send(msg)
>   File "/usr/lib/python2.7/httplib.py", line 858, in send
> self.sock.sendall(data)
>   File "/usr/lib/python2.7/ssl.py", line 753, in sendall
> v = self.send(data[count:])
>   File "/usr/lib/python2.7/ssl.py", line 719, in send
> v = self._sslobj.write(data)
> RuntimeError: error: [Errno 32] Broken pipe [while running 'Groups to 
> datastore/Write Mutation to Datastore']
> Workaround: https://github.com/apache/beam/pull/8346



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7906) Perf regression in SQL Query3 in Dataflow

2019-08-09 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-7906:

Fix Version/s: (was: 2.15.0)
   2.16.0

> Perf regression in SQL Query3 in Dataflow
> -
>
> Key: BEAM-7906
> URL: https://issues.apache.org/jira/browse/BEAM-7906
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql, runner-dataflow
>Reporter: Anton Kedin
>Priority: Major
> Fix For: 2.16.0
>
> Attachments: dataflow.png, direct.png
>
>
> Nexmark shows perf regression in SQL Query3 starting on July 30 2019: 
> https://apache-beam-testing.appspot.com/explore?dashboard=5670405876482048
> There doesn't seem to be a lot of changes to SQL around that date and the one 
> that was there doesn't seem relevan to the query: 
> https://github.com/apache/beam/commits/master/sdks/java/extensions/sql
> Direct runner shows a slight perf decrease as well: 
> https://apache-beam-testing.appspot.com/explore?dashboard=5084698770407424 
> while Spark runner doesn't: 
> https://apache-beam-testing.appspot.com/explore?dashboard=5138380291571712
> The query in question is a join with a simple filter condition: 
> https://github.com/apache/beam/blob/b8aa8486f336df6fc9cf581f29040194edad3b87/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/sql/SqlQuery3.java#L69
> Other queries don't seem to be affected



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7906) Perf regression in SQL Query3 in Dataflow

2019-08-09 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-7906:

Priority: Major  (was: Blocker)

> Perf regression in SQL Query3 in Dataflow
> -
>
> Key: BEAM-7906
> URL: https://issues.apache.org/jira/browse/BEAM-7906
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql, runner-dataflow
>Reporter: Anton Kedin
>Priority: Major
> Fix For: 2.15.0
>
> Attachments: dataflow.png, direct.png
>
>
> Nexmark shows perf regression in SQL Query3 starting on July 30 2019: 
> https://apache-beam-testing.appspot.com/explore?dashboard=5670405876482048
> There doesn't seem to be a lot of changes to SQL around that date and the one 
> that was there doesn't seem relevan to the query: 
> https://github.com/apache/beam/commits/master/sdks/java/extensions/sql
> Direct runner shows a slight perf decrease as well: 
> https://apache-beam-testing.appspot.com/explore?dashboard=5084698770407424 
> while Spark runner doesn't: 
> https://apache-beam-testing.appspot.com/explore?dashboard=5138380291571712
> The query in question is a join with a simple filter condition: 
> https://github.com/apache/beam/blob/b8aa8486f336df6fc9cf581f29040194edad3b87/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/sql/SqlQuery3.java#L69
> Other queries don't seem to be affected



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7906) Perf regression in SQL Query3 in Dataflow

2019-08-09 Thread yifan zou (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904042#comment-16904042
 ] 

yifan zou commented on BEAM-7906:
-

Downgrade to major and move it to 2.16.0.

> Perf regression in SQL Query3 in Dataflow
> -
>
> Key: BEAM-7906
> URL: https://issues.apache.org/jira/browse/BEAM-7906
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql, runner-dataflow
>Reporter: Anton Kedin
>Priority: Blocker
> Fix For: 2.15.0
>
> Attachments: dataflow.png, direct.png
>
>
> Nexmark shows perf regression in SQL Query3 starting on July 30 2019: 
> https://apache-beam-testing.appspot.com/explore?dashboard=5670405876482048
> There doesn't seem to be a lot of changes to SQL around that date and the one 
> that was there doesn't seem relevan to the query: 
> https://github.com/apache/beam/commits/master/sdks/java/extensions/sql
> Direct runner shows a slight perf decrease as well: 
> https://apache-beam-testing.appspot.com/explore?dashboard=5084698770407424 
> while Spark runner doesn't: 
> https://apache-beam-testing.appspot.com/explore?dashboard=5138380291571712
> The query in question is a join with a simple filter condition: 
> https://github.com/apache/beam/blob/b8aa8486f336df6fc9cf581f29040194edad3b87/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/sql/SqlQuery3.java#L69
> Other queries don't seem to be affected



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7860) v1new ReadFromDatastore returns duplicates if keys are of mixed types

2019-08-07 Thread yifan zou (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902396#comment-16902396
 ] 

yifan zou commented on BEAM-7860:
-

Great, thanks!

> v1new ReadFromDatastore returns duplicates if keys are of mixed types
> -
>
> Key: BEAM-7860
> URL: https://issues.apache.org/jira/browse/BEAM-7860
> Project: Beam
>  Issue Type: Bug
>  Components: io-python-gcp
>Affects Versions: 2.13.0
> Environment: Python 2.7
> Python 3.7
>Reporter: Niels Stender
>Assignee: Udi Meiri
>Priority: Blocker
> Fix For: 2.15.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In the presence of mixed type keys, v1new ReadFromDatastore may return 
> duplicate items. The attached example returns 4 records, not the expected 3.
>  
> {code:java}
> // code placeholder
> from __future__ import unicode_literals
> import apache_beam as beam
> from apache_beam.io.gcp.datastore.v1new.types import Key, Entity, Query
> from apache_beam.io.gcp.datastore.v1new import datastoreio
> config = dict(project='your-google-project', namespace='test')
> def test_mixed():
> keys = [
> Key(['mixed', '10038260-iperm_eservice'], **config),
> Key(['mixed', 4812224868188160], **config),
> Key(['mixed', '99152975-pointshop'], **config)
> ]
> entities = map(lambda key: Entity(key=key), keys)
> with beam.Pipeline() as p:
> (p
> | beam.Create(entities)
> | datastoreio.WriteToDatastore(project=config['project'])
> )
> query = Query(kind='mixed', **config)
> with beam.Pipeline() as p:
> (p
> | datastoreio.ReadFromDatastore(query=query, num_splits=4)
> | beam.io.WriteToText('tmp.txt', num_shards=1, 
> shard_name_template='')
> )
> items = open('tmp.txt').read().strip().split('\n')
> assert len(items) == 3, 'incorrect number of items'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7860) v1new ReadFromDatastore returns duplicates if keys are of mixed types

2019-08-06 Thread yifan zou (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901244#comment-16901244
 ] 

yifan zou commented on BEAM-7860:
-

Any ETA on this?

> v1new ReadFromDatastore returns duplicates if keys are of mixed types
> -
>
> Key: BEAM-7860
> URL: https://issues.apache.org/jira/browse/BEAM-7860
> Project: Beam
>  Issue Type: Bug
>  Components: io-python-gcp
>Affects Versions: 2.13.0
> Environment: Python 2.7
> Python 3.7
>Reporter: Niels Stender
>Assignee: Udi Meiri
>Priority: Blocker
> Fix For: 2.15.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In the presence of mixed type keys, v1new ReadFromDatastore may return 
> duplicate items. The attached example returns 4 records, not the expected 3.
>  
> {code:java}
> // code placeholder
> from __future__ import unicode_literals
> import apache_beam as beam
> from apache_beam.io.gcp.datastore.v1new.types import Key, Entity, Query
> from apache_beam.io.gcp.datastore.v1new import datastoreio
> config = dict(project='your-google-project', namespace='test')
> def test_mixed():
> keys = [
> Key(['mixed', '10038260-iperm_eservice'], **config),
> Key(['mixed', 4812224868188160], **config),
> Key(['mixed', '99152975-pointshop'], **config)
> ]
> entities = map(lambda key: Entity(key=key), keys)
> with beam.Pipeline() as p:
> (p
> | beam.Create(entities)
> | datastoreio.WriteToDatastore(project=config['project'])
> )
> query = Query(kind='mixed', **config)
> with beam.Pipeline() as p:
> (p
> | datastoreio.ReadFromDatastore(query=query, num_splits=4)
> | beam.io.WriteToText('tmp.txt', num_shards=1, 
> shard_name_template='')
> )
> items = open('tmp.txt').read().strip().split('\n')
> assert len(items) == 3, 'incorrect number of items'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7866) Python MongoDB IO performance and correctness issues

2019-08-06 Thread yifan zou (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901243#comment-16901243
 ] 

yifan zou commented on BEAM-7866:
-

Any ETA on this? [~yichi]

> Python MongoDB IO performance and correctness issues
> 
>
> Key: BEAM-7866
> URL: https://issues.apache.org/jira/browse/BEAM-7866
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Yichi Zhang
>Priority: Blocker
> Fix For: 2.15.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/mongodbio.py
>  splits the query result by computing number of results in constructor, and 
> then in each reader re-executing the whole query and getting an index 
> sub-range of those results.
> This is broken in several critical ways:
> - The order of query results returned by find() is not necessarily 
> deterministic, so the idea of index ranges on it is meaningless: each shard 
> may basically get random, possibly overlapping subsets of the total results
> - Even if you add order by `_id`, the database may be changing concurrently 
> to reading and splitting. E.g. if the database contained documents with ids 
> 10 20 30 40 50, and this was split into shards 0..2 and 3..5 (under the 
> assumption that these shards would contain respectively 10 20 30, and 40 50), 
> and then suppose shard 10 20 30 is read and then document 25 is inserted - 
> then the 3..5 shard will read 30 40 50, i.e. document 30 is duplicated and 
> document 25 is lost.
> - Every shard re-executes the query and skips the first start_offset items, 
> which in total is quadratic complexity
> - The query is first executed in the constructor in order to count results, 
> which 1) means the constructor can be super slow and 2) it won't work at all 
> if the database is unavailable at the time the pipeline is constructed (e.g. 
> if this is a template).
> Unfortunately, none of these issues are caught by SourceTestUtils: this class 
> has extensive coverage with it, and the tests pass. This is because the tests 
> return the same results in the same order. I don't know how to catch this 
> automatically, and I don't know how to catch the performance issue 
> automatically, but these would all be important follow-up items after the 
> actual fix.
> CC: [~chamikara] as reviewer.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7833) warn user when --region flag is not explicitly set

2019-08-06 Thread yifan zou (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16901241#comment-16901241
 ] 

yifan zou commented on BEAM-7833:
-

The PR has been approved. Do you have an ETA to check in? [~ibzib]

> warn user when --region flag is not explicitly set
> --
>
> Key: BEAM-7833
> URL: https://issues.apache.org/jira/browse/BEAM-7833
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7874) FnApi only supports up to 10 workers

2019-08-02 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-7874:

Priority: Blocker  (was: Major)

> FnApi only supports up to 10 workers
> 
>
> Key: BEAM-7874
> URL: https://issues.apache.org/jira/browse/BEAM-7874
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Blocker
> Fix For: 2.15.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Because max_workers of grpc servers are hardcoded to 10, it only supports up 
> to 10 workers, and if we pass more direct_num_workers greater than 10, 
> pipeline hangs, because not all workers get connected to the runner.
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/fn_api_runner.py#L1141]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7873) FnApi with Subprocess runner hangs frequently when running with multi workers with py2

2019-08-02 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-7873:

Priority: Blocker  (was: Major)

> FnApi with Subprocess runner hangs frequently when running with multi workers 
> with py2
> --
>
> Key: BEAM-7873
> URL: https://issues.apache.org/jira/browse/BEAM-7873
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Blocker
> Fix For: 2.15.0
>
>
> Pipeline hangs at 
> [subprocess.Popen()|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/local_job_service.py#L203]
>  when shut it down. I looked into source code of subprocess lib. 
> [py27|https://github.com/enthought/Python-2.7.3/blob/master/Lib/subprocess.py#L1286]
>  doesn't do any lock while 
> [py3|https://github.com/python/cpython/blob/3.7/Lib/subprocess.py#L1592] 
> locks when waiting. Py3 added locks at other places of Popen() as well, all 
> unlocked places with py2 may contribute to the problem. We can add a lock 
> when calling Popen() to prevent the deadlock. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7789) :beam-test-tools project fails to build locally

2019-08-02 Thread yifan zou (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898965#comment-16898965
 ] 

yifan zou commented on BEAM-7789:
-

It failed to build on Jenkins build workers as well. 
[https://scans.gradle.com/s/na72ytsc2r65u/failure#top=1]

> :beam-test-tools project fails to build locally
> ---
>
> Key: BEAM-7789
> URL: https://issues.apache.org/jira/browse/BEAM-7789
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Anton Kedin
>Priority: Major
>
> Running the release-verification build (global build of everything) in turn 
> triggers the build of the `beam-test-tools` project, which has some test 
> infrastructure scripts that we run on Jenkins. It seems to work fine on 
> jenkins. However running the build of the project locally fails: 
> https://scans.gradle.com/s/kqhkzyozbpiua/console-log#L6
> What seems to happen is the gradle vendoring plugin caches the dependencies 
> locally, but fails to cache simplelru.
> One workaround (based on ./gradlew :beam-test-tools:showGopathGoroot)
> {code}
> export GOPATH=$PWD/.test-infra/tools/.gogradle/project_gopath
> go get github.com/hashicorp/golang-lru/simplelru
> ./gradlew :beam-test-tools:build
> {code}
> It is able to find the `lrumap` and `simplelru` during the dependency 
> resolution step, and I can see it mentioned in couple of artifacts produced 
> by the `gogradle` plugin. But when it does `:installDepedencies` to actually 
> copy them to `vendor` directory, this specific package is missing. This 
> reproduces for me on a couple of different machines I tried, both on release 
> and master branches



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7833) warn user when --region flag is not explicitly set

2019-08-01 Thread yifan zou (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898279#comment-16898279
 ] 

yifan zou commented on BEAM-7833:
-

ACK

> warn user when --region flag is not explicitly set
> --
>
> Key: BEAM-7833
> URL: https://issues.apache.org/jira/browse/BEAM-7833
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (BEAM-7797) DataflowRunner's upload_graph feature doesn't reduce template file size

2019-07-31 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou resolved BEAM-7797.
-
Resolution: Fixed

> DataflowRunner's upload_graph feature doesn't reduce template file size
> ---
>
> Key: BEAM-7797
> URL: https://issues.apache.org/jira/browse/BEAM-7797
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Yunqing Zhou
>Assignee: Yunqing Zhou
>Priority: Minor
> Fix For: 2.15.0
>
>   Original Estimate: 48h
>  Time Spent: 50m
>  Remaining Estimate: 47h 10m
>
> Dataflow Runner has a feature to save job graph separately, reducing the size 
> of the request to Dataflow API.
> Since templates are simply dumps of requests to the API, sizes of template 
> files should be reduced as well.
>  
> However, the logic to remove parts from the request happened after template 
> file generation.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7303) Move Portable Runner and other of reference runner.

2019-07-31 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-7303:

Fix Version/s: (was: 2.15.0)
   2.16.0

> Move Portable Runner and other of reference runner.
> ---
>
> Key: BEAM-7303
> URL: https://issues.apache.org/jira/browse/BEAM-7303
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.16.0
>
>
> PortableRunner is used by all Flink, Spark ... . 
> We should move it out of Reference Runner package to stream line the 
> dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7840) Create MapTuple and FlatMapTuple to ease migration to Python 3.

2019-07-30 Thread yifan zou (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896576#comment-16896576
 ] 

yifan zou commented on BEAM-7840:
-

Is this still planned for 2.15 release?

> Create MapTuple and FlatMapTuple to ease migration to Python 3.
> ---
>
> Key: BEAM-7840
> URL: https://issues.apache.org/jira/browse/BEAM-7840
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> These are like Map and FlatMap but expand out tuple input elements across
> several arguments. This will be useful as tuple argument unpacking has been
> removed in Python 3. Instead of having to convert
> Map(lambda (k, v): expresion(k, v))
> into
> Map(lambda k_v: expression(k_v[0], k_v[1]))
> one can now write
> MapTuple(lambda k, v: expression(k, v))



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7223) Add ValidatesRunner test suite for Flink on Python 3.

2019-07-30 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-7223:

Fix Version/s: (was: 2.15.0)
   2.16.0

> Add ValidatesRunner test suite for Flink on Python 3.
> -
>
> Key: BEAM-7223
> URL: https://issues.apache.org/jira/browse/BEAM-7223
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Ankur Goenka
>Assignee: Frederik Bode
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Add py3 integration tests for Flink



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7223) Add ValidatesRunner test suite for Flink on Python 3.

2019-07-30 Thread yifan zou (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896574#comment-16896574
 ] 

yifan zou commented on BEAM-7223:
-

This seems not a blocker of the incoming 2.15 release. I'll change the fix 
version to 2.16. 

> Add ValidatesRunner test suite for Flink on Python 3.
> -
>
> Key: BEAM-7223
> URL: https://issues.apache.org/jira/browse/BEAM-7223
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-flink
>Reporter: Ankur Goenka
>Assignee: Frederik Bode
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Add py3 integration tests for Flink



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7303) Move Portable Runner and other of reference runner.

2019-07-30 Thread yifan zou (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896572#comment-16896572
 ] 

yifan zou commented on BEAM-7303:
-

Will this be checked in 2.15 release?

> Move Portable Runner and other of reference runner.
> ---
>
> Key: BEAM-7303
> URL: https://issues.apache.org/jira/browse/BEAM-7303
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.15.0
>
>
> PortableRunner is used by all Flink, Spark ... . 
> We should move it out of Reference Runner package to stream line the 
> dependencies.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7730) Add Flink 1.9 build target and Make FlinkRunner compatible with Flink 1.9

2019-07-30 Thread yifan zou (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896569#comment-16896569
 ] 

yifan zou commented on BEAM-7730:
-

Changed the fix version to 2.16 since this is not a blocker of incoming release.

> Add Flink 1.9 build target and Make FlinkRunner compatible with Flink 1.9
> -
>
> Key: BEAM-7730
> URL: https://issues.apache.org/jira/browse/BEAM-7730
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.15.0
>
>
> Apache Flink 1.9 will coming and it's better to add Flink 1.9 build target 
> and make Flink Runner compatible with Flink 1.9.
> I will add the brief changes after the Flink 1.9.0 released. 
> And I appreciate it if you can leave your suggestions or comments!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7730) Add Flink 1.9 build target and Make FlinkRunner compatible with Flink 1.9

2019-07-30 Thread yifan zou (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-7730:

Fix Version/s: (was: 2.15.0)
   2.16.0

> Add Flink 1.9 build target and Make FlinkRunner compatible with Flink 1.9
> -
>
> Key: BEAM-7730
> URL: https://issues.apache.org/jira/browse/BEAM-7730
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.16.0
>
>
> Apache Flink 1.9 will coming and it's better to add Flink 1.9 build target 
> and make Flink Runner compatible with Flink 1.9.
> I will add the brief changes after the Flink 1.9.0 released. 
> And I appreciate it if you can leave your suggestions or comments!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


  1   2   3   >