[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation
[ https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=300067&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300067 ] ASF GitHub Bot logged work on BEAM-7013: Author: ASF GitHub Bot Created on: 23/Aug/19 06:09 Start Date: 23/Aug/19 06:09 Worklog Time Spent: 10m Work Description: robinyqiu commented on issue #9144: [BEAM-7013] Integrating ZetaSketch's HLL++ algorithm with Beam URL: https://github.com/apache/beam/pull/9144#issuecomment-524185713 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 300067) Time Spent: 27h 10m (was: 27h) > A new count distinct transform based on BigQuery compatible HyperLogLog++ > implementation > > > Key: BEAM-7013 > URL: https://issues.apache.org/jira/browse/BEAM-7013 > Project: Beam > Issue Type: New Feature > Components: extensions-java-sketching, sdk-java-core >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Fix For: 2.16.0 > > Time Spent: 27h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-6114) SQL join selection should be done in planner, not in expansion to PTransform
[ https://issues.apache.org/jira/browse/BEAM-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913924#comment-16913924 ] Rui Wang commented on BEAM-6114: [~rahul8383] Regarding to : // Should we throw Exception when joinType is LEFT (or) RIGHT (or) FULL? My perspective is, for the sake of simplicity, we can only allow triggering once (like what CoGBK is doing). By doing so, we will allow LEFT/RIGHT/FULL OUTER join. It is because for multiple triggering, the problem is how to refine data. Think about outer join means it could emit at the first triggering and later it will have to emit to refine data. We will need retractions to solve this problem. Also it sounds nice to split javadoc of BeamJoinRel. Thanks for bringing it up. > SQL join selection should be done in planner, not in expansion to PTransform > > > Key: BEAM-6114 > URL: https://issues.apache.org/jira/browse/BEAM-6114 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Rahul Patwari >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > Currently Beam SQL joins all go through a single physical operator which has > a single PTransform that does all join algorithms based on properties of its > input PCollections as well as the relational algebra. > A first step is to make the needed information part of the relational > algebra, so it can choose a PTransform based on that, and the PTransforms can > be simpler. > Second step is to have separate (physical) relational operators for different > join algorithms. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-6114) SQL join selection should be done in planner, not in expansion to PTransform
[ https://issues.apache.org/jira/browse/BEAM-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913871#comment-16913871 ] Rahul Patwari commented on BEAM-6114: - Hi [~amaliujia] What are your thoughts about [https://github.com/apache/beam/blob/cacb9310b0223683ae6bea0637d2e0077ebee1de/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSideInputLookupJoinRel.java#L52] I am planning to move Javadoc in [https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamJoinRel.java] to the respective JoinRels. > SQL join selection should be done in planner, not in expansion to PTransform > > > Key: BEAM-6114 > URL: https://issues.apache.org/jira/browse/BEAM-6114 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Rahul Patwari >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > Currently Beam SQL joins all go through a single physical operator which has > a single PTransform that does all join algorithms based on properties of its > input PCollections as well as the relational algebra. > A first step is to make the needed information part of the relational > algebra, so it can choose a PTransform based on that, and the PTransforms can > be simpler. > Second step is to have separate (physical) relational operators for different > join algorithms. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Closed] (BEAM-8038) Python Precommit fail: 'BeamFnExternalWorkerPoolServicer' has no attribute '_worker_processes'
[ https://issues.apache.org/jira/browse/BEAM-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Weise closed BEAM-8038. -- Fix Version/s: Not applicable Resolution: Fixed > Python Precommit fail: 'BeamFnExternalWorkerPoolServicer' has no attribute > '_worker_processes' > -- > > Key: BEAM-8038 > URL: https://issues.apache.org/jira/browse/BEAM-8038 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness, test-failures >Reporter: Ahmet Altay >Assignee: Thomas Weise >Priority: Critical > Fix For: Not applicable > > Time Spent: 1h > Remaining Estimate: 0h > > Logs: https://builds.apache.org/job/beam_PreCommit_Python_Commit/8246/console > 10:14:09 > -- > 10:14:09 XML: > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/nosetests.xml > 10:14:09 > -- > 10:14:09 Ran 2594 tests in 629.438s > 10:14:09 > 10:14:09 OK (SKIP=520) > 10:14:09 Error in atexit._run_exitfuncs: > 10:14:09 Traceback (most recent call last): > 10:14:09 File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs > 10:14:09 func(*targs, **kargs) > 10:14:09 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/worker_pool_main.py", > line 72, in kill_worker_processes > 10:14:09 for worker_process in cls._worker_processes.values(): > 10:14:09 AttributeError: type object 'BeamFnExternalWorkerPoolServicer' has > no attribute '_worker_processes' > 10:14:09 Error in sys.exitfunc: > 10:14:09 Traceback (most recent call last): > 10:14:09 File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs > 10:14:09 func(*targs, **kargs) > 10:14:09 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/worker_pool_main.py", > line 72, in kill_worker_processes > 10:14:09 for worker_process in cls._worker_processes.values(): > 10:14:09 AttributeError: type object 'BeamFnExternalWorkerPoolServicer' has > no attribute '_worker_processes' > 10:14:10 py27-cython run-test-post: commands[0] | > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/scripts/run_tox_cleanup.sh > 10:14:10 ___ summary > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8079) Move verify_release_build.sh to Jenkins job
[ https://issues.apache.org/jira/browse/BEAM-8079?focusedWorklogId=299870&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299870 ] ASF GitHub Bot logged work on BEAM-8079: Author: ASF GitHub Bot Created on: 23/Aug/19 02:19 Start Date: 23/Aug/19 02:19 Worklog Time Spent: 10m Work Description: markflyhigh commented on pull request #9411: [BEAM-8079] Move release Gradle build to a Jenkins job (Part - 1) URL: https://github.com/apache/beam/pull/9411 Reuse existing Jenkins machine to verify release Gradle build can get rid of painful environment setup in `verify_release_branch.sh`. Make it to a Jenkins job can also remove the restriction of the platform. Originally, the environment setup is specific to Linux-like system. +R: @yifanzou Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_
[jira] [Work logged] (BEAM-7616) urlopen calls could get stuck without a timeout
[ https://issues.apache.org/jira/browse/BEAM-7616?focusedWorklogId=299867&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299867 ] ASF GitHub Bot logged work on BEAM-7616: Author: ASF GitHub Bot Created on: 23/Aug/19 02:11 Start Date: 23/Aug/19 02:11 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9401: [BEAM-7616] apitools use urllib with the global timeout. Set it to 60 seconds # to prevent network related stuckness issues. URL: https://github.com/apache/beam/pull/9401#issuecomment-524144582 Run Python 2 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299867) Time Spent: 3h 40m (was: 3.5h) > urlopen calls could get stuck without a timeout > --- > > Key: BEAM-7616 > URL: https://issues.apache.org/jira/browse/BEAM-7616 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Udi Meiri >Priority: Blocker > Fix For: 2.14.0, 2.16.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7616) urlopen calls could get stuck without a timeout
[ https://issues.apache.org/jira/browse/BEAM-7616?focusedWorklogId=299866&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299866 ] ASF GitHub Bot logged work on BEAM-7616: Author: ASF GitHub Bot Created on: 23/Aug/19 02:10 Start Date: 23/Aug/19 02:10 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9401: [BEAM-7616] apitools use urllib with the global timeout. Set it to 60 seconds # to prevent network related stuckness issues. URL: https://github.com/apache/beam/pull/9401#issuecomment-524144776 Run Python 2 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299866) Time Spent: 3.5h (was: 3h 20m) > urlopen calls could get stuck without a timeout > --- > > Key: BEAM-7616 > URL: https://issues.apache.org/jira/browse/BEAM-7616 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Udi Meiri >Priority: Blocker > Fix For: 2.14.0, 2.16.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7616) urlopen calls could get stuck without a timeout
[ https://issues.apache.org/jira/browse/BEAM-7616?focusedWorklogId=299865&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299865 ] ASF GitHub Bot logged work on BEAM-7616: Author: ASF GitHub Bot Created on: 23/Aug/19 02:10 Start Date: 23/Aug/19 02:10 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9401: [BEAM-7616] apitools use urllib with the global timeout. Set it to 60 seconds # to prevent network related stuckness issues. URL: https://github.com/apache/beam/pull/9401#issuecomment-524144582 Run Python 2 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299865) Time Spent: 3h 20m (was: 3h 10m) > urlopen calls could get stuck without a timeout > --- > > Key: BEAM-7616 > URL: https://issues.apache.org/jira/browse/BEAM-7616 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Udi Meiri >Priority: Blocker > Fix For: 2.14.0, 2.16.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7616) urlopen calls could get stuck without a timeout
[ https://issues.apache.org/jira/browse/BEAM-7616?focusedWorklogId=299864&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299864 ] ASF GitHub Bot logged work on BEAM-7616: Author: ASF GitHub Bot Created on: 23/Aug/19 02:09 Start Date: 23/Aug/19 02:09 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9401: [BEAM-7616] apitools use urllib with the global timeout. Set it to 60 seconds # to prevent network related stuckness issues. URL: https://github.com/apache/beam/pull/9401#issuecomment-524144582 Run Python PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299864) Time Spent: 3h 10m (was: 3h) > urlopen calls could get stuck without a timeout > --- > > Key: BEAM-7616 > URL: https://issues.apache.org/jira/browse/BEAM-7616 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Udi Meiri >Priority: Blocker > Fix For: 2.14.0, 2.16.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7616) urlopen calls could get stuck without a timeout
[ https://issues.apache.org/jira/browse/BEAM-7616?focusedWorklogId=299863&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299863 ] ASF GitHub Bot logged work on BEAM-7616: Author: ASF GitHub Bot Created on: 23/Aug/19 02:09 Start Date: 23/Aug/19 02:09 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9401: [BEAM-7616] apitools use urllib with the global timeout. Set it to 60 seconds # to prevent network related stuckness issues. URL: https://github.com/apache/beam/pull/9401#issuecomment-524144557 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299863) Time Spent: 3h (was: 2h 50m) > urlopen calls could get stuck without a timeout > --- > > Key: BEAM-7616 > URL: https://issues.apache.org/jira/browse/BEAM-7616 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Udi Meiri >Priority: Blocker > Fix For: 2.14.0, 2.16.0 > > Time Spent: 3h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7864) Portable spark Reshuffle coder cast exception
[ https://issues.apache.org/jira/browse/BEAM-7864?focusedWorklogId=299849&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299849 ] ASF GitHub Bot logged work on BEAM-7864: Author: ASF GitHub Bot Created on: 23/Aug/19 01:38 Start Date: 23/Aug/19 01:38 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9410: [BEAM-7864] fix Spark reshuffle translation with Python SDK URL: https://github.com/apache/beam/pull/9410#issuecomment-524138861 Run Java Spark PortableValidatesRunner Batch This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299849) Time Spent: 0.5h (was: 20m) > Portable spark Reshuffle coder cast exception > - > > Key: BEAM-7864 > URL: https://issues.apache.org/jira/browse/BEAM-7864 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 0.5h > Remaining Estimate: 0h > > running :sdks:python:test-suites:portable:py35:portableWordCountBatch in > either loopback or docker mode on master fails with exception: > > java.lang.ClassCastException: org.apache.beam.sdk.coders.LengthPrefixCoder > cannot be cast to org.apache.beam.sdk.coders.KvCoder > at > org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translateReshuffle(SparkBatchPortablePipelineTranslator.java:400) > at > org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translate(SparkBatchPortablePipelineTranslator.java:147) > at > org.apache.beam.runners.spark.SparkPipelineRunner.lambda$run$1(SparkPipelineRunner.java:96) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7864) Portable spark Reshuffle coder cast exception
[ https://issues.apache.org/jira/browse/BEAM-7864?focusedWorklogId=299846&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299846 ] ASF GitHub Bot logged work on BEAM-7864: Author: ASF GitHub Bot Created on: 23/Aug/19 01:30 Start Date: 23/Aug/19 01:30 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #9410: [BEAM-7864] fix Spark reshuffle translation with Python SDK URL: https://github.com/apache/beam/pull/9410 The previous implementation of reshuffle on the portable Spark runner made assumptions its inputs that proved false when running some Python pipelines. This new translation is made more general to fix that. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompleted
[jira] [Work logged] (BEAM-7864) Portable spark Reshuffle coder cast exception
[ https://issues.apache.org/jira/browse/BEAM-7864?focusedWorklogId=299847&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299847 ] ASF GitHub Bot logged work on BEAM-7864: Author: ASF GitHub Bot Created on: 23/Aug/19 01:30 Start Date: 23/Aug/19 01:30 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9410: [BEAM-7864] fix Spark reshuffle translation with Python SDK URL: https://github.com/apache/beam/pull/9410#issuecomment-524137681 Run Java Spark PortableValidatesRunner Batch This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299847) Time Spent: 20m (was: 10m) > Portable spark Reshuffle coder cast exception > - > > Key: BEAM-7864 > URL: https://issues.apache.org/jira/browse/BEAM-7864 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 20m > Remaining Estimate: 0h > > running :sdks:python:test-suites:portable:py35:portableWordCountBatch in > either loopback or docker mode on master fails with exception: > > java.lang.ClassCastException: org.apache.beam.sdk.coders.LengthPrefixCoder > cannot be cast to org.apache.beam.sdk.coders.KvCoder > at > org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translateReshuffle(SparkBatchPortablePipelineTranslator.java:400) > at > org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translate(SparkBatchPortablePipelineTranslator.java:147) > at > org.apache.beam.runners.spark.SparkPipelineRunner.lambda$run$1(SparkPipelineRunner.java:96) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8079) Move verify_release_build.sh to Jenkins job
Mark Liu created BEAM-8079: -- Summary: Move verify_release_build.sh to Jenkins job Key: BEAM-8079 URL: https://issues.apache.org/jira/browse/BEAM-8079 Project: Beam Issue Type: Sub-task Components: build-system Reporter: Mark Liu Assignee: Mark Liu verify_release_build.sh is used for validation after release branch is cut. Basically it does two things: 1. verify Gradle build with -PisRelease turned on. 2. create a PR and run all PostCommit jobs against release branch. However, release manager got many painpoints when running this script: 1. A lot of environment setup and some of tooling install easily broke the script. 2. Running Gradle build locally too extremely long time. 3. Auto-pr-creation (use hub) doesn't work. We can move Gradle build to Jenkins in order to get rid of environment setup work. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python
[ https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=299826&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299826 ] ASF GitHub Bot logged work on BEAM-7886: Author: ASF GitHub Bot Created on: 23/Aug/19 00:13 Start Date: 23/Aug/19 00:13 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #9188: [BEAM-7886] Make row coder a standard coder and implement in Python URL: https://github.com/apache/beam/pull/9188#issuecomment-524124310 Trying to think of a better name than PortableSchemaCoder, but I guess this is fine for now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299826) Time Spent: 8.5h (was: 8h 20m) > Make row coder a standard coder and implement in python > --- > > Key: BEAM-7886 > URL: https://issues.apache.org/jira/browse/BEAM-7886 > Project: Beam > Issue Type: Improvement > Components: beam-model, sdk-java-core, sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 8.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation
[ https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=299822&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299822 ] ASF GitHub Bot logged work on BEAM-7013: Author: ASF GitHub Bot Created on: 22/Aug/19 23:33 Start Date: 22/Aug/19 23:33 Worklog Time Spent: 10m Work Description: robinyqiu commented on issue #9144: [BEAM-7013] Integrating ZetaSketch's HLL++ algorithm with Beam URL: https://github.com/apache/beam/pull/9144#issuecomment-524116749 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299822) Time Spent: 27h (was: 26h 50m) > A new count distinct transform based on BigQuery compatible HyperLogLog++ > implementation > > > Key: BEAM-7013 > URL: https://issues.apache.org/jira/browse/BEAM-7013 > Project: Beam > Issue Type: New Feature > Components: extensions-java-sketching, sdk-java-core >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Fix For: 2.16.0 > > Time Spent: 27h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation
[ https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=299820&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299820 ] ASF GitHub Bot logged work on BEAM-7013: Author: ASF GitHub Bot Created on: 22/Aug/19 23:28 Start Date: 22/Aug/19 23:28 Worklog Time Spent: 10m Work Description: robinyqiu commented on issue #9144: [BEAM-7013] Integrating ZetaSketch's HLL++ algorithm with Beam URL: https://github.com/apache/beam/pull/9144#issuecomment-524115723 I have made the change such that the BQ tables needed for testing is now created before the tests and deleted after the tests. PTAL. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299820) Time Spent: 26h 50m (was: 26h 40m) > A new count distinct transform based on BigQuery compatible HyperLogLog++ > implementation > > > Key: BEAM-7013 > URL: https://issues.apache.org/jira/browse/BEAM-7013 > Project: Beam > Issue Type: New Feature > Components: extensions-java-sketching, sdk-java-core >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Fix For: 2.16.0 > > Time Spent: 26h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation
[ https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=299819&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299819 ] ASF GitHub Bot logged work on BEAM-7013: Author: ASF GitHub Bot Created on: 22/Aug/19 23:27 Start Date: 22/Aug/19 23:27 Worklog Time Spent: 10m Work Description: robinyqiu commented on pull request #9144: [BEAM-7013] Integrating ZetaSketch's HLL++ algorithm with Beam URL: https://github.com/apache/beam/pull/9144#discussion_r316924297 ## File path: sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java ## @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.zetasketch; + +import com.google.api.services.bigquery.model.TableFieldSchema; +import com.google.api.services.bigquery.model.TableRow; +import com.google.api.services.bigquery.model.TableSchema; +import java.nio.ByteBuffer; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.coders.ByteArrayCoder; +import org.apache.beam.sdk.extensions.gcp.options.GcpOptions; +import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO; +import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.TypedRead.Method; +import org.apache.beam.sdk.io.gcp.bigquery.SchemaAndRecord; +import org.apache.beam.sdk.io.gcp.testing.BigqueryMatcher; +import org.apache.beam.sdk.options.ApplicationNameOptions; +import org.apache.beam.sdk.testing.PAssert; +import org.apache.beam.sdk.testing.TestPipeline; +import org.apache.beam.sdk.testing.TestPipelineOptions; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.values.PCollection; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** + * Integration tests for HLL++ sketch compatibility between Beam and BigQuery. The tests verifies + * that HLL++ sketches created in Beam can be processed by BigQuery, and vice versa. + */ +@RunWith(JUnit4.class) +public class BigQueryHllSketchCompatibilityIT { + + private static final String DATASET_NAME = "zetasketch_compatibility_test"; + + // Table for testReadSketchFromBigQuery() + // Schema: only one STRING field named "data". + // Content: prepopulated with 4 rows: "Apple", "Orange", "Banana", "Orange" + private static final String DATA_TABLE_NAME = "hll_data"; + private static final String DATA_FIELD_NAME = "data"; + private static final String QUERY_RESULT_FIELD_NAME = "sketch"; + private static final Long EXPECTED_COUNT = 3L; + + // Table for testWriteSketchToBigQuery() + // Schema: only one BYTES field named "sketch". + // Content: will be overridden by the sketch computed by the test pipeline each time the test runs + private static final String SKETCH_TABLE_NAME = "hll_sketch"; + private static final String SKETCH_FIELD_NAME = "sketch"; + private static final List TEST_DATA = + Arrays.asList("Apple", "Orange", "Banana", "Orange"); + // SHA-1 hash of string "[3]", the string representation of a row that has only one field 3 in it + private static final String EXPECTED_CHECKSUM = "f1e31df9806ce94c5bdbbfff9608324930f4d3f1"; + + /** + * Test that HLL++ sketch computed in BigQuery can be processed by Beam. Hll sketch is computed by + * {@code HLL_COUNT.INIT} in BigQuery and read into Beam; the test verifies that we can run {@link + * HllCount.MergePartial} and {@link HllCount.Extract} on the sketch in Beam to get the correct + * estimated count. + */ + @Test + public void testReadSketchFromBigQuery() { Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ---
[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation
[ https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=299818&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299818 ] ASF GitHub Bot logged work on BEAM-7013: Author: ASF GitHub Bot Created on: 22/Aug/19 23:27 Start Date: 22/Aug/19 23:27 Worklog Time Spent: 10m Work Description: robinyqiu commented on pull request #9144: [BEAM-7013] Integrating ZetaSketch's HLL++ algorithm with Beam URL: https://github.com/apache/beam/pull/9144#discussion_r316924234 ## File path: sdks/java/extensions/zetasketch/build.gradle ## @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import groovy.json.JsonOutput + +plugins { id 'org.apache.beam.module' } +applyJavaNature() + +description = "Apache Beam :: SDKs :: Java :: Extensions :: ZetaSketch" + +def zetasketch_version = "0.1.0" + +dependencies { +compile library.java.vendored_guava_26_0_jre +compile project(path: ":sdks:java:core", configuration: "shadow") +compile "com.google.zetasketch:zetasketch:$zetasketch_version" +testCompile library.java.junit +testCompile project(":sdks:java:io:google-cloud-platform") +testRuntimeOnly project(":runners:direct-java") +testRuntimeOnly project(":runners:google-cloud-dataflow-java") +} + +/** + * Integration tests running on Dataflow with BigQuery. + */ +task integrationTest(type: Test) { +group = "Verification" +def gcpProject = project.findProperty('gcpProject') ?: 'apache-beam-testing' +def gcpTempRoot = project.findProperty('gcpTempRoot') ?: 'gs://temp-storage-for-end-to-end-tests' +systemProperty "beamTestPipelineOptions", JsonOutput.toJson([ +"--runner=TestDataflowRunner", +"--project=${gcpProject}", +"--tempRoot=${gcpTempRoot}", +]) Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299818) Time Spent: 26.5h (was: 26h 20m) > A new count distinct transform based on BigQuery compatible HyperLogLog++ > implementation > > > Key: BEAM-7013 > URL: https://issues.apache.org/jira/browse/BEAM-7013 > Project: Beam > Issue Type: New Feature > Components: extensions-java-sketching, sdk-java-core >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Fix For: 2.16.0 > > Time Spent: 26.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-7864) Portable spark Reshuffle coder cast exception
[ https://issues.apache.org/jira/browse/BEAM-7864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913794#comment-16913794 ] Kyle Weaver commented on BEAM-7864: --- The underlying coder to the LengthPrefixCoder is ByteArrayCoder, which is the fallback because we have unknown coder URN "beam:coder:pickled_python:v1". The reshuffle transform is just receiving an array of bytes, which have been presumably pickled somehow. We will need to unpickle them if we want to separate keys and values. I'm not sure if that's possible. > Portable spark Reshuffle coder cast exception > - > > Key: BEAM-7864 > URL: https://issues.apache.org/jira/browse/BEAM-7864 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > > running :sdks:python:test-suites:portable:py35:portableWordCountBatch in > either loopback or docker mode on master fails with exception: > > java.lang.ClassCastException: org.apache.beam.sdk.coders.LengthPrefixCoder > cannot be cast to org.apache.beam.sdk.coders.KvCoder > at > org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translateReshuffle(SparkBatchPortablePipelineTranslator.java:400) > at > org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translate(SparkBatchPortablePipelineTranslator.java:147) > at > org.apache.beam.runners.spark.SparkPipelineRunner.lambda$run$1(SparkPipelineRunner.java:96) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime
[ https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299812&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299812 ] ASF GitHub Bot logged work on BEAM-8023: Author: ASF GitHub Bot Created on: 22/Aug/19 23:21 Start Date: 22/Aug/19 23:21 Worklog Time Spent: 10m Work Description: kmjung commented on issue #9405: [BEAM-8023] Add value provider interfaces for BigQueryIO.Read using Method.DIRECT_READ URL: https://github.com/apache/beam/pull/9405#issuecomment-524114002 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299812) Time Spent: 2h 20m (was: 2h 10m) > Allow specifying BigQuery Storage API readOptions at runtime > > > Key: BEAM-8023 > URL: https://issues.apache.org/jira/browse/BEAM-8023 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Jeff Klukas >Assignee: Kenneth Jung >Priority: Minor > Time Spent: 2h 20m > Remaining Estimate: 0h > > We have support in the Java SDK for using the BigQuery Storage API for reads, > but only the target query or table is supported as a ValueProvider to be > specified at runtime. AFAICT, there is no reason we can't delay specifying > readOptions until runtime as well. > The readOptions are accessed by BigQueryStorageTableSource in getTargetTable; > I believe that's occurring at runtime, but I'd love for someone with deeper > BoundedSource knowledge to confirm that. > I'd advocate for adding new methods > `TypedRead.withSelectedFields(ValueProvider> value)` and > `TypedRead.withRowRestriction(ValueProvider value)`. The existing > `withReadOptions` method would then populate the other two as > StaticValueProviders. Perhaps we'd want to deprecate `withReadOptions` in > favor or specifying individual read options as separate parameters. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (BEAM-8036) [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method
[ https://issues.apache.org/jira/browse/BEAM-8036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang resolved BEAM-8036. Fix Version/s: Not applicable Resolution: Fixed > [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method > > > Key: BEAM-8036 > URL: https://issues.apache.org/jira/browse/BEAM-8036 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Mikhail Gryzykhin >Assignee: Rui Wang >Priority: Major > Labels: currently-failing > Fix For: Not applicable > > Time Spent: 2h > Remaining Estimate: 0h > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|[https://builds.apache.org/job/beam_PostCommit_SQL/2417/console]] > * [Gradle Build Scan|TODO] > * [Test source code|TODO] > Initial investigation: > *09:03:27* > > *Task :sdks:java:extensions:sql:datacatalog:integrationTest* > *09:03:27* *09:03:27* > org.apache.beam.sdk.extensions.sql.meta.provider.datacatalog.DataCatalogBigQueryIT > > testReadWrite FAILED*09:03:27* java.lang.NoSuchMethodError at > DataCatalogBigQueryIT.java:69*09:03:27* *09:03:27* 1 test completed, 1 > failed*09:03:28* *09:03:28* > > *Task :sdks:java:extensions:sql:datacatalog:integrationTest* > FAILED*09:03:28* *09:03:28* FAILURE: Build failed with an exception. > > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8036) [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method
[ https://issues.apache.org/jira/browse/BEAM-8036?focusedWorklogId=299806&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299806 ] ASF GitHub Bot logged work on BEAM-8036: Author: ASF GitHub Bot Created on: 22/Aug/19 23:04 Start Date: 22/Aug/19 23:04 Worklog Time Spent: 10m Work Description: Ardagan commented on pull request #9409: [BEAM-8036] fix failed postcommit URL: https://github.com/apache/beam/pull/9409 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299806) Time Spent: 2h (was: 1h 50m) > [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method > > > Key: BEAM-8036 > URL: https://issues.apache.org/jira/browse/BEAM-8036 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Mikhail Gryzykhin >Assignee: Rui Wang >Priority: Major > Labels: currently-failing > Time Spent: 2h > Remaining Estimate: 0h > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|[https://builds.apache.org/job/beam_PostCommit_SQL/2417/console]] > * [Gradle Build Scan|TODO] > * [Test source code|TODO] > Initial investigation: > *09:03:27* > > *Task :sdks:java:extensions:sql:datacatalog:integrationTest* > *09:03:27* *09:03:27* > org.apache.beam.sdk.extensions.sql.meta.provider.datacatalog.DataCatalogBigQueryIT > > testReadWrite FAILED*09:03:27* java.lang.NoSuchMethodError at > DataCatalogBigQueryIT.java:69*09:03:27* *09:03:27* 1 test completed, 1 > failed*09:03:28* *09:03:28* > > *Task :sdks:java:extensions:sql:datacatalog:integrationTest* > FAILED*09:03:28* *09:03:28* FAILURE: Build failed with an exception. > > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8078) streaming_wordcount_debugging.py is missing a test
Udi Meiri created BEAM-8078: --- Summary: streaming_wordcount_debugging.py is missing a test Key: BEAM-8078 URL: https://issues.apache.org/jira/browse/BEAM-8078 Project: Beam Issue Type: Improvement Components: sdk-py-core Reporter: Udi Meiri It's example code and should have a basic_test (like the other wordcount variants in [1]) to at least verify that it runs in the latest Beam release. [1] https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8036) [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method
[ https://issues.apache.org/jira/browse/BEAM-8036?focusedWorklogId=299802&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299802 ] ASF GitHub Bot logged work on BEAM-8036: Author: ASF GitHub Bot Created on: 22/Aug/19 22:47 Start Date: 22/Aug/19 22:47 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #9409: [BEAM-8036] fix failed postcommit URL: https://github.com/apache/beam/pull/9409#issuecomment-524106188 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299802) Time Spent: 1h 50m (was: 1h 40m) > [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method > > > Key: BEAM-8036 > URL: https://issues.apache.org/jira/browse/BEAM-8036 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Mikhail Gryzykhin >Assignee: Rui Wang >Priority: Major > Labels: currently-failing > Time Spent: 1h 50m > Remaining Estimate: 0h > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|[https://builds.apache.org/job/beam_PostCommit_SQL/2417/console]] > * [Gradle Build Scan|TODO] > * [Test source code|TODO] > Initial investigation: > *09:03:27* > > *Task :sdks:java:extensions:sql:datacatalog:integrationTest* > *09:03:27* *09:03:27* > org.apache.beam.sdk.extensions.sql.meta.provider.datacatalog.DataCatalogBigQueryIT > > testReadWrite FAILED*09:03:27* java.lang.NoSuchMethodError at > DataCatalogBigQueryIT.java:69*09:03:27* *09:03:27* 1 test completed, 1 > failed*09:03:28* *09:03:28* > > *Task :sdks:java:extensions:sql:datacatalog:integrationTest* > FAILED*09:03:28* *09:03:28* FAILURE: Build failed with an exception. > > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8036) [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method
[ https://issues.apache.org/jira/browse/BEAM-8036?focusedWorklogId=299795&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299795 ] ASF GitHub Bot logged work on BEAM-8036: Author: ASF GitHub Bot Created on: 22/Aug/19 22:26 Start Date: 22/Aug/19 22:26 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #9409: [BEAM-8036] fix failed postcommit URL: https://github.com/apache/beam/pull/9409#issuecomment-524101331 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299795) Time Spent: 1h 40m (was: 1.5h) > [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method > > > Key: BEAM-8036 > URL: https://issues.apache.org/jira/browse/BEAM-8036 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Mikhail Gryzykhin >Assignee: Rui Wang >Priority: Major > Labels: currently-failing > Time Spent: 1h 40m > Remaining Estimate: 0h > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|[https://builds.apache.org/job/beam_PostCommit_SQL/2417/console]] > * [Gradle Build Scan|TODO] > * [Test source code|TODO] > Initial investigation: > *09:03:27* > > *Task :sdks:java:extensions:sql:datacatalog:integrationTest* > *09:03:27* *09:03:27* > org.apache.beam.sdk.extensions.sql.meta.provider.datacatalog.DataCatalogBigQueryIT > > testReadWrite FAILED*09:03:27* java.lang.NoSuchMethodError at > DataCatalogBigQueryIT.java:69*09:03:27* *09:03:27* 1 test completed, 1 > failed*09:03:28* *09:03:28* > > *Task :sdks:java:extensions:sql:datacatalog:integrationTest* > FAILED*09:03:28* *09:03:28* FAILURE: Build failed with an exception. > > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-7993) portable python precommit is flaky
[ https://issues.apache.org/jira/browse/BEAM-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pablo Estrada updated BEAM-7993: Fix Version/s: (was: 2.15.0) 2.16.0 > portable python precommit is flaky > -- > > Key: BEAM-7993 > URL: https://issues.apache.org/jira/browse/BEAM-7993 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures, testing >Affects Versions: 2.15.0 >Reporter: Udi Meiri >Assignee: Kyle Weaver >Priority: Major > Labels: currently-failing > Fix For: 2.16.0 > > Time Spent: 40m > Remaining Estimate: 0h > > I'm not sure what the root cause is here. > Example log where > :sdks:python:test-suites:portable:py35:portableWordCountBatch failed: > {code} > 11:51:22 [CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap > (FlatMap at ExtractOutput[0]) (2/2)] ERROR > org.apache.flink.runtime.operators.BatchTask - Error in task code: CHAIN > MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at > ExtractOutput[0]) (2/2) > 11:51:22 [CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap > (FlatMap at ExtractOutput[0]) (1/2)] ERROR > org.apache.flink.runtime.operators.BatchTask - Error in task code: CHAIN > MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at > ExtractOutput[0]) (1/2) > 11:51:22 [CHAIN MapPartition (MapPartition at > [2]write/Write/WriteImpl/DoOnce/{FlatMap(), > Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2)] ERROR > org.apache.flink.runtime.operators.BatchTask - Error in task code: CHAIN > MapPartition (MapPartition at > [2]write/Write/WriteImpl/DoOnce/{FlatMap(), > Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2) > 11:51:22 [CHAIN MapPartition (MapPartition at > [2]write/Write/WriteImpl/DoOnce/{FlatMap(), > Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2)] ERROR > org.apache.flink.runtime.operators.BatchTask - Error in task code: CHAIN > MapPartition (MapPartition at > [2]write/Write/WriteImpl/DoOnce/{FlatMap(), > Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2) > 11:51:22 java.lang.Exception: The user defined 'open()' method caused an > exception: java.io.IOException: Received exit code 1 for command 'docker > inspect -f {{.State.Running}} > 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1'. stderr: > Error: No such object: > 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1 > 11:51:22 at > org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498) > 11:51:22 at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368) > 11:51:22 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:712) > 11:51:22 at java.lang.Thread.run(Thread.java:748) > 11:51:22 Caused by: > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: > java.io.IOException: Received exit code 1 for command 'docker inspect -f > {{.State.Running}} > 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1'. stderr: > Error: No such object: > 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1 > 11:51:22 at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4966) > 11:51:22 at > org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.(DefaultJobBundleFactory.java:211) > 11:51:22 at > org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.(DefaultJobBundleFactory.java:202) > 11:51:22 at > org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:185) > 11:51:22 at > org.apache.beam.runners.flink.translation.functions.FlinkDefaultExecutableStageContext.getStageBundleFactory(FlinkDefaultExecutableStageContext.java:49) > 11:51:22 at > org.apache.beam.runners.flink.translation.functions.ReferenceCountingFlinkExecutableStageContextFactory$WrappedContext.getStageBundleFactory(ReferenceCountingFlinkExecutableStageContextFactory.java:203) > 11:51:22 at > org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.open(FlinkExecutableStageFunction.java:129) > 11:51:22 at > org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36) > 11:51:22 at > org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:494) > 11:51:22 ... 3 more > {code} > https://builds.apache.org/job/beam_PreCommit_Portable_Python_Commit/5512/consoleFull -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8036) [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method
[ https://issues.apache.org/jira/browse/BEAM-8036?focusedWorklogId=299788&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299788 ] ASF GitHub Bot logged work on BEAM-8036: Author: ASF GitHub Bot Created on: 22/Aug/19 22:11 Start Date: 22/Aug/19 22:11 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #9407: [BEAM-8036] disable failed Postcommit Test URL: https://github.com/apache/beam/pull/9407 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299788) Time Spent: 1.5h (was: 1h 20m) > [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method > > > Key: BEAM-8036 > URL: https://issues.apache.org/jira/browse/BEAM-8036 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Mikhail Gryzykhin >Assignee: Rui Wang >Priority: Major > Labels: currently-failing > Time Spent: 1.5h > Remaining Estimate: 0h > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|[https://builds.apache.org/job/beam_PostCommit_SQL/2417/console]] > * [Gradle Build Scan|TODO] > * [Test source code|TODO] > Initial investigation: > *09:03:27* > > *Task :sdks:java:extensions:sql:datacatalog:integrationTest* > *09:03:27* *09:03:27* > org.apache.beam.sdk.extensions.sql.meta.provider.datacatalog.DataCatalogBigQueryIT > > testReadWrite FAILED*09:03:27* java.lang.NoSuchMethodError at > DataCatalogBigQueryIT.java:69*09:03:27* *09:03:27* 1 test completed, 1 > failed*09:03:28* *09:03:28* > > *Task :sdks:java:extensions:sql:datacatalog:integrationTest* > FAILED*09:03:28* *09:03:28* FAILURE: Build failed with an exception. > > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8036) [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method
[ https://issues.apache.org/jira/browse/BEAM-8036?focusedWorklogId=299787&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299787 ] ASF GitHub Bot logged work on BEAM-8036: Author: ASF GitHub Bot Created on: 22/Aug/19 22:11 Start Date: 22/Aug/19 22:11 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #9407: [BEAM-8036] disable failed Postcommit Test URL: https://github.com/apache/beam/pull/9407#issuecomment-524097654 https://github.com/apache/beam/pull/9409 is supposed to fix this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299787) Time Spent: 1h 20m (was: 1h 10m) > [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method > > > Key: BEAM-8036 > URL: https://issues.apache.org/jira/browse/BEAM-8036 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Mikhail Gryzykhin >Assignee: Rui Wang >Priority: Major > Labels: currently-failing > Time Spent: 1h 20m > Remaining Estimate: 0h > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|[https://builds.apache.org/job/beam_PostCommit_SQL/2417/console]] > * [Gradle Build Scan|TODO] > * [Test source code|TODO] > Initial investigation: > *09:03:27* > > *Task :sdks:java:extensions:sql:datacatalog:integrationTest* > *09:03:27* *09:03:27* > org.apache.beam.sdk.extensions.sql.meta.provider.datacatalog.DataCatalogBigQueryIT > > testReadWrite FAILED*09:03:27* java.lang.NoSuchMethodError at > DataCatalogBigQueryIT.java:69*09:03:27* *09:03:27* 1 test completed, 1 > failed*09:03:28* *09:03:28* > > *Task :sdks:java:extensions:sql:datacatalog:integrationTest* > FAILED*09:03:28* *09:03:28* FAILURE: Build failed with an exception. > > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8038) Python Precommit fail: 'BeamFnExternalWorkerPoolServicer' has no attribute '_worker_processes'
[ https://issues.apache.org/jira/browse/BEAM-8038?focusedWorklogId=299785&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299785 ] ASF GitHub Bot logged work on BEAM-8038: Author: ASF GitHub Bot Created on: 22/Aug/19 22:05 Start Date: 22/Aug/19 22:05 Worklog Time Spent: 10m Work Description: tweise commented on pull request #9403: [BEAM-8038] Fix worker pool exit hook URL: https://github.com/apache/beam/pull/9403 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299785) Time Spent: 1h (was: 50m) > Python Precommit fail: 'BeamFnExternalWorkerPoolServicer' has no attribute > '_worker_processes' > -- > > Key: BEAM-8038 > URL: https://issues.apache.org/jira/browse/BEAM-8038 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness, test-failures >Reporter: Ahmet Altay >Assignee: Thomas Weise >Priority: Critical > Time Spent: 1h > Remaining Estimate: 0h > > Logs: https://builds.apache.org/job/beam_PreCommit_Python_Commit/8246/console > 10:14:09 > -- > 10:14:09 XML: > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/nosetests.xml > 10:14:09 > -- > 10:14:09 Ran 2594 tests in 629.438s > 10:14:09 > 10:14:09 OK (SKIP=520) > 10:14:09 Error in atexit._run_exitfuncs: > 10:14:09 Traceback (most recent call last): > 10:14:09 File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs > 10:14:09 func(*targs, **kargs) > 10:14:09 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/worker_pool_main.py", > line 72, in kill_worker_processes > 10:14:09 for worker_process in cls._worker_processes.values(): > 10:14:09 AttributeError: type object 'BeamFnExternalWorkerPoolServicer' has > no attribute '_worker_processes' > 10:14:09 Error in sys.exitfunc: > 10:14:09 Traceback (most recent call last): > 10:14:09 File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs > 10:14:09 func(*targs, **kargs) > 10:14:09 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/worker_pool_main.py", > line 72, in kill_worker_processes > 10:14:09 for worker_process in cls._worker_processes.values(): > 10:14:09 AttributeError: type object 'BeamFnExternalWorkerPoolServicer' has > no attribute '_worker_processes' > 10:14:10 py27-cython run-test-post: commands[0] | > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/scripts/run_tox_cleanup.sh > 10:14:10 ___ summary > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-8036) [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method
[ https://issues.apache.org/jira/browse/BEAM-8036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang updated BEAM-8036: --- Status: Open (was: Triage Needed) > [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method > > > Key: BEAM-8036 > URL: https://issues.apache.org/jira/browse/BEAM-8036 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Mikhail Gryzykhin >Assignee: Rui Wang >Priority: Major > Labels: currently-failing > Time Spent: 1h 10m > Remaining Estimate: 0h > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|[https://builds.apache.org/job/beam_PostCommit_SQL/2417/console]] > * [Gradle Build Scan|TODO] > * [Test source code|TODO] > Initial investigation: > *09:03:27* > > *Task :sdks:java:extensions:sql:datacatalog:integrationTest* > *09:03:27* *09:03:27* > org.apache.beam.sdk.extensions.sql.meta.provider.datacatalog.DataCatalogBigQueryIT > > testReadWrite FAILED*09:03:27* java.lang.NoSuchMethodError at > DataCatalogBigQueryIT.java:69*09:03:27* *09:03:27* 1 test completed, 1 > failed*09:03:28* *09:03:28* > > *Task :sdks:java:extensions:sql:datacatalog:integrationTest* > FAILED*09:03:28* *09:03:28* FAILURE: Build failed with an exception. > > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8036) [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method
[ https://issues.apache.org/jira/browse/BEAM-8036?focusedWorklogId=299783&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299783 ] ASF GitHub Bot logged work on BEAM-8036: Author: ASF GitHub Bot Created on: 22/Aug/19 22:03 Start Date: 22/Aug/19 22:03 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #9409: [BEAM-8036] fix failed postcommit URL: https://github.com/apache/beam/pull/9409#issuecomment-524095590 Run SQL PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299783) Time Spent: 1h 10m (was: 1h) > [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method > > > Key: BEAM-8036 > URL: https://issues.apache.org/jira/browse/BEAM-8036 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Mikhail Gryzykhin >Assignee: Rui Wang >Priority: Major > Labels: currently-failing > Time Spent: 1h 10m > Remaining Estimate: 0h > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|[https://builds.apache.org/job/beam_PostCommit_SQL/2417/console]] > * [Gradle Build Scan|TODO] > * [Test source code|TODO] > Initial investigation: > *09:03:27* > > *Task :sdks:java:extensions:sql:datacatalog:integrationTest* > *09:03:27* *09:03:27* > org.apache.beam.sdk.extensions.sql.meta.provider.datacatalog.DataCatalogBigQueryIT > > testReadWrite FAILED*09:03:27* java.lang.NoSuchMethodError at > DataCatalogBigQueryIT.java:69*09:03:27* *09:03:27* 1 test completed, 1 > failed*09:03:28* *09:03:28* > > *Task :sdks:java:extensions:sql:datacatalog:integrationTest* > FAILED*09:03:28* *09:03:28* FAILURE: Build failed with an exception. > > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-8077) CONCAT function is broken
[ https://issues.apache.org/jira/browse/BEAM-8077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang updated BEAM-8077: --- Summary: CONCAT function is broken (was: CONCAT function breaks) > CONCAT function is broken > - > > Key: BEAM-8077 > URL: https://issues.apache.org/jira/browse/BEAM-8077 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-8076) FieldAccess in Join is borken
[ https://issues.apache.org/jira/browse/BEAM-8076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang updated BEAM-8076: --- Summary: FieldAccess in Join is borken (was: FieldAccess in Join breaks ) > FieldAccess in Join is borken > -- > > Key: BEAM-8076 > URL: https://issues.apache.org/jira/browse/BEAM-8076 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8077) CONCAT function breaks
Rui Wang created BEAM-8077: -- Summary: CONCAT function breaks Key: BEAM-8077 URL: https://issues.apache.org/jira/browse/BEAM-8077 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-6114) SQL join selection should be done in planner, not in expansion to PTransform
[ https://issues.apache.org/jira/browse/BEAM-6114?focusedWorklogId=299777&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299777 ] ASF GitHub Bot logged work on BEAM-6114: Author: ASF GitHub Bot Created on: 22/Aug/19 21:44 Start Date: 22/Aug/19 21:44 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #9395: [BEAM-6114] Calcite Rules to Select Type of Join in BeamSQL URL: https://github.com/apache/beam/pull/9395 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299777) Time Spent: 3.5h (was: 3h 20m) > SQL join selection should be done in planner, not in expansion to PTransform > > > Key: BEAM-6114 > URL: https://issues.apache.org/jira/browse/BEAM-6114 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Rahul Patwari >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > Currently Beam SQL joins all go through a single physical operator which has > a single PTransform that does all join algorithms based on properties of its > input PCollections as well as the relational algebra. > A first step is to make the needed information part of the relational > algebra, so it can choose a PTransform based on that, and the PTransforms can > be simpler. > Second step is to have separate (physical) relational operators for different > join algorithms. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8076) FieldAccess in Join breaks
Rui Wang created BEAM-8076: -- Summary: FieldAccess in Join breaks Key: BEAM-8076 URL: https://issues.apache.org/jira/browse/BEAM-8076 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8036) [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method
[ https://issues.apache.org/jira/browse/BEAM-8036?focusedWorklogId=299775&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299775 ] ASF GitHub Bot logged work on BEAM-8036: Author: ASF GitHub Bot Created on: 22/Aug/19 21:38 Start Date: 22/Aug/19 21:38 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #9409: [BEAM-8036] fix failed postcommit URL: https://github.com/apache/beam/pull/9409#issuecomment-524087952 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299775) Time Spent: 1h (was: 50m) > [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method > > > Key: BEAM-8036 > URL: https://issues.apache.org/jira/browse/BEAM-8036 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Mikhail Gryzykhin >Assignee: Rui Wang >Priority: Major > Labels: currently-failing > Time Spent: 1h > Remaining Estimate: 0h > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|[https://builds.apache.org/job/beam_PostCommit_SQL/2417/console]] > * [Gradle Build Scan|TODO] > * [Test source code|TODO] > Initial investigation: > *09:03:27* > > *Task :sdks:java:extensions:sql:datacatalog:integrationTest* > *09:03:27* *09:03:27* > org.apache.beam.sdk.extensions.sql.meta.provider.datacatalog.DataCatalogBigQueryIT > > testReadWrite FAILED*09:03:27* java.lang.NoSuchMethodError at > DataCatalogBigQueryIT.java:69*09:03:27* *09:03:27* 1 test completed, 1 > failed*09:03:28* *09:03:28* > > *Task :sdks:java:extensions:sql:datacatalog:integrationTest* > FAILED*09:03:28* *09:03:28* FAILURE: Build failed with an exception. > > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8038) Python Precommit fail: 'BeamFnExternalWorkerPoolServicer' has no attribute '_worker_processes'
[ https://issues.apache.org/jira/browse/BEAM-8038?focusedWorklogId=299771&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299771 ] ASF GitHub Bot logged work on BEAM-8038: Author: ASF GitHub Bot Created on: 22/Aug/19 21:34 Start Date: 22/Aug/19 21:34 Worklog Time Spent: 10m Work Description: tweise commented on issue #9403: [BEAM-8038] Fix worker pool exit hook URL: https://github.com/apache/beam/pull/9403#issuecomment-524086550 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299771) Time Spent: 50m (was: 40m) > Python Precommit fail: 'BeamFnExternalWorkerPoolServicer' has no attribute > '_worker_processes' > -- > > Key: BEAM-8038 > URL: https://issues.apache.org/jira/browse/BEAM-8038 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness, test-failures >Reporter: Ahmet Altay >Assignee: Thomas Weise >Priority: Critical > Time Spent: 50m > Remaining Estimate: 0h > > Logs: https://builds.apache.org/job/beam_PreCommit_Python_Commit/8246/console > 10:14:09 > -- > 10:14:09 XML: > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/nosetests.xml > 10:14:09 > -- > 10:14:09 Ran 2594 tests in 629.438s > 10:14:09 > 10:14:09 OK (SKIP=520) > 10:14:09 Error in atexit._run_exitfuncs: > 10:14:09 Traceback (most recent call last): > 10:14:09 File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs > 10:14:09 func(*targs, **kargs) > 10:14:09 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/worker_pool_main.py", > line 72, in kill_worker_processes > 10:14:09 for worker_process in cls._worker_processes.values(): > 10:14:09 AttributeError: type object 'BeamFnExternalWorkerPoolServicer' has > no attribute '_worker_processes' > 10:14:09 Error in sys.exitfunc: > 10:14:09 Traceback (most recent call last): > 10:14:09 File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs > 10:14:09 func(*targs, **kargs) > 10:14:09 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/worker_pool_main.py", > line 72, in kill_worker_processes > 10:14:09 for worker_process in cls._worker_processes.values(): > 10:14:09 AttributeError: type object 'BeamFnExternalWorkerPoolServicer' has > no attribute '_worker_processes' > 10:14:10 py27-cython run-test-post: commands[0] | > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/scripts/run_tox_cleanup.sh > 10:14:10 ___ summary > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8075) IndexOutOfBounds in LogicalProject
Rui Wang created BEAM-8075: -- Summary: IndexOutOfBounds in LogicalProject Key: BEAM-8075 URL: https://issues.apache.org/jira/browse/BEAM-8075 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang SELECT payload.bankId, SUM(payload.purchaseAmountCents) / 100 AS totalPurchase FROM pubsub.topic.`instant-insights`.`retaildemo-online-transactions-json` GROUP BY payload.bankId Causes the workers to fail with: Exception in thread "main" java.lang.RuntimeException: Error while applying rule ProjectToCalcRule, args [rel#9:LogicalProject.NONE(input=RelSubset#8,bankId=$0,totalPurchase=/(CAST($3):DOUBLE NOT NULL, 1E2))] at org.apache -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-8074) Update error message when reading from table with unsupported data types
[ https://issues.apache.org/jira/browse/BEAM-8074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang updated BEAM-8074: --- Description: When reading NUMERIC column from BQ table, the query will fail with error message "Does not support DATE, TIME and DATETIME types in source tables" We should include NUMERIC in this error message. > Update error message when reading from table with unsupported data types > - > > Key: BEAM-8074 > URL: https://issues.apache.org/jira/browse/BEAM-8074 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Priority: Major > > When reading NUMERIC column from BQ table, the query will fail with error > message "Does not support DATE, TIME and DATETIME types in source tables" > We should include NUMERIC in this error message. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8074) Update error message when reading from table with unsupported data types
Rui Wang created BEAM-8074: -- Summary: Update error message when reading from table with unsupported data types Key: BEAM-8074 URL: https://issues.apache.org/jira/browse/BEAM-8074 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8073) CAST Timestamp -> String doesn't properly handle timezones with sub-minute offsets
Rui Wang created BEAM-8073: -- Summary: CAST Timestamp -> String doesn't properly handle timezones with sub-minute offsets Key: BEAM-8073 URL: https://issues.apache.org/jira/browse/BEAM-8073 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang One of the timestamp -> string test cases is -621355968 microseconds from the unix epoch, or 01/01/0001 00:00:00 GMT Technically the timezone offset at this time in America/Los_Angeles is -07:52:58. This causes the following error: Expected: ARRAY>[{"-12-31 16:08:00-07:52"}] Actual: ARRAY>[{"-12-31 16:07:02-07:52"}] Note that ZetaSQL expects us to completely truncate the second part of the offset. It's not used when subtracting from the origin datetime, and it's not included in the offset string. However when we perform this conversion, joda time uses the second part of the offset, and thus our time string is off by 58 seconds. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8071) LIMIT with negative OFFSET should throw an error
Rui Wang created BEAM-8071: -- Summary: LIMIT with negative OFFSET should throw an error Key: BEAM-8071 URL: https://issues.apache.org/jira/browse/BEAM-8071 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang Currently just returns data as if OFFSET were 0 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8072) Allow non ColumnRef nodes in aggreation functions
Rui Wang created BEAM-8072: -- Summary: Allow non ColumnRef nodes in aggreation functions Key: BEAM-8072 URL: https://issues.apache.org/jira/browse/BEAM-8072 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang Currently we throw an error if node is not a Column Ref or CAST(Column Ref) -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8070) Support empty array literal
Rui Wang created BEAM-8070: -- Summary: Support empty array literal Key: BEAM-8070 URL: https://issues.apache.org/jira/browse/BEAM-8070 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang Currently BeamSQL throws an IndexOutOfBoundsException when given a query with an empty array literal. This happens because Calcite attempts to infer the element types [1,2] from an empty element list. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8068) Throw expected error when LIKE pattern ends with backslash
Rui Wang created BEAM-8068: -- Summary: Throw expected error when LIKE pattern ends with backslash Key: BEAM-8068 URL: https://issues.apache.org/jira/browse/BEAM-8068 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang ZetaSQL expect returning a status code out_of_range with message "LIKE pattern ends with a backslash" in that situation. We do throw an error when this happens (a RuntimeException with that message), but when it gets returned over gRPC to the framework for some reason it is mapped to status code unknown with no message. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8069) OFFSET in LIMIT clause only accepts literal or parameter
Rui Wang created BEAM-8069: -- Summary: OFFSET in LIMIT clause only accepts literal or parameter Key: BEAM-8069 URL: https://issues.apache.org/jira/browse/BEAM-8069 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang Should verify what is in side parameter. E.g. Parameter might contain string or other unaccepted types. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8067) Throw exception when truncating nano/micro to millis when creating time literals
Rui Wang created BEAM-8067: -- Summary: Throw exception when truncating nano/micro to millis when creating time literals Key: BEAM-8067 URL: https://issues.apache.org/jira/browse/BEAM-8067 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang time values in googlesql is encoded in a special form. Need a function to extract sub-millis from time values and decide if rejection is needed. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8066) Have a right implementation on nullability of return type of AggregateCall
Rui Wang created BEAM-8066: -- Summary: Have a right implementation on nullability of return type of AggregateCall Key: BEAM-8066 URL: https://issues.apache.org/jira/browse/BEAM-8066 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang Have a right implementation on nullability of return type of AggregateCall -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8064) Throw exceptions when overflow or division by 0 in arithmetic operators
Rui Wang created BEAM-8064: -- Summary: Throw exceptions when overflow or division by 0 in arithmetic operators Key: BEAM-8064 URL: https://issues.apache.org/jira/browse/BEAM-8064 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang Should throw out of range exception when stackoverflow. division by 0 should throw our of range exception. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8065) Select * FROM pubsub table should not throw exception
Rui Wang created BEAM-8065: -- Summary: Select * FROM pubsub table should not throw exception Key: BEAM-8065 URL: https://issues.apache.org/jira/browse/BEAM-8065 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8062) Support array member accessor
Rui Wang created BEAM-8062: -- Summary: Support array member accessor Key: BEAM-8062 URL: https://issues.apache.org/jira/browse/BEAM-8062 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang array[] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8063) Support STRUCT member field access operator
Rui Wang created BEAM-8063: -- Summary: Support STRUCT member field access operator Key: BEAM-8063 URL: https://issues.apache.org/jira/browse/BEAM-8063 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8060) Support DATE type
Rui Wang created BEAM-8060: -- Summary: Support DATE type Key: BEAM-8060 URL: https://issues.apache.org/jira/browse/BEAM-8060 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8061) Support TIME type
Rui Wang created BEAM-8061: -- Summary: Support TIME type Key: BEAM-8061 URL: https://issues.apache.org/jira/browse/BEAM-8061 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8059) Support struct type
Rui Wang created BEAM-8059: -- Summary: Support struct type Key: BEAM-8059 URL: https://issues.apache.org/jira/browse/BEAM-8059 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8058) Support ARRAY type
Rui Wang created BEAM-8058: -- Summary: Support ARRAY type Key: BEAM-8058 URL: https://issues.apache.org/jira/browse/BEAM-8058 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8056) Support window offset for TUMBLE, HOP and SESSION
Rui Wang created BEAM-8056: -- Summary: Support window offset for TUMBLE, HOP and SESSION Key: BEAM-8056 URL: https://issues.apache.org/jira/browse/BEAM-8056 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8057) Support NAN, INF, and -INF
Rui Wang created BEAM-8057: -- Summary: Support NAN, INF, and -INF Key: BEAM-8057 URL: https://issues.apache.org/jira/browse/BEAM-8057 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8055) Support STRUCT constructor
Rui Wang created BEAM-8055: -- Summary: Support STRUCT constructor Key: BEAM-8055 URL: https://issues.apache.org/jira/browse/BEAM-8055 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang For example, SELECT STRUCT(1, "test_string") -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8054) Windowing functions should only accept watermarked timestamp column
Rui Wang created BEAM-8054: -- Summary: Windowing functions should only accept watermarked timestamp column Key: BEAM-8054 URL: https://issues.apache.org/jira/browse/BEAM-8054 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7760) Interactive Beam Caching PCollections bound to user defined vars in notebook
[ https://issues.apache.org/jira/browse/BEAM-7760?focusedWorklogId=299769&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299769 ] ASF GitHub Bot logged work on BEAM-7760: Author: ASF GitHub Bot Created on: 22/Aug/19 21:15 Start Date: 22/Aug/19 21:15 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9278: [BEAM-7760] Added Interactive Beam module URL: https://github.com/apache/beam/pull/9278 **Please** add a meaningful description for your change here 1. Added interactive_beam module that will serve sugar syntax and shorthand functions to apply interactivity, create iBeam pipeline, visualize PCollection data and execute iBeam pipeline as normal pipeline with selected Beam runners without interactivity. 2. This commit implemented the implicitly managed Interactive Beam environment to track definition of user pipelines. It exposed a watch() interface for users to explicitly instruct Interactive Beam the whereabout of their pipeline definition when it's not in __main__. 3. This commit implemented a shorthand function create_pipeline() to create a pipeline that is backed by direct runner with interactivity when running. 4. This commit also implemented a shorthand function run_pipeline() to run a pipeline created with interactivity on a different runner and pipeline options without interactivity. It's useful when interactivity is not needed and a one-shot in production-like environment is desired. 5. This commit exposed a PCollection data exploration interface visualize(). Implementation is yet to be added. 6. Added interactive_environment module for internal usage without backward-compatibility. It holds the cache manager and watchable metadata for current interactive environment/session/context. Interfaces are provided to interact with the environment and its components. 7. Unit tests included. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.a
[jira] [Created] (BEAM-8053) Throw error for 1/0 or floating point overflow
Rui Wang created BEAM-8053: -- Summary: Throw error for 1/0 or floating point overflow Key: BEAM-8053 URL: https://issues.apache.org/jira/browse/BEAM-8053 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang Currently BeamSQL returns infinity rather than throwing an error in these cases -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8052) Should validate if String literal is valid UTF-8
Rui Wang created BEAM-8052: -- Summary: Should validate if String literal is valid UTF-8 Key: BEAM-8052 URL: https://issues.apache.org/jira/browse/BEAM-8052 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8051) Convert FLOAT64 to NUMERIC in UNION ALL
Rui Wang created BEAM-8051: -- Summary: Convert FLOAT64 to NUMERIC in UNION ALL Key: BEAM-8051 URL: https://issues.apache.org/jira/browse/BEAM-8051 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang Analyzer does not reject UNION ALL DOUBLE, for example `2.3 UNION ALL 2.1`. BeamSQL does not execute when DOUBLE is in GBK (as UNION ALL is implemented based on GBK). Investigate why DOUBLE appears in GBK in UNION ALL implementation, try to fix it, and if it's not feasible, at least need throw exception in planner. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8050) Remove "$" from auto-generated field names
Rui Wang created BEAM-8050: -- Summary: Remove "$" from auto-generated field names Key: BEAM-8050 URL: https://issues.apache.org/jira/browse/BEAM-8050 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang ZetaSQL generates column names starting with "$", but "$" in not accepted by BQ as field name, so we either force users to add alias or we processed column names and remove "$" -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8049) Throw clear exception when handling unsupported interval time units
Rui Wang created BEAM-8049: -- Summary: Throw clear exception when handling unsupported interval time units Key: BEAM-8049 URL: https://issues.apache.org/jira/browse/BEAM-8049 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang E.g Week and Quarter. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8048) Support TIMESTAMP Sub function/operator
Rui Wang created BEAM-8048: -- Summary: Support TIMESTAMP Sub function/operator Key: BEAM-8048 URL: https://issues.apache.org/jira/browse/BEAM-8048 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8047) Handle overflow when converting from mills to marcos
Rui Wang created BEAM-8047: -- Summary: Handle overflow when converting from mills to marcos Key: BEAM-8047 URL: https://issues.apache.org/jira/browse/BEAM-8047 Project: Beam Issue Type: Sub-task Components: dsl-sql-zetasql Reporter: Rui Wang When converting from Joda Instant/Datetime, what can be gotten is epoch millis, but conversions require epoch macros, so * 1000L is applied, but it definitely has overflow issue and need to be handle appropriately. This issue exists in zetasql planner. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime
[ https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299767&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299767 ] ASF GitHub Bot logged work on BEAM-8023: Author: ASF GitHub Bot Created on: 22/Aug/19 21:07 Start Date: 22/Aug/19 21:07 Worklog Time Spent: 10m Work Description: kmjung commented on issue #9405: [BEAM-8023] Add value provider interfaces for BigQueryIO.Read using Method.DIRECT_READ URL: https://github.com/apache/beam/pull/9405#issuecomment-524078069 @chamikaramj I think this is ready to go -- please take another look when you can. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299767) Time Spent: 2h 10m (was: 2h) > Allow specifying BigQuery Storage API readOptions at runtime > > > Key: BEAM-8023 > URL: https://issues.apache.org/jira/browse/BEAM-8023 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Jeff Klukas >Assignee: Kenneth Jung >Priority: Minor > Time Spent: 2h 10m > Remaining Estimate: 0h > > We have support in the Java SDK for using the BigQuery Storage API for reads, > but only the target query or table is supported as a ValueProvider to be > specified at runtime. AFAICT, there is no reason we can't delay specifying > readOptions until runtime as well. > The readOptions are accessed by BigQueryStorageTableSource in getTargetTable; > I believe that's occurring at runtime, but I'd love for someone with deeper > BoundedSource knowledge to confirm that. > I'd advocate for adding new methods > `TypedRead.withSelectedFields(ValueProvider> value)` and > `TypedRead.withRowRestriction(ValueProvider value)`. The existing > `withReadOptions` method would then populate the other two as > StaticValueProviders. Perhaps we'd want to deprecate `withReadOptions` in > favor or specifying individual read options as separate parameters. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-6114) SQL join selection should be done in planner, not in expansion to PTransform
[ https://issues.apache.org/jira/browse/BEAM-6114?focusedWorklogId=299765&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299765 ] ASF GitHub Bot logged work on BEAM-6114: Author: ASF GitHub Bot Created on: 22/Aug/19 21:05 Start Date: 22/Aug/19 21:05 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #9395: [BEAM-6114] Calcite Rules to Select Type of Join in BeamSQL URL: https://github.com/apache/beam/pull/9395#issuecomment-524077304 LGTM I will merge this PR once every test pass. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299765) Time Spent: 3h 20m (was: 3h 10m) > SQL join selection should be done in planner, not in expansion to PTransform > > > Key: BEAM-6114 > URL: https://issues.apache.org/jira/browse/BEAM-6114 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Rahul Patwari >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > Currently Beam SQL joins all go through a single physical operator which has > a single PTransform that does all join algorithms based on properties of its > input PCollections as well as the relational algebra. > A first step is to make the needed information part of the relational > algebra, so it can choose a PTransform based on that, and the PTransforms can > be simpler. > Second step is to have separate (physical) relational operators for different > join algorithms. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime
[ https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299764&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299764 ] ASF GitHub Bot logged work on BEAM-8023: Author: ASF GitHub Bot Created on: 22/Aug/19 21:01 Start Date: 22/Aug/19 21:01 Worklog Time Spent: 10m Work Description: kmjung commented on pull request #9405: [BEAM-8023] Add value provider interfaces for BigQueryIO.Read using Method.DIRECT_READ URL: https://github.com/apache/beam/pull/9405#discussion_r316883401 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java ## @@ -1251,6 +1294,16 @@ private void ensureFromNotCalledYet() { getJsonTableRef() == null && getQuery() == null, "from() or fromQuery() already called"); } +private void ensureReadOptionsNotSet() { + checkState(getReadOptions() == null, "withReadOptions() already called"); +} + +private void ensureSelectedFieldsAndRowRestrictionNotSet() { Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299764) Time Spent: 2h (was: 1h 50m) > Allow specifying BigQuery Storage API readOptions at runtime > > > Key: BEAM-8023 > URL: https://issues.apache.org/jira/browse/BEAM-8023 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Jeff Klukas >Assignee: Kenneth Jung >Priority: Minor > Time Spent: 2h > Remaining Estimate: 0h > > We have support in the Java SDK for using the BigQuery Storage API for reads, > but only the target query or table is supported as a ValueProvider to be > specified at runtime. AFAICT, there is no reason we can't delay specifying > readOptions until runtime as well. > The readOptions are accessed by BigQueryStorageTableSource in getTargetTable; > I believe that's occurring at runtime, but I'd love for someone with deeper > BoundedSource knowledge to confirm that. > I'd advocate for adding new methods > `TypedRead.withSelectedFields(ValueProvider> value)` and > `TypedRead.withRowRestriction(ValueProvider value)`. The existing > `withReadOptions` method would then populate the other two as > StaticValueProviders. Perhaps we'd want to deprecate `withReadOptions` in > favor or specifying individual read options as separate parameters. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime
[ https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299762&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299762 ] ASF GitHub Bot logged work on BEAM-8023: Author: ASF GitHub Bot Created on: 22/Aug/19 20:59 Start Date: 22/Aug/19 20:59 Worklog Time Spent: 10m Work Description: kmjung commented on pull request #9405: [BEAM-8023] Add value provider interfaces for BigQueryIO.Read using Method.DIRECT_READ URL: https://github.com/apache/beam/pull/9405#discussion_r316882807 ## File path: sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOStorageReadTest.java ## @@ -465,6 +500,71 @@ public void testTableSourceInitialSplit_WithTableReadOptions() throws Throwable BigQueryStorageTableSource.create( ValueProvider.StaticValueProvider.of(tableRef), readOptions, +null, +null, +new TableRowParser(), +TableRowJsonCoder.of(), +new FakeBigQueryServices() +.withDatasetService(fakeDatasetService) +.withStorageClient(fakeStorageClient)); + +List> sources = tableSource.split(10L, options); +assertEquals(10L, sources.size()); + } + + @Test + public void testTableSourceInitialSplit_WithSelectedFieldsAndRowRestriction() throws Exception { +fakeDatasetService.createDataset("foo.com:project", "dataset", "", "", null); +TableReference tableRef = BigQueryHelpers.parseTableSpec("foo.com:project:dataset.table"); + +Table table = +new Table() +.setTableReference(tableRef) +.setNumBytes(100L) +.setSchema( +new TableSchema() +.setFields( +ImmutableList.of( +new TableFieldSchema().setName("name").setType("STRING"), +new TableFieldSchema().setName("number").setType("INTEGER"; + +fakeDatasetService.createTable(table); + +TableReadOptions readOptions = +TableReadOptions.newBuilder() +.addSelectedFields("name") +.addSelectedFields("number") +.setRowRestriction("number > 5") +.build(); + +CreateReadSessionRequest expectedRequest = +CreateReadSessionRequest.newBuilder() +.setParent("projects/project-id") +.setTableReference(BigQueryHelpers.toTableRefProto(tableRef)) +.setRequestedStreams(10) +.setReadOptions(readOptions) +// TODO(aryann): Once we rebuild the generated client code, we should change this to +// use setShardingStrategy(). +.setUnknownFields( +UnknownFieldSet.newBuilder() +.addField(7, UnknownFieldSet.Field.newBuilder().addVarint(2).build()) +.build()) +.build(); + +ReadSession.Builder builder = ReadSession.newBuilder(); +for (int i = 0; i < 10; i++) { + builder.addStreams(Stream.newBuilder().setName("stream-" + i)); +} + +StorageClient fakeStorageClient = mock(StorageClient.class); + when(fakeStorageClient.createReadSession(expectedRequest)).thenReturn(builder.build()); + +BigQueryStorageTableSource tableSource = +BigQueryStorageTableSource.create( +ValueProvider.StaticValueProvider.of(tableRef), +null, +StaticValueProvider.of(Lists.newArrayList("name", "number")), +StaticValueProvider.of("number > 5"), Review comment: Good suggestion. `p.newProvider` doesn't work here -- we manually call `split` on the source object in this test rather than executing the pipeline, which (happily) fails since we're accessing the provider value outside of the pipeline context -- but I've updated `testReadFromBigQueryIO` below to cover this case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299762) Time Spent: 1h 50m (was: 1h 40m) > Allow specifying BigQuery Storage API readOptions at runtime > > > Key: BEAM-8023 > URL: https://issues.apache.org/jira/browse/BEAM-8023 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Jeff Klukas >Assignee: Kenneth Jung >Priority: Minor > Time Spent: 1h 50m > Remaining Estimate: 0h > > We have support in the Java SDK for using the Bi
[jira] [Created] (BEAM-8046) Unable to read from bigquery and publish to pubsub using dataflow runner (python SDK)
James Hutchison created BEAM-8046: - Summary: Unable to read from bigquery and publish to pubsub using dataflow runner (python SDK) Key: BEAM-8046 URL: https://issues.apache.org/jira/browse/BEAM-8046 Project: Beam Issue Type: Improvement Components: runner-dataflow Affects Versions: 2.14.0, 2.13.0 Reporter: James Hutchison With the Python SDK: The dataflow runner does not allow use of bigquery in streaming pipelines. Pubsub is not allowed for batch pipelines. Thus, there's no way to create a pipeline on the dataflow runner that reads from bigquery and publishes to pubsub. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Closed] (BEAM-8037) Python FlinkRunner does not override reads
[ https://issues.apache.org/jira/browse/BEAM-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver closed BEAM-8037. - Fix Version/s: 2.16.0 Resolution: Fixed > Python FlinkRunner does not override reads > -- > > Key: BEAM-8037 > URL: https://issues.apache.org/jira/browse/BEAM-8037 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Fix For: 2.16.0 > > Time Spent: 20m > Remaining Estimate: 0h > > When using the Python FlinkRunner [1], my example pipeline (beginning with a > Create transform) failed with exception: > java.lang.IllegalArgumentException: GreedyPipelineFuser requires all root > nodes to be runner-implemented beam:transform:impulse:v1 or > beam:transform:read:v1 primitives, but transform > ref_AppliedPTransform_Create/Read_3 executes in environment Optional[urn: > "beam:env:docker:v1" -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8038) Python Precommit fail: 'BeamFnExternalWorkerPoolServicer' has no attribute '_worker_processes'
[ https://issues.apache.org/jira/browse/BEAM-8038?focusedWorklogId=299753&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299753 ] ASF GitHub Bot logged work on BEAM-8038: Author: ASF GitHub Bot Created on: 22/Aug/19 20:52 Start Date: 22/Aug/19 20:52 Worklog Time Spent: 10m Work Description: tweise commented on issue #9403: [BEAM-8038] Fix worker pool exit hook URL: https://github.com/apache/beam/pull/9403#issuecomment-524072782 needed lint fix This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299753) Time Spent: 40m (was: 0.5h) > Python Precommit fail: 'BeamFnExternalWorkerPoolServicer' has no attribute > '_worker_processes' > -- > > Key: BEAM-8038 > URL: https://issues.apache.org/jira/browse/BEAM-8038 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness, test-failures >Reporter: Ahmet Altay >Assignee: Thomas Weise >Priority: Critical > Time Spent: 40m > Remaining Estimate: 0h > > Logs: https://builds.apache.org/job/beam_PreCommit_Python_Commit/8246/console > 10:14:09 > -- > 10:14:09 XML: > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/nosetests.xml > 10:14:09 > -- > 10:14:09 Ran 2594 tests in 629.438s > 10:14:09 > 10:14:09 OK (SKIP=520) > 10:14:09 Error in atexit._run_exitfuncs: > 10:14:09 Traceback (most recent call last): > 10:14:09 File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs > 10:14:09 func(*targs, **kargs) > 10:14:09 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/worker_pool_main.py", > line 72, in kill_worker_processes > 10:14:09 for worker_process in cls._worker_processes.values(): > 10:14:09 AttributeError: type object 'BeamFnExternalWorkerPoolServicer' has > no attribute '_worker_processes' > 10:14:09 Error in sys.exitfunc: > 10:14:09 Traceback (most recent call last): > 10:14:09 File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs > 10:14:09 func(*targs, **kargs) > 10:14:09 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/worker_pool_main.py", > line 72, in kill_worker_processes > 10:14:09 for worker_process in cls._worker_processes.values(): > 10:14:09 AttributeError: type object 'BeamFnExternalWorkerPoolServicer' has > no attribute '_worker_processes' > 10:14:10 py27-cython run-test-post: commands[0] | > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/scripts/run_tox_cleanup.sh > 10:14:10 ___ summary > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime
[ https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299747&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299747 ] ASF GitHub Bot logged work on BEAM-8023: Author: ASF GitHub Bot Created on: 22/Aug/19 20:46 Start Date: 22/Aug/19 20:46 Worklog Time Spent: 10m Work Description: kmjung commented on pull request #9405: [BEAM-8023] Add value provider interfaces for BigQueryIO.Read using Method.DIRECT_READ URL: https://github.com/apache/beam/pull/9405#discussion_r31681 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java ## @@ -1360,12 +1413,39 @@ public TableReference getTable() { return toBuilder().setMethod(method).build(); } -/** Read options, including a list of selected columns and push-down SQL filter text. */ +/** + * @deprecated Use {@link #withSelectedFields(List)} and {@link #withRowRestriction(String)} + * instead. + */ +@Deprecated @Experimental(Experimental.Kind.SOURCE_SINK) public TypedRead withReadOptions(TableReadOptions readOptions) { + ensureSelectedFieldsAndRowRestrictionNotSet(); return toBuilder().setReadOptions(readOptions).build(); } +@Experimental(Experimental.Kind.SOURCE_SINK) +public TypedRead withSelectedFields(List selectedFields) { + return withSelectedFields(StaticValueProvider.of(selectedFields)); +} + +@Experimental(Experimental.Kind.SOURCE_SINK) Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299747) Time Spent: 1.5h (was: 1h 20m) > Allow specifying BigQuery Storage API readOptions at runtime > > > Key: BEAM-8023 > URL: https://issues.apache.org/jira/browse/BEAM-8023 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Jeff Klukas >Assignee: Kenneth Jung >Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > > We have support in the Java SDK for using the BigQuery Storage API for reads, > but only the target query or table is supported as a ValueProvider to be > specified at runtime. AFAICT, there is no reason we can't delay specifying > readOptions until runtime as well. > The readOptions are accessed by BigQueryStorageTableSource in getTargetTable; > I believe that's occurring at runtime, but I'd love for someone with deeper > BoundedSource knowledge to confirm that. > I'd advocate for adding new methods > `TypedRead.withSelectedFields(ValueProvider> value)` and > `TypedRead.withRowRestriction(ValueProvider value)`. The existing > `withReadOptions` method would then populate the other two as > StaticValueProviders. Perhaps we'd want to deprecate `withReadOptions` in > favor or specifying individual read options as separate parameters. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime
[ https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299748&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299748 ] ASF GitHub Bot logged work on BEAM-8023: Author: ASF GitHub Bot Created on: 22/Aug/19 20:46 Start Date: 22/Aug/19 20:46 Worklog Time Spent: 10m Work Description: kmjung commented on pull request #9405: [BEAM-8023] Add value provider interfaces for BigQueryIO.Read using Method.DIRECT_READ URL: https://github.com/apache/beam/pull/9405#discussion_r316877815 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java ## @@ -1360,12 +1413,39 @@ public TableReference getTable() { return toBuilder().setMethod(method).build(); } -/** Read options, including a list of selected columns and push-down SQL filter text. */ +/** + * @deprecated Use {@link #withSelectedFields(List)} and {@link #withRowRestriction(String)} + * instead. + */ +@Deprecated @Experimental(Experimental.Kind.SOURCE_SINK) public TypedRead withReadOptions(TableReadOptions readOptions) { + ensureSelectedFieldsAndRowRestrictionNotSet(); return toBuilder().setReadOptions(readOptions).build(); } +@Experimental(Experimental.Kind.SOURCE_SINK) +public TypedRead withSelectedFields(List selectedFields) { + return withSelectedFields(StaticValueProvider.of(selectedFields)); +} + +@Experimental(Experimental.Kind.SOURCE_SINK) +public TypedRead withSelectedFields(ValueProvider> selectedFields) { + ensureReadOptionsNotSet(); + return toBuilder().setSelectedFields(selectedFields).build(); +} + +@Experimental(Experimental.Kind.SOURCE_SINK) Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299748) Time Spent: 1h 40m (was: 1.5h) > Allow specifying BigQuery Storage API readOptions at runtime > > > Key: BEAM-8023 > URL: https://issues.apache.org/jira/browse/BEAM-8023 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Jeff Klukas >Assignee: Kenneth Jung >Priority: Minor > Time Spent: 1h 40m > Remaining Estimate: 0h > > We have support in the Java SDK for using the BigQuery Storage API for reads, > but only the target query or table is supported as a ValueProvider to be > specified at runtime. AFAICT, there is no reason we can't delay specifying > readOptions until runtime as well. > The readOptions are accessed by BigQueryStorageTableSource in getTargetTable; > I believe that's occurring at runtime, but I'd love for someone with deeper > BoundedSource knowledge to confirm that. > I'd advocate for adding new methods > `TypedRead.withSelectedFields(ValueProvider> value)` and > `TypedRead.withRowRestriction(ValueProvider value)`. The existing > `withReadOptions` method would then populate the other two as > StaticValueProviders. Perhaps we'd want to deprecate `withReadOptions` in > favor or specifying individual read options as separate parameters. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7760) Interactive Beam Caching PCollections bound to user defined vars in notebook
[ https://issues.apache.org/jira/browse/BEAM-7760?focusedWorklogId=299749&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299749 ] ASF GitHub Bot logged work on BEAM-7760: Author: ASF GitHub Bot Created on: 22/Aug/19 20:46 Start Date: 22/Aug/19 20:46 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #9278: [BEAM-7760] Added Interactive Beam module URL: https://github.com/apache/beam/pull/9278 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299749) Time Spent: 3h (was: 2h 50m) > Interactive Beam Caching PCollections bound to user defined vars in notebook > > > Key: BEAM-7760 > URL: https://issues.apache.org/jira/browse/BEAM-7760 > Project: Beam > Issue Type: New Feature > Components: examples-python >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > Cache only PCollections bound to user defined variables in a pipeline when > running pipeline with interactive runner in jupyter notebooks. > [Interactive > Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]] > has been caching and using caches of "leaf" PCollections for interactive > execution in jupyter notebooks. > The interactive execution is currently supported so that when appending new > transforms to existing pipeline for a new run, executed part of the pipeline > doesn't need to be re-executed. > A PCollection is "leaf" when it is never used as input in any PTransform in > the pipeline. > The problem with building caches and pipeline to execute around "leaf" is > that when a PCollection is consumed by a sink with no output, the pipeline to > execute built will miss the subgraph generating and consuming that > PCollection. > An example, "ReadFromPubSub --> WirteToPubSub" will result in an empty > pipeline. > Caching around PCollections bound to user defined variables and replacing > transforms with source and sink of caches could resolve the pipeline to > execute properly under the interactive execution scenario. Also, cached > PCollection now can trace back to user code and can be used for user data > visualization if user wants to do it. > E.g., > {code:java} > // ... > p = beam.Pipeline(interactive_runner.InteractiveRunner(), > options=pipeline_options) > messages = p | "Read" >> beam.io.ReadFromPubSub(subscription='...') > messages | "Write" >> beam.io.WriteToPubSub(topic_path) > result = p.run() > // ... > visualize(messages){code} > The interactive runner automatically figures out that PCollection > {code:java} > messages{code} > created by > {code:java} > p | "Read" >> beam.io.ReadFromPubSub(subscription='...'){code} > should be cached and reused if the notebook user appends more transforms. > And once the pipeline gets executed, the user could use any > visualize(PCollection) module to visualize the data statically (batch) or > dynamically (stream) -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime
[ https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299745&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299745 ] ASF GitHub Bot logged work on BEAM-8023: Author: ASF GitHub Bot Created on: 22/Aug/19 20:45 Start Date: 22/Aug/19 20:45 Worklog Time Spent: 10m Work Description: kmjung commented on pull request #9405: [BEAM-8023] Add value provider interfaces for BigQueryIO.Read using Method.DIRECT_READ URL: https://github.com/apache/beam/pull/9405#discussion_r316877647 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java ## @@ -1360,12 +1413,39 @@ public TableReference getTable() { return toBuilder().setMethod(method).build(); } -/** Read options, including a list of selected columns and push-down SQL filter text. */ +/** + * @deprecated Use {@link #withSelectedFields(List)} and {@link #withRowRestriction(String)} + * instead. + */ +@Deprecated @Experimental(Experimental.Kind.SOURCE_SINK) public TypedRead withReadOptions(TableReadOptions readOptions) { + ensureSelectedFieldsAndRowRestrictionNotSet(); return toBuilder().setReadOptions(readOptions).build(); } +@Experimental(Experimental.Kind.SOURCE_SINK) Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299745) Time Spent: 1h 10m (was: 1h) > Allow specifying BigQuery Storage API readOptions at runtime > > > Key: BEAM-8023 > URL: https://issues.apache.org/jira/browse/BEAM-8023 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Jeff Klukas >Assignee: Kenneth Jung >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > We have support in the Java SDK for using the BigQuery Storage API for reads, > but only the target query or table is supported as a ValueProvider to be > specified at runtime. AFAICT, there is no reason we can't delay specifying > readOptions until runtime as well. > The readOptions are accessed by BigQueryStorageTableSource in getTargetTable; > I believe that's occurring at runtime, but I'd love for someone with deeper > BoundedSource knowledge to confirm that. > I'd advocate for adding new methods > `TypedRead.withSelectedFields(ValueProvider> value)` and > `TypedRead.withRowRestriction(ValueProvider value)`. The existing > `withReadOptions` method would then populate the other two as > StaticValueProviders. Perhaps we'd want to deprecate `withReadOptions` in > favor or specifying individual read options as separate parameters. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime
[ https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299746&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299746 ] ASF GitHub Bot logged work on BEAM-8023: Author: ASF GitHub Bot Created on: 22/Aug/19 20:45 Start Date: 22/Aug/19 20:45 Worklog Time Spent: 10m Work Description: kmjung commented on pull request #9405: [BEAM-8023] Add value provider interfaces for BigQueryIO.Read using Method.DIRECT_READ URL: https://github.com/apache/beam/pull/9405#discussion_r316877715 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java ## @@ -1360,12 +1413,39 @@ public TableReference getTable() { return toBuilder().setMethod(method).build(); } -/** Read options, including a list of selected columns and push-down SQL filter text. */ +/** + * @deprecated Use {@link #withSelectedFields(List)} and {@link #withRowRestriction(String)} + * instead. + */ +@Deprecated @Experimental(Experimental.Kind.SOURCE_SINK) public TypedRead withReadOptions(TableReadOptions readOptions) { + ensureSelectedFieldsAndRowRestrictionNotSet(); return toBuilder().setReadOptions(readOptions).build(); } +@Experimental(Experimental.Kind.SOURCE_SINK) Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299746) Time Spent: 1h 20m (was: 1h 10m) > Allow specifying BigQuery Storage API readOptions at runtime > > > Key: BEAM-8023 > URL: https://issues.apache.org/jira/browse/BEAM-8023 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Jeff Klukas >Assignee: Kenneth Jung >Priority: Minor > Time Spent: 1h 20m > Remaining Estimate: 0h > > We have support in the Java SDK for using the BigQuery Storage API for reads, > but only the target query or table is supported as a ValueProvider to be > specified at runtime. AFAICT, there is no reason we can't delay specifying > readOptions until runtime as well. > The readOptions are accessed by BigQueryStorageTableSource in getTargetTable; > I believe that's occurring at runtime, but I'd love for someone with deeper > BoundedSource knowledge to confirm that. > I'd advocate for adding new methods > `TypedRead.withSelectedFields(ValueProvider> value)` and > `TypedRead.withRowRestriction(ValueProvider value)`. The existing > `withReadOptions` method would then populate the other two as > StaticValueProviders. Perhaps we'd want to deprecate `withReadOptions` in > favor or specifying individual read options as separate parameters. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python
[ https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=299742&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299742 ] ASF GitHub Bot logged work on BEAM-7886: Author: ASF GitHub Bot Created on: 22/Aug/19 20:44 Start Date: 22/Aug/19 20:44 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #9188: [BEAM-7886] Make row coder a standard coder and implement in Python URL: https://github.com/apache/beam/pull/9188#discussion_r316877069 ## File path: sdks/python/apache_beam/coders/row_coder_test.py ## @@ -0,0 +1,129 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +from __future__ import absolute_import + +import logging +import typing +import unittest +from itertools import chain + +import numpy as np +from past.builtins import unicode + +from apache_beam.coders import RowCoder +from apache_beam.coders.typecoders import registry as coders_registry +from apache_beam.portability.api import schema_pb2 +from apache_beam.typehints.schemas import typing_to_runner_api + +Person = typing.NamedTuple("Person", [ +("name", unicode), +("age", np.int32), +("address", typing.Optional[unicode]), +("aliases", typing.List[unicode]), +]) + +coders_registry.register_coder(Person, RowCoder) + + +class CodersTest(unittest.TestCase): + TEST_CASES = [ + Person("Jon Snow", 23, None, ["crow", "wildling"]), + Person("Daenerys Targaryen", 25, "Westeros", ["Mother of Dragons"]), + Person("Michael Bluth", 30, None, []) + ] + + def test_create_row_coder_from_named_tuple(self): +expected_coder = RowCoder(typing_to_runner_api(Person).row_type.schema) +real_coder = coders_registry.get_coder(Person) + +for test_case in self.TEST_CASES: + self.assertEqual( + expected_coder.encode(test_case), real_coder.encode(test_case)) + + self.assertEqual(test_case, + real_coder.decode(real_coder.encode(test_case))) + + def test_create_row_coder_from_schema(self): +schema = schema_pb2.Schema( +id="person", +fields=[ +schema_pb2.Field( +name="name", +type=schema_pb2.FieldType( +atomic_type=schema_pb2.AtomicType.STRING)), +schema_pb2.Field( +name="age", +type=schema_pb2.FieldType( +atomic_type=schema_pb2.AtomicType.INT32)), +schema_pb2.Field( +name="address", +type=schema_pb2.FieldType( +atomic_type=schema_pb2.AtomicType.STRING, nullable=True)), +schema_pb2.Field( +name="aliases", +type=schema_pb2.FieldType( +array_type=schema_pb2.ArrayType( +element_type=schema_pb2.FieldType( +atomic_type=schema_pb2.AtomicType.STRING, +]) +coder = RowCoder(schema) + +for test_case in self.TEST_CASES: + self.assertEqual(test_case, coder.decode(coder.encode(test_case))) + + @unittest.skip( + "Need to decide whether to defer to the stream writer for these checks " + "or add explicit checks" + ) + def test_overflows(self): +IntTester = typing.NamedTuple('IntTester', [ +#('i8', typing.Optional[np.int8]), Review comment: Added. Also added a reference to [BEAM-8030](https://issues.apache.org/jira/browse/BEAM-8030) in the skip message. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299742) Time Spent: 8h 10m (was: 8h) > Make row coder a standard coder and implement in python > --- > > Key: BEAM-7886 > URL: https://issues.apache.org/jira/browse/BEAM-7886 >
[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python
[ https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=299744&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299744 ] ASF GitHub Bot logged work on BEAM-7886: Author: ASF GitHub Bot Created on: 22/Aug/19 20:44 Start Date: 22/Aug/19 20:44 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #9188: [BEAM-7886] Make row coder a standard coder and implement in Python URL: https://github.com/apache/beam/pull/9188#discussion_r316877261 ## File path: sdks/python/apache_beam/coders/row_coder_test.py ## @@ -0,0 +1,126 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +from __future__ import absolute_import + +import logging +import typing +import unittest + +import numpy as np +from itertools import chain +from past.builtins import unicode + +from apache_beam.coders import RowCoder +from apache_beam.coders.typecoders import registry as coders_registry +from apache_beam.portability.api import schema_pb2 +from apache_beam.typehints.schemas import typing_to_runner_api + +Person = typing.NamedTuple("Person", [ +("name", unicode), +("age", np.int32), +("address", typing.Optional[unicode]), +("aliases", typing.List[unicode]), +]) + +coders_registry.register_coder(Person, RowCoder) + + +class CodersTest(unittest.TestCase): + TEST_CASES = [ + Person("Jon Snow", 23, None, ["crow", "wildling"]), + Person("Daenerys Targaryen", 25, "Westeros", ["Mother of Dragons"]), + Person("Michael Bluth", 30, None, []) + ] + + def test_create_row_coder_from_named_tuple(self): +expected_coder = RowCoder(typing_to_runner_api(Person).row_type.schema) +real_coder = coders_registry.get_coder(Person) + +for test_case in self.TEST_CASES: + self.assertEqual( + expected_coder.encode(test_case), real_coder.encode(test_case)) + + self.assertEqual(test_case, + real_coder.decode(real_coder.encode(test_case))) + + def test_create_row_coder_from_schema(self): +schema = schema_pb2.Schema( +id="person", +fields=[ +schema_pb2.Field( +name="name", +type=schema_pb2.FieldType( +atomic_type=schema_pb2.AtomicType.STRING)), +schema_pb2.Field( +name="age", +type=schema_pb2.FieldType( +atomic_type=schema_pb2.AtomicType.INT32)), +schema_pb2.Field( +name="address", +type=schema_pb2.FieldType( +atomic_type=schema_pb2.AtomicType.STRING, nullable=True)), +schema_pb2.Field( +name="aliases", +type=schema_pb2.FieldType( +array_type=schema_pb2.ArrayType( +element_type=schema_pb2.FieldType( +atomic_type=schema_pb2.AtomicType.STRING, +]) +coder = RowCoder(schema) + +for test_case in self.TEST_CASES: + self.assertEqual(test_case, coder.decode(coder.encode(test_case))) + + @unittest.skip("Need to decide whether to defer to the stream writer for these checks or add explicit checks") Review comment: Filed [BEAM-8030](https://issues.apache.org/jira/browse/BEAM-8030) to reconcile this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299744) Time Spent: 8h 20m (was: 8h 10m) > Make row coder a standard coder and implement in python > --- > > Key: BEAM-7886 > URL: https://issues.apache.org/jira/browse/BEAM-7886 > Project: Beam > Issue Type: Improvement > Components: beam-model, sdk-java-core, sdk-py-core >Reporter: Brian Hulette >Ass
[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python
[ https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=299736&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299736 ] ASF GitHub Bot logged work on BEAM-7886: Author: ASF GitHub Bot Created on: 22/Aug/19 20:39 Start Date: 22/Aug/19 20:39 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #9188: [BEAM-7886] Make row coder a standard coder and implement in Python URL: https://github.com/apache/beam/pull/9188#discussion_r316875134 ## File path: sdks/python/setup.py ## @@ -115,8 +115,7 @@ def get_version(): 'mock>=1.0.1,<3.0.0', 'pymongo>=3.8.0,<4.0.0', 'oauth2client>=2.0.1,<4', -# grpcio 1.8.1 and above requires protobuf 3.5.0.post1. -'protobuf>=3.5.0.post1,<4', +'protobuf>=3.8.0.post1,<4', Review comment: Sounds good, I just moved the numpy dependency from test to required with the same range. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299736) Time Spent: 8h (was: 7h 50m) > Make row coder a standard coder and implement in python > --- > > Key: BEAM-7886 > URL: https://issues.apache.org/jira/browse/BEAM-7886 > Project: Beam > Issue Type: Improvement > Components: beam-model, sdk-java-core, sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 8h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime
[ https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299723&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299723 ] ASF GitHub Bot logged work on BEAM-8023: Author: ASF GitHub Bot Created on: 22/Aug/19 20:33 Start Date: 22/Aug/19 20:33 Worklog Time Spent: 10m Work Description: jklukas commented on pull request #9405: [BEAM-8023] Add value provider interfaces for BigQueryIO.Read using Method.DIRECT_READ URL: https://github.com/apache/beam/pull/9405#discussion_r316870272 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java ## @@ -1360,12 +1413,39 @@ public TableReference getTable() { return toBuilder().setMethod(method).build(); } -/** Read options, including a list of selected columns and push-down SQL filter text. */ +/** + * @deprecated Use {@link #withSelectedFields(List)} and {@link #withRowRestriction(String)} + * instead. + */ +@Deprecated @Experimental(Experimental.Kind.SOURCE_SINK) public TypedRead withReadOptions(TableReadOptions readOptions) { + ensureSelectedFieldsAndRowRestrictionNotSet(); return toBuilder().setReadOptions(readOptions).build(); } +@Experimental(Experimental.Kind.SOURCE_SINK) Review comment: ```suggestion /** Names of the fields in the table that should be read; valid only for direct reads. */ @Experimental(Experimental.Kind.SOURCE_SINK) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299723) Time Spent: 50m (was: 40m) > Allow specifying BigQuery Storage API readOptions at runtime > > > Key: BEAM-8023 > URL: https://issues.apache.org/jira/browse/BEAM-8023 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Jeff Klukas >Assignee: Kenneth Jung >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > We have support in the Java SDK for using the BigQuery Storage API for reads, > but only the target query or table is supported as a ValueProvider to be > specified at runtime. AFAICT, there is no reason we can't delay specifying > readOptions until runtime as well. > The readOptions are accessed by BigQueryStorageTableSource in getTargetTable; > I believe that's occurring at runtime, but I'd love for someone with deeper > BoundedSource knowledge to confirm that. > I'd advocate for adding new methods > `TypedRead.withSelectedFields(ValueProvider> value)` and > `TypedRead.withRowRestriction(ValueProvider value)`. The existing > `withReadOptions` method would then populate the other two as > StaticValueProviders. Perhaps we'd want to deprecate `withReadOptions` in > favor or specifying individual read options as separate parameters. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime
[ https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299726&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299726 ] ASF GitHub Bot logged work on BEAM-8023: Author: ASF GitHub Bot Created on: 22/Aug/19 20:33 Start Date: 22/Aug/19 20:33 Worklog Time Spent: 10m Work Description: jklukas commented on pull request #9405: [BEAM-8023] Add value provider interfaces for BigQueryIO.Read using Method.DIRECT_READ URL: https://github.com/apache/beam/pull/9405#discussion_r316872643 ## File path: sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOStorageReadTest.java ## @@ -465,6 +500,71 @@ public void testTableSourceInitialSplit_WithTableReadOptions() throws Throwable BigQueryStorageTableSource.create( ValueProvider.StaticValueProvider.of(tableRef), readOptions, +null, +null, +new TableRowParser(), +TableRowJsonCoder.of(), +new FakeBigQueryServices() +.withDatasetService(fakeDatasetService) +.withStorageClient(fakeStorageClient)); + +List> sources = tableSource.split(10L, options); +assertEquals(10L, sources.size()); + } + + @Test + public void testTableSourceInitialSplit_WithSelectedFieldsAndRowRestriction() throws Exception { +fakeDatasetService.createDataset("foo.com:project", "dataset", "", "", null); +TableReference tableRef = BigQueryHelpers.parseTableSpec("foo.com:project:dataset.table"); + +Table table = +new Table() +.setTableReference(tableRef) +.setNumBytes(100L) +.setSchema( +new TableSchema() +.setFields( +ImmutableList.of( +new TableFieldSchema().setName("name").setType("STRING"), +new TableFieldSchema().setName("number").setType("INTEGER"; + +fakeDatasetService.createTable(table); + +TableReadOptions readOptions = +TableReadOptions.newBuilder() +.addSelectedFields("name") +.addSelectedFields("number") +.setRowRestriction("number > 5") +.build(); + +CreateReadSessionRequest expectedRequest = +CreateReadSessionRequest.newBuilder() +.setParent("projects/project-id") +.setTableReference(BigQueryHelpers.toTableRefProto(tableRef)) +.setRequestedStreams(10) +.setReadOptions(readOptions) +// TODO(aryann): Once we rebuild the generated client code, we should change this to +// use setShardingStrategy(). +.setUnknownFields( +UnknownFieldSet.newBuilder() +.addField(7, UnknownFieldSet.Field.newBuilder().addVarint(2).build()) +.build()) +.build(); + +ReadSession.Builder builder = ReadSession.newBuilder(); +for (int i = 0; i < 10; i++) { + builder.addStreams(Stream.newBuilder().setName("stream-" + i)); +} + +StorageClient fakeStorageClient = mock(StorageClient.class); + when(fakeStorageClient.createReadSession(expectedRequest)).thenReturn(builder.build()); + +BigQueryStorageTableSource tableSource = +BigQueryStorageTableSource.create( +ValueProvider.StaticValueProvider.of(tableRef), +null, +StaticValueProvider.of(Lists.newArrayList("name", "number")), +StaticValueProvider.of("number > 5"), Review comment: Would it be more appropriate to use `p.newProvider` here rather than `StaticValueProvider.of` to catch potential misuses of the valueprovider before we hit runtime? If the StaticValueProvider style is already predominant in this file, I'm fine with keeping it as-is. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299726) Time Spent: 1h (was: 50m) > Allow specifying BigQuery Storage API readOptions at runtime > > > Key: BEAM-8023 > URL: https://issues.apache.org/jira/browse/BEAM-8023 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Jeff Klukas >Assignee: Kenneth Jung >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > We have support in the Java SDK for using the BigQuery Storage API for reads, > but only the target query
[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime
[ https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299725&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299725 ] ASF GitHub Bot logged work on BEAM-8023: Author: ASF GitHub Bot Created on: 22/Aug/19 20:33 Start Date: 22/Aug/19 20:33 Worklog Time Spent: 10m Work Description: jklukas commented on pull request #9405: [BEAM-8023] Add value provider interfaces for BigQueryIO.Read using Method.DIRECT_READ URL: https://github.com/apache/beam/pull/9405#discussion_r316870665 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java ## @@ -1360,12 +1413,39 @@ public TableReference getTable() { return toBuilder().setMethod(method).build(); } -/** Read options, including a list of selected columns and push-down SQL filter text. */ +/** + * @deprecated Use {@link #withSelectedFields(List)} and {@link #withRowRestriction(String)} + * instead. + */ +@Deprecated @Experimental(Experimental.Kind.SOURCE_SINK) public TypedRead withReadOptions(TableReadOptions readOptions) { + ensureSelectedFieldsAndRowRestrictionNotSet(); return toBuilder().setReadOptions(readOptions).build(); } +@Experimental(Experimental.Kind.SOURCE_SINK) +public TypedRead withSelectedFields(List selectedFields) { + return withSelectedFields(StaticValueProvider.of(selectedFields)); +} + +@Experimental(Experimental.Kind.SOURCE_SINK) Review comment: ```suggestion /** SQL text filtering statement; valid only for direct reads. */ @Experimental(Experimental.Kind.SOURCE_SINK) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299725) Time Spent: 1h (was: 50m) > Allow specifying BigQuery Storage API readOptions at runtime > > > Key: BEAM-8023 > URL: https://issues.apache.org/jira/browse/BEAM-8023 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Jeff Klukas >Assignee: Kenneth Jung >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > We have support in the Java SDK for using the BigQuery Storage API for reads, > but only the target query or table is supported as a ValueProvider to be > specified at runtime. AFAICT, there is no reason we can't delay specifying > readOptions until runtime as well. > The readOptions are accessed by BigQueryStorageTableSource in getTargetTable; > I believe that's occurring at runtime, but I'd love for someone with deeper > BoundedSource knowledge to confirm that. > I'd advocate for adding new methods > `TypedRead.withSelectedFields(ValueProvider> value)` and > `TypedRead.withRowRestriction(ValueProvider value)`. The existing > `withReadOptions` method would then populate the other two as > StaticValueProviders. Perhaps we'd want to deprecate `withReadOptions` in > favor or specifying individual read options as separate parameters. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime
[ https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299724&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299724 ] ASF GitHub Bot logged work on BEAM-8023: Author: ASF GitHub Bot Created on: 22/Aug/19 20:33 Start Date: 22/Aug/19 20:33 Worklog Time Spent: 10m Work Description: jklukas commented on pull request #9405: [BEAM-8023] Add value provider interfaces for BigQueryIO.Read using Method.DIRECT_READ URL: https://github.com/apache/beam/pull/9405#discussion_r316868145 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java ## @@ -1251,6 +1294,16 @@ private void ensureFromNotCalledYet() { getJsonTableRef() == null && getQuery() == null, "from() or fromQuery() already called"); } +private void ensureReadOptionsNotSet() { + checkState(getReadOptions() == null, "withReadOptions() already called"); +} + +private void ensureSelectedFieldsAndRowRestrictionNotSet() { Review comment: For a little future-proofing in case additional read options are added to the BQ Storage API in the future, these methods could be named `ensureReadOptionsObjectNotSet` and `ensureIndividualReadOptionsNotSet`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299724) Time Spent: 50m (was: 40m) > Allow specifying BigQuery Storage API readOptions at runtime > > > Key: BEAM-8023 > URL: https://issues.apache.org/jira/browse/BEAM-8023 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Jeff Klukas >Assignee: Kenneth Jung >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > We have support in the Java SDK for using the BigQuery Storage API for reads, > but only the target query or table is supported as a ValueProvider to be > specified at runtime. AFAICT, there is no reason we can't delay specifying > readOptions until runtime as well. > The readOptions are accessed by BigQueryStorageTableSource in getTargetTable; > I believe that's occurring at runtime, but I'd love for someone with deeper > BoundedSource knowledge to confirm that. > I'd advocate for adding new methods > `TypedRead.withSelectedFields(ValueProvider> value)` and > `TypedRead.withRowRestriction(ValueProvider value)`. The existing > `withReadOptions` method would then populate the other two as > StaticValueProviders. Perhaps we'd want to deprecate `withReadOptions` in > favor or specifying individual read options as separate parameters. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime
[ https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299727&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299727 ] ASF GitHub Bot logged work on BEAM-8023: Author: ASF GitHub Bot Created on: 22/Aug/19 20:33 Start Date: 22/Aug/19 20:33 Worklog Time Spent: 10m Work Description: jklukas commented on pull request #9405: [BEAM-8023] Add value provider interfaces for BigQueryIO.Read using Method.DIRECT_READ URL: https://github.com/apache/beam/pull/9405#discussion_r316870997 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java ## @@ -1360,12 +1413,39 @@ public TableReference getTable() { return toBuilder().setMethod(method).build(); } -/** Read options, including a list of selected columns and push-down SQL filter text. */ +/** + * @deprecated Use {@link #withSelectedFields(List)} and {@link #withRowRestriction(String)} + * instead. + */ +@Deprecated @Experimental(Experimental.Kind.SOURCE_SINK) public TypedRead withReadOptions(TableReadOptions readOptions) { + ensureSelectedFieldsAndRowRestrictionNotSet(); return toBuilder().setReadOptions(readOptions).build(); } +@Experimental(Experimental.Kind.SOURCE_SINK) +public TypedRead withSelectedFields(List selectedFields) { + return withSelectedFields(StaticValueProvider.of(selectedFields)); +} + +@Experimental(Experimental.Kind.SOURCE_SINK) +public TypedRead withSelectedFields(ValueProvider> selectedFields) { + ensureReadOptionsNotSet(); + return toBuilder().setSelectedFields(selectedFields).build(); +} + +@Experimental(Experimental.Kind.SOURCE_SINK) Review comment: Can we add docstrings on these variants too? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299727) Time Spent: 1h (was: 50m) > Allow specifying BigQuery Storage API readOptions at runtime > > > Key: BEAM-8023 > URL: https://issues.apache.org/jira/browse/BEAM-8023 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Jeff Klukas >Assignee: Kenneth Jung >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > We have support in the Java SDK for using the BigQuery Storage API for reads, > but only the target query or table is supported as a ValueProvider to be > specified at runtime. AFAICT, there is no reason we can't delay specifying > readOptions until runtime as well. > The readOptions are accessed by BigQueryStorageTableSource in getTargetTable; > I believe that's occurring at runtime, but I'd love for someone with deeper > BoundedSource knowledge to confirm that. > I'd advocate for adding new methods > `TypedRead.withSelectedFields(ValueProvider> value)` and > `TypedRead.withRowRestriction(ValueProvider value)`. The existing > `withReadOptions` method would then populate the other two as > StaticValueProviders. Perhaps we'd want to deprecate `withReadOptions` in > favor or specifying individual read options as separate parameters. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-8042) Parsing of aggregate query fails
[ https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang updated BEAM-8042: --- Component/s: (was: dsl-sql) dsl-sql-zetasql > Parsing of aggregate query fails > > > Key: BEAM-8042 > URL: https://issues.apache.org/jira/browse/BEAM-8042 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Priority: Major > > SELECT > key, > COUNT(*) as f1, > SUM(has_f2) AS f2, > SUM(has_f3) AS f3, > SUM(has_f4) AS f4, > SUM(has_f5) AS f5, > SUM(has_f6) AS f6, > SUM(has_f7) AS f7 > FROM xxx > GROUP BY key > Caused by: java.lang.RuntimeException: Error while applying rule > AggregateProjectMergeRule, args > [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)), > > rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)] > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232) > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637) > at > org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66) > at > org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104) > at > ... 39 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58) > at > org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96) > at > org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73) > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205) > ... 48 more -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-8043) Support AVG(long)
[ https://issues.apache.org/jira/browse/BEAM-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang updated BEAM-8043: --- Component/s: (was: dsl-sql) dsl-sql-zetasql > Support AVG(long) > - > > Key: BEAM-8043 > URL: https://issues.apache.org/jira/browse/BEAM-8043 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Priority: Major > > Currently AVG(long) is not support and users have to use AVG(CAST(long as > float64)) as the workaround. > We should support AVG(long). -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-8039) SUM(CASE WHEN xxx THEN 1 ELSE 0)
[ https://issues.apache.org/jira/browse/BEAM-8039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang updated BEAM-8039: --- Component/s: (was: dsl-sql) dsl-sql-zetasql > SUM(CASE WHEN xxx THEN 1 ELSE 0) > > > Key: BEAM-8039 > URL: https://issues.apache.org/jira/browse/BEAM-8039 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Priority: Major > > java.lang.RuntimeException: Aggregate function only accepts Column Reference > or CAST(Column Reference) as its input. > I was able to rewrite SQL using WITH statement, and it seemed to work, but > requires us rewriting a lot of queries and makes them pretty much unreadable. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-8041) Support Insert statements
[ https://issues.apache.org/jira/browse/BEAM-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang updated BEAM-8041: --- Component/s: (was: dsl-sql) dsl-sql-zetasql > Support Insert statements > - > > Key: BEAM-8041 > URL: https://issues.apache.org/jira/browse/BEAM-8041 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Priority: Major > > Caused by: > org.apache.beam.repackaged.sql.com.google.zetasql.io.grpc.StatusRuntimeException: > INVALID_ARGUMENT: Statement not supported: InsertStatement [at -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-8044) Investigate SUM(long)
[ https://issues.apache.org/jira/browse/BEAM-8044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang updated BEAM-8044: --- Component/s: (was: dsl-sql) dsl-sql-zetasql > Investigate SUM(long) > -- > > Key: BEAM-8044 > URL: https://issues.apache.org/jira/browse/BEAM-8044 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Priority: Major > > User reports SUM(long) is not supported. Need further investigated. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-8040) NPE in table name resolver when selecting from a table that doesn't exist
[ https://issues.apache.org/jira/browse/BEAM-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang updated BEAM-8040: --- Component/s: (was: dsl-sql) dsl-sql-zetasql > NPE in table name resolver when selecting from a table that doesn't exist > - > > Key: BEAM-8040 > URL: https://issues.apache.org/jira/browse/BEAM-8040 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Priority: Major > > NullPointerException when selecting from a table that doesn't exist. > Caused by: java.lang.NullPointerException > at > org.apache.beam.sdk.extensions.sql.zetasql.TableResolverImpl.assumeLeafIsTable(TableResolverImpl.java:42) > at > org.apache.beam.sdk.extensions.sql.zetasql.TableResolution.resolveCalciteTable(TableResolution.java:48) > at > org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addTableToLeafCatalog(SqlAnalyzer.java:174) > at > org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.lambda$createPopulatedCatalog$0(SqlAnalyzer.java:132) -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-7832) ZetaSQL Dialect
[ https://issues.apache.org/jira/browse/BEAM-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang updated BEAM-7832: --- Component/s: dsl-sql-zetasql > ZetaSQL Dialect > --- > > Key: BEAM-7832 > URL: https://issues.apache.org/jira/browse/BEAM-7832 > Project: Beam > Issue Type: New Feature > Components: dsl-sql, dsl-sql-zetasql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Time Spent: 5h 40m > Remaining Estimate: 0h > > We can support ZetaSQL(https://github.com/google/zetasql) dialect in BeamSQL. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-6114) SQL join selection should be done in planner, not in expansion to PTransform
[ https://issues.apache.org/jira/browse/BEAM-6114?focusedWorklogId=299716&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299716 ] ASF GitHub Bot logged work on BEAM-6114: Author: ASF GitHub Bot Created on: 22/Aug/19 20:26 Start Date: 22/Aug/19 20:26 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #9395: [BEAM-6114] Calcite Rules to Select Type of Join in BeamSQL URL: https://github.com/apache/beam/pull/9395#issuecomment-524064006 Run SQL PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299716) Time Spent: 3h 10m (was: 3h) > SQL join selection should be done in planner, not in expansion to PTransform > > > Key: BEAM-6114 > URL: https://issues.apache.org/jira/browse/BEAM-6114 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kenneth Knowles >Assignee: Rahul Patwari >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > Currently Beam SQL joins all go through a single physical operator which has > a single PTransform that does all join algorithms based on properties of its > input PCollections as well as the relational algebra. > A first step is to make the needed information part of the relational > algebra, so it can choose a PTransform based on that, and the PTransforms can > be simpler. > Second step is to have separate (physical) relational operators for different > join algorithms. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime
[ https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299710&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299710 ] ASF GitHub Bot logged work on BEAM-8023: Author: ASF GitHub Bot Created on: 22/Aug/19 20:14 Start Date: 22/Aug/19 20:14 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #9405: [BEAM-8023] Add value provider interfaces for BigQueryIO.Read using Method.DIRECT_READ URL: https://github.com/apache/beam/pull/9405#discussion_r316865488 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java ## @@ -1360,12 +1413,39 @@ public TableReference getTable() { return toBuilder().setMethod(method).build(); } -/** Read options, including a list of selected columns and push-down SQL filter text. */ +/** + * @deprecated Use {@link #withSelectedFields(List)} and {@link #withRowRestriction(String)} + * instead. + */ +@Deprecated @Experimental(Experimental.Kind.SOURCE_SINK) public TypedRead withReadOptions(TableReadOptions readOptions) { + ensureSelectedFieldsAndRowRestrictionNotSet(); return toBuilder().setReadOptions(readOptions).build(); } +@Experimental(Experimental.Kind.SOURCE_SINK) Review comment: Please document the new public methods. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299710) Time Spent: 40m (was: 0.5h) > Allow specifying BigQuery Storage API readOptions at runtime > > > Key: BEAM-8023 > URL: https://issues.apache.org/jira/browse/BEAM-8023 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Jeff Klukas >Assignee: Kenneth Jung >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > We have support in the Java SDK for using the BigQuery Storage API for reads, > but only the target query or table is supported as a ValueProvider to be > specified at runtime. AFAICT, there is no reason we can't delay specifying > readOptions until runtime as well. > The readOptions are accessed by BigQueryStorageTableSource in getTargetTable; > I believe that's occurring at runtime, but I'd love for someone with deeper > BoundedSource knowledge to confirm that. > I'd advocate for adding new methods > `TypedRead.withSelectedFields(ValueProvider> value)` and > `TypedRead.withRowRestriction(ValueProvider value)`. The existing > `withReadOptions` method would then populate the other two as > StaticValueProviders. Perhaps we'd want to deprecate `withReadOptions` in > favor or specifying individual read options as separate parameters. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime
[ https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299708&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299708 ] ASF GitHub Bot logged work on BEAM-8023: Author: ASF GitHub Bot Created on: 22/Aug/19 20:13 Start Date: 22/Aug/19 20:13 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #9405: [BEAM-8023] Add value provider interfaces for BigQueryIO.Read using Method.DIRECT_READ URL: https://github.com/apache/beam/pull/9405#discussion_r316865488 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java ## @@ -1360,12 +1413,39 @@ public TableReference getTable() { return toBuilder().setMethod(method).build(); } -/** Read options, including a list of selected columns and push-down SQL filter text. */ +/** + * @deprecated Use {@link #withSelectedFields(List)} and {@link #withRowRestriction(String)} + * instead. + */ +@Deprecated @Experimental(Experimental.Kind.SOURCE_SINK) public TypedRead withReadOptions(TableReadOptions readOptions) { + ensureSelectedFieldsAndRowRestrictionNotSet(); return toBuilder().setReadOptions(readOptions).build(); } +@Experimental(Experimental.Kind.SOURCE_SINK) Review comment: Please document the new public fields. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 299708) Time Spent: 0.5h (was: 20m) > Allow specifying BigQuery Storage API readOptions at runtime > > > Key: BEAM-8023 > URL: https://issues.apache.org/jira/browse/BEAM-8023 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Jeff Klukas >Assignee: Kenneth Jung >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > We have support in the Java SDK for using the BigQuery Storage API for reads, > but only the target query or table is supported as a ValueProvider to be > specified at runtime. AFAICT, there is no reason we can't delay specifying > readOptions until runtime as well. > The readOptions are accessed by BigQueryStorageTableSource in getTargetTable; > I believe that's occurring at runtime, but I'd love for someone with deeper > BoundedSource knowledge to confirm that. > I'd advocate for adding new methods > `TypedRead.withSelectedFields(ValueProvider> value)` and > `TypedRead.withRowRestriction(ValueProvider value)`. The existing > `withReadOptions` method would then populate the other two as > StaticValueProviders. Perhaps we'd want to deprecate `withReadOptions` in > favor or specifying individual read options as separate parameters. -- This message was sent by Atlassian Jira (v8.3.2#803003)