date:20190822

[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=300067&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-300067
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 23/Aug/19 06:09
Start Date: 23/Aug/19 06:09
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on issue #9144: [BEAM-7013] 
Integrating ZetaSketch's HLL++ algorithm with Beam
URL: https://github.com/apache/beam/pull/9144#issuecomment-524185713
 
 
   Run Java PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 300067)
Time Spent: 27h 10m  (was: 27h)

> A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 27h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (BEAM-6114) SQL join selection should be done in planner, not in expansion to PTransform

2019-08-22 Thread Rui Wang (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913924#comment-16913924
 ] 

Rui Wang commented on BEAM-6114:


[~rahul8383]

Regarding to : // Should we throw Exception when joinType is LEFT (or) RIGHT 
(or) FULL?

My perspective is, for the sake of simplicity, we can only allow triggering 
once (like what CoGBK is doing). By doing so, we will allow LEFT/RIGHT/FULL 
OUTER join. It is because for multiple triggering, the problem is how to refine 
data. Think about outer join means it could emit  at the first 
triggering and later it will have to emit  to refine 
data. We will need retractions to solve this problem.


Also it sounds nice to split javadoc of BeamJoinRel. Thanks for bringing it up.



> SQL join selection should be done in planner, not in expansion to PTransform
> 
>
> Key: BEAM-6114
> URL: https://issues.apache.org/jira/browse/BEAM-6114
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Rahul Patwari
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Currently Beam SQL joins all go through a single physical operator which has 
> a single PTransform that does all join algorithms based on properties of its 
> input PCollections as well as the relational algebra.
> A first step is to make the needed information part of the relational 
> algebra, so it can choose a PTransform based on that, and the PTransforms can 
> be simpler.
> Second step is to have separate (physical) relational operators for different 
> join algorithms.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (BEAM-6114) SQL join selection should be done in planner, not in expansion to PTransform

2019-08-22 Thread Rahul Patwari (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913871#comment-16913871
 ] 

Rahul Patwari commented on BEAM-6114:
-

Hi [~amaliujia]

What are your thoughts about 
[https://github.com/apache/beam/blob/cacb9310b0223683ae6bea0637d2e0077ebee1de/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSideInputLookupJoinRel.java#L52]

I am planning to move Javadoc in 
[https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamJoinRel.java]
 to the respective JoinRels.

> SQL join selection should be done in planner, not in expansion to PTransform
> 
>
> Key: BEAM-6114
> URL: https://issues.apache.org/jira/browse/BEAM-6114
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Rahul Patwari
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Currently Beam SQL joins all go through a single physical operator which has 
> a single PTransform that does all join algorithms based on properties of its 
> input PCollections as well as the relational algebra.
> A first step is to make the needed information part of the relational 
> algebra, so it can choose a PTransform based on that, and the PTransforms can 
> be simpler.
> Second step is to have separate (physical) relational operators for different 
> join algorithms.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Closed] (BEAM-8038) Python Precommit fail: 'BeamFnExternalWorkerPoolServicer' has no attribute '_worker_processes'

2019-08-22 Thread Thomas Weise (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise closed BEAM-8038.
--
Fix Version/s: Not applicable
   Resolution: Fixed

> Python Precommit fail: 'BeamFnExternalWorkerPoolServicer' has no attribute 
> '_worker_processes'
> --
>
> Key: BEAM-8038
> URL: https://issues.apache.org/jira/browse/BEAM-8038
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness, test-failures
>Reporter: Ahmet Altay
>Assignee: Thomas Weise
>Priority: Critical
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Logs: https://builds.apache.org/job/beam_PreCommit_Python_Commit/8246/console
> 10:14:09 
> --
> 10:14:09 XML: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/nosetests.xml
> 10:14:09 
> --
> 10:14:09 Ran 2594 tests in 629.438s
> 10:14:09 
> 10:14:09 OK (SKIP=520)
> 10:14:09 Error in atexit._run_exitfuncs:
> 10:14:09 Traceback (most recent call last):
> 10:14:09   File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
> 10:14:09 func(*targs, **kargs)
> 10:14:09   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/worker_pool_main.py",
>  line 72, in kill_worker_processes
> 10:14:09 for worker_process in cls._worker_processes.values():
> 10:14:09 AttributeError: type object 'BeamFnExternalWorkerPoolServicer' has 
> no attribute '_worker_processes'
> 10:14:09 Error in sys.exitfunc:
> 10:14:09 Traceback (most recent call last):
> 10:14:09   File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
> 10:14:09 func(*targs, **kargs)
> 10:14:09   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/worker_pool_main.py",
>  line 72, in kill_worker_processes
> 10:14:09 for worker_process in cls._worker_processes.values():
> 10:14:09 AttributeError: type object 'BeamFnExternalWorkerPoolServicer' has 
> no attribute '_worker_processes'
> 10:14:10 py27-cython run-test-post: commands[0] | 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/scripts/run_tox_cleanup.sh
> 10:14:10 ___ summary 
> 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-8079) Move verify_release_build.sh to Jenkins job

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8079?focusedWorklogId=299870&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299870
 ]

ASF GitHub Bot logged work on BEAM-8079:


Author: ASF GitHub Bot
Created on: 23/Aug/19 02:19
Start Date: 23/Aug/19 02:19
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #9411: 
[BEAM-8079] Move release Gradle build to a Jenkins job (Part - 1)
URL: https://github.com/apache/beam/pull/9411
 
 
   Reuse existing Jenkins machine to verify release Gradle build can get rid of 
painful environment setup in `verify_release_branch.sh`. Make it to a Jenkins 
job can also remove the restriction of the platform. Originally, the 
environment setup is specific to Linux-like system.
   
   +R: @yifanzou 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_

[jira] [Work logged] (BEAM-7616) urlopen calls could get stuck without a timeout

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7616?focusedWorklogId=299867&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299867
 ]

ASF GitHub Bot logged work on BEAM-7616:


Author: ASF GitHub Bot
Created on: 23/Aug/19 02:11
Start Date: 23/Aug/19 02:11
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9401: [BEAM-7616] apitools 
use urllib with the global timeout. Set it to 60 seconds # to prevent network 
related stuckness issues.
URL: https://github.com/apache/beam/pull/9401#issuecomment-524144582
 
 
   Run Python 2 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299867)
Time Spent: 3h 40m  (was: 3.5h)

> urlopen calls could get stuck without a timeout
> ---
>
> Key: BEAM-7616
> URL: https://issues.apache.org/jira/browse/BEAM-7616
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Udi Meiri
>Priority: Blocker
> Fix For: 2.14.0, 2.16.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-7616) urlopen calls could get stuck without a timeout

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7616?focusedWorklogId=299866&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299866
 ]

ASF GitHub Bot logged work on BEAM-7616:


Author: ASF GitHub Bot
Created on: 23/Aug/19 02:10
Start Date: 23/Aug/19 02:10
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9401: [BEAM-7616] apitools 
use urllib with the global timeout. Set it to 60 seconds # to prevent network 
related stuckness issues.
URL: https://github.com/apache/beam/pull/9401#issuecomment-524144776
 
 
   Run Python 2 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299866)
Time Spent: 3.5h  (was: 3h 20m)

> urlopen calls could get stuck without a timeout
> ---
>
> Key: BEAM-7616
> URL: https://issues.apache.org/jira/browse/BEAM-7616
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Udi Meiri
>Priority: Blocker
> Fix For: 2.14.0, 2.16.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-7616) urlopen calls could get stuck without a timeout

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7616?focusedWorklogId=299865&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299865
 ]

ASF GitHub Bot logged work on BEAM-7616:


Author: ASF GitHub Bot
Created on: 23/Aug/19 02:10
Start Date: 23/Aug/19 02:10
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9401: [BEAM-7616] apitools 
use urllib with the global timeout. Set it to 60 seconds # to prevent network 
related stuckness issues.
URL: https://github.com/apache/beam/pull/9401#issuecomment-524144582
 
 
   Run Python 2 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299865)
Time Spent: 3h 20m  (was: 3h 10m)

> urlopen calls could get stuck without a timeout
> ---
>
> Key: BEAM-7616
> URL: https://issues.apache.org/jira/browse/BEAM-7616
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Udi Meiri
>Priority: Blocker
> Fix For: 2.14.0, 2.16.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-7616) urlopen calls could get stuck without a timeout

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7616?focusedWorklogId=299864&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299864
 ]

ASF GitHub Bot logged work on BEAM-7616:


Author: ASF GitHub Bot
Created on: 23/Aug/19 02:09
Start Date: 23/Aug/19 02:09
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9401: [BEAM-7616] apitools 
use urllib with the global timeout. Set it to 60 seconds # to prevent network 
related stuckness issues.
URL: https://github.com/apache/beam/pull/9401#issuecomment-524144582
 
 
   Run Python PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299864)
Time Spent: 3h 10m  (was: 3h)

> urlopen calls could get stuck without a timeout
> ---
>
> Key: BEAM-7616
> URL: https://issues.apache.org/jira/browse/BEAM-7616
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Udi Meiri
>Priority: Blocker
> Fix For: 2.14.0, 2.16.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-7616) urlopen calls could get stuck without a timeout

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7616?focusedWorklogId=299863&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299863
 ]

ASF GitHub Bot logged work on BEAM-7616:


Author: ASF GitHub Bot
Created on: 23/Aug/19 02:09
Start Date: 23/Aug/19 02:09
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9401: [BEAM-7616] apitools 
use urllib with the global timeout. Set it to 60 seconds # to prevent network 
related stuckness issues.
URL: https://github.com/apache/beam/pull/9401#issuecomment-524144557
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299863)
Time Spent: 3h  (was: 2h 50m)

> urlopen calls could get stuck without a timeout
> ---
>
> Key: BEAM-7616
> URL: https://issues.apache.org/jira/browse/BEAM-7616
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ahmet Altay
>Assignee: Udi Meiri
>Priority: Blocker
> Fix For: 2.14.0, 2.16.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-7864) Portable spark Reshuffle coder cast exception

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7864?focusedWorklogId=299849&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299849
 ]

ASF GitHub Bot logged work on BEAM-7864:


Author: ASF GitHub Bot
Created on: 23/Aug/19 01:38
Start Date: 23/Aug/19 01:38
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9410: [BEAM-7864] fix Spark 
reshuffle translation with Python SDK
URL: https://github.com/apache/beam/pull/9410#issuecomment-524138861
 
 
   Run Java Spark PortableValidatesRunner Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299849)
Time Spent: 0.5h  (was: 20m)

> Portable spark Reshuffle coder cast exception
> -
>
> Key: BEAM-7864
> URL: https://issues.apache.org/jira/browse/BEAM-7864
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> running :sdks:python:test-suites:portable:py35:portableWordCountBatch in 
> either loopback or docker mode on master fails with exception:
>  
> java.lang.ClassCastException: org.apache.beam.sdk.coders.LengthPrefixCoder 
> cannot be cast to org.apache.beam.sdk.coders.KvCoder
>  at 
> org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translateReshuffle(SparkBatchPortablePipelineTranslator.java:400)
>  at 
> org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translate(SparkBatchPortablePipelineTranslator.java:147)
>  at 
> org.apache.beam.runners.spark.SparkPipelineRunner.lambda$run$1(SparkPipelineRunner.java:96)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-7864) Portable spark Reshuffle coder cast exception

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7864?focusedWorklogId=299846&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299846
 ]

ASF GitHub Bot logged work on BEAM-7864:


Author: ASF GitHub Bot
Created on: 23/Aug/19 01:30
Start Date: 23/Aug/19 01:30
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9410: [BEAM-7864] fix 
Spark reshuffle translation with Python SDK
URL: https://github.com/apache/beam/pull/9410
 
 
   The previous implementation of reshuffle on the portable Spark runner made 
assumptions its inputs that proved false when running some Python pipelines. 
This new translation is made more general to fix that.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompleted

[jira] [Work logged] (BEAM-7864) Portable spark Reshuffle coder cast exception

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7864?focusedWorklogId=299847&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299847
 ]

ASF GitHub Bot logged work on BEAM-7864:


Author: ASF GitHub Bot
Created on: 23/Aug/19 01:30
Start Date: 23/Aug/19 01:30
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9410: [BEAM-7864] fix Spark 
reshuffle translation with Python SDK
URL: https://github.com/apache/beam/pull/9410#issuecomment-524137681
 
 
   Run Java Spark PortableValidatesRunner Batch
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299847)
Time Spent: 20m  (was: 10m)

> Portable spark Reshuffle coder cast exception
> -
>
> Key: BEAM-7864
> URL: https://issues.apache.org/jira/browse/BEAM-7864
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> running :sdks:python:test-suites:portable:py35:portableWordCountBatch in 
> either loopback or docker mode on master fails with exception:
>  
> java.lang.ClassCastException: org.apache.beam.sdk.coders.LengthPrefixCoder 
> cannot be cast to org.apache.beam.sdk.coders.KvCoder
>  at 
> org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translateReshuffle(SparkBatchPortablePipelineTranslator.java:400)
>  at 
> org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translate(SparkBatchPortablePipelineTranslator.java:147)
>  at 
> org.apache.beam.runners.spark.SparkPipelineRunner.lambda$run$1(SparkPipelineRunner.java:96)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8079) Move verify_release_build.sh to Jenkins job

2019-08-22 Thread Mark Liu (Jira)

Mark Liu created BEAM-8079:
--

 Summary: Move verify_release_build.sh to Jenkins job
 Key: BEAM-8079
 URL: https://issues.apache.org/jira/browse/BEAM-8079
 Project: Beam
  Issue Type: Sub-task
  Components: build-system
Reporter: Mark Liu
Assignee: Mark Liu


verify_release_build.sh is used for validation after release branch is cut. 
Basically it does two things: 1. verify Gradle build with -PisRelease turned 
on. 2. create a PR and run all PostCommit jobs against release branch. However, 
release manager got many painpoints when running this script:

1. A lot of environment setup and some of tooling install easily broke the 
script.
2. Running Gradle build locally too extremely long time.
3. Auto-pr-creation (use hub) doesn't work.

We can move Gradle build to Jenkins in order to get rid of environment setup 
work.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=299826&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299826
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 23/Aug/19 00:13
Start Date: 23/Aug/19 00:13
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-524124310
 
 
   Trying to think of a better name than PortableSchemaCoder, but I guess this 
is fine for now.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299826)
Time Spent: 8.5h  (was: 8h 20m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=299822&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299822
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 22/Aug/19 23:33
Start Date: 22/Aug/19 23:33
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on issue #9144: [BEAM-7013] 
Integrating ZetaSketch's HLL++ algorithm with Beam
URL: https://github.com/apache/beam/pull/9144#issuecomment-524116749
 
 
   Run Java PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299822)
Time Spent: 27h  (was: 26h 50m)

> A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 27h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=299820&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299820
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 22/Aug/19 23:28
Start Date: 22/Aug/19 23:28
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on issue #9144: [BEAM-7013] 
Integrating ZetaSketch's HLL++ algorithm with Beam
URL: https://github.com/apache/beam/pull/9144#issuecomment-524115723
 
 
   I have made the change such that the BQ tables needed for testing is now 
created before the tests and deleted after the tests. PTAL.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299820)
Time Spent: 26h 50m  (was: 26h 40m)

> A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 26h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=299819&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299819
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 22/Aug/19 23:27
Start Date: 22/Aug/19 23:27
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on pull request #9144: [BEAM-7013] 
Integrating ZetaSketch's HLL++ algorithm with Beam
URL: https://github.com/apache/beam/pull/9144#discussion_r316924297
 
 

 ##
 File path: 
sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java
 ##
 @@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.zetasketch;
+
+import com.google.api.services.bigquery.model.TableFieldSchema;
+import com.google.api.services.bigquery.model.TableRow;
+import com.google.api.services.bigquery.model.TableSchema;
+import java.nio.ByteBuffer;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.coders.ByteArrayCoder;
+import org.apache.beam.sdk.extensions.gcp.options.GcpOptions;
+import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO;
+import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO.TypedRead.Method;
+import org.apache.beam.sdk.io.gcp.bigquery.SchemaAndRecord;
+import org.apache.beam.sdk.io.gcp.testing.BigqueryMatcher;
+import org.apache.beam.sdk.options.ApplicationNameOptions;
+import org.apache.beam.sdk.testing.PAssert;
+import org.apache.beam.sdk.testing.TestPipeline;
+import org.apache.beam.sdk.testing.TestPipelineOptions;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.SerializableFunction;
+import org.apache.beam.sdk.values.PCollection;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/**
+ * Integration tests for HLL++ sketch compatibility between Beam and BigQuery. 
The tests verifies
+ * that HLL++ sketches created in Beam can be processed by BigQuery, and vice 
versa.
+ */
+@RunWith(JUnit4.class)
+public class BigQueryHllSketchCompatibilityIT {
+
+  private static final String DATASET_NAME = "zetasketch_compatibility_test";
+
+  // Table for testReadSketchFromBigQuery()
+  // Schema: only one STRING field named "data".
+  // Content: prepopulated with 4 rows: "Apple", "Orange", "Banana", "Orange"
+  private static final String DATA_TABLE_NAME = "hll_data";
+  private static final String DATA_FIELD_NAME = "data";
+  private static final String QUERY_RESULT_FIELD_NAME = "sketch";
+  private static final Long EXPECTED_COUNT = 3L;
+
+  // Table for testWriteSketchToBigQuery()
+  // Schema: only one BYTES field named "sketch".
+  // Content: will be overridden by the sketch computed by the test pipeline 
each time the test runs
+  private static final String SKETCH_TABLE_NAME = "hll_sketch";
+  private static final String SKETCH_FIELD_NAME = "sketch";
+  private static final List TEST_DATA =
+  Arrays.asList("Apple", "Orange", "Banana", "Orange");
+  // SHA-1 hash of string "[3]", the string representation of a row that has 
only one field 3 in it
+  private static final String EXPECTED_CHECKSUM = 
"f1e31df9806ce94c5bdbbfff9608324930f4d3f1";
+
+  /**
+   * Test that HLL++ sketch computed in BigQuery can be processed by Beam. Hll 
sketch is computed by
+   * {@code HLL_COUNT.INIT} in BigQuery and read into Beam; the test verifies 
that we can run {@link
+   * HllCount.MergePartial} and {@link HllCount.Extract} on the sketch in Beam 
to get the correct
+   * estimated count.
+   */
+  @Test
+  public void testReadSketchFromBigQuery() {
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=299818&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299818
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 22/Aug/19 23:27
Start Date: 22/Aug/19 23:27
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on pull request #9144: [BEAM-7013] 
Integrating ZetaSketch's HLL++ algorithm with Beam
URL: https://github.com/apache/beam/pull/9144#discussion_r316924234
 
 

 ##
 File path: sdks/java/extensions/zetasketch/build.gradle
 ##
 @@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import groovy.json.JsonOutput
+
+plugins { id 'org.apache.beam.module' }
+applyJavaNature()
+
+description = "Apache Beam :: SDKs :: Java :: Extensions :: ZetaSketch"
+
+def zetasketch_version = "0.1.0"
+
+dependencies {
+compile library.java.vendored_guava_26_0_jre
+compile project(path: ":sdks:java:core", configuration: "shadow")
+compile "com.google.zetasketch:zetasketch:$zetasketch_version"
+testCompile library.java.junit
+testCompile project(":sdks:java:io:google-cloud-platform")
+testRuntimeOnly project(":runners:direct-java")
+testRuntimeOnly project(":runners:google-cloud-dataflow-java")
+}
+
+/**
+ * Integration tests running on Dataflow with BigQuery.
+ */
+task integrationTest(type: Test) {
+group = "Verification"
+def gcpProject = project.findProperty('gcpProject') ?: 
'apache-beam-testing'
+def gcpTempRoot = project.findProperty('gcpTempRoot') ?: 
'gs://temp-storage-for-end-to-end-tests'
+systemProperty "beamTestPipelineOptions", JsonOutput.toJson([
+"--runner=TestDataflowRunner",
+"--project=${gcpProject}",
+"--tempRoot=${gcpTempRoot}",
+])
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299818)
Time Spent: 26.5h  (was: 26h 20m)

> A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 26.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Commented] (BEAM-7864) Portable spark Reshuffle coder cast exception

2019-08-22 Thread Kyle Weaver (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-7864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16913794#comment-16913794
 ] 

Kyle Weaver commented on BEAM-7864:
---

The underlying coder to the LengthPrefixCoder is ByteArrayCoder, which is the 
fallback because we have unknown coder URN "beam:coder:pickled_python:v1". The 
reshuffle transform is just receiving an array of bytes, which have been 
presumably pickled somehow. We will need to unpickle them if we want to 
separate keys and values. I'm not sure if that's possible. 

> Portable spark Reshuffle coder cast exception
> -
>
> Key: BEAM-7864
> URL: https://issues.apache.org/jira/browse/BEAM-7864
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>
> running :sdks:python:test-suites:portable:py35:portableWordCountBatch in 
> either loopback or docker mode on master fails with exception:
>  
> java.lang.ClassCastException: org.apache.beam.sdk.coders.LengthPrefixCoder 
> cannot be cast to org.apache.beam.sdk.coders.KvCoder
>  at 
> org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translateReshuffle(SparkBatchPortablePipelineTranslator.java:400)
>  at 
> org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translate(SparkBatchPortablePipelineTranslator.java:147)
>  at 
> org.apache.beam.runners.spark.SparkPipelineRunner.lambda$run$1(SparkPipelineRunner.java:96)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299812&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299812
 ]

ASF GitHub Bot logged work on BEAM-8023:


Author: ASF GitHub Bot
Created on: 22/Aug/19 23:21
Start Date: 22/Aug/19 23:21
Worklog Time Spent: 10m 
  Work Description: kmjung commented on issue #9405: [BEAM-8023] Add value 
provider interfaces for BigQueryIO.Read using Method.DIRECT_READ
URL: https://github.com/apache/beam/pull/9405#issuecomment-524114002
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299812)
Time Spent: 2h 20m  (was: 2h 10m)

> Allow specifying BigQuery Storage API readOptions at runtime
> 
>
> Key: BEAM-8023
> URL: https://issues.apache.org/jira/browse/BEAM-8023
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Jeff Klukas
>Assignee: Kenneth Jung
>Priority: Minor
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> We have support in the Java SDK for using the BigQuery Storage API for reads, 
> but only the target query or table is supported as a ValueProvider to be 
> specified at runtime. AFAICT, there is no reason we can't delay specifying 
> readOptions until runtime as well.
> The readOptions are accessed by BigQueryStorageTableSource in getTargetTable; 
> I believe that's occurring at runtime, but I'd love for someone with deeper 
> BoundedSource knowledge to confirm that.
> I'd advocate for adding new methods 
> `TypedRead.withSelectedFields(ValueProvider> value)` and 
> `TypedRead.withRowRestriction(ValueProvider value)`. The existing 
> `withReadOptions` method would then populate the other two as 
> StaticValueProviders. Perhaps we'd want to deprecate `withReadOptions` in 
> favor or specifying individual read options as separate parameters.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Resolved] (BEAM-8036) [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method

2019-08-22 Thread Rui Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang resolved BEAM-8036.

Fix Version/s: Not applicable
   Resolution: Fixed

> [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method
> 
>
> Key: BEAM-8036
> URL: https://issues.apache.org/jira/browse/BEAM-8036
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Mikhail Gryzykhin
>Assignee: Rui Wang
>Priority: Major
>  Labels: currently-failing
> Fix For: Not applicable
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|[https://builds.apache.org/job/beam_PostCommit_SQL/2417/console]]
>  * [Gradle Build Scan|TODO]
>  * [Test source code|TODO]
> Initial investigation:
> *09:03:27* >
>  *Task :sdks:java:extensions:sql:datacatalog:integrationTest*
> *09:03:27* *09:03:27* 
> org.apache.beam.sdk.extensions.sql.meta.provider.datacatalog.DataCatalogBigQueryIT
>  > testReadWrite FAILED*09:03:27* java.lang.NoSuchMethodError at 
> DataCatalogBigQueryIT.java:69*09:03:27* *09:03:27* 1 test completed, 1 
> failed*09:03:28* *09:03:28* >
>  *Task :sdks:java:extensions:sql:datacatalog:integrationTest*
>  FAILED*09:03:28* *09:03:28* FAILURE: Build failed with an exception.
>  
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-8036) [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8036?focusedWorklogId=299806&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299806
 ]

ASF GitHub Bot logged work on BEAM-8036:


Author: ASF GitHub Bot
Created on: 22/Aug/19 23:04
Start Date: 22/Aug/19 23:04
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on pull request #9409: [BEAM-8036] 
fix failed postcommit
URL: https://github.com/apache/beam/pull/9409
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299806)
Time Spent: 2h  (was: 1h 50m)

> [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method
> 
>
> Key: BEAM-8036
> URL: https://issues.apache.org/jira/browse/BEAM-8036
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Mikhail Gryzykhin
>Assignee: Rui Wang
>Priority: Major
>  Labels: currently-failing
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|[https://builds.apache.org/job/beam_PostCommit_SQL/2417/console]]
>  * [Gradle Build Scan|TODO]
>  * [Test source code|TODO]
> Initial investigation:
> *09:03:27* >
>  *Task :sdks:java:extensions:sql:datacatalog:integrationTest*
> *09:03:27* *09:03:27* 
> org.apache.beam.sdk.extensions.sql.meta.provider.datacatalog.DataCatalogBigQueryIT
>  > testReadWrite FAILED*09:03:27* java.lang.NoSuchMethodError at 
> DataCatalogBigQueryIT.java:69*09:03:27* *09:03:27* 1 test completed, 1 
> failed*09:03:28* *09:03:28* >
>  *Task :sdks:java:extensions:sql:datacatalog:integrationTest*
>  FAILED*09:03:28* *09:03:28* FAILURE: Build failed with an exception.
>  
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8078) streaming_wordcount_debugging.py is missing a test

2019-08-22 Thread Udi Meiri (Jira)

Udi Meiri created BEAM-8078:
---

 Summary: streaming_wordcount_debugging.py is missing a test
 Key: BEAM-8078
 URL: https://issues.apache.org/jira/browse/BEAM-8078
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Udi Meiri


It's example code and should have a basic_test (like the other wordcount 
variants in [1]) to at least verify that it runs in the latest Beam release.

[1] https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-8036) [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8036?focusedWorklogId=299802&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299802
 ]

ASF GitHub Bot logged work on BEAM-8036:


Author: ASF GitHub Bot
Created on: 22/Aug/19 22:47
Start Date: 22/Aug/19 22:47
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #9409: [BEAM-8036] fix 
failed postcommit
URL: https://github.com/apache/beam/pull/9409#issuecomment-524106188
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299802)
Time Spent: 1h 50m  (was: 1h 40m)

> [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method
> 
>
> Key: BEAM-8036
> URL: https://issues.apache.org/jira/browse/BEAM-8036
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Mikhail Gryzykhin
>Assignee: Rui Wang
>Priority: Major
>  Labels: currently-failing
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|[https://builds.apache.org/job/beam_PostCommit_SQL/2417/console]]
>  * [Gradle Build Scan|TODO]
>  * [Test source code|TODO]
> Initial investigation:
> *09:03:27* >
>  *Task :sdks:java:extensions:sql:datacatalog:integrationTest*
> *09:03:27* *09:03:27* 
> org.apache.beam.sdk.extensions.sql.meta.provider.datacatalog.DataCatalogBigQueryIT
>  > testReadWrite FAILED*09:03:27* java.lang.NoSuchMethodError at 
> DataCatalogBigQueryIT.java:69*09:03:27* *09:03:27* 1 test completed, 1 
> failed*09:03:28* *09:03:28* >
>  *Task :sdks:java:extensions:sql:datacatalog:integrationTest*
>  FAILED*09:03:28* *09:03:28* FAILURE: Build failed with an exception.
>  
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-8036) [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8036?focusedWorklogId=299795&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299795
 ]

ASF GitHub Bot logged work on BEAM-8036:


Author: ASF GitHub Bot
Created on: 22/Aug/19 22:26
Start Date: 22/Aug/19 22:26
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #9409: [BEAM-8036] fix 
failed postcommit
URL: https://github.com/apache/beam/pull/9409#issuecomment-524101331
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299795)
Time Spent: 1h 40m  (was: 1.5h)

> [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method
> 
>
> Key: BEAM-8036
> URL: https://issues.apache.org/jira/browse/BEAM-8036
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Mikhail Gryzykhin
>Assignee: Rui Wang
>Priority: Major
>  Labels: currently-failing
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|[https://builds.apache.org/job/beam_PostCommit_SQL/2417/console]]
>  * [Gradle Build Scan|TODO]
>  * [Test source code|TODO]
> Initial investigation:
> *09:03:27* >
>  *Task :sdks:java:extensions:sql:datacatalog:integrationTest*
> *09:03:27* *09:03:27* 
> org.apache.beam.sdk.extensions.sql.meta.provider.datacatalog.DataCatalogBigQueryIT
>  > testReadWrite FAILED*09:03:27* java.lang.NoSuchMethodError at 
> DataCatalogBigQueryIT.java:69*09:03:27* *09:03:27* 1 test completed, 1 
> failed*09:03:28* *09:03:28* >
>  *Task :sdks:java:extensions:sql:datacatalog:integrationTest*
>  FAILED*09:03:28* *09:03:28* FAILURE: Build failed with an exception.
>  
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Updated] (BEAM-7993) portable python precommit is flaky

2019-08-22 Thread Pablo Estrada (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-7993:

Fix Version/s: (was: 2.15.0)
   2.16.0

> portable python precommit is flaky
> --
>
> Key: BEAM-7993
> URL: https://issues.apache.org/jira/browse/BEAM-7993
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures, testing
>Affects Versions: 2.15.0
>Reporter: Udi Meiri
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: currently-failing
> Fix For: 2.16.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I'm not sure what the root cause is here.
> Example log where 
> :sdks:python:test-suites:portable:py35:portableWordCountBatch failed:
> {code}
> 11:51:22 [CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap 
> (FlatMap at ExtractOutput[0]) (2/2)] ERROR 
> org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN 
> MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at 
> ExtractOutput[0]) (2/2)
> 11:51:22 [CHAIN MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap 
> (FlatMap at ExtractOutput[0]) (1/2)] ERROR 
> org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN 
> MapPartition (MapPartition at [1]read/Read/Split) -> FlatMap (FlatMap at 
> ExtractOutput[0]) (1/2)
> 11:51:22 [CHAIN MapPartition (MapPartition at 
> [2]write/Write/WriteImpl/DoOnce/{FlatMap(), 
> Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2)] ERROR 
> org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN 
> MapPartition (MapPartition at 
> [2]write/Write/WriteImpl/DoOnce/{FlatMap(), 
> Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (2/2)
> 11:51:22 [CHAIN MapPartition (MapPartition at 
> [2]write/Write/WriteImpl/DoOnce/{FlatMap(), 
> Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2)] ERROR 
> org.apache.flink.runtime.operators.BatchTask - Error in task code:  CHAIN 
> MapPartition (MapPartition at 
> [2]write/Write/WriteImpl/DoOnce/{FlatMap(), 
> Map(decode)}) -> FlatMap (FlatMap at ExtractOutput[0]) (1/2)
> 11:51:22 java.lang.Exception: The user defined 'open()' method caused an 
> exception: java.io.IOException: Received exit code 1 for command 'docker 
> inspect -f {{.State.Running}} 
> 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1'. stderr: 
> Error: No such object: 
> 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1
> 11:51:22  at 
> org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:498)
> 11:51:22  at 
> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368)
> 11:51:22  at org.apache.flink.runtime.taskmanager.Task.run(Task.java:712)
> 11:51:22  at java.lang.Thread.run(Thread.java:748)
> 11:51:22 Caused by: 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.io.IOException: Received exit code 1 for command 'docker inspect -f 
> {{.State.Running}} 
> 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1'. stderr: 
> Error: No such object: 
> 642c312c335d3881b885873c66917b536e79cff07503fdceaddee5fbeb10bfd1
> 11:51:22  at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4966)
> 11:51:22  at 
> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.(DefaultJobBundleFactory.java:211)
> 11:51:22  at 
> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory$SimpleStageBundleFactory.(DefaultJobBundleFactory.java:202)
> 11:51:22  at 
> org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactory.forStage(DefaultJobBundleFactory.java:185)
> 11:51:22  at 
> org.apache.beam.runners.flink.translation.functions.FlinkDefaultExecutableStageContext.getStageBundleFactory(FlinkDefaultExecutableStageContext.java:49)
> 11:51:22  at 
> org.apache.beam.runners.flink.translation.functions.ReferenceCountingFlinkExecutableStageContextFactory$WrappedContext.getStageBundleFactory(ReferenceCountingFlinkExecutableStageContextFactory.java:203)
> 11:51:22  at 
> org.apache.beam.runners.flink.translation.functions.FlinkExecutableStageFunction.open(FlinkExecutableStageFunction.java:129)
> 11:51:22  at 
> org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
> 11:51:22  at 
> org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:494)
> 11:51:22  ... 3 more
> {code}
> https://builds.apache.org/job/beam_PreCommit_Portable_Python_Commit/5512/consoleFull



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-8036) [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8036?focusedWorklogId=299788&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299788
 ]

ASF GitHub Bot logged work on BEAM-8036:


Author: ASF GitHub Bot
Created on: 22/Aug/19 22:11
Start Date: 22/Aug/19 22:11
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #9407: [BEAM-8036] 
disable failed Postcommit Test
URL: https://github.com/apache/beam/pull/9407
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299788)
Time Spent: 1.5h  (was: 1h 20m)

> [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method
> 
>
> Key: BEAM-8036
> URL: https://issues.apache.org/jira/browse/BEAM-8036
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Mikhail Gryzykhin
>Assignee: Rui Wang
>Priority: Major
>  Labels: currently-failing
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|[https://builds.apache.org/job/beam_PostCommit_SQL/2417/console]]
>  * [Gradle Build Scan|TODO]
>  * [Test source code|TODO]
> Initial investigation:
> *09:03:27* >
>  *Task :sdks:java:extensions:sql:datacatalog:integrationTest*
> *09:03:27* *09:03:27* 
> org.apache.beam.sdk.extensions.sql.meta.provider.datacatalog.DataCatalogBigQueryIT
>  > testReadWrite FAILED*09:03:27* java.lang.NoSuchMethodError at 
> DataCatalogBigQueryIT.java:69*09:03:27* *09:03:27* 1 test completed, 1 
> failed*09:03:28* *09:03:28* >
>  *Task :sdks:java:extensions:sql:datacatalog:integrationTest*
>  FAILED*09:03:28* *09:03:28* FAILURE: Build failed with an exception.
>  
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-8036) [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8036?focusedWorklogId=299787&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299787
 ]

ASF GitHub Bot logged work on BEAM-8036:


Author: ASF GitHub Bot
Created on: 22/Aug/19 22:11
Start Date: 22/Aug/19 22:11
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #9407: [BEAM-8036] disable 
failed Postcommit Test
URL: https://github.com/apache/beam/pull/9407#issuecomment-524097654
 
 
   https://github.com/apache/beam/pull/9409 is supposed to fix this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299787)
Time Spent: 1h 20m  (was: 1h 10m)

> [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method
> 
>
> Key: BEAM-8036
> URL: https://issues.apache.org/jira/browse/BEAM-8036
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Mikhail Gryzykhin
>Assignee: Rui Wang
>Priority: Major
>  Labels: currently-failing
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|[https://builds.apache.org/job/beam_PostCommit_SQL/2417/console]]
>  * [Gradle Build Scan|TODO]
>  * [Test source code|TODO]
> Initial investigation:
> *09:03:27* >
>  *Task :sdks:java:extensions:sql:datacatalog:integrationTest*
> *09:03:27* *09:03:27* 
> org.apache.beam.sdk.extensions.sql.meta.provider.datacatalog.DataCatalogBigQueryIT
>  > testReadWrite FAILED*09:03:27* java.lang.NoSuchMethodError at 
> DataCatalogBigQueryIT.java:69*09:03:27* *09:03:27* 1 test completed, 1 
> failed*09:03:28* *09:03:28* >
>  *Task :sdks:java:extensions:sql:datacatalog:integrationTest*
>  FAILED*09:03:28* *09:03:28* FAILURE: Build failed with an exception.
>  
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-8038) Python Precommit fail: 'BeamFnExternalWorkerPoolServicer' has no attribute '_worker_processes'

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8038?focusedWorklogId=299785&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299785
 ]

ASF GitHub Bot logged work on BEAM-8038:


Author: ASF GitHub Bot
Created on: 22/Aug/19 22:05
Start Date: 22/Aug/19 22:05
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #9403: [BEAM-8038] Fix 
worker pool exit hook
URL: https://github.com/apache/beam/pull/9403
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299785)
Time Spent: 1h  (was: 50m)

> Python Precommit fail: 'BeamFnExternalWorkerPoolServicer' has no attribute 
> '_worker_processes'
> --
>
> Key: BEAM-8038
> URL: https://issues.apache.org/jira/browse/BEAM-8038
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness, test-failures
>Reporter: Ahmet Altay
>Assignee: Thomas Weise
>Priority: Critical
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Logs: https://builds.apache.org/job/beam_PreCommit_Python_Commit/8246/console
> 10:14:09 
> --
> 10:14:09 XML: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/nosetests.xml
> 10:14:09 
> --
> 10:14:09 Ran 2594 tests in 629.438s
> 10:14:09 
> 10:14:09 OK (SKIP=520)
> 10:14:09 Error in atexit._run_exitfuncs:
> 10:14:09 Traceback (most recent call last):
> 10:14:09   File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
> 10:14:09 func(*targs, **kargs)
> 10:14:09   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/worker_pool_main.py",
>  line 72, in kill_worker_processes
> 10:14:09 for worker_process in cls._worker_processes.values():
> 10:14:09 AttributeError: type object 'BeamFnExternalWorkerPoolServicer' has 
> no attribute '_worker_processes'
> 10:14:09 Error in sys.exitfunc:
> 10:14:09 Traceback (most recent call last):
> 10:14:09   File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
> 10:14:09 func(*targs, **kargs)
> 10:14:09   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/worker_pool_main.py",
>  line 72, in kill_worker_processes
> 10:14:09 for worker_process in cls._worker_processes.values():
> 10:14:09 AttributeError: type object 'BeamFnExternalWorkerPoolServicer' has 
> no attribute '_worker_processes'
> 10:14:10 py27-cython run-test-post: commands[0] | 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/scripts/run_tox_cleanup.sh
> 10:14:10 ___ summary 
> 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Updated] (BEAM-8036) [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method

2019-08-22 Thread Rui Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-8036:
---
Status: Open  (was: Triage Needed)

> [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method
> 
>
> Key: BEAM-8036
> URL: https://issues.apache.org/jira/browse/BEAM-8036
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Mikhail Gryzykhin
>Assignee: Rui Wang
>Priority: Major
>  Labels: currently-failing
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|[https://builds.apache.org/job/beam_PostCommit_SQL/2417/console]]
>  * [Gradle Build Scan|TODO]
>  * [Test source code|TODO]
> Initial investigation:
> *09:03:27* >
>  *Task :sdks:java:extensions:sql:datacatalog:integrationTest*
> *09:03:27* *09:03:27* 
> org.apache.beam.sdk.extensions.sql.meta.provider.datacatalog.DataCatalogBigQueryIT
>  > testReadWrite FAILED*09:03:27* java.lang.NoSuchMethodError at 
> DataCatalogBigQueryIT.java:69*09:03:27* *09:03:27* 1 test completed, 1 
> failed*09:03:28* *09:03:28* >
>  *Task :sdks:java:extensions:sql:datacatalog:integrationTest*
>  FAILED*09:03:28* *09:03:28* FAILURE: Build failed with an exception.
>  
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-8036) [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8036?focusedWorklogId=299783&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299783
 ]

ASF GitHub Bot logged work on BEAM-8036:


Author: ASF GitHub Bot
Created on: 22/Aug/19 22:03
Start Date: 22/Aug/19 22:03
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #9409: [BEAM-8036] fix 
failed postcommit
URL: https://github.com/apache/beam/pull/9409#issuecomment-524095590
 
 
   Run SQL PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299783)
Time Spent: 1h 10m  (was: 1h)

> [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method
> 
>
> Key: BEAM-8036
> URL: https://issues.apache.org/jira/browse/BEAM-8036
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Mikhail Gryzykhin
>Assignee: Rui Wang
>Priority: Major
>  Labels: currently-failing
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|[https://builds.apache.org/job/beam_PostCommit_SQL/2417/console]]
>  * [Gradle Build Scan|TODO]
>  * [Test source code|TODO]
> Initial investigation:
> *09:03:27* >
>  *Task :sdks:java:extensions:sql:datacatalog:integrationTest*
> *09:03:27* *09:03:27* 
> org.apache.beam.sdk.extensions.sql.meta.provider.datacatalog.DataCatalogBigQueryIT
>  > testReadWrite FAILED*09:03:27* java.lang.NoSuchMethodError at 
> DataCatalogBigQueryIT.java:69*09:03:27* *09:03:27* 1 test completed, 1 
> failed*09:03:28* *09:03:28* >
>  *Task :sdks:java:extensions:sql:datacatalog:integrationTest*
>  FAILED*09:03:28* *09:03:28* FAILURE: Build failed with an exception.
>  
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Updated] (BEAM-8077) CONCAT function is broken

2019-08-22 Thread Rui Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-8077:
---
Summary: CONCAT function is broken  (was: CONCAT function breaks)

> CONCAT function is broken
> -
>
> Key: BEAM-8077
> URL: https://issues.apache.org/jira/browse/BEAM-8077
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Updated] (BEAM-8076) FieldAccess in Join is borken

2019-08-22 Thread Rui Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-8076:
---
Summary: FieldAccess in Join is borken   (was: FieldAccess in Join breaks )

> FieldAccess in Join is borken 
> --
>
> Key: BEAM-8076
> URL: https://issues.apache.org/jira/browse/BEAM-8076
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8077) CONCAT function breaks

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8077:
--

 Summary: CONCAT function breaks
 Key: BEAM-8077
 URL: https://issues.apache.org/jira/browse/BEAM-8077
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-6114) SQL join selection should be done in planner, not in expansion to PTransform

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-6114?focusedWorklogId=299777&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299777
 ]

ASF GitHub Bot logged work on BEAM-6114:


Author: ASF GitHub Bot
Created on: 22/Aug/19 21:44
Start Date: 22/Aug/19 21:44
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #9395: [BEAM-6114] 
Calcite Rules to Select Type of Join in BeamSQL
URL: https://github.com/apache/beam/pull/9395
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299777)
Time Spent: 3.5h  (was: 3h 20m)

> SQL join selection should be done in planner, not in expansion to PTransform
> 
>
> Key: BEAM-6114
> URL: https://issues.apache.org/jira/browse/BEAM-6114
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Rahul Patwari
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Currently Beam SQL joins all go through a single physical operator which has 
> a single PTransform that does all join algorithms based on properties of its 
> input PCollections as well as the relational algebra.
> A first step is to make the needed information part of the relational 
> algebra, so it can choose a PTransform based on that, and the PTransforms can 
> be simpler.
> Second step is to have separate (physical) relational operators for different 
> join algorithms.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8076) FieldAccess in Join breaks

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8076:
--

 Summary: FieldAccess in Join breaks 
 Key: BEAM-8076
 URL: https://issues.apache.org/jira/browse/BEAM-8076
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-8036) [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8036?focusedWorklogId=299775&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299775
 ]

ASF GitHub Bot logged work on BEAM-8036:


Author: ASF GitHub Bot
Created on: 22/Aug/19 21:38
Start Date: 22/Aug/19 21:38
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #9409: [BEAM-8036] fix 
failed postcommit
URL: https://github.com/apache/beam/pull/9409#issuecomment-524087952
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299775)
Time Spent: 1h  (was: 50m)

> [beam_PostCommit_SQL] [DataCatalogBigQueryIT > testReadWrite] No such method
> 
>
> Key: BEAM-8036
> URL: https://issues.apache.org/jira/browse/BEAM-8036
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Mikhail Gryzykhin
>Assignee: Rui Wang
>Priority: Major
>  Labels: currently-failing
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> _Use this form to file an issue for test failure:_
>  * [Jenkins 
> Job|[https://builds.apache.org/job/beam_PostCommit_SQL/2417/console]]
>  * [Gradle Build Scan|TODO]
>  * [Test source code|TODO]
> Initial investigation:
> *09:03:27* >
>  *Task :sdks:java:extensions:sql:datacatalog:integrationTest*
> *09:03:27* *09:03:27* 
> org.apache.beam.sdk.extensions.sql.meta.provider.datacatalog.DataCatalogBigQueryIT
>  > testReadWrite FAILED*09:03:27* java.lang.NoSuchMethodError at 
> DataCatalogBigQueryIT.java:69*09:03:27* *09:03:27* 1 test completed, 1 
> failed*09:03:28* *09:03:28* >
>  *Task :sdks:java:extensions:sql:datacatalog:integrationTest*
>  FAILED*09:03:28* *09:03:28* FAILURE: Build failed with an exception.
>  
> 
> _After you've filled out the above details, please [assign the issue to an 
> individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist].
>  Assignee should [treat test failures as 
> high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test],
>  helping to fix the issue or find a more appropriate owner. See [Apache Beam 
> Post-Commit 
> Policies|https://beam.apache.org/contribute/postcommits-policies]._



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-8038) Python Precommit fail: 'BeamFnExternalWorkerPoolServicer' has no attribute '_worker_processes'

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8038?focusedWorklogId=299771&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299771
 ]

ASF GitHub Bot logged work on BEAM-8038:


Author: ASF GitHub Bot
Created on: 22/Aug/19 21:34
Start Date: 22/Aug/19 21:34
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #9403: [BEAM-8038] Fix worker 
pool exit hook
URL: https://github.com/apache/beam/pull/9403#issuecomment-524086550
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299771)
Time Spent: 50m  (was: 40m)

> Python Precommit fail: 'BeamFnExternalWorkerPoolServicer' has no attribute 
> '_worker_processes'
> --
>
> Key: BEAM-8038
> URL: https://issues.apache.org/jira/browse/BEAM-8038
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness, test-failures
>Reporter: Ahmet Altay
>Assignee: Thomas Weise
>Priority: Critical
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Logs: https://builds.apache.org/job/beam_PreCommit_Python_Commit/8246/console
> 10:14:09 
> --
> 10:14:09 XML: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/nosetests.xml
> 10:14:09 
> --
> 10:14:09 Ran 2594 tests in 629.438s
> 10:14:09 
> 10:14:09 OK (SKIP=520)
> 10:14:09 Error in atexit._run_exitfuncs:
> 10:14:09 Traceback (most recent call last):
> 10:14:09   File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
> 10:14:09 func(*targs, **kargs)
> 10:14:09   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/worker_pool_main.py",
>  line 72, in kill_worker_processes
> 10:14:09 for worker_process in cls._worker_processes.values():
> 10:14:09 AttributeError: type object 'BeamFnExternalWorkerPoolServicer' has 
> no attribute '_worker_processes'
> 10:14:09 Error in sys.exitfunc:
> 10:14:09 Traceback (most recent call last):
> 10:14:09   File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
> 10:14:09 func(*targs, **kargs)
> 10:14:09   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/worker_pool_main.py",
>  line 72, in kill_worker_processes
> 10:14:09 for worker_process in cls._worker_processes.values():
> 10:14:09 AttributeError: type object 'BeamFnExternalWorkerPoolServicer' has 
> no attribute '_worker_processes'
> 10:14:10 py27-cython run-test-post: commands[0] | 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/scripts/run_tox_cleanup.sh
> 10:14:10 ___ summary 
> 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8075) IndexOutOfBounds in LogicalProject

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8075:
--

 Summary: IndexOutOfBounds in LogicalProject
 Key: BEAM-8075
 URL: https://issues.apache.org/jira/browse/BEAM-8075
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang


SELECT payload.bankId, 
   SUM(payload.purchaseAmountCents) / 100 AS totalPurchase
FROM pubsub.topic.`instant-insights`.`retaildemo-online-transactions-json`
GROUP BY payload.bankId

Causes the workers to fail with:

Exception in thread "main" java.lang.RuntimeException: Error while applying 
rule ProjectToCalcRule, args 
[rel#9:LogicalProject.NONE(input=RelSubset#8,bankId=$0,totalPurchase=/(CAST($3):DOUBLE
 NOT NULL, 1E2))]
at org.apache



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Updated] (BEAM-8074) Update error message when reading from table with unsupported data types

2019-08-22 Thread Rui Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-8074:
---
Description: 
When reading NUMERIC column from BQ table, the query will fail with error 
message "Does not support DATE, TIME and DATETIME types in source tables"


We should include NUMERIC in this error message.

> Update error message when reading from table with unsupported data types 
> -
>
> Key: BEAM-8074
> URL: https://issues.apache.org/jira/browse/BEAM-8074
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Priority: Major
>
> When reading NUMERIC column from BQ table, the query will fail with error 
> message "Does not support DATE, TIME and DATETIME types in source tables"
> We should include NUMERIC in this error message.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8074) Update error message when reading from table with unsupported data types

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8074:
--

 Summary: Update error message when reading from table with 
unsupported data types 
 Key: BEAM-8074
 URL: https://issues.apache.org/jira/browse/BEAM-8074
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8073) CAST Timestamp -> String doesn't properly handle timezones with sub-minute offsets

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8073:
--

 Summary: CAST Timestamp -> String doesn't properly handle 
timezones with sub-minute offsets 
 Key: BEAM-8073
 URL: https://issues.apache.org/jira/browse/BEAM-8073
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang


One of the timestamp -> string test cases is -621355968 microseconds 
from the unix epoch, or 01/01/0001 00:00:00 GMT

Technically the timezone offset at this time in America/Los_Angeles is 
-07:52:58. This causes the following error:

Expected: ARRAY>[{"-12-31 16:08:00-07:52"}]
  Actual: ARRAY>[{"-12-31 16:07:02-07:52"}]

Note that ZetaSQL expects us to completely truncate the second part of the 
offset. It's not used when subtracting from the origin datetime, and it's not 
included in the offset string. However when we perform this conversion, joda 
time uses the second part of the offset, and thus our time string is off by 58 
seconds.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8071) LIMIT with negative OFFSET should throw an error

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8071:
--

 Summary: LIMIT with negative OFFSET should throw an error 
 Key: BEAM-8071
 URL: https://issues.apache.org/jira/browse/BEAM-8071
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang


Currently just returns data as if OFFSET were 0




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8072) Allow non ColumnRef nodes in aggreation functions

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8072:
--

 Summary: Allow non ColumnRef nodes in aggreation functions
 Key: BEAM-8072
 URL: https://issues.apache.org/jira/browse/BEAM-8072
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang


Currently we throw an error if node is not a Column Ref or CAST(Column Ref)




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8070) Support empty array literal

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8070:
--

 Summary: Support empty array literal
 Key: BEAM-8070
 URL: https://issues.apache.org/jira/browse/BEAM-8070
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang


Currently BeamSQL throws an IndexOutOfBoundsException when given a query with 
an empty array literal. This happens because Calcite attempts to infer the 
element types [1,2] from an empty element list.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8068) Throw expected error when LIKE pattern ends with backslash

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8068:
--

 Summary: Throw expected error when LIKE pattern ends with backslash
 Key: BEAM-8068
 URL: https://issues.apache.org/jira/browse/BEAM-8068
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang


ZetaSQL expect returning a status code out_of_range with message "LIKE pattern 
ends with a backslash" in that situation.

We do throw an error when this happens (a RuntimeException with that message), 
but when it gets returned over gRPC to the framework for some reason it is 
mapped to status code unknown with no message.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8069) OFFSET in LIMIT clause only accepts literal or parameter

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8069:
--

 Summary: OFFSET in LIMIT clause only accepts literal or parameter
 Key: BEAM-8069
 URL: https://issues.apache.org/jira/browse/BEAM-8069
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang


Should verify what is in side parameter. E.g. Parameter might contain string or 
other unaccepted types.




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8067) Throw exception when truncating nano/micro to millis when creating time literals

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8067:
--

 Summary:  Throw exception when truncating nano/micro to millis 
when creating time literals
 Key: BEAM-8067
 URL: https://issues.apache.org/jira/browse/BEAM-8067
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang


time values in googlesql is encoded in a special form. Need a function to 
extract sub-millis from time values and decide if rejection is needed.




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8066) Have a right implementation on nullability of return type of AggregateCall

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8066:
--

 Summary:  Have a right implementation on nullability of return 
type of AggregateCall
 Key: BEAM-8066
 URL: https://issues.apache.org/jira/browse/BEAM-8066
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang


 Have a right implementation on nullability of return type of AggregateCall



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8064) Throw exceptions when overflow or division by 0 in arithmetic operators

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8064:
--

 Summary: Throw exceptions when overflow or division by 0 in 
arithmetic operators
 Key: BEAM-8064
 URL: https://issues.apache.org/jira/browse/BEAM-8064
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang


Should throw out of range exception when stackoverflow. 
division by 0 should throw our of range exception.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8065) Select * FROM pubsub table should not throw exception

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8065:
--

 Summary: Select * FROM pubsub table should not throw exception
 Key: BEAM-8065
 URL: https://issues.apache.org/jira/browse/BEAM-8065
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8062) Support array member accessor

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8062:
--

 Summary: Support array member accessor
 Key: BEAM-8062
 URL: https://issues.apache.org/jira/browse/BEAM-8062
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang


array[]




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8063) Support STRUCT member field access operator

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8063:
--

 Summary: Support STRUCT member field access operator
 Key: BEAM-8063
 URL: https://issues.apache.org/jira/browse/BEAM-8063
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8060) Support DATE type

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8060:
--

 Summary: Support DATE type
 Key: BEAM-8060
 URL: https://issues.apache.org/jira/browse/BEAM-8060
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8061) Support TIME type

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8061:
--

 Summary: Support TIME type
 Key: BEAM-8061
 URL: https://issues.apache.org/jira/browse/BEAM-8061
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8059) Support struct type

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8059:
--

 Summary: Support struct type
 Key: BEAM-8059
 URL: https://issues.apache.org/jira/browse/BEAM-8059
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8058) Support ARRAY type

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8058:
--

 Summary: Support ARRAY type
 Key: BEAM-8058
 URL: https://issues.apache.org/jira/browse/BEAM-8058
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8056) Support window offset for TUMBLE, HOP and SESSION

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8056:
--

 Summary: Support window offset for TUMBLE, HOP and SESSION
 Key: BEAM-8056
 URL: https://issues.apache.org/jira/browse/BEAM-8056
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8057) Support NAN, INF, and -INF

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8057:
--

 Summary: Support NAN, INF, and -INF
 Key: BEAM-8057
 URL: https://issues.apache.org/jira/browse/BEAM-8057
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8055) Support STRUCT constructor

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8055:
--

 Summary: Support STRUCT constructor
 Key: BEAM-8055
 URL: https://issues.apache.org/jira/browse/BEAM-8055
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang


For example, SELECT STRUCT(1, "test_string")




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8054) Windowing functions should only accept watermarked timestamp column

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8054:
--

 Summary:  Windowing functions should only accept watermarked 
timestamp column
 Key: BEAM-8054
 URL: https://issues.apache.org/jira/browse/BEAM-8054
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-7760) Interactive Beam Caching PCollections bound to user defined vars in notebook

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7760?focusedWorklogId=299769&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299769
 ]

ASF GitHub Bot logged work on BEAM-7760:


Author: ASF GitHub Bot
Created on: 22/Aug/19 21:15
Start Date: 22/Aug/19 21:15
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9278: [BEAM-7760] 
Added Interactive Beam module
URL: https://github.com/apache/beam/pull/9278
 
 
   **Please** add a meaningful description for your change here
   1. Added interactive_beam module that will serve sugar syntax and
   shorthand functions to apply interactivity, create iBeam pipeline,
   visualize PCollection data and execute iBeam pipeline as normal pipeline
   with selected Beam runners without interactivity.
   2. This commit implemented the implicitly managed Interactive Beam
   environment to track definition of user pipelines. It exposed a watch()
   interface for users to explicitly instruct Interactive Beam the
   whereabout of their pipeline definition when it's not in __main__.
   3. This commit implemented a shorthand function create_pipeline()  to
   create a pipeline that is backed by direct runner with interactivity
   when running.
   4. This commit also implemented a shorthand function run_pipeline() to
   run a pipeline created with interactivity on a different runner and
   pipeline options without interactivity. It's useful when interactivity
   is not needed and a one-shot in production-like environment is desired.
   5. This commit exposed a PCollection data exploration interface
   visualize(). Implementation is yet to be added.
   6. Added interactive_environment module for internal usage without
   backward-compatibility. It holds the cache manager and watchable
   metadata for current interactive environment/session/context. Interfaces
   are provided to interact with the environment and its components.
   7. Unit tests included.
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.a

[jira] [Created] (BEAM-8053) Throw error for 1/0 or floating point overflow

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8053:
--

 Summary: Throw error for 1/0 or floating point overflow
 Key: BEAM-8053
 URL: https://issues.apache.org/jira/browse/BEAM-8053
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang


Currently BeamSQL returns infinity rather than throwing an error in these cases




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8052) Should validate if String literal is valid UTF-8

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8052:
--

 Summary: Should validate if String literal is valid UTF-8
 Key: BEAM-8052
 URL: https://issues.apache.org/jira/browse/BEAM-8052
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8051) Convert FLOAT64 to NUMERIC in UNION ALL

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8051:
--

 Summary: Convert FLOAT64 to NUMERIC in UNION ALL
 Key: BEAM-8051
 URL: https://issues.apache.org/jira/browse/BEAM-8051
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang


Analyzer does not reject UNION ALL DOUBLE, for example `2.3 UNION ALL 2.1`. 
BeamSQL does not execute when DOUBLE is in GBK (as UNION ALL is implemented 
based on GBK).

Investigate why DOUBLE appears in GBK in UNION ALL implementation, try to fix 
it, and if it's not feasible, at least need throw exception in planner.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8050) Remove "$" from auto-generated field names

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8050:
--

 Summary:  Remove "$" from auto-generated field names
 Key: BEAM-8050
 URL: https://issues.apache.org/jira/browse/BEAM-8050
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang


ZetaSQL generates column names starting with "$", but "$" in not accepted by BQ 
as field name, so we either force users to add alias or we processed column 
names and remove "$"




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8049) Throw clear exception when handling unsupported interval time units

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8049:
--

 Summary: Throw clear exception when handling unsupported interval 
time units
 Key: BEAM-8049
 URL: https://issues.apache.org/jira/browse/BEAM-8049
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang


E.g  Week and Quarter.




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8048) Support TIMESTAMP Sub function/operator

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8048:
--

 Summary: Support TIMESTAMP Sub function/operator
 Key: BEAM-8048
 URL: https://issues.apache.org/jira/browse/BEAM-8048
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (BEAM-8047) Handle overflow when converting from mills to marcos

2019-08-22 Thread Rui Wang (Jira)

Rui Wang created BEAM-8047:
--

 Summary: Handle overflow when converting from mills to marcos
 Key: BEAM-8047
 URL: https://issues.apache.org/jira/browse/BEAM-8047
 Project: Beam
  Issue Type: Sub-task
  Components: dsl-sql-zetasql
Reporter: Rui Wang


When converting from Joda Instant/Datetime, what can be gotten is epoch millis, 
but conversions require epoch macros, so * 1000L is applied, but it definitely 
has overflow issue and need to be handle appropriately.

This issue exists in zetasql planner.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299767&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299767
 ]

ASF GitHub Bot logged work on BEAM-8023:


Author: ASF GitHub Bot
Created on: 22/Aug/19 21:07
Start Date: 22/Aug/19 21:07
Worklog Time Spent: 10m 
  Work Description: kmjung commented on issue #9405: [BEAM-8023] Add value 
provider interfaces for BigQueryIO.Read using Method.DIRECT_READ
URL: https://github.com/apache/beam/pull/9405#issuecomment-524078069
 
 
   @chamikaramj I think this is ready to go -- please take another look when 
you can.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299767)
Time Spent: 2h 10m  (was: 2h)

> Allow specifying BigQuery Storage API readOptions at runtime
> 
>
> Key: BEAM-8023
> URL: https://issues.apache.org/jira/browse/BEAM-8023
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Jeff Klukas
>Assignee: Kenneth Jung
>Priority: Minor
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> We have support in the Java SDK for using the BigQuery Storage API for reads, 
> but only the target query or table is supported as a ValueProvider to be 
> specified at runtime. AFAICT, there is no reason we can't delay specifying 
> readOptions until runtime as well.
> The readOptions are accessed by BigQueryStorageTableSource in getTargetTable; 
> I believe that's occurring at runtime, but I'd love for someone with deeper 
> BoundedSource knowledge to confirm that.
> I'd advocate for adding new methods 
> `TypedRead.withSelectedFields(ValueProvider> value)` and 
> `TypedRead.withRowRestriction(ValueProvider value)`. The existing 
> `withReadOptions` method would then populate the other two as 
> StaticValueProviders. Perhaps we'd want to deprecate `withReadOptions` in 
> favor or specifying individual read options as separate parameters.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-6114) SQL join selection should be done in planner, not in expansion to PTransform

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-6114?focusedWorklogId=299765&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299765
 ]

ASF GitHub Bot logged work on BEAM-6114:


Author: ASF GitHub Bot
Created on: 22/Aug/19 21:05
Start Date: 22/Aug/19 21:05
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #9395: [BEAM-6114] Calcite 
Rules to Select Type of Join in BeamSQL
URL: https://github.com/apache/beam/pull/9395#issuecomment-524077304
 
 
   LGTM 
   
   I will merge this PR once every test pass.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299765)
Time Spent: 3h 20m  (was: 3h 10m)

> SQL join selection should be done in planner, not in expansion to PTransform
> 
>
> Key: BEAM-6114
> URL: https://issues.apache.org/jira/browse/BEAM-6114
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Rahul Patwari
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Currently Beam SQL joins all go through a single physical operator which has 
> a single PTransform that does all join algorithms based on properties of its 
> input PCollections as well as the relational algebra.
> A first step is to make the needed information part of the relational 
> algebra, so it can choose a PTransform based on that, and the PTransforms can 
> be simpler.
> Second step is to have separate (physical) relational operators for different 
> join algorithms.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299764&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299764
 ]

ASF GitHub Bot logged work on BEAM-8023:


Author: ASF GitHub Bot
Created on: 22/Aug/19 21:01
Start Date: 22/Aug/19 21:01
Worklog Time Spent: 10m 
  Work Description: kmjung commented on pull request #9405: [BEAM-8023] Add 
value provider interfaces for BigQueryIO.Read using Method.DIRECT_READ
URL: https://github.com/apache/beam/pull/9405#discussion_r316883401
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
 ##
 @@ -1251,6 +1294,16 @@ private void ensureFromNotCalledYet() {
   getJsonTableRef() == null && getQuery() == null, "from() or 
fromQuery() already called");
 }
 
+private void ensureReadOptionsNotSet() {
+  checkState(getReadOptions() == null, "withReadOptions() already called");
+}
+
+private void ensureSelectedFieldsAndRowRestrictionNotSet() {
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299764)
Time Spent: 2h  (was: 1h 50m)

> Allow specifying BigQuery Storage API readOptions at runtime
> 
>
> Key: BEAM-8023
> URL: https://issues.apache.org/jira/browse/BEAM-8023
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Jeff Klukas
>Assignee: Kenneth Jung
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> We have support in the Java SDK for using the BigQuery Storage API for reads, 
> but only the target query or table is supported as a ValueProvider to be 
> specified at runtime. AFAICT, there is no reason we can't delay specifying 
> readOptions until runtime as well.
> The readOptions are accessed by BigQueryStorageTableSource in getTargetTable; 
> I believe that's occurring at runtime, but I'd love for someone with deeper 
> BoundedSource knowledge to confirm that.
> I'd advocate for adding new methods 
> `TypedRead.withSelectedFields(ValueProvider> value)` and 
> `TypedRead.withRowRestriction(ValueProvider value)`. The existing 
> `withReadOptions` method would then populate the other two as 
> StaticValueProviders. Perhaps we'd want to deprecate `withReadOptions` in 
> favor or specifying individual read options as separate parameters.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299762&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299762
 ]

ASF GitHub Bot logged work on BEAM-8023:


Author: ASF GitHub Bot
Created on: 22/Aug/19 20:59
Start Date: 22/Aug/19 20:59
Worklog Time Spent: 10m 
  Work Description: kmjung commented on pull request #9405: [BEAM-8023] Add 
value provider interfaces for BigQueryIO.Read using Method.DIRECT_READ
URL: https://github.com/apache/beam/pull/9405#discussion_r316882807
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOStorageReadTest.java
 ##
 @@ -465,6 +500,71 @@ public void 
testTableSourceInitialSplit_WithTableReadOptions() throws Throwable
 BigQueryStorageTableSource.create(
 ValueProvider.StaticValueProvider.of(tableRef),
 readOptions,
+null,
+null,
+new TableRowParser(),
+TableRowJsonCoder.of(),
+new FakeBigQueryServices()
+.withDatasetService(fakeDatasetService)
+.withStorageClient(fakeStorageClient));
+
+List> sources = tableSource.split(10L, 
options);
+assertEquals(10L, sources.size());
+  }
+
+  @Test
+  public void 
testTableSourceInitialSplit_WithSelectedFieldsAndRowRestriction() throws 
Exception {
+fakeDatasetService.createDataset("foo.com:project", "dataset", "", "", 
null);
+TableReference tableRef = 
BigQueryHelpers.parseTableSpec("foo.com:project:dataset.table");
+
+Table table =
+new Table()
+.setTableReference(tableRef)
+.setNumBytes(100L)
+.setSchema(
+new TableSchema()
+.setFields(
+ImmutableList.of(
+new 
TableFieldSchema().setName("name").setType("STRING"),
+new 
TableFieldSchema().setName("number").setType("INTEGER";
+
+fakeDatasetService.createTable(table);
+
+TableReadOptions readOptions =
+TableReadOptions.newBuilder()
+.addSelectedFields("name")
+.addSelectedFields("number")
+.setRowRestriction("number > 5")
+.build();
+
+CreateReadSessionRequest expectedRequest =
+CreateReadSessionRequest.newBuilder()
+.setParent("projects/project-id")
+.setTableReference(BigQueryHelpers.toTableRefProto(tableRef))
+.setRequestedStreams(10)
+.setReadOptions(readOptions)
+// TODO(aryann): Once we rebuild the generated client code, we 
should change this to
+// use setShardingStrategy().
+.setUnknownFields(
+UnknownFieldSet.newBuilder()
+.addField(7, 
UnknownFieldSet.Field.newBuilder().addVarint(2).build())
+.build())
+.build();
+
+ReadSession.Builder builder = ReadSession.newBuilder();
+for (int i = 0; i < 10; i++) {
+  builder.addStreams(Stream.newBuilder().setName("stream-" + i));
+}
+
+StorageClient fakeStorageClient = mock(StorageClient.class);
+
when(fakeStorageClient.createReadSession(expectedRequest)).thenReturn(builder.build());
+
+BigQueryStorageTableSource tableSource =
+BigQueryStorageTableSource.create(
+ValueProvider.StaticValueProvider.of(tableRef),
+null,
+StaticValueProvider.of(Lists.newArrayList("name", "number")),
+StaticValueProvider.of("number > 5"),
 
 Review comment:
   Good suggestion. `p.newProvider` doesn't work here -- we manually call 
`split` on the source object in this test rather than executing the pipeline, 
which (happily) fails since we're accessing the provider value outside of the 
pipeline context -- but I've updated `testReadFromBigQueryIO` below to cover 
this case.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299762)
Time Spent: 1h 50m  (was: 1h 40m)

> Allow specifying BigQuery Storage API readOptions at runtime
> 
>
> Key: BEAM-8023
> URL: https://issues.apache.org/jira/browse/BEAM-8023
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Jeff Klukas
>Assignee: Kenneth Jung
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> We have support in the Java SDK for using the Bi

[jira] [Created] (BEAM-8046) Unable to read from bigquery and publish to pubsub using dataflow runner (python SDK)

2019-08-22 Thread James Hutchison (Jira)

James Hutchison created BEAM-8046:
-

 Summary: Unable to read from bigquery and publish to pubsub using 
dataflow runner (python SDK)
 Key: BEAM-8046
 URL: https://issues.apache.org/jira/browse/BEAM-8046
 Project: Beam
  Issue Type: Improvement
  Components: runner-dataflow
Affects Versions: 2.14.0, 2.13.0
Reporter: James Hutchison


With the Python SDK:

The dataflow runner does not allow use of bigquery in streaming pipelines.

Pubsub is not allowed for batch pipelines.

Thus, there's no way to create a pipeline on the dataflow runner that reads 
from bigquery and publishes to pubsub.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Closed] (BEAM-8037) Python FlinkRunner does not override reads

2019-08-22 Thread Kyle Weaver (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver closed BEAM-8037.
-
Fix Version/s: 2.16.0
   Resolution: Fixed

> Python FlinkRunner does not override reads
> --
>
> Key: BEAM-8037
> URL: https://issues.apache.org/jira/browse/BEAM-8037
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
> Fix For: 2.16.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When using the Python FlinkRunner [1],  my example pipeline (beginning with a 
> Create transform) failed with exception:
> java.lang.IllegalArgumentException: GreedyPipelineFuser requires all root 
> nodes to be runner-implemented beam:transform:impulse:v1 or 
> beam:transform:read:v1 primitives, but transform 
> ref_AppliedPTransform_Create/Read_3 executes in environment Optional[urn: 
> "beam:env:docker:v1"



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-8038) Python Precommit fail: 'BeamFnExternalWorkerPoolServicer' has no attribute '_worker_processes'

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8038?focusedWorklogId=299753&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299753
 ]

ASF GitHub Bot logged work on BEAM-8038:


Author: ASF GitHub Bot
Created on: 22/Aug/19 20:52
Start Date: 22/Aug/19 20:52
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #9403: [BEAM-8038] Fix worker 
pool exit hook
URL: https://github.com/apache/beam/pull/9403#issuecomment-524072782
 
 
   needed lint fix
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299753)
Time Spent: 40m  (was: 0.5h)

> Python Precommit fail: 'BeamFnExternalWorkerPoolServicer' has no attribute 
> '_worker_processes'
> --
>
> Key: BEAM-8038
> URL: https://issues.apache.org/jira/browse/BEAM-8038
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness, test-failures
>Reporter: Ahmet Altay
>Assignee: Thomas Weise
>Priority: Critical
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Logs: https://builds.apache.org/job/beam_PreCommit_Python_Commit/8246/console
> 10:14:09 
> --
> 10:14:09 XML: 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/nosetests.xml
> 10:14:09 
> --
> 10:14:09 Ran 2594 tests in 629.438s
> 10:14:09 
> 10:14:09 OK (SKIP=520)
> 10:14:09 Error in atexit._run_exitfuncs:
> 10:14:09 Traceback (most recent call last):
> 10:14:09   File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
> 10:14:09 func(*targs, **kargs)
> 10:14:09   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/worker_pool_main.py",
>  line 72, in kill_worker_processes
> 10:14:09 for worker_process in cls._worker_processes.values():
> 10:14:09 AttributeError: type object 'BeamFnExternalWorkerPoolServicer' has 
> no attribute '_worker_processes'
> 10:14:09 Error in sys.exitfunc:
> 10:14:09 Traceback (most recent call last):
> 10:14:09   File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
> 10:14:09 func(*targs, **kargs)
> 10:14:09   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/worker_pool_main.py",
>  line 72, in kill_worker_processes
> 10:14:09 for worker_process in cls._worker_processes.values():
> 10:14:09 AttributeError: type object 'BeamFnExternalWorkerPoolServicer' has 
> no attribute '_worker_processes'
> 10:14:10 py27-cython run-test-post: commands[0] | 
> /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/scripts/run_tox_cleanup.sh
> 10:14:10 ___ summary 
> 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299747&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299747
 ]

ASF GitHub Bot logged work on BEAM-8023:


Author: ASF GitHub Bot
Created on: 22/Aug/19 20:46
Start Date: 22/Aug/19 20:46
Worklog Time Spent: 10m 
  Work Description: kmjung commented on pull request #9405: [BEAM-8023] Add 
value provider interfaces for BigQueryIO.Read using Method.DIRECT_READ
URL: https://github.com/apache/beam/pull/9405#discussion_r31681
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
 ##
 @@ -1360,12 +1413,39 @@ public TableReference getTable() {
   return toBuilder().setMethod(method).build();
 }
 
-/** Read options, including a list of selected columns and push-down SQL 
filter text. */
+/**
+ * @deprecated Use {@link #withSelectedFields(List)} and {@link 
#withRowRestriction(String)}
+ * instead.
+ */
+@Deprecated
 @Experimental(Experimental.Kind.SOURCE_SINK)
 public TypedRead withReadOptions(TableReadOptions readOptions) {
+  ensureSelectedFieldsAndRowRestrictionNotSet();
   return toBuilder().setReadOptions(readOptions).build();
 }
 
+@Experimental(Experimental.Kind.SOURCE_SINK)
+public TypedRead withSelectedFields(List selectedFields) {
+  return withSelectedFields(StaticValueProvider.of(selectedFields));
+}
+
+@Experimental(Experimental.Kind.SOURCE_SINK)
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299747)
Time Spent: 1.5h  (was: 1h 20m)

> Allow specifying BigQuery Storage API readOptions at runtime
> 
>
> Key: BEAM-8023
> URL: https://issues.apache.org/jira/browse/BEAM-8023
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Jeff Klukas
>Assignee: Kenneth Jung
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> We have support in the Java SDK for using the BigQuery Storage API for reads, 
> but only the target query or table is supported as a ValueProvider to be 
> specified at runtime. AFAICT, there is no reason we can't delay specifying 
> readOptions until runtime as well.
> The readOptions are accessed by BigQueryStorageTableSource in getTargetTable; 
> I believe that's occurring at runtime, but I'd love for someone with deeper 
> BoundedSource knowledge to confirm that.
> I'd advocate for adding new methods 
> `TypedRead.withSelectedFields(ValueProvider> value)` and 
> `TypedRead.withRowRestriction(ValueProvider value)`. The existing 
> `withReadOptions` method would then populate the other two as 
> StaticValueProviders. Perhaps we'd want to deprecate `withReadOptions` in 
> favor or specifying individual read options as separate parameters.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299748&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299748
 ]

ASF GitHub Bot logged work on BEAM-8023:


Author: ASF GitHub Bot
Created on: 22/Aug/19 20:46
Start Date: 22/Aug/19 20:46
Worklog Time Spent: 10m 
  Work Description: kmjung commented on pull request #9405: [BEAM-8023] Add 
value provider interfaces for BigQueryIO.Read using Method.DIRECT_READ
URL: https://github.com/apache/beam/pull/9405#discussion_r316877815
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
 ##
 @@ -1360,12 +1413,39 @@ public TableReference getTable() {
   return toBuilder().setMethod(method).build();
 }
 
-/** Read options, including a list of selected columns and push-down SQL 
filter text. */
+/**
+ * @deprecated Use {@link #withSelectedFields(List)} and {@link 
#withRowRestriction(String)}
+ * instead.
+ */
+@Deprecated
 @Experimental(Experimental.Kind.SOURCE_SINK)
 public TypedRead withReadOptions(TableReadOptions readOptions) {
+  ensureSelectedFieldsAndRowRestrictionNotSet();
   return toBuilder().setReadOptions(readOptions).build();
 }
 
+@Experimental(Experimental.Kind.SOURCE_SINK)
+public TypedRead withSelectedFields(List selectedFields) {
+  return withSelectedFields(StaticValueProvider.of(selectedFields));
+}
+
+@Experimental(Experimental.Kind.SOURCE_SINK)
+public TypedRead withSelectedFields(ValueProvider> 
selectedFields) {
+  ensureReadOptionsNotSet();
+  return toBuilder().setSelectedFields(selectedFields).build();
+}
+
+@Experimental(Experimental.Kind.SOURCE_SINK)
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299748)
Time Spent: 1h 40m  (was: 1.5h)

> Allow specifying BigQuery Storage API readOptions at runtime
> 
>
> Key: BEAM-8023
> URL: https://issues.apache.org/jira/browse/BEAM-8023
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Jeff Klukas
>Assignee: Kenneth Jung
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We have support in the Java SDK for using the BigQuery Storage API for reads, 
> but only the target query or table is supported as a ValueProvider to be 
> specified at runtime. AFAICT, there is no reason we can't delay specifying 
> readOptions until runtime as well.
> The readOptions are accessed by BigQueryStorageTableSource in getTargetTable; 
> I believe that's occurring at runtime, but I'd love for someone with deeper 
> BoundedSource knowledge to confirm that.
> I'd advocate for adding new methods 
> `TypedRead.withSelectedFields(ValueProvider> value)` and 
> `TypedRead.withRowRestriction(ValueProvider value)`. The existing 
> `withReadOptions` method would then populate the other two as 
> StaticValueProviders. Perhaps we'd want to deprecate `withReadOptions` in 
> favor or specifying individual read options as separate parameters.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-7760) Interactive Beam Caching PCollections bound to user defined vars in notebook

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7760?focusedWorklogId=299749&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299749
 ]

ASF GitHub Bot logged work on BEAM-7760:


Author: ASF GitHub Bot
Created on: 22/Aug/19 20:46
Start Date: 22/Aug/19 20:46
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9278: [BEAM-7760] 
Added Interactive Beam module
URL: https://github.com/apache/beam/pull/9278
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299749)
Time Spent: 3h  (was: 2h 50m)

> Interactive Beam Caching PCollections bound to user defined vars in notebook
> 
>
> Key: BEAM-7760
> URL: https://issues.apache.org/jira/browse/BEAM-7760
> Project: Beam
>  Issue Type: New Feature
>  Components: examples-python
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Cache only PCollections bound to user defined variables in a pipeline when 
> running pipeline with interactive runner in jupyter notebooks.
> [Interactive 
> Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]]
>  has been caching and using caches of "leaf" PCollections for interactive 
> execution in jupyter notebooks.
> The interactive execution is currently supported so that when appending new 
> transforms to existing pipeline for a new run, executed part of the pipeline 
> doesn't need to be re-executed. 
> A PCollection is "leaf" when it is never used as input in any PTransform in 
> the pipeline.
> The problem with building caches and pipeline to execute around "leaf" is 
> that when a PCollection is consumed by a sink with no output, the pipeline to 
> execute built will miss the subgraph generating and consuming that 
> PCollection.
> An example, "ReadFromPubSub --> WirteToPubSub" will result in an empty 
> pipeline.
> Caching around PCollections bound to user defined variables and replacing 
> transforms with source and sink of caches could resolve the pipeline to 
> execute properly under the interactive execution scenario. Also, cached 
> PCollection now can trace back to user code and can be used for user data 
> visualization if user wants to do it.
> E.g.,
> {code:java}
> // ...
> p = beam.Pipeline(interactive_runner.InteractiveRunner(),
>   options=pipeline_options)
> messages = p | "Read" >> beam.io.ReadFromPubSub(subscription='...')
> messages | "Write" >> beam.io.WriteToPubSub(topic_path)
> result = p.run()
> // ...
> visualize(messages){code}
>  The interactive runner automatically figures out that PCollection
> {code:java}
> messages{code}
> created by
> {code:java}
> p | "Read" >> beam.io.ReadFromPubSub(subscription='...'){code}
> should be cached and reused if the notebook user appends more transforms.
>  And once the pipeline gets executed, the user could use any 
> visualize(PCollection) module to visualize the data statically (batch) or 
> dynamically (stream)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299745&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299745
 ]

ASF GitHub Bot logged work on BEAM-8023:


Author: ASF GitHub Bot
Created on: 22/Aug/19 20:45
Start Date: 22/Aug/19 20:45
Worklog Time Spent: 10m 
  Work Description: kmjung commented on pull request #9405: [BEAM-8023] Add 
value provider interfaces for BigQueryIO.Read using Method.DIRECT_READ
URL: https://github.com/apache/beam/pull/9405#discussion_r316877647
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
 ##
 @@ -1360,12 +1413,39 @@ public TableReference getTable() {
   return toBuilder().setMethod(method).build();
 }
 
-/** Read options, including a list of selected columns and push-down SQL 
filter text. */
+/**
+ * @deprecated Use {@link #withSelectedFields(List)} and {@link 
#withRowRestriction(String)}
+ * instead.
+ */
+@Deprecated
 @Experimental(Experimental.Kind.SOURCE_SINK)
 public TypedRead withReadOptions(TableReadOptions readOptions) {
+  ensureSelectedFieldsAndRowRestrictionNotSet();
   return toBuilder().setReadOptions(readOptions).build();
 }
 
+@Experimental(Experimental.Kind.SOURCE_SINK)
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299745)
Time Spent: 1h 10m  (was: 1h)

> Allow specifying BigQuery Storage API readOptions at runtime
> 
>
> Key: BEAM-8023
> URL: https://issues.apache.org/jira/browse/BEAM-8023
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Jeff Klukas
>Assignee: Kenneth Jung
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> We have support in the Java SDK for using the BigQuery Storage API for reads, 
> but only the target query or table is supported as a ValueProvider to be 
> specified at runtime. AFAICT, there is no reason we can't delay specifying 
> readOptions until runtime as well.
> The readOptions are accessed by BigQueryStorageTableSource in getTargetTable; 
> I believe that's occurring at runtime, but I'd love for someone with deeper 
> BoundedSource knowledge to confirm that.
> I'd advocate for adding new methods 
> `TypedRead.withSelectedFields(ValueProvider> value)` and 
> `TypedRead.withRowRestriction(ValueProvider value)`. The existing 
> `withReadOptions` method would then populate the other two as 
> StaticValueProviders. Perhaps we'd want to deprecate `withReadOptions` in 
> favor or specifying individual read options as separate parameters.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299746&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299746
 ]

ASF GitHub Bot logged work on BEAM-8023:


Author: ASF GitHub Bot
Created on: 22/Aug/19 20:45
Start Date: 22/Aug/19 20:45
Worklog Time Spent: 10m 
  Work Description: kmjung commented on pull request #9405: [BEAM-8023] Add 
value provider interfaces for BigQueryIO.Read using Method.DIRECT_READ
URL: https://github.com/apache/beam/pull/9405#discussion_r316877715
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
 ##
 @@ -1360,12 +1413,39 @@ public TableReference getTable() {
   return toBuilder().setMethod(method).build();
 }
 
-/** Read options, including a list of selected columns and push-down SQL 
filter text. */
+/**
+ * @deprecated Use {@link #withSelectedFields(List)} and {@link 
#withRowRestriction(String)}
+ * instead.
+ */
+@Deprecated
 @Experimental(Experimental.Kind.SOURCE_SINK)
 public TypedRead withReadOptions(TableReadOptions readOptions) {
+  ensureSelectedFieldsAndRowRestrictionNotSet();
   return toBuilder().setReadOptions(readOptions).build();
 }
 
+@Experimental(Experimental.Kind.SOURCE_SINK)
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299746)
Time Spent: 1h 20m  (was: 1h 10m)

> Allow specifying BigQuery Storage API readOptions at runtime
> 
>
> Key: BEAM-8023
> URL: https://issues.apache.org/jira/browse/BEAM-8023
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Jeff Klukas
>Assignee: Kenneth Jung
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> We have support in the Java SDK for using the BigQuery Storage API for reads, 
> but only the target query or table is supported as a ValueProvider to be 
> specified at runtime. AFAICT, there is no reason we can't delay specifying 
> readOptions until runtime as well.
> The readOptions are accessed by BigQueryStorageTableSource in getTargetTable; 
> I believe that's occurring at runtime, but I'd love for someone with deeper 
> BoundedSource knowledge to confirm that.
> I'd advocate for adding new methods 
> `TypedRead.withSelectedFields(ValueProvider> value)` and 
> `TypedRead.withRowRestriction(ValueProvider value)`. The existing 
> `withReadOptions` method would then populate the other two as 
> StaticValueProviders. Perhaps we'd want to deprecate `withReadOptions` in 
> favor or specifying individual read options as separate parameters.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=299742&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299742
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 22/Aug/19 20:44
Start Date: 22/Aug/19 20:44
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316877069
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder_test.py
 ##
 @@ -0,0 +1,129 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+from __future__ import absolute_import
+
+import logging
+import typing
+import unittest
+from itertools import chain
+
+import numpy as np
+from past.builtins import unicode
+
+from apache_beam.coders import RowCoder
+from apache_beam.coders.typecoders import registry as coders_registry
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import typing_to_runner_api
+
+Person = typing.NamedTuple("Person", [
+("name", unicode),
+("age", np.int32),
+("address", typing.Optional[unicode]),
+("aliases", typing.List[unicode]),
+])
+
+coders_registry.register_coder(Person, RowCoder)
+
+
+class CodersTest(unittest.TestCase):
+  TEST_CASES = [
+  Person("Jon Snow", 23, None, ["crow", "wildling"]),
+  Person("Daenerys Targaryen", 25, "Westeros", ["Mother of Dragons"]),
+  Person("Michael Bluth", 30, None, [])
+  ]
+
+  def test_create_row_coder_from_named_tuple(self):
+expected_coder = RowCoder(typing_to_runner_api(Person).row_type.schema)
+real_coder = coders_registry.get_coder(Person)
+
+for test_case in self.TEST_CASES:
+  self.assertEqual(
+  expected_coder.encode(test_case), real_coder.encode(test_case))
+
+  self.assertEqual(test_case,
+   real_coder.decode(real_coder.encode(test_case)))
+
+  def test_create_row_coder_from_schema(self):
+schema = schema_pb2.Schema(
+id="person",
+fields=[
+schema_pb2.Field(
+name="name",
+type=schema_pb2.FieldType(
+atomic_type=schema_pb2.AtomicType.STRING)),
+schema_pb2.Field(
+name="age",
+type=schema_pb2.FieldType(
+atomic_type=schema_pb2.AtomicType.INT32)),
+schema_pb2.Field(
+name="address",
+type=schema_pb2.FieldType(
+atomic_type=schema_pb2.AtomicType.STRING, nullable=True)),
+schema_pb2.Field(
+name="aliases",
+type=schema_pb2.FieldType(
+array_type=schema_pb2.ArrayType(
+element_type=schema_pb2.FieldType(
+atomic_type=schema_pb2.AtomicType.STRING,
+])
+coder = RowCoder(schema)
+
+for test_case in self.TEST_CASES:
+  self.assertEqual(test_case, coder.decode(coder.encode(test_case)))
+
+  @unittest.skip(
+  "Need to decide whether to defer to the stream writer for these checks "
+  "or add explicit checks"
+  )
+  def test_overflows(self):
+IntTester = typing.NamedTuple('IntTester', [
+#('i8', typing.Optional[np.int8]),
 
 Review comment:
   Added. Also added a reference to 
[BEAM-8030](https://issues.apache.org/jira/browse/BEAM-8030) in the skip 
message.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299742)
Time Spent: 8h 10m  (was: 8h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
>

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=299744&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299744
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 22/Aug/19 20:44
Start Date: 22/Aug/19 20:44
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316877261
 
 

 ##
 File path: sdks/python/apache_beam/coders/row_coder_test.py
 ##
 @@ -0,0 +1,126 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+from __future__ import absolute_import
+
+import logging
+import typing
+import unittest
+
+import numpy as np
+from itertools import chain
+from past.builtins import unicode
+
+from apache_beam.coders import RowCoder
+from apache_beam.coders.typecoders import registry as coders_registry
+from apache_beam.portability.api import schema_pb2
+from apache_beam.typehints.schemas import typing_to_runner_api
+
+Person = typing.NamedTuple("Person", [
+("name", unicode),
+("age", np.int32),
+("address", typing.Optional[unicode]),
+("aliases", typing.List[unicode]),
+])
+
+coders_registry.register_coder(Person, RowCoder)
+
+
+class CodersTest(unittest.TestCase):
+  TEST_CASES = [
+  Person("Jon Snow", 23, None, ["crow", "wildling"]),
+  Person("Daenerys Targaryen", 25, "Westeros", ["Mother of Dragons"]),
+  Person("Michael Bluth", 30, None, [])
+  ]
+
+  def test_create_row_coder_from_named_tuple(self):
+expected_coder = RowCoder(typing_to_runner_api(Person).row_type.schema)
+real_coder = coders_registry.get_coder(Person)
+
+for test_case in self.TEST_CASES:
+  self.assertEqual(
+  expected_coder.encode(test_case), real_coder.encode(test_case))
+
+  self.assertEqual(test_case,
+   real_coder.decode(real_coder.encode(test_case)))
+
+  def test_create_row_coder_from_schema(self):
+schema = schema_pb2.Schema(
+id="person",
+fields=[
+schema_pb2.Field(
+name="name",
+type=schema_pb2.FieldType(
+atomic_type=schema_pb2.AtomicType.STRING)),
+schema_pb2.Field(
+name="age",
+type=schema_pb2.FieldType(
+atomic_type=schema_pb2.AtomicType.INT32)),
+schema_pb2.Field(
+name="address",
+type=schema_pb2.FieldType(
+atomic_type=schema_pb2.AtomicType.STRING, nullable=True)),
+schema_pb2.Field(
+name="aliases",
+type=schema_pb2.FieldType(
+array_type=schema_pb2.ArrayType(
+element_type=schema_pb2.FieldType(
+atomic_type=schema_pb2.AtomicType.STRING,
+])
+coder = RowCoder(schema)
+
+for test_case in self.TEST_CASES:
+  self.assertEqual(test_case, coder.decode(coder.encode(test_case)))
+
+  @unittest.skip("Need to decide whether to defer to the stream writer for 
these checks or add explicit checks")
 
 Review comment:
   Filed [BEAM-8030](https://issues.apache.org/jira/browse/BEAM-8030) to 
reconcile this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299744)
Time Spent: 8h 20m  (was: 8h 10m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Ass

[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=299736&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299736
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 22/Aug/19 20:39
Start Date: 22/Aug/19 20:39
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r316875134
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -115,8 +115,7 @@ def get_version():
 'mock>=1.0.1,<3.0.0',
 'pymongo>=3.8.0,<4.0.0',
 'oauth2client>=2.0.1,<4',
-# grpcio 1.8.1 and above requires protobuf 3.5.0.post1.
-'protobuf>=3.5.0.post1,<4',
+'protobuf>=3.8.0.post1,<4',
 
 Review comment:
   Sounds good, I just moved the numpy dependency from test to required with 
the same range. Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299736)
Time Spent: 8h  (was: 7h 50m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 8h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299723&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299723
 ]

ASF GitHub Bot logged work on BEAM-8023:


Author: ASF GitHub Bot
Created on: 22/Aug/19 20:33
Start Date: 22/Aug/19 20:33
Worklog Time Spent: 10m 
  Work Description: jklukas commented on pull request #9405: [BEAM-8023] 
Add value provider interfaces for BigQueryIO.Read using Method.DIRECT_READ
URL: https://github.com/apache/beam/pull/9405#discussion_r316870272
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
 ##
 @@ -1360,12 +1413,39 @@ public TableReference getTable() {
   return toBuilder().setMethod(method).build();
 }
 
-/** Read options, including a list of selected columns and push-down SQL 
filter text. */
+/**
+ * @deprecated Use {@link #withSelectedFields(List)} and {@link 
#withRowRestriction(String)}
+ * instead.
+ */
+@Deprecated
 @Experimental(Experimental.Kind.SOURCE_SINK)
 public TypedRead withReadOptions(TableReadOptions readOptions) {
+  ensureSelectedFieldsAndRowRestrictionNotSet();
   return toBuilder().setReadOptions(readOptions).build();
 }
 
+@Experimental(Experimental.Kind.SOURCE_SINK)
 
 Review comment:
   ```suggestion
   /** Names of the fields in the table that should be read; valid only for 
direct reads. */
   @Experimental(Experimental.Kind.SOURCE_SINK)
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299723)
Time Spent: 50m  (was: 40m)

> Allow specifying BigQuery Storage API readOptions at runtime
> 
>
> Key: BEAM-8023
> URL: https://issues.apache.org/jira/browse/BEAM-8023
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Jeff Klukas
>Assignee: Kenneth Jung
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We have support in the Java SDK for using the BigQuery Storage API for reads, 
> but only the target query or table is supported as a ValueProvider to be 
> specified at runtime. AFAICT, there is no reason we can't delay specifying 
> readOptions until runtime as well.
> The readOptions are accessed by BigQueryStorageTableSource in getTargetTable; 
> I believe that's occurring at runtime, but I'd love for someone with deeper 
> BoundedSource knowledge to confirm that.
> I'd advocate for adding new methods 
> `TypedRead.withSelectedFields(ValueProvider> value)` and 
> `TypedRead.withRowRestriction(ValueProvider value)`. The existing 
> `withReadOptions` method would then populate the other two as 
> StaticValueProviders. Perhaps we'd want to deprecate `withReadOptions` in 
> favor or specifying individual read options as separate parameters.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299726&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299726
 ]

ASF GitHub Bot logged work on BEAM-8023:


Author: ASF GitHub Bot
Created on: 22/Aug/19 20:33
Start Date: 22/Aug/19 20:33
Worklog Time Spent: 10m 
  Work Description: jklukas commented on pull request #9405: [BEAM-8023] 
Add value provider interfaces for BigQueryIO.Read using Method.DIRECT_READ
URL: https://github.com/apache/beam/pull/9405#discussion_r316872643
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOStorageReadTest.java
 ##
 @@ -465,6 +500,71 @@ public void 
testTableSourceInitialSplit_WithTableReadOptions() throws Throwable
 BigQueryStorageTableSource.create(
 ValueProvider.StaticValueProvider.of(tableRef),
 readOptions,
+null,
+null,
+new TableRowParser(),
+TableRowJsonCoder.of(),
+new FakeBigQueryServices()
+.withDatasetService(fakeDatasetService)
+.withStorageClient(fakeStorageClient));
+
+List> sources = tableSource.split(10L, 
options);
+assertEquals(10L, sources.size());
+  }
+
+  @Test
+  public void 
testTableSourceInitialSplit_WithSelectedFieldsAndRowRestriction() throws 
Exception {
+fakeDatasetService.createDataset("foo.com:project", "dataset", "", "", 
null);
+TableReference tableRef = 
BigQueryHelpers.parseTableSpec("foo.com:project:dataset.table");
+
+Table table =
+new Table()
+.setTableReference(tableRef)
+.setNumBytes(100L)
+.setSchema(
+new TableSchema()
+.setFields(
+ImmutableList.of(
+new 
TableFieldSchema().setName("name").setType("STRING"),
+new 
TableFieldSchema().setName("number").setType("INTEGER";
+
+fakeDatasetService.createTable(table);
+
+TableReadOptions readOptions =
+TableReadOptions.newBuilder()
+.addSelectedFields("name")
+.addSelectedFields("number")
+.setRowRestriction("number > 5")
+.build();
+
+CreateReadSessionRequest expectedRequest =
+CreateReadSessionRequest.newBuilder()
+.setParent("projects/project-id")
+.setTableReference(BigQueryHelpers.toTableRefProto(tableRef))
+.setRequestedStreams(10)
+.setReadOptions(readOptions)
+// TODO(aryann): Once we rebuild the generated client code, we 
should change this to
+// use setShardingStrategy().
+.setUnknownFields(
+UnknownFieldSet.newBuilder()
+.addField(7, 
UnknownFieldSet.Field.newBuilder().addVarint(2).build())
+.build())
+.build();
+
+ReadSession.Builder builder = ReadSession.newBuilder();
+for (int i = 0; i < 10; i++) {
+  builder.addStreams(Stream.newBuilder().setName("stream-" + i));
+}
+
+StorageClient fakeStorageClient = mock(StorageClient.class);
+
when(fakeStorageClient.createReadSession(expectedRequest)).thenReturn(builder.build());
+
+BigQueryStorageTableSource tableSource =
+BigQueryStorageTableSource.create(
+ValueProvider.StaticValueProvider.of(tableRef),
+null,
+StaticValueProvider.of(Lists.newArrayList("name", "number")),
+StaticValueProvider.of("number > 5"),
 
 Review comment:
   Would it be more appropriate to use `p.newProvider` here rather than 
`StaticValueProvider.of` to catch potential misuses of the valueprovider before 
we hit runtime? If the StaticValueProvider style is already predominant in this 
file, I'm fine with keeping it as-is.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299726)
Time Spent: 1h  (was: 50m)

> Allow specifying BigQuery Storage API readOptions at runtime
> 
>
> Key: BEAM-8023
> URL: https://issues.apache.org/jira/browse/BEAM-8023
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Jeff Klukas
>Assignee: Kenneth Jung
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We have support in the Java SDK for using the BigQuery Storage API for reads, 
> but only the target query

[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299725&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299725
 ]

ASF GitHub Bot logged work on BEAM-8023:


Author: ASF GitHub Bot
Created on: 22/Aug/19 20:33
Start Date: 22/Aug/19 20:33
Worklog Time Spent: 10m 
  Work Description: jklukas commented on pull request #9405: [BEAM-8023] 
Add value provider interfaces for BigQueryIO.Read using Method.DIRECT_READ
URL: https://github.com/apache/beam/pull/9405#discussion_r316870665
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
 ##
 @@ -1360,12 +1413,39 @@ public TableReference getTable() {
   return toBuilder().setMethod(method).build();
 }
 
-/** Read options, including a list of selected columns and push-down SQL 
filter text. */
+/**
+ * @deprecated Use {@link #withSelectedFields(List)} and {@link 
#withRowRestriction(String)}
+ * instead.
+ */
+@Deprecated
 @Experimental(Experimental.Kind.SOURCE_SINK)
 public TypedRead withReadOptions(TableReadOptions readOptions) {
+  ensureSelectedFieldsAndRowRestrictionNotSet();
   return toBuilder().setReadOptions(readOptions).build();
 }
 
+@Experimental(Experimental.Kind.SOURCE_SINK)
+public TypedRead withSelectedFields(List selectedFields) {
+  return withSelectedFields(StaticValueProvider.of(selectedFields));
+}
+
+@Experimental(Experimental.Kind.SOURCE_SINK)
 
 Review comment:
   ```suggestion
   /** SQL text filtering statement; valid only for direct reads. */
   @Experimental(Experimental.Kind.SOURCE_SINK)
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299725)
Time Spent: 1h  (was: 50m)

> Allow specifying BigQuery Storage API readOptions at runtime
> 
>
> Key: BEAM-8023
> URL: https://issues.apache.org/jira/browse/BEAM-8023
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Jeff Klukas
>Assignee: Kenneth Jung
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We have support in the Java SDK for using the BigQuery Storage API for reads, 
> but only the target query or table is supported as a ValueProvider to be 
> specified at runtime. AFAICT, there is no reason we can't delay specifying 
> readOptions until runtime as well.
> The readOptions are accessed by BigQueryStorageTableSource in getTargetTable; 
> I believe that's occurring at runtime, but I'd love for someone with deeper 
> BoundedSource knowledge to confirm that.
> I'd advocate for adding new methods 
> `TypedRead.withSelectedFields(ValueProvider> value)` and 
> `TypedRead.withRowRestriction(ValueProvider value)`. The existing 
> `withReadOptions` method would then populate the other two as 
> StaticValueProviders. Perhaps we'd want to deprecate `withReadOptions` in 
> favor or specifying individual read options as separate parameters.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299724&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299724
 ]

ASF GitHub Bot logged work on BEAM-8023:


Author: ASF GitHub Bot
Created on: 22/Aug/19 20:33
Start Date: 22/Aug/19 20:33
Worklog Time Spent: 10m 
  Work Description: jklukas commented on pull request #9405: [BEAM-8023] 
Add value provider interfaces for BigQueryIO.Read using Method.DIRECT_READ
URL: https://github.com/apache/beam/pull/9405#discussion_r316868145
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
 ##
 @@ -1251,6 +1294,16 @@ private void ensureFromNotCalledYet() {
   getJsonTableRef() == null && getQuery() == null, "from() or 
fromQuery() already called");
 }
 
+private void ensureReadOptionsNotSet() {
+  checkState(getReadOptions() == null, "withReadOptions() already called");
+}
+
+private void ensureSelectedFieldsAndRowRestrictionNotSet() {
 
 Review comment:
   For a little future-proofing in case additional read options are added to 
the BQ Storage API in the future, these methods could be named 
`ensureReadOptionsObjectNotSet` and `ensureIndividualReadOptionsNotSet`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299724)
Time Spent: 50m  (was: 40m)

> Allow specifying BigQuery Storage API readOptions at runtime
> 
>
> Key: BEAM-8023
> URL: https://issues.apache.org/jira/browse/BEAM-8023
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Jeff Klukas
>Assignee: Kenneth Jung
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We have support in the Java SDK for using the BigQuery Storage API for reads, 
> but only the target query or table is supported as a ValueProvider to be 
> specified at runtime. AFAICT, there is no reason we can't delay specifying 
> readOptions until runtime as well.
> The readOptions are accessed by BigQueryStorageTableSource in getTargetTable; 
> I believe that's occurring at runtime, but I'd love for someone with deeper 
> BoundedSource knowledge to confirm that.
> I'd advocate for adding new methods 
> `TypedRead.withSelectedFields(ValueProvider> value)` and 
> `TypedRead.withRowRestriction(ValueProvider value)`. The existing 
> `withReadOptions` method would then populate the other two as 
> StaticValueProviders. Perhaps we'd want to deprecate `withReadOptions` in 
> favor or specifying individual read options as separate parameters.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299727&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299727
 ]

ASF GitHub Bot logged work on BEAM-8023:


Author: ASF GitHub Bot
Created on: 22/Aug/19 20:33
Start Date: 22/Aug/19 20:33
Worklog Time Spent: 10m 
  Work Description: jklukas commented on pull request #9405: [BEAM-8023] 
Add value provider interfaces for BigQueryIO.Read using Method.DIRECT_READ
URL: https://github.com/apache/beam/pull/9405#discussion_r316870997
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
 ##
 @@ -1360,12 +1413,39 @@ public TableReference getTable() {
   return toBuilder().setMethod(method).build();
 }
 
-/** Read options, including a list of selected columns and push-down SQL 
filter text. */
+/**
+ * @deprecated Use {@link #withSelectedFields(List)} and {@link 
#withRowRestriction(String)}
+ * instead.
+ */
+@Deprecated
 @Experimental(Experimental.Kind.SOURCE_SINK)
 public TypedRead withReadOptions(TableReadOptions readOptions) {
+  ensureSelectedFieldsAndRowRestrictionNotSet();
   return toBuilder().setReadOptions(readOptions).build();
 }
 
+@Experimental(Experimental.Kind.SOURCE_SINK)
+public TypedRead withSelectedFields(List selectedFields) {
+  return withSelectedFields(StaticValueProvider.of(selectedFields));
+}
+
+@Experimental(Experimental.Kind.SOURCE_SINK)
+public TypedRead withSelectedFields(ValueProvider> 
selectedFields) {
+  ensureReadOptionsNotSet();
+  return toBuilder().setSelectedFields(selectedFields).build();
+}
+
+@Experimental(Experimental.Kind.SOURCE_SINK)
 
 Review comment:
   Can we add docstrings on these variants too?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299727)
Time Spent: 1h  (was: 50m)

> Allow specifying BigQuery Storage API readOptions at runtime
> 
>
> Key: BEAM-8023
> URL: https://issues.apache.org/jira/browse/BEAM-8023
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Jeff Klukas
>Assignee: Kenneth Jung
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We have support in the Java SDK for using the BigQuery Storage API for reads, 
> but only the target query or table is supported as a ValueProvider to be 
> specified at runtime. AFAICT, there is no reason we can't delay specifying 
> readOptions until runtime as well.
> The readOptions are accessed by BigQueryStorageTableSource in getTargetTable; 
> I believe that's occurring at runtime, but I'd love for someone with deeper 
> BoundedSource knowledge to confirm that.
> I'd advocate for adding new methods 
> `TypedRead.withSelectedFields(ValueProvider> value)` and 
> `TypedRead.withRowRestriction(ValueProvider value)`. The existing 
> `withReadOptions` method would then populate the other two as 
> StaticValueProviders. Perhaps we'd want to deprecate `withReadOptions` in 
> favor or specifying individual read options as separate parameters.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Updated] (BEAM-8042) Parsing of aggregate query fails

2019-08-22 Thread Rui Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-8042:
---
Component/s: (was: dsl-sql)
 dsl-sql-zetasql

> Parsing of aggregate query fails
> 
>
> Key: BEAM-8042
> URL: https://issues.apache.org/jira/browse/BEAM-8042
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Priority: Major
>
> SELECT
>   key,
>   COUNT(*) as f1,
>   SUM(has_f2) AS f2,
>   SUM(has_f3) AS f3,
>   SUM(has_f4) AS f4,
>   SUM(has_f5) AS f5,
>   SUM(has_f6) AS f6,
>   SUM(has_f7) AS f7
> FROM xxx
> GROUP BY key
> Caused by: java.lang.RuntimeException: Error while applying rule 
> AggregateProjectMergeRule, args 
> [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)),
>  
> rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)]
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104)
>   at 
>   ... 39 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
>   at 
> org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205)
>   ... 48 more



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Updated] (BEAM-8043) Support AVG(long)

2019-08-22 Thread Rui Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-8043:
---
Component/s: (was: dsl-sql)
 dsl-sql-zetasql

> Support AVG(long)
> -
>
> Key: BEAM-8043
> URL: https://issues.apache.org/jira/browse/BEAM-8043
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Priority: Major
>
> Currently AVG(long) is not support and users have to use AVG(CAST(long as 
> float64)) as the workaround. 
> We should support AVG(long).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Updated] (BEAM-8039) SUM(CASE WHEN xxx THEN 1 ELSE 0)

2019-08-22 Thread Rui Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-8039:
---
Component/s: (was: dsl-sql)
 dsl-sql-zetasql

> SUM(CASE WHEN xxx THEN 1 ELSE 0)
> 
>
> Key: BEAM-8039
> URL: https://issues.apache.org/jira/browse/BEAM-8039
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Priority: Major
>
> java.lang.RuntimeException: Aggregate function only accepts Column Reference 
> or CAST(Column Reference) as its input.
> I was able to rewrite SQL using WITH statement, and it seemed to work, but 
> requires us rewriting a lot of queries and makes them pretty much unreadable.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Updated] (BEAM-8041) Support Insert statements

2019-08-22 Thread Rui Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-8041:
---
Component/s: (was: dsl-sql)
 dsl-sql-zetasql

> Support Insert statements
> -
>
> Key: BEAM-8041
> URL: https://issues.apache.org/jira/browse/BEAM-8041
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Priority: Major
>
> Caused by: 
> org.apache.beam.repackaged.sql.com.google.zetasql.io.grpc.StatusRuntimeException:
>  INVALID_ARGUMENT: Statement not supported: InsertStatement [at 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Updated] (BEAM-8044) Investigate SUM(long)

2019-08-22 Thread Rui Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-8044:
---
Component/s: (was: dsl-sql)
 dsl-sql-zetasql

> Investigate SUM(long) 
> --
>
> Key: BEAM-8044
> URL: https://issues.apache.org/jira/browse/BEAM-8044
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Priority: Major
>
> User reports SUM(long) is not supported. Need further investigated.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Updated] (BEAM-8040) NPE in table name resolver when selecting from a table that doesn't exist

2019-08-22 Thread Rui Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-8040:
---
Component/s: (was: dsl-sql)
 dsl-sql-zetasql

> NPE in table name resolver when selecting from a table that doesn't exist
> -
>
> Key: BEAM-8040
> URL: https://issues.apache.org/jira/browse/BEAM-8040
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Priority: Major
>
> NullPointerException when selecting from a table that doesn't exist.
> Caused by: java.lang.NullPointerException
> at 
> org.apache.beam.sdk.extensions.sql.zetasql.TableResolverImpl.assumeLeafIsTable(TableResolverImpl.java:42)
> at 
> org.apache.beam.sdk.extensions.sql.zetasql.TableResolution.resolveCalciteTable(TableResolution.java:48)
> at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addTableToLeafCatalog(SqlAnalyzer.java:174)
> at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.lambda$createPopulatedCatalog$0(SqlAnalyzer.java:132)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Updated] (BEAM-7832) ZetaSQL Dialect

2019-08-22 Thread Rui Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-7832:
---
Component/s: dsl-sql-zetasql

> ZetaSQL Dialect
> ---
>
> Key: BEAM-7832
> URL: https://issues.apache.org/jira/browse/BEAM-7832
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql, dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> We can support ZetaSQL(https://github.com/google/zetasql) dialect in BeamSQL. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-6114) SQL join selection should be done in planner, not in expansion to PTransform

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-6114?focusedWorklogId=299716&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299716
 ]

ASF GitHub Bot logged work on BEAM-6114:


Author: ASF GitHub Bot
Created on: 22/Aug/19 20:26
Start Date: 22/Aug/19 20:26
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #9395: [BEAM-6114] Calcite 
Rules to Select Type of Join in BeamSQL
URL: https://github.com/apache/beam/pull/9395#issuecomment-524064006
 
 
   Run SQL PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299716)
Time Spent: 3h 10m  (was: 3h)

> SQL join selection should be done in planner, not in expansion to PTransform
> 
>
> Key: BEAM-6114
> URL: https://issues.apache.org/jira/browse/BEAM-6114
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Rahul Patwari
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Currently Beam SQL joins all go through a single physical operator which has 
> a single PTransform that does all join algorithms based on properties of its 
> input PCollections as well as the relational algebra.
> A first step is to make the needed information part of the relational 
> algebra, so it can choose a PTransform based on that, and the PTransforms can 
> be simpler.
> Second step is to have separate (physical) relational operators for different 
> join algorithms.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299710&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299710
 ]

ASF GitHub Bot logged work on BEAM-8023:


Author: ASF GitHub Bot
Created on: 22/Aug/19 20:14
Start Date: 22/Aug/19 20:14
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #9405: 
[BEAM-8023] Add value provider interfaces for BigQueryIO.Read using 
Method.DIRECT_READ
URL: https://github.com/apache/beam/pull/9405#discussion_r316865488
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
 ##
 @@ -1360,12 +1413,39 @@ public TableReference getTable() {
   return toBuilder().setMethod(method).build();
 }
 
-/** Read options, including a list of selected columns and push-down SQL 
filter text. */
+/**
+ * @deprecated Use {@link #withSelectedFields(List)} and {@link 
#withRowRestriction(String)}
+ * instead.
+ */
+@Deprecated
 @Experimental(Experimental.Kind.SOURCE_SINK)
 public TypedRead withReadOptions(TableReadOptions readOptions) {
+  ensureSelectedFieldsAndRowRestrictionNotSet();
   return toBuilder().setReadOptions(readOptions).build();
 }
 
+@Experimental(Experimental.Kind.SOURCE_SINK)
 
 Review comment:
   Please document the new public methods.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299710)
Time Spent: 40m  (was: 0.5h)

> Allow specifying BigQuery Storage API readOptions at runtime
> 
>
> Key: BEAM-8023
> URL: https://issues.apache.org/jira/browse/BEAM-8023
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Jeff Klukas
>Assignee: Kenneth Jung
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We have support in the Java SDK for using the BigQuery Storage API for reads, 
> but only the target query or table is supported as a ValueProvider to be 
> specified at runtime. AFAICT, there is no reason we can't delay specifying 
> readOptions until runtime as well.
> The readOptions are accessed by BigQueryStorageTableSource in getTargetTable; 
> I believe that's occurring at runtime, but I'd love for someone with deeper 
> BoundedSource knowledge to confirm that.
> I'd advocate for adding new methods 
> `TypedRead.withSelectedFields(ValueProvider> value)` and 
> `TypedRead.withRowRestriction(ValueProvider value)`. The existing 
> `withReadOptions` method would then populate the other two as 
> StaticValueProviders. Perhaps we'd want to deprecate `withReadOptions` in 
> favor or specifying individual read options as separate parameters.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Work logged] (BEAM-8023) Allow specifying BigQuery Storage API readOptions at runtime

2019-08-22 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8023?focusedWorklogId=299708&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-299708
 ]

ASF GitHub Bot logged work on BEAM-8023:


Author: ASF GitHub Bot
Created on: 22/Aug/19 20:13
Start Date: 22/Aug/19 20:13
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #9405: 
[BEAM-8023] Add value provider interfaces for BigQueryIO.Read using 
Method.DIRECT_READ
URL: https://github.com/apache/beam/pull/9405#discussion_r316865488
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
 ##
 @@ -1360,12 +1413,39 @@ public TableReference getTable() {
   return toBuilder().setMethod(method).build();
 }
 
-/** Read options, including a list of selected columns and push-down SQL 
filter text. */
+/**
+ * @deprecated Use {@link #withSelectedFields(List)} and {@link 
#withRowRestriction(String)}
+ * instead.
+ */
+@Deprecated
 @Experimental(Experimental.Kind.SOURCE_SINK)
 public TypedRead withReadOptions(TableReadOptions readOptions) {
+  ensureSelectedFieldsAndRowRestrictionNotSet();
   return toBuilder().setReadOptions(readOptions).build();
 }
 
+@Experimental(Experimental.Kind.SOURCE_SINK)
 
 Review comment:
   Please document the new public fields.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 299708)
Time Spent: 0.5h  (was: 20m)

> Allow specifying BigQuery Storage API readOptions at runtime
> 
>
> Key: BEAM-8023
> URL: https://issues.apache.org/jira/browse/BEAM-8023
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Jeff Klukas
>Assignee: Kenneth Jung
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We have support in the Java SDK for using the BigQuery Storage API for reads, 
> but only the target query or table is supported as a ValueProvider to be 
> specified at runtime. AFAICT, there is no reason we can't delay specifying 
> readOptions until runtime as well.
> The readOptions are accessed by BigQueryStorageTableSource in getTargetTable; 
> I believe that's occurring at runtime, but I'd love for someone with deeper 
> BoundedSource knowledge to confirm that.
> I'd advocate for adding new methods 
> `TypedRead.withSelectedFields(ValueProvider> value)` and 
> `TypedRead.withRowRestriction(ValueProvider value)`. The existing 
> `withReadOptions` method would then populate the other two as 
> StaticValueProviders. Perhaps we'd want to deprecate `withReadOptions` in 
> favor or specifying individual read options as separate parameters.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

1 2 3 >

1 - 100 of 279 matches

Mail list logo