[jira] [Work logged] (BEAM-7948) Add time-based cache threshold support in the Java data service

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7948?focusedWorklogId=347238&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347238
 ]

ASF GitHub Bot logged work on BEAM-7948:


Author: ASF GitHub Bot
Created on: 21/Nov/19 07:31
Start Date: 21/Nov/19 07:31
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #9949: [BEAM-7948] 
Add time-based cache threshold support in the Java data s…
URL: https://github.com/apache/beam/pull/9949#issuecomment-556957491
 
 
   I have update the PR accordingly except one comment I am not pretty sure, 
and left comment to you. :) Appreciate if you can have another look :) 
@lukecwik 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347238)
Time Spent: 2h 50m  (was: 2h 40m)

> Add time-based cache threshold support in the Java data service
> ---
>
> Key: BEAM-7948
> URL: https://issues.apache.org/jira/browse/BEAM-7948
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently only size-based cache threshold is supported in data service. It 
> should also support the time-based cache threshold. This is very important, 
> especially for streaming jobs which are sensitive to the delay.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7948) Add time-based cache threshold support in the Java data service

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7948?focusedWorklogId=347237&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347237
 ]

ASF GitHub Bot logged work on BEAM-7948:


Author: ASF GitHub Bot
Created on: 21/Nov/19 07:27
Start Date: 21/Nov/19 07:27
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on pull request #9949: 
[BEAM-7948] Add time-based cache threshold support in the Java data s…
URL: https://github.com/apache/beam/pull/9949#discussion_r348928070
 
 

 ##
 File path: 
sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataTimeBasedBufferingOutboundObserver.java
 ##
 @@ -48,25 +46,27 @@
   Coder coder,
   StreamObserver outboundObserver) {
 super(sizeLimit, outputLocation, coder, outboundObserver);
-this.lock = new Object();
+this.flushLock = new Object();
 this.flushFuture =
 Executors.newSingleThreadScheduledExecutor(
 new ThreadFactoryBuilder()
 .setDaemon(true)
 .setNameFormat("DataBufferOutboundFlusher-thread")
 .build())
 .scheduleAtFixedRate(this::periodicFlush, timeLimit, timeLimit, 
TimeUnit.MILLISECONDS);
 
 Review comment:
   I found that the callable version of schedule is not a periodic action. So, 
we have to create a new method which warps the `flush`, What do you think?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347237)
Time Spent: 2h 40m  (was: 2.5h)

> Add time-based cache threshold support in the Java data service
> ---
>
> Key: BEAM-7948
> URL: https://issues.apache.org/jira/browse/BEAM-7948
> Project: Beam
>  Issue Type: Sub-task
>  Components: java-fn-execution
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently only size-based cache threshold is supported in data service. It 
> should also support the time-based cache threshold. This is very important, 
> especially for streaming jobs which are sensitive to the delay.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8619) Tear down the DoFns upon the control service termination in Java SDK harness

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8619?focusedWorklogId=347221&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347221
 ]

ASF GitHub Bot logged work on BEAM-8619:


Author: ASF GitHub Bot
Created on: 21/Nov/19 06:54
Start Date: 21/Nov/19 06:54
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #10126: [BEAM-8619] 
Tear down the DoFns upon the control service termination …
URL: https://github.com/apache/beam/pull/10126#issuecomment-556947219
 
 
   Thanks for the review and valuable comments. @lukecwik 
   
   I divided the change into 4 commits, is that makes sense to you? Feel free 
to tell me if you want let me split the changes into new PRs. :)
   
   Best, Jincheng
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347221)
Time Spent: 1h  (was: 50m)

> Tear down the DoFns upon the control service termination in Java SDK harness
> 
>
> Key: BEAM-8619
> URL: https://issues.apache.org/jira/browse/BEAM-8619
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Affects Versions: 2.18.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Per the discussion in the ML, the detail can be found [1], the teardown of 
> DoFns should be supported in the portability framework. It happens at two 
> places:
> 1) Upon the control service termination
> 2) Tear down the unused DoFns periodically
> The aim of this JIRA is to add support for teardown the DoFns upon the 
> control service termination in Java SDK harness.
> [1] 
> https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8619) Tear down the DoFns upon the control service termination in Java SDK harness

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8619?focusedWorklogId=347210&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347210
 ]

ASF GitHub Bot logged work on BEAM-8619:


Author: ASF GitHub Bot
Created on: 21/Nov/19 06:39
Start Date: 21/Nov/19 06:39
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on pull request #10126: 
[BEAM-8619] Tear down the DoFns upon the control service termination …
URL: https://github.com/apache/beam/pull/10126#discussion_r348915951
 
 

 ##
 File path: 
sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java
 ##
 @@ -226,6 +226,7 @@ public void testUsingUserState() throws Exception {
 consumers,
 startFunctionRegistry,
 finishFunctionRegistry,
+new ArrayList<>()::add,
 
 Review comment:
   Sorry, I don't think I fully understand what you mean. Can you explain it 
more? :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347210)
Time Spent: 50m  (was: 40m)

> Tear down the DoFns upon the control service termination in Java SDK harness
> 
>
> Key: BEAM-8619
> URL: https://issues.apache.org/jira/browse/BEAM-8619
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-harness
>Affects Versions: 2.18.0
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Per the discussion in the ML, the detail can be found [1], the teardown of 
> DoFns should be supported in the portability framework. It happens at two 
> places:
> 1) Upon the control service termination
> 2) Tear down the unused DoFns periodically
> The aim of this JIRA is to add support for teardown the DoFns upon the 
> control service termination in Java SDK harness.
> [1] 
> https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7854) Reading files from local file system does not fully support glob

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7854?focusedWorklogId=347165&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347165
 ]

ASF GitHub Bot logged work on BEAM-7854:


Author: ASF GitHub Bot
Created on: 21/Nov/19 05:03
Start Date: 21/Nov/19 05:03
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #9197: [BEAM-7854] 
Resolve parent folder recursively in LocalFileSystem matc…
URL: https://github.com/apache/beam/pull/9197#issuecomment-556921396
 
 
   @lukecwik this one probably should have been squashed (just happened to come 
across these commits in the history debugging #10028)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347165)
Time Spent: 4h 50m  (was: 4h 40m)

> Reading files from local file system does not fully support glob
> 
>
> Key: BEAM-7854
> URL: https://issues.apache.org/jira/browse/BEAM-7854
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Tomer Zeltzer
>Assignee: Tomer Zeltzer
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Folder structure:   
> {code:java}
> A
> B
> a=100
> data1
> file1.zst
> file2.zst 
> a=999 
> data2
> file6.zst
> a=397
> data3
> file7.zst{code}
>  
> Glob:
>  
> {code:java}
> /A/B/a=[0-9][0-9][0-9]/*/*{code}
> Code:  
>  
> {code:java}
> input.apply(Create.of(patterns))
>  .apply("Matching patterns", FileIO.matchAll())
>  .apply(FileIO.readMatches());
> {code}
>  
> input is of type PBegin.
> The above code matches 0 files even though, from the glob, its clear it 
> should match all files. I suspect its because of line 227, where only the 
> first parent folder is checked while is could be an asterix in a glob. I 
> believe the right behaviour should be to check all parent folder and use the 
> first one that exists.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8651?focusedWorklogId=347162&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347162
 ]

ASF GitHub Bot logged work on BEAM-8651:


Author: ASF GitHub Bot
Created on: 21/Nov/19 04:58
Start Date: 21/Nov/19 04:58
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10185: [BEAM-8651] 
Cherrypick PR #10167 to the release branch. 
URL: https://github.com/apache/beam/pull/10185#issuecomment-556920352
 
 
   R: @Ardagan 
   cc: @robertwb 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347162)
Time Spent: 2.5h  (was: 2h 20m)

> Python 3 portable pipelines sometimes fail with errors in 
> StockUnpickler.find_class()
> -
>
> Key: BEAM-8651
> URL: https://issues.apache.org/jira/browse/BEAM-8651
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
> Fix For: 2.17.0
>
> Attachments: beam8651.py
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Several Beam users [1,2] reported an error which happens on Python 3 in 
> StockUnpickler.find_class.
> So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink 
> and Dataflow runners. On Dataflow runner so far I have seen this in streaming 
> pipelines only, which use portable SDK worker.
> Typical stack trace:
> {noformat}
> File 
> "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", 
> line 1148, in _create_pardo_operation
>     dofn_data = pickler.loads(serialized_fn)  
>  
>   File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, 
> in loads
>     return dill.loads(s)  
>  
>   File "python3.5/site-packages/dill/_dill.py", line 317, in loads
>  
>     return load(file, ignore) 
>  
>   File "python3.5/site-packages/dill/_dill.py", line 305, in load 
>  
>     obj = pik.load()  
>  
>   File "python3.5/site-packages/dill/_dill.py", line 474, in find_class   
>  
>     return StockUnpickler.find_class(self, module, name)  
>  
> AttributeError: Can't get attribute 'ClassName' on  'python3.5/site-packages/filename.py'>
> {noformat}
> According to Guenther from [1]:
> {quote}
> This looks exactly like a race condition that we've encountered on Python
> 3.7.1: There's a bug in some older 3.7.x releases that breaks the
> thread-safety of the unpickler, as concurrent unpickle threads can access a
> module before it has been fully imported. See
> https://bugs.python.org/issue34572 for more information.
> The traceback shows a Python 3.6 venv so this could be a different issue
> (the unpickle bug was introduced in version 3.7). If it's the same bug then
> upgrading to Python 3.7.3 or higher should fix that issue. One potential
> workaround is to ensure that all of the modules get imported during the
> initialization of the sdk_worker, as this bug only affects imports done by
> the unpickler.
> {quote}
> Opening this for visibility. Current open questions are:
> 1. Find a minimal example to reproduce this issue.
> 2. Figure out whether users are still affected by this issue on Python 3.7.3.
> 3. Communicate a workarounds for 3.5, 3.6 users affected by this.
> [1] 
> https://lists.apache.org/thread.html/5581ddfcf6d2ae10d25b834b8a61ebee265ffbcf650c6ec8d1e69408@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8651?focusedWorklogId=347161&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347161
 ]

ASF GitHub Bot logged work on BEAM-8651:


Author: ASF GitHub Bot
Created on: 21/Nov/19 04:57
Start Date: 21/Nov/19 04:57
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #10185: [BEAM-8651] 
Cherrypick PR #10167 to the release branch. 
URL: https://github.com/apache/beam/pull/10185
 
 
   This is a cherrypick of #10167 to 2.17.0 release branch.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBui

[jira] [Work logged] (BEAM-8568) Local file system does not match relative path with wildcards

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8568?focusedWorklogId=347160&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347160
 ]

ASF GitHub Bot logged work on BEAM-8568:


Author: ASF GitHub Bot
Created on: 21/Nov/19 04:57
Start Date: 21/Nov/19 04:57
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #10028: [BEAM-8568] 
Fixed problem that LocalFileSystem no longer supports wil…
URL: https://github.com/apache/beam/pull/10028#issuecomment-556917287
 
 
   Tests pass on the release branch. https://gradle.com/s/5jl76y2tkiwmc
   
   So something about this change is causing the error deterministically, as 
you say. Since it is healthy on `master`, perhaps there are other coupled 
commits that need to be cherrypicked.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347160)
Time Spent: 4.5h  (was: 4h 20m)

> Local file system does not match relative path with wildcards
> -
>
> Key: BEAM-8568
> URL: https://issues.apache.org/jira/browse/BEAM-8568
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.16.0
>Reporter: Ondrej Cerny
>Assignee: David Moravek
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> CWD structure:
> {code}
> src/test/resources/input/sometestfile.txt
> {code}
>  
> Code:
> {code:java}
> input 
> .apply(Create.of("src/test/resources/input/*)) 
> .apply(FileIO.matchAll()) 
> .apply(FileIO.readMatches())
> {code}
> The code above doesn't match any file starting Beam 2.16.0. The regression 
> has been introduced in BEAM-7854.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8568) Local file system does not match relative path with wildcards

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8568?focusedWorklogId=347156&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347156
 ]

ASF GitHub Bot logged work on BEAM-8568:


Author: ASF GitHub Bot
Created on: 21/Nov/19 04:42
Start Date: 21/Nov/19 04:42
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #10028: [BEAM-8568] 
Fixed problem that LocalFileSystem no longer supports wil…
URL: https://github.com/apache/beam/pull/10028#issuecomment-556917287
 
 
   Tests pass on the release branch...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347156)
Time Spent: 4h 20m  (was: 4h 10m)

> Local file system does not match relative path with wildcards
> -
>
> Key: BEAM-8568
> URL: https://issues.apache.org/jira/browse/BEAM-8568
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.16.0
>Reporter: Ondrej Cerny
>Assignee: David Moravek
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> CWD structure:
> {code}
> src/test/resources/input/sometestfile.txt
> {code}
>  
> Code:
> {code:java}
> input 
> .apply(Create.of("src/test/resources/input/*)) 
> .apply(FileIO.matchAll()) 
> .apply(FileIO.readMatches())
> {code}
> The code above doesn't match any file starting Beam 2.16.0. The regression 
> has been introduced in BEAM-7854.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8568) Local file system does not match relative path with wildcards

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8568?focusedWorklogId=347148&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347148
 ]

ASF GitHub Bot logged work on BEAM-8568:


Author: ASF GitHub Bot
Created on: 21/Nov/19 03:54
Start Date: 21/Nov/19 03:54
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #10028: [BEAM-8568] 
Fixed problem that LocalFileSystem no longer supports wil…
URL: https://github.com/apache/beam/pull/10028#issuecomment-556908586
 
 
   I could not identify a stuck job, even though the tests were stuck.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347148)
Time Spent: 4h 10m  (was: 4h)

> Local file system does not match relative path with wildcards
> -
>
> Key: BEAM-8568
> URL: https://issues.apache.org/jira/browse/BEAM-8568
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.16.0
>Reporter: Ondrej Cerny
>Assignee: David Moravek
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> CWD structure:
> {code}
> src/test/resources/input/sometestfile.txt
> {code}
>  
> Code:
> {code:java}
> input 
> .apply(Create.of("src/test/resources/input/*)) 
> .apply(FileIO.matchAll()) 
> .apply(FileIO.readMatches())
> {code}
> The code above doesn't match any file starting Beam 2.16.0. The regression 
> has been introduced in BEAM-7854.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8568) Local file system does not match relative path with wildcards

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8568?focusedWorklogId=347143&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347143
 ]

ASF GitHub Bot logged work on BEAM-8568:


Author: ASF GitHub Bot
Created on: 21/Nov/19 03:21
Start Date: 21/Nov/19 03:21
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #10028: [BEAM-8568] 
Fixed problem that LocalFileSystem no longer supports wil…
URL: https://github.com/apache/beam/pull/10028#issuecomment-556902409
 
 
   Hmm, confirmed that it times out. Running locally for me it is up to 50+ 
minutes. Presumably just stuck. I'll check.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347143)
Time Spent: 4h  (was: 3h 50m)

> Local file system does not match relative path with wildcards
> -
>
> Key: BEAM-8568
> URL: https://issues.apache.org/jira/browse/BEAM-8568
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.16.0
>Reporter: Ondrej Cerny
>Assignee: David Moravek
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> CWD structure:
> {code}
> src/test/resources/input/sometestfile.txt
> {code}
>  
> Code:
> {code:java}
> input 
> .apply(Create.of("src/test/resources/input/*)) 
> .apply(FileIO.matchAll()) 
> .apply(FileIO.readMatches())
> {code}
> The code above doesn't match any file starting Beam 2.16.0. The regression 
> has been introduced in BEAM-7854.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8624) Implement FnService for status api in Dataflow runner

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8624?focusedWorklogId=347142&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347142
 ]

ASF GitHub Bot logged work on BEAM-8624:


Author: ASF GitHub Bot
Created on: 21/Nov/19 03:19
Start Date: 21/Nov/19 03:19
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #10115: [BEAM-8624] Implement 
Worker Status FnService in Dataflow runner
URL: https://github.com/apache/beam/pull/10115#issuecomment-556901957
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347142)
Time Spent: 3h 10m  (was: 3h)

> Implement FnService for status api in Dataflow runner
> -
>
> Key: BEAM-8624
> URL: https://issues.apache.org/jira/browse/BEAM-8624
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8624) Implement FnService for status api in Dataflow runner

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8624?focusedWorklogId=347141&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347141
 ]

ASF GitHub Bot logged work on BEAM-8624:


Author: ASF GitHub Bot
Created on: 21/Nov/19 03:19
Start Date: 21/Nov/19 03:19
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #10115: [BEAM-8624] Implement 
Worker Status FnService in Dataflow runner
URL: https://github.com/apache/beam/pull/10115#issuecomment-556277599
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347141)
Time Spent: 3h  (was: 2h 50m)

> Implement FnService for status api in Dataflow runner
> -
>
> Key: BEAM-8624
> URL: https://issues.apache.org/jira/browse/BEAM-8624
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8568) Local file system does not match relative path with wildcards

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8568?focusedWorklogId=347133&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347133
 ]

ASF GitHub Bot logged work on BEAM-8568:


Author: ASF GitHub Bot
Created on: 21/Nov/19 02:25
Start Date: 21/Nov/19 02:25
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #10028: [BEAM-8568] 
Fixed problem that LocalFileSystem no longer supports wil…
URL: https://github.com/apache/beam/pull/10028#issuecomment-556854842
 
 
   Something to do with Jenkins. That gradle scan I posted was a run against 
this PR's head (not the merge commit, so maybe that is the problem)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347133)
Time Spent: 3h 50m  (was: 3h 40m)

> Local file system does not match relative path with wildcards
> -
>
> Key: BEAM-8568
> URL: https://issues.apache.org/jira/browse/BEAM-8568
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.16.0
>Reporter: Ondrej Cerny
>Assignee: David Moravek
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> CWD structure:
> {code}
> src/test/resources/input/sometestfile.txt
> {code}
>  
> Code:
> {code:java}
> input 
> .apply(Create.of("src/test/resources/input/*)) 
> .apply(FileIO.matchAll()) 
> .apply(FileIO.readMatches())
> {code}
> The code above doesn't match any file starting Beam 2.16.0. The regression 
> has been introduced in BEAM-7854.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8746) Allow the local job service to work from inside docker

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8746?focusedWorklogId=347126&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347126
 ]

ASF GitHub Bot logged work on BEAM-8746:


Author: ASF GitHub Bot
Created on: 21/Nov/19 02:10
Start Date: 21/Nov/19 02:10
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10161: [BEAM-8746] 
Make local job service accessible from external machines
URL: https://github.com/apache/beam/pull/10161#discussion_r348866533
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/local_job_service.py
 ##
 @@ -95,7 +95,7 @@ def create_beam_job(self, preparation_id, job_name, 
pipeline, options):
 
   def start_grpc_server(self, port=0):
 self._server = grpc.server(UnboundedThreadPoolExecutor())
-port = self._server.add_insecure_port('localhost:%d' % port)
+port = self._server.add_insecure_port('[::]:%d' % port)
 
 Review comment:
   I think it'd make sense for this to be parameterized, likely with localhost 
as a default. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347126)
Time Spent: 1h 20m  (was: 1h 10m)

> Allow the local job service to work from inside docker
> --
>
> Key: BEAM-8746
> URL: https://issues.apache.org/jira/browse/BEAM-8746
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently the connection is refused.  It's a simple fix. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7850) Make Environment a top level attribute of PTransform

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7850?focusedWorklogId=347124&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347124
 ]

ASF GitHub Bot logged work on BEAM-7850:


Author: ASF GitHub Bot
Created on: 21/Nov/19 02:01
Start Date: 21/Nov/19 02:01
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10183: [BEAM-7850] 
Makes environment ID a top level attribute of PTransform.
URL: https://github.com/apache/beam/pull/10183#issuecomment-556829893
 
 
   Thanks. Yeah, design for cross-language UDFs should be done separately and 
SdkFunctionSpec is inadequate for this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347124)
Time Spent: 50m  (was: 40m)

> Make Environment a top level attribute of PTransform
> 
>
> Key: BEAM-7850
> URL: https://issues.apache.org/jira/browse/BEAM-7850
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently Environment is not a top level attribute of the PTransform (of 
> runner API proto).
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
> Instead it is hidden inside various payload objects. For example, for ParDo, 
> environment will be inside SdkFunctionSpec of ParDoPayload.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
>  
> This makes tracking environment of different types of PTransforms harder and 
> we have to fork code (on the type of PTransform) to extract the Environment 
> where the PTransform should be executed. It will probably be simpler to just 
> make Environment a top level attribute of PTransform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347123&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347123
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 21/Nov/19 01:59
Start Date: 21/Nov/19 01:59
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10173: [BEAM-8575] 
Added two unit tests in CombineTest class to test AccumulatingCombine
URL: https://github.com/apache/beam/pull/10173#discussion_r348864276
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -393,6 +395,54 @@ def test_global_fanout(self):
   | beam.CombineGlobally(combine.MeanCombineFn()).with_fanout(11))
   assert_that(result, equal_to([49.5]))
 
+  @attr('ValidatesRunner')
+  def test_accumulating_combine(self):
 
 Review comment:
   This seems mostly redundant with 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py#L562
 , other than the fact that it does globally as well. (Globally is just built 
on top of per-key, so there's little value in making it validates runner.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347123)
Time Spent: 15h 20m  (was: 15h 10m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 15h 20m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7850) Make Environment a top level attribute of PTransform

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7850?focusedWorklogId=347121&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347121
 ]

ASF GitHub Bot logged work on BEAM-7850:


Author: ASF GitHub Bot
Created on: 21/Nov/19 01:59
Start Date: 21/Nov/19 01:59
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10183: [BEAM-7850] 
Makes environment ID a top level attribute of PTransform.
URL: https://github.com/apache/beam/pull/10183#issuecomment-556829893
 
 
   Thanks. Yeah, design for cross-language UDFs should be done separately and 
SdfFunctionSpec is inadequate for this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347121)
Time Spent: 40m  (was: 0.5h)

> Make Environment a top level attribute of PTransform
> 
>
> Key: BEAM-7850
> URL: https://issues.apache.org/jira/browse/BEAM-7850
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Currently Environment is not a top level attribute of the PTransform (of 
> runner API proto).
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
> Instead it is hidden inside various payload objects. For example, for ParDo, 
> environment will be inside SdkFunctionSpec of ParDoPayload.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
>  
> This makes tracking environment of different types of PTransforms harder and 
> we have to fork code (on the type of PTransform) to extract the Environment 
> where the PTransform should be executed. It will probably be simpler to just 
> make Environment a top level attribute of PTransform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347122&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347122
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 21/Nov/19 01:59
Start Date: 21/Nov/19 01:59
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10173: [BEAM-8575] 
Added two unit tests in CombineTest class to test AccumulatingCombine
URL: https://github.com/apache/beam/pull/10173#discussion_r348863302
 
 

 ##
 File path: sdks/python/apache_beam/transforms/combiners_test.py
 ##
 @@ -393,6 +395,54 @@ def test_global_fanout(self):
   | beam.CombineGlobally(combine.MeanCombineFn()).with_fanout(11))
   assert_that(result, equal_to([49.5]))
 
+  @attr('ValidatesRunner')
+  def test_accumulating_combine(self):
+with TestPipeline() as p:
+  input = (p
+   | beam.Create([('a', 1),
+  ('a', 1),
+  ('a', 4),
+  ('b', 1),
+  ('b', 13)]))
+  # The mean of all values regardless of key.
+  global_mean = (input
+ | beam.Values()
+ | beam.CombineGlobally(combine.MeanCombineFn()))
+
+  # The (key, mean) pairs for all keys.
+  mean_per_key = (input | beam.CombinePerKey(combine.MeanCombineFn()))
+
+  expected_mean_per_key = [('a', 2), ('b', 7)]
+  assert_that(global_mean, equal_to([4]), label='global mean')
+  assert_that(mean_per_key, equal_to(expected_mean_per_key),
+  label='mean per key')
+
+  @attr('ValidatesRunner')
+  def test_accumulating_combine_empty(self):
+# For each element in a PCollection, if it is float('NaN'), then emits
+# a string 'NaN', otherwise emits str(element).
+class FormatNaNDoFn(beam.DoFn):
+  def process(self, element):
+return ([str(element)], ['NaN'])[math.isnan(element)]
+
+with TestPipeline() as p:
+  input = (p | beam.Create([]))
+
+  # Compute the mean of all values in the PCollection,
+  # then format the mean. Since the Pcollection is empty,
+  # the mean is float('NaN'), and is formatted to be a string 'NaN'.
+  global_mean = (input
+ | beam.Values()
+ | beam.CombineGlobally(combine.MeanCombineFn())
+ | beam.ParDo(FormatNaNDoFn()))
 
 Review comment:
   What about just doing beam.Map(str)?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347122)
Time Spent: 15h 10m  (was: 15h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 15h 10m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7850) Make Environment a top level attribute of PTransform

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7850?focusedWorklogId=347119&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347119
 ]

ASF GitHub Bot logged work on BEAM-7850:


Author: ASF GitHub Bot
Created on: 21/Nov/19 01:51
Start Date: 21/Nov/19 01:51
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10183: [BEAM-7850] Makes 
environment ID a top level attribute of PTransform.
URL: https://github.com/apache/beam/pull/10183#issuecomment-556821932
 
 
   The proto changes makes sense to me. The one bit I'm not sure of is how this 
will look for cross-language UDFs, but I think that will still look very 
different than the current SdfFunctionSpecs, and possibly will be modeled as 
"side" PTransforms which this would be in line with. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347119)
Time Spent: 0.5h  (was: 20m)

> Make Environment a top level attribute of PTransform
> 
>
> Key: BEAM-7850
> URL: https://issues.apache.org/jira/browse/BEAM-7850
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently Environment is not a top level attribute of the PTransform (of 
> runner API proto).
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
> Instead it is hidden inside various payload objects. For example, for ParDo, 
> environment will be inside SdkFunctionSpec of ParDoPayload.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
>  
> This makes tracking environment of different types of PTransforms harder and 
> we have to fork code (on the type of PTransform) to extract the Environment 
> where the PTransform should be executed. It will probably be simpler to just 
> make Environment a top level attribute of PTransform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8658) Optionally set artifact staging port in FlinkUberJarJobServer

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8658?focusedWorklogId=347113&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347113
 ]

ASF GitHub Bot logged work on BEAM-8658:


Author: ASF GitHub Bot
Created on: 21/Nov/19 01:17
Start Date: 21/Nov/19 01:17
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10163: [BEAM-8658] 
[BEAM-8781] Optionally set jar and artifact staging port …
URL: https://github.com/apache/beam/pull/10163#issuecomment-556752216
 
 
   I expanded this PR a bit to include the job and expansion ports as well. PTAL
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347113)
Time Spent: 50m  (was: 40m)

> Optionally set artifact staging port in FlinkUberJarJobServer
> -
>
> Key: BEAM-8658
> URL: https://issues.apache.org/jira/browse/BEAM-8658
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In certain network environments, port forwarding is necessary for our GRPC 
> servers, such as the artifact staging server. Currently, the port for 
> FlinkUberJarJobServer's artifact staging server is chosen randomly (0). We 
> will need to let the user choose it if they are to forward that port.
> https://github.com/apache/beam/blob/802e7cd86024c21d7b2eeb45f0e7c8e370661610/sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server.py#L129
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=347112&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347112
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 21/Nov/19 01:15
Start Date: 21/Nov/19 01:15
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #10132: [BEAM-8016] Pipeline 
Graph
URL: https://github.com/apache/beam/pull/10132#issuecomment-556749010
 
 
   R: @pabloem 
   Hi Pablo, could you please take a last round of review for this PR? Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347112)
Time Spent: 6h 20m  (was: 6h 10m)

> Render Beam Pipeline as DOT with Interactive Beam  
> ---
>
> Key: BEAM-8016
> URL: https://issues.apache.org/jira/browse/BEAM-8016
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline 
> converted to DOT then rendered should mark user defined variables on edges.
> With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be 
> redundant or confusing to render arbitrary random sample PCollection data on 
> edges.
> We'll also make sure edges in the graph corresponds to output -> input 
> relationship in the user defined pipeline. Each edge is one output. If 
> multiple down stream inputs take the same output, it should be rendered as 
> one edge diverging into two instead of two edges.
> For advanced interactivity highlight where each execution highlights the part 
> of the pipeline really executed from the original pipeline, we'll also 
> provide the support in beta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8691) Beam Dependency Update Request: com.google.cloud.bigtable:bigtable-client-core

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8691?focusedWorklogId=347106&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347106
 ]

ASF GitHub Bot logged work on BEAM-8691:


Author: ASF GitHub Bot
Created on: 21/Nov/19 00:55
Start Date: 21/Nov/19 00:55
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10144: [BEAM-8691] 
Upgrading bigtable-client-core to latest 1.12.1
URL: https://github.com/apache/beam/pull/10144#issuecomment-556703155
 
 
   Thanks to @elharo telling me about the linkage checker but this may help you 
perform the analysis faster: https://github.com/apache/beam/pull/10184
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347106)
Time Spent: 3h 20m  (was: 3h 10m)

> Beam Dependency Update Request: com.google.cloud.bigtable:bigtable-client-core
> --
>
> Key: BEAM-8691
> URL: https://issues.apache.org/jira/browse/BEAM-8691
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
>  - 2019-11-15 19:39:51.523448 
> -
> Please consider upgrading the dependency 
> com.google.cloud.bigtable:bigtable-client-core. 
> The current version is 1.8.0. The latest version is 1.12.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-11-19 21:05:43.901882 
> -
> Please consider upgrading the dependency 
> com.google.cloud.bigtable:bigtable-client-core. 
> The current version is 1.8.0. The latest version is 1.12.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7278) Upgrade some Beam dependencies

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7278?focusedWorklogId=347105&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347105
 ]

ASF GitHub Bot logged work on BEAM-7278:


Author: ASF GitHub Bot
Created on: 21/Nov/19 00:53
Start Date: 21/Nov/19 00:53
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10184: [BEAM-7278, 
BEAM-2530] Add support for using a Java linkage testing tool to aid upgrading 
dependencies.
URL: https://github.com/apache/beam/pull/10184#issuecomment-556695677
 
 
   R: @elharo @suztomo 
   CC: @kennknowles @iemejia 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347105)
Time Spent: 20m  (was: 10m)

> Upgrade some Beam dependencies
> --
>
> Key: BEAM-7278
> URL: https://issues.apache.org/jira/browse/BEAM-7278
> Project: Beam
>  Issue Type: Task
>  Components: dependencies
>Reporter: Etienne Chauchot
>Assignee: Mujuzi Moses
>Priority: Critical
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Some dependencies need to be upgraded.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7278) Upgrade some Beam dependencies

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7278?focusedWorklogId=347101&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347101
 ]

ASF GitHub Bot logged work on BEAM-7278:


Author: ASF GitHub Bot
Created on: 21/Nov/19 00:51
Start Date: 21/Nov/19 00:51
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10184: [BEAM-7278, 
BEAM-2530] Add support for using a Java linkage testing tool to aid upgrading 
dependencies.
URL: https://github.com/apache/beam/pull/10184
 
 
   For example:
   ```
   ./gradlew -Ppublishing 
-PjavaLinkageArtifacts=beam-sdks-java-core,beam-sdks-java-io-jdbc 
:checkJavaLinkage
   ```
   
   More details in 
https://lists.apache.org/thread.html/eb5d95b9a33d7e32dc9bcd0f7d48ba8711d42bd7ed03b9cf0f1103f1@%3Cdev.beam.apache.org%3E
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructured

[jira] [Work logged] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8651?focusedWorklogId=347097&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347097
 ]

ASF GitHub Bot logged work on BEAM-8651:


Author: ASF GitHub Bot
Created on: 21/Nov/19 00:41
Start Date: 21/Nov/19 00:41
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #10167: [BEAM-8651] 
Guard pickling operations with a lock to prevent race condition in module 
imports.
URL: https://github.com/apache/beam/pull/10167
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347097)
Time Spent: 2h 10m  (was: 2h)

> Python 3 portable pipelines sometimes fail with errors in 
> StockUnpickler.find_class()
> -
>
> Key: BEAM-8651
> URL: https://issues.apache.org/jira/browse/BEAM-8651
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
> Fix For: 2.17.0
>
> Attachments: beam8651.py
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Several Beam users [1,2] reported an error which happens on Python 3 in 
> StockUnpickler.find_class.
> So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink 
> and Dataflow runners. On Dataflow runner so far I have seen this in streaming 
> pipelines only, which use portable SDK worker.
> Typical stack trace:
> {noformat}
> File 
> "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", 
> line 1148, in _create_pardo_operation
>     dofn_data = pickler.loads(serialized_fn)  
>  
>   File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, 
> in loads
>     return dill.loads(s)  
>  
>   File "python3.5/site-packages/dill/_dill.py", line 317, in loads
>  
>     return load(file, ignore) 
>  
>   File "python3.5/site-packages/dill/_dill.py", line 305, in load 
>  
>     obj = pik.load()  
>  
>   File "python3.5/site-packages/dill/_dill.py", line 474, in find_class   
>  
>     return StockUnpickler.find_class(self, module, name)  
>  
> AttributeError: Can't get attribute 'ClassName' on  'python3.5/site-packages/filename.py'>
> {noformat}
> According to Guenther from [1]:
> {quote}
> This looks exactly like a race condition that we've encountered on Python
> 3.7.1: There's a bug in some older 3.7.x releases that breaks the
> thread-safety of the unpickler, as concurrent unpickle threads can access a
> module before it has been fully imported. See
> https://bugs.python.org/issue34572 for more information.
> The traceback shows a Python 3.6 venv so this could be a different issue
> (the unpickle bug was introduced in version 3.7). If it's the same bug then
> upgrading to Python 3.7.3 or higher should fix that issue. One potential
> workaround is to ensure that all of the modules get imported during the
> initialization of the sdk_worker, as this bug only affects imports done by
> the unpickler.
> {quote}
> Opening this for visibility. Current open questions are:
> 1. Find a minimal example to reproduce this issue.
> 2. Figure out whether users are still affected by this issue on Python 3.7.3.
> 3. Communicate a workarounds for 3.5, 3.6 users affected by this.
> [1] 
> https://lists.apache.org/thread.html/5581ddfcf6d2ae10d25b834b8a61ebee265ffbcf650c6ec8d1e69408@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8794) Projects should be handled by an IOPushDownRule before applying AggregateProjectMergeRule

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8794?focusedWorklogId=347088&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347088
 ]

ASF GitHub Bot logged work on BEAM-8794:


Author: ASF GitHub Bot
Created on: 21/Nov/19 00:25
Start Date: 21/Nov/19 00:25
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10180: [BEAM-8794] 
Conditional aggregate project merge
URL: https://github.com/apache/beam/pull/10180#issuecomment-556512353
 
 
   Run sql postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347088)
Time Spent: 1h  (was: 50m)

> Projects should be handled by an IOPushDownRule before applying 
> AggregateProjectMergeRule
> -
>
> Key: BEAM-8794
> URL: https://issues.apache.org/jira/browse/BEAM-8794
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> It is more efficient to push-down projected fields at an IO level (vs merging 
> with an Aggregate), when supported.
> When running queries like:
> {code:java}
> select SUM(score) as total_score from  group by name{code}
> Projects get merged with an aggregate, as a result Calc (after an 
> IOSourceRel) projects all fields and BeamIOPushDown rule does know what 
> fields can be dropped, thus not dropping any.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8794) Projects should be handled by an IOPushDownRule before applying AggregateProjectMergeRule

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8794?focusedWorklogId=347089&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347089
 ]

ASF GitHub Bot logged work on BEAM-8794:


Author: ASF GitHub Bot
Created on: 21/Nov/19 00:25
Start Date: 21/Nov/19 00:25
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10180: [BEAM-8794] 
Conditional aggregate project merge
URL: https://github.com/apache/beam/pull/10180#issuecomment-556609585
 
 
   Run sql postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347089)
Time Spent: 1h 10m  (was: 1h)

> Projects should be handled by an IOPushDownRule before applying 
> AggregateProjectMergeRule
> -
>
> Key: BEAM-8794
> URL: https://issues.apache.org/jira/browse/BEAM-8794
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> It is more efficient to push-down projected fields at an IO level (vs merging 
> with an Aggregate), when supported.
> When running queries like:
> {code:java}
> select SUM(score) as total_score from  group by name{code}
> Projects get merged with an aggregate, as a result Calc (after an 
> IOSourceRel) projects all fields and BeamIOPushDown rule does know what 
> fields can be dropped, thus not dropping any.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8603) Add Python SqlTransform example script

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8603?focusedWorklogId=347087&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347087
 ]

ASF GitHub Bot logged work on BEAM-8603:


Author: ASF GitHub Bot
Created on: 21/Nov/19 00:19
Start Date: 21/Nov/19 00:19
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #10055: 
[BEAM-8603] Add Python SqlTransform example script
URL: https://github.com/apache/beam/pull/10055#discussion_r348841220
 
 

 ##
 File path: sdks/java/extensions/sql/build.gradle
 ##
 @@ -24,6 +24,7 @@ plugins {
 }
 applyJavaNature(
   automaticModuleName: 'org.apache.beam.sdk.extensions.sql',
+  shadowClosure: {},
 
 Review comment:
   We need two jars.
   
   (1) Jar for expansion service that contains the transform classes that need 
to be expanded (this can be multiple jars as well, IO transforms just have to 
be in the class path to be picked up by the AutoService).
   (2) Jar to be passed to the Java worker. For me, building a shadow Jar of 
java-harness (:sdks:java:harness:shadowJar) worked.
   
   It probably makes sens to move expansion service to it's own Gradle module 
and support building a shadow jar with all in-built cross-language transforms 
in Beam and release that.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347087)
Time Spent: 1h  (was: 50m)

> Add Python SqlTransform example script
> --
>
> Key: BEAM-8603
> URL: https://issues.apache.org/jira/browse/BEAM-8603
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7850) Make Environment a top level attribute of PTransform

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7850?focusedWorklogId=347084&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347084
 ]

ASF GitHub Bot logged work on BEAM-7850:


Author: ASF GitHub Bot
Created on: 21/Nov/19 00:11
Start Date: 21/Nov/19 00:11
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10183: [BEAM-7850] 
Makes environment ID a top level attribute of PTransform.
URL: https://github.com/apache/beam/pull/10183#issuecomment-556563438
 
 
   cc: @robertwb and @lukecwik 
   
   I'm adding the rest of the refactoring needed for this but can you take a 
quick look to see if the proto changes look good ?
   
   I did not preserve tags since I think we do not worry about backwards 
compatibility at this point but lemme know if I should. Thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347084)
Time Spent: 20m  (was: 10m)

> Make Environment a top level attribute of PTransform
> 
>
> Key: BEAM-7850
> URL: https://issues.apache.org/jira/browse/BEAM-7850
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently Environment is not a top level attribute of the PTransform (of 
> runner API proto).
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
> Instead it is hidden inside various payload objects. For example, for ParDo, 
> environment will be inside SdkFunctionSpec of ParDoPayload.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
>  
> This makes tracking environment of different types of PTransforms harder and 
> we have to fork code (on the type of PTransform) to extract the Environment 
> where the PTransform should be executed. It will probably be simpler to just 
> make Environment a top level attribute of PTransform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347083&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347083
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 21/Nov/19 00:10
Start Date: 21/Nov/19 00:10
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on issue #10173: [BEAM-8575] 
Added two unit tests in CombineTest class to test simple …
URL: https://github.com/apache/beam/pull/10173#issuecomment-556201858
 
 
   Although the names of the tests contain "accumulating", those tests are not 
related to ACCUMULATING or DISCARDING mode. They are testing simple combine 
cases. Since the Java tests have these names, Python tests follow them.
   
   Note that the "simple combine cases" I mentioned above has no special 
meaning. It is different from the "SimpleCombine" in Java tests, which has a 
special meaning.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347083)
Time Spent: 15h  (was: 14h 50m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 15h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7850) Make Environment a top level attribute of PTransform

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7850?focusedWorklogId=347082&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347082
 ]

ASF GitHub Bot logged work on BEAM-7850:


Author: ASF GitHub Bot
Created on: 21/Nov/19 00:09
Start Date: 21/Nov/19 00:09
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #10183: 
[BEAM-7850] Makes environment ID a top level attribute of PTransform.
URL: https://github.com/apache/beam/pull/10183
 
 
   Removes SDKFunctionSpec and replaces all usages of it with FunctionSpec.
   
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status

[jira] [Work logged] (BEAM-8592) DataCatalogTableProvider should not squash table components together into a string

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8592?focusedWorklogId=347081&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347081
 ]

ASF GitHub Bot logged work on BEAM-8592:


Author: ASF GitHub Bot
Created on: 20/Nov/19 23:58
Start Date: 20/Nov/19 23:58
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #10021: [BEAM-8592] 
Adjusting ZetaSQL table resolution to standard
URL: https://github.com/apache/beam/pull/10021#issuecomment-556555460
 
 
   Please take another look. I have restored unit testing to ensure that 
`TableResolution` works as expected.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347081)
Time Spent: 1h 20m  (was: 1h 10m)

> DataCatalogTableProvider should not squash table components together into a 
> string
> --
>
> Key: BEAM-8592
> URL: https://issues.apache.org/jira/browse/BEAM-8592
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql, dsl-sql-zetasql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, if a user writes a table name like \{{foo.`baz.bar`.bizzle}} 
> representing the components \{{"foo", "baz.bar", "bizzle"}} the 
> DataCatalogTableProvider will concatenate the components into a string and 
> resolve the identifier as if it represented \{{"foo", "baz", "bar", 
> "bizzle"}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8797) Add artifactEndpoint to PortablePipelineOptions.java

2019-11-20 Thread Kyle Weaver (Jira)
Kyle Weaver created BEAM-8797:
-

 Summary: Add artifactEndpoint to PortablePipelineOptions.java
 Key: BEAM-8797
 URL: https://issues.apache.org/jira/browse/BEAM-8797
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Reporter: Kyle Weaver
Assignee: Kyle Weaver


Same as BEAM-8660 but for Java.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8797) Add artifactEndpoint to PortablePipelineOptions.java

2019-11-20 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8797:
--
Status: Open  (was: Triage Needed)

> Add artifactEndpoint to PortablePipelineOptions.java
> 
>
> Key: BEAM-8797
> URL: https://issues.apache.org/jira/browse/BEAM-8797
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>
> Same as BEAM-8660 but for Java.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=347079&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347079
 ]

ASF GitHub Bot logged work on BEAM-8016:


Author: ASF GitHub Bot
Created on: 20/Nov/19 23:50
Start Date: 20/Nov/19 23:50
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #10132: [BEAM-8016] Pipeline 
Graph
URL: https://github.com/apache/beam/pull/10132#issuecomment-556553496
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347079)
Time Spent: 6h 10m  (was: 6h)

> Render Beam Pipeline as DOT with Interactive Beam  
> ---
>
> Key: BEAM-8016
> URL: https://issues.apache.org/jira/browse/BEAM-8016
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline 
> converted to DOT then rendered should mark user defined variables on edges.
> With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be 
> redundant or confusing to render arbitrary random sample PCollection data on 
> edges.
> We'll also make sure edges in the graph corresponds to output -> input 
> relationship in the user defined pipeline. Each edge is one output. If 
> multiple down stream inputs take the same output, it should be rendered as 
> one edge diverging into two instead of two edges.
> For advanced interactivity highlight where each execution highlights the part 
> of the pipeline really executed from the original pipeline, we'll also 
> provide the support in beta.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8496) remove SDF translators in flink streaming transform translator

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8496?focusedWorklogId=347078&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347078
 ]

ASF GitHub Bot logged work on BEAM-8496:


Author: ASF GitHub Bot
Created on: 20/Nov/19 23:50
Start Date: 20/Nov/19 23:50
Worklog Time Spent: 10m 
  Work Description: vectorijk commented on issue #9903: [BEAM-8496] remove 
SDF translators from flink translator
URL: https://github.com/apache/beam/pull/9903#issuecomment-55655
 
 
   Run Flink ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347078)
Time Spent: 1h  (was: 50m)

> remove SDF translators in flink streaming transform translator
> --
>
> Key: BEAM-8496
> URL: https://issues.apache.org/jira/browse/BEAM-8496
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kai Jiang
>Assignee: Kai Jiang
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Since URN of SDF has been moved to runners-core-construction-java, we need to 
> remove it.
> Otherwise, in failed nexmark Jenkins 
> [job|https://builds.apache.org/job/beam_PostCommit_Java_Nexmark_Flink/4128/console],
>  it causes duplicated transformer registered in 
> [PTransformTranslation.KnownTransformPayloadTranslator()|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PTransformTranslation.java#L290]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8796) Optionally configure static job port for JavaJarJobServer

2019-11-20 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8796:
--
Issue Type: Improvement  (was: Bug)

> Optionally configure static job port for JavaJarJobServer
> -
>
> Key: BEAM-8796
> URL: https://issues.apache.org/jira/browse/BEAM-8796
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink, runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>
> Right now, ports are always dynamically assigned.
> https://github.com/apache/beam/blob/10243dc78d5472a5c312a316f03c6d4c622840ea/sdks/python/apache_beam/runners/portability/job_server.py#L144



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8743) Add support for flat schemas in pubsub

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8743?focusedWorklogId=347075&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347075
 ]

ASF GitHub Bot logged work on BEAM-8743:


Author: ASF GitHub Bot
Created on: 20/Nov/19 23:44
Start Date: 20/Nov/19 23:44
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #10158: 
[BEAM-8743] Add support for flat schemas in pubsub
URL: https://github.com/apache/beam/pull/10158#discussion_r348805530
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubMessageToRow.java
 ##
 @@ -95,28 +109,40 @@ public void processElement(ProcessContext context) {
* payload, and attributes.
*/
   private List getFieldValues(ProcessContext context) {
+Row payload = parsePayloadJsonRow(context.element());
 return messageSchema().getFields().stream()
-.map(field -> getValueForField(field, context.timestamp(), 
context.element()))
+.map(
+field ->
+getValueForField(
+field, context.timestamp(), 
context.element().getAttributeMap(), payload))
 .collect(toList());
   }
 
   private Object getValueForField(
-  Schema.Field field, Instant timestamp, PubsubMessage pubsubMessage) {
-
-switch (field.getName()) {
-  case TIMESTAMP_FIELD:
+  Schema.Field field, Instant timestamp, Map attributeMap, 
Row payload) {
+// TODO: do this check once at construction time, rather than for every 
element.
 
 Review comment:
   I imagine you just fork the DoFn and share utility code?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347075)
Time Spent: 1h  (was: 50m)

> Add support for flat schemas in pubsub
> --
>
> Key: BEAM-8743
> URL: https://issues.apache.org/jira/browse/BEAM-8743
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> See 
> https://lists.apache.org/thread.html/bf4c37f21bda194d7f8c40f6e7b9a776262415755cc1658412af3c76@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8743) Add support for flat schemas in pubsub

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8743?focusedWorklogId=347076&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347076
 ]

ASF GitHub Bot logged work on BEAM-8743:


Author: ASF GitHub Bot
Created on: 20/Nov/19 23:44
Start Date: 20/Nov/19 23:44
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #10158: [BEAM-8743] Add 
support for flat schemas in pubsub
URL: https://github.com/apache/beam/pull/10158#issuecomment-556552084
 
 
   (be sure to `rebase -i` the fixup commits)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347076)
Time Spent: 1h 10m  (was: 1h)

> Add support for flat schemas in pubsub
> --
>
> Key: BEAM-8743
> URL: https://issues.apache.org/jira/browse/BEAM-8743
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> See 
> https://lists.apache.org/thread.html/bf4c37f21bda194d7f8c40f6e7b9a776262415755cc1658412af3c76@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8743) Add support for flat schemas in pubsub

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8743?focusedWorklogId=347074&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347074
 ]

ASF GitHub Bot logged work on BEAM-8743:


Author: ASF GitHub Bot
Created on: 20/Nov/19 23:44
Start Date: 20/Nov/19 23:44
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #10158: 
[BEAM-8743] Add support for flat schemas in pubsub
URL: https://github.com/apache/beam/pull/10158#discussion_r348802643
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubMessageToRow.java
 ##
 @@ -68,8 +69,21 @@
 
   public abstract boolean useDlq();
 
+  public abstract boolean useFlatSchema();
+
   private Schema payloadSchema() {
-return messageSchema().getField(PAYLOAD_FIELD).getType().getRowSchema();
+if (useFlatSchema()) {
+  Schema.Builder builder = Schema.builder();
+  for (Schema.Field field : messageSchema().getFields()) {
+if (field.getName().equals(TIMESTAMP_FIELD)) {
+  continue;
+}
+builder.addField(field);
+  }
+  return builder.build();
+} else {
+  return messageSchema().getField(PAYLOAD_FIELD).getType().getRowSchema();
 
 Review comment:
   nit: shorter branch first is slightly easier to read
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347074)
Time Spent: 50m  (was: 40m)

> Add support for flat schemas in pubsub
> --
>
> Key: BEAM-8743
> URL: https://issues.apache.org/jira/browse/BEAM-8743
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> See 
> https://lists.apache.org/thread.html/bf4c37f21bda194d7f8c40f6e7b9a776262415755cc1658412af3c76@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8743) Add support for flat schemas in pubsub

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8743?focusedWorklogId=347073&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347073
 ]

ASF GitHub Bot logged work on BEAM-8743:


Author: ASF GitHub Bot
Created on: 20/Nov/19 23:44
Start Date: 20/Nov/19 23:44
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #10158: 
[BEAM-8743] Add support for flat schemas in pubsub
URL: https://github.com/apache/beam/pull/10158#discussion_r348803919
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubMessageToRow.java
 ##
 @@ -68,8 +69,21 @@
 
   public abstract boolean useDlq();
 
+  public abstract boolean useFlatSchema();
+
   private Schema payloadSchema() {
-return messageSchema().getField(PAYLOAD_FIELD).getType().getRowSchema();
+if (useFlatSchema()) {
+  Schema.Builder builder = Schema.builder();
+  for (Schema.Field field : messageSchema().getFields()) {
 
 Review comment:
   nit: might be a pithy way to do e.g. 
`messageSchema().getFields().stream().filter(f -> 
!f.getName().equals(TIMESTAMP_FIELD)` but it might just get crufty anyhow
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347073)
Time Spent: 40m  (was: 0.5h)

> Add support for flat schemas in pubsub
> --
>
> Key: BEAM-8743
> URL: https://issues.apache.org/jira/browse/BEAM-8743
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> See 
> https://lists.apache.org/thread.html/bf4c37f21bda194d7f8c40f6e7b9a776262415755cc1658412af3c76@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347072&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347072
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 20/Nov/19 23:42
Start Date: 20/Nov/19 23:42
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on issue #10173: [BEAM-8575] 
Added two unit tests in CombineTest class to test simple …
URL: https://github.com/apache/beam/pull/10173#issuecomment-556551619
 
 
   R: @robertwb 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347072)
Time Spent: 14h 50m  (was: 14h 40m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 14h 50m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8795) runners:spark:compileJava broken on master

2019-11-20 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver resolved BEAM-8795.
---
Fix Version/s: Not applicable
   Resolution: Fixed

> runners:spark:compileJava broken on master
> --
>
> Key: BEAM-8795
> URL: https://issues.apache.org/jira/browse/BEAM-8795
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/pull/10147
> beam/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/functions/SparkSideInputReader.java:49:
>  error: incompatible types: MultimapView is not a functional interface
>   o -> Collections.EMPTY_LIST;
>   ^



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8795) runners:spark:compileJava broken on master

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8795?focusedWorklogId=347071&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347071
 ]

ASF GitHub Bot logged work on BEAM-8795:


Author: ASF GitHub Bot
Created on: 20/Nov/19 23:34
Start Date: 20/Nov/19 23:34
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #10182: [BEAM-8795] fix 
Spark runner build
URL: https://github.com/apache/beam/pull/10182
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347071)
Time Spent: 0.5h  (was: 20m)

> runners:spark:compileJava broken on master
> --
>
> Key: BEAM-8795
> URL: https://issues.apache.org/jira/browse/BEAM-8795
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/pull/10147
> beam/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/functions/SparkSideInputReader.java:49:
>  error: incompatible types: MultimapView is not a functional interface
>   o -> Collections.EMPTY_LIST;
>   ^



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8796) Optionally configure static job port for JavaJarJobServer

2019-11-20 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8796:
--
Status: Open  (was: Triage Needed)

> Optionally configure static job port for JavaJarJobServer
> -
>
> Key: BEAM-8796
> URL: https://issues.apache.org/jira/browse/BEAM-8796
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>
> Right now, ports are always dynamically assigned.
> https://github.com/apache/beam/blob/10243dc78d5472a5c312a316f03c6d4c622840ea/sdks/python/apache_beam/runners/portability/job_server.py#L144



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8796) Optionally configure static job port for JavaJarJobServer

2019-11-20 Thread Kyle Weaver (Jira)
Kyle Weaver created BEAM-8796:
-

 Summary: Optionally configure static job port for JavaJarJobServer
 Key: BEAM-8796
 URL: https://issues.apache.org/jira/browse/BEAM-8796
 Project: Beam
  Issue Type: Bug
  Components: runner-flink, runner-spark
Reporter: Kyle Weaver
Assignee: Kyle Weaver


Right now, ports are always dynamically assigned.

https://github.com/apache/beam/blob/10243dc78d5472a5c312a316f03c6d4c622840ea/sdks/python/apache_beam/runners/portability/job_server.py#L144



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8629) WithTypeHints._get_or_create_type_hints may return a mutable copy of the class type hints.

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8629?focusedWorklogId=347069&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347069
 ]

ASF GitHub Bot logged work on BEAM-8629:


Author: ASF GitHub Bot
Created on: 20/Nov/19 23:28
Start Date: 20/Nov/19 23:28
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10080: [BEAM-8629] 
Don't return mutable class type hints.
URL: https://github.com/apache/beam/pull/10080
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347069)
Time Spent: 1h 20m  (was: 1h 10m)

> WithTypeHints._get_or_create_type_hints may return a mutable copy of the 
> class type hints.
> --
>
> Key: BEAM-8629
> URL: https://issues.apache.org/jira/browse/BEAM-8629
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8793) installGcpTest task flakes

2019-11-20 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8793:
--
Description: 
I've also seen this happen with 
:sdks:python:test-suites:portable:py37:installGcpTest.

11:01:38 > Task :sdks:python:test-suites:direct:py35:installGcpTest FAILED
11:01:38 Obtaining 
file:///home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python
11:01:38 ERROR: Command errored out with exit status 1:
11:01:38  command: 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/bin/python3.5
 -c 'import sys, setuptools, tokenize; sys.argv[0] = 
'"'"'/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py'"'"';
 
__file__='"'"'/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py'"'"';f=getattr(tokenize,
 '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', 
'"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info
11:01:38  cwd: 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/
11:01:38 Complete output (37 lines):
11:01:38 Traceback (most recent call last):
11:01:38   File "", line 1, in 
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py",
 line 264, in 
11:01:38 'test': generate_protos_first(test),
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/__init__.py",
 line 144, in setup
11:01:38 _install_setup_requires(attrs)
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/__init__.py",
 line 139, in _install_setup_requires
11:01:38 dist.fetch_build_eggs(dist.setup_requires)
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/dist.py",
 line 720, in fetch_build_eggs
11:01:38 replace_conflicting=True,
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py",
 line 782, in resolve
11:01:38 replace_conflicting=replace_conflicting
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py",
 line 1065, in best_match
11:01:38 return self.obtain(req, installer)
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py",
 line 1077, in obtain
11:01:38 return installer(requirement)
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/dist.py",
 line 787, in fetch_build_egg
11:01:38 return cmd.easy_install(req)
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py",
 line 679, in easy_install
11:01:38 return self.install_item(spec, dist.location, tmpdir, deps)
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py",
 line 705, in install_item
11:01:38 dists = self.install_eggs(spec, download, tmpdir)
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py",
 line 855, in install_eggs
11:01:38 return [self.install_wheel(dist_filename, tmpdir)]
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py",
 line 1073, in install_wheel
11:01:38 os.path.dirname(destination)
11:01:38   File "/usr/lib/python3.5/distutils/cmd.py", line 336, in execute
11:01:38 util.execute(func, args, msg, dry_run=self.dry_run)
11:01:38   File "/usr/lib/python3.5/distutils/util.py", line 301, in execute
11:01:38 func(*args)
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/wheel.py",
 line 101, in install_as_egg
11:01:38 self._install_as_egg(destination_eggdir, zf)
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build

[jira] [Assigned] (BEAM-8504) BigQueryIO DIRECT_READ is broken

2019-11-20 Thread Kenneth Knowles (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenneth Knowles reassigned BEAM-8504:
-

Assignee: Gleb Kanterov  (was: Aryan Naraghi)

> BigQueryIO DIRECT_READ is broken
> 
>
> Key: BEAM-8504
> URL: https://issues.apache.org/jira/browse/BEAM-8504
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.16.0, 2.17.0
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT 
> (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with 
> 2.15.0.
> {code}
> java.io.IOException: Failed to start reading from source: name: 
> "projects//locations/eu/streams/"
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Fraction consumed from 
> previous response (0.0) is not less than fraction consumed from current 
> response (0.0).
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206)
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:601)
>   ... 14 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8504) BigQueryIO DIRECT_READ is broken

2019-11-20 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978813#comment-16978813
 ] 

Kenneth Knowles commented on BEAM-8504:
---

LGTM. Thanks! Just close this out when green & merged.

> BigQueryIO DIRECT_READ is broken
> 
>
> Key: BEAM-8504
> URL: https://issues.apache.org/jira/browse/BEAM-8504
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.16.0, 2.17.0
>Reporter: Gleb Kanterov
>Assignee: Aryan Naraghi
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT 
> (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with 
> 2.15.0.
> {code}
> java.io.IOException: Failed to start reading from source: name: 
> "projects//locations/eu/streams/"
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Fraction consumed from 
> previous response (0.0) is not less than fraction consumed from current 
> response (0.0).
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206)
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:601)
>   ... 14 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8504) BigQueryIO DIRECT_READ is broken

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8504?focusedWorklogId=347066&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347066
 ]

ASF GitHub Bot logged work on BEAM-8504:


Author: ASF GitHub Bot
Created on: 20/Nov/19 23:11
Start Date: 20/Nov/19 23:11
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #10168: [BEAM-8504] 
Cherry-pick into release-2.17.0
URL: https://github.com/apache/beam/pull/10168#issuecomment-556530499
 
 
   Looks good to merge to release branch when green.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347066)
Time Spent: 2.5h  (was: 2h 20m)

> BigQueryIO DIRECT_READ is broken
> 
>
> Key: BEAM-8504
> URL: https://issues.apache.org/jira/browse/BEAM-8504
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.16.0, 2.17.0
>Reporter: Gleb Kanterov
>Assignee: Aryan Naraghi
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT 
> (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with 
> 2.15.0.
> {code}
> java.io.IOException: Failed to start reading from source: name: 
> "projects//locations/eu/streams/"
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Fraction consumed from 
> previous response (0.0) is not less than fraction consumed from current 
> response (0.0).
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206)
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:601)
>   ... 14 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8489) Python typehints: filter callable output type hint should not be used

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8489?focusedWorklogId=347065&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347065
 ]

ASF GitHub Bot logged work on BEAM-8489:


Author: ASF GitHub Bot
Created on: 20/Nov/19 23:07
Start Date: 20/Nov/19 23:07
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #9890: [BEAM-8489] 
Filter: don't use callable's output type
URL: https://github.com/apache/beam/pull/9890#discussion_r348791496
 
 

 ##
 File path: sdks/python/apache_beam/transforms/core.py
 ##
 @@ -1544,10 +1544,16 @@ def Filter(fn, *args, **kwargs):  # pylint: 
disable=invalid-name
   # TODO: What about callable classes?
   if hasattr(fn, '__name__'):
 wrapper.__name__ = fn.__name__
+
+  # Get type hints from this instance or the callable. Do not use output type
+  # hints from the callable (which should be bool if set).
+  fn_type_hints = typehints.decorators.IOTypeHints.from_callable(fn)
+  if fn_type_hints is not None:
+fn_type_hints.output_types = None
 
 Review comment:
   With this change, do we still need both branches in line 1559, 1563? Perhaps 
we can make the evaluation more deterministic as in:
   ```
 if (get_type_hints(wrapper).input_types
 and get_type_hints(wrapper).input_types[0]):
   output_hint = get_type_hints(wrapper).input_types[0][0]
   get_type_hints(wrapper).set_output_types(typehints.Iterable[output_hint])
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347065)
Time Spent: 0.5h  (was: 20m)

> Python typehints: filter callable output type hint should not be used
> -
>
> Key: BEAM-8489
> URL: https://issues.apache.org/jira/browse/BEAM-8489
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> A filter function returns bool, while the Filter() transform outputs the same 
> element type as the input.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8795) runners:spark:compileJava broken on master

2019-11-20 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8795:
--
Status: Open  (was: Triage Needed)

> runners:spark:compileJava broken on master
> --
>
> Key: BEAM-8795
> URL: https://issues.apache.org/jira/browse/BEAM-8795
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/pull/10147
> beam/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/functions/SparkSideInputReader.java:49:
>  error: incompatible types: MultimapView is not a functional interface
>   o -> Collections.EMPTY_LIST;
>   ^



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8794) Projects should be handled by an IOPushDownRule before applying AggregateProjectMergeRule

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8794?focusedWorklogId=347053&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347053
 ]

ASF GitHub Bot logged work on BEAM-8794:


Author: ASF GitHub Bot
Created on: 20/Nov/19 22:53
Start Date: 20/Nov/19 22:53
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10180: [BEAM-8794] 
Conditional aggregate project merge
URL: https://github.com/apache/beam/pull/10180#issuecomment-556512353
 
 
   Run sql postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347053)
Time Spent: 50m  (was: 40m)

> Projects should be handled by an IOPushDownRule before applying 
> AggregateProjectMergeRule
> -
>
> Key: BEAM-8794
> URL: https://issues.apache.org/jira/browse/BEAM-8794
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> It is more efficient to push-down projected fields at an IO level (vs merging 
> with an Aggregate), when supported.
> When running queries like:
> {code:java}
> select SUM(score) as total_score from  group by name{code}
> Projects get merged with an aggregate, as a result Calc (after an 
> IOSourceRel) projects all fields and BeamIOPushDown rule does know what 
> fields can be dropped, thus not dropping any.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8794) Projects should be handled by an IOPushDownRule before applying AggregateProjectMergeRule

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8794?focusedWorklogId=347051&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347051
 ]

ASF GitHub Bot logged work on BEAM-8794:


Author: ASF GitHub Bot
Created on: 20/Nov/19 22:53
Start Date: 20/Nov/19 22:53
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10180: [BEAM-8794] 
Conditional aggregate project merge
URL: https://github.com/apache/beam/pull/10180#issuecomment-556466893
 
 
   Run sql postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347051)
Time Spent: 40m  (was: 0.5h)

> Projects should be handled by an IOPushDownRule before applying 
> AggregateProjectMergeRule
> -
>
> Key: BEAM-8794
> URL: https://issues.apache.org/jira/browse/BEAM-8794
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> It is more efficient to push-down projected fields at an IO level (vs merging 
> with an Aggregate), when supported.
> When running queries like:
> {code:java}
> select SUM(score) as total_score from  group by name{code}
> Projects get merged with an aggregate, as a result Calc (after an 
> IOSourceRel) projects all fields and BeamIOPushDown rule does know what 
> fields can be dropped, thus not dropping any.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8795) runners:spark:compileJava broken on master

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8795?focusedWorklogId=347047&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347047
 ]

ASF GitHub Bot logged work on BEAM-8795:


Author: ASF GitHub Bot
Created on: 20/Nov/19 22:51
Start Date: 20/Nov/19 22:51
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10182: [BEAM-8795] fix Spark 
runner build
URL: https://github.com/apache/beam/pull/10182#issuecomment-556509852
 
 
   Run Spark ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347047)
Time Spent: 20m  (was: 10m)

> runners:spark:compileJava broken on master
> --
>
> Key: BEAM-8795
> URL: https://issues.apache.org/jira/browse/BEAM-8795
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/pull/10147
> beam/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/functions/SparkSideInputReader.java:49:
>  error: incompatible types: MultimapView is not a functional interface
>   o -> Collections.EMPTY_LIST;
>   ^



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3419) Enable iterable side input for beam runners.

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3419?focusedWorklogId=347045&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347045
 ]

ASF GitHub Bot logged work on BEAM-3419:


Author: ASF GitHub Bot
Created on: 20/Nov/19 22:47
Start Date: 20/Nov/19 22:47
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10147: [BEAM-3419] Flesh out 
iterable side inputs and key enumeration for multimaps in shared libraries
URL: https://github.com/apache/beam/pull/10147#issuecomment-556506562
 
 
   #10182
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347045)
Time Spent: 3h 50m  (was: 3h 40m)

> Enable iterable side input for beam runners.
> 
>
> Key: BEAM-3419
> URL: https://issues.apache.org/jira/browse/BEAM-3419
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8795) runners:spark:compileJava broken on master

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8795?focusedWorklogId=347044&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347044
 ]

ASF GitHub Bot logged work on BEAM-8795:


Author: ASF GitHub Bot
Created on: 20/Nov/19 22:46
Start Date: 20/Nov/19 22:46
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #10182: [BEAM-8795] fix 
Spark runner build
URL: https://github.com/apache/beam/pull/10182
 
 
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/

[jira] [Work logged] (BEAM-3419) Enable iterable side input for beam runners.

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3419?focusedWorklogId=347043&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347043
 ]

ASF GitHub Bot logged work on BEAM-3419:


Author: ASF GitHub Bot
Created on: 20/Nov/19 22:45
Start Date: 20/Nov/19 22:45
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10147: [BEAM-3419] Flesh 
out iterable side inputs and key enumeration for multimaps in shared libraries
URL: https://github.com/apache/beam/pull/10147#issuecomment-556503702
 
 
   @ibzib has a fix in flight for this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347043)
Time Spent: 3h 40m  (was: 3.5h)

> Enable iterable side input for beam runners.
> 
>
> Key: BEAM-3419
> URL: https://issues.apache.org/jira/browse/BEAM-3419
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8795) runners:spark:compileJava broken on master

2019-11-20 Thread Kyle Weaver (Jira)
Kyle Weaver created BEAM-8795:
-

 Summary: runners:spark:compileJava broken on master
 Key: BEAM-8795
 URL: https://issues.apache.org/jira/browse/BEAM-8795
 Project: Beam
  Issue Type: Bug
  Components: runner-spark
Reporter: Kyle Weaver
Assignee: Kyle Weaver


https://github.com/apache/beam/pull/10147

beam/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/functions/SparkSideInputReader.java:49:
 error: incompatible types: MultimapView is not a functional interface
  o -> Collections.EMPTY_LIST;
  ^




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3419) Enable iterable side input for beam runners.

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3419?focusedWorklogId=347041&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347041
 ]

ASF GitHub Bot logged work on BEAM-3419:


Author: ASF GitHub Bot
Created on: 20/Nov/19 22:42
Start Date: 20/Nov/19 22:42
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10147: [BEAM-3419] Flesh 
out iterable side inputs and key enumeration for multimaps in shared libraries
URL: https://github.com/apache/beam/pull/10147#issuecomment-556500524
 
 
   ```
   
beam/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/functions/SparkSideInputReader.java:49:
 error: incompatible types: MultimapView is not a functional interface
 o -> Collections.EMPTY_LIST;
 ^
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347041)
Time Spent: 3.5h  (was: 3h 20m)

> Enable iterable side input for beam runners.
> 
>
> Key: BEAM-3419
> URL: https://issues.apache.org/jira/browse/BEAM-3419
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8504) BigQueryIO DIRECT_READ is broken

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8504?focusedWorklogId=347039&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347039
 ]

ASF GitHub Bot logged work on BEAM-8504:


Author: ASF GitHub Bot
Created on: 20/Nov/19 22:41
Start Date: 20/Nov/19 22:41
Worklog Time Spent: 10m 
  Work Description: kanterov commented on issue #10168: [BEAM-8504] 
Cherry-pick into release-2.17.0
URL: https://github.com/apache/beam/pull/10168#issuecomment-556499571
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347039)
Time Spent: 2h 10m  (was: 2h)

> BigQueryIO DIRECT_READ is broken
> 
>
> Key: BEAM-8504
> URL: https://issues.apache.org/jira/browse/BEAM-8504
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.16.0, 2.17.0
>Reporter: Gleb Kanterov
>Assignee: Aryan Naraghi
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT 
> (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with 
> 2.15.0.
> {code}
> java.io.IOException: Failed to start reading from source: name: 
> "projects//locations/eu/streams/"
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Fraction consumed from 
> previous response (0.0) is not less than fraction consumed from current 
> response (0.0).
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206)
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:601)
>   ... 14 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8504) BigQueryIO DIRECT_READ is broken

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8504?focusedWorklogId=347040&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347040
 ]

ASF GitHub Bot logged work on BEAM-8504:


Author: ASF GitHub Bot
Created on: 20/Nov/19 22:41
Start Date: 20/Nov/19 22:41
Worklog Time Spent: 10m 
  Work Description: kanterov commented on issue #10168: [BEAM-8504] 
Cherry-pick into release-2.17.0
URL: https://github.com/apache/beam/pull/10168#issuecomment-556499732
 
 
   Run Java_Examples_Dataflow PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347040)
Time Spent: 2h 20m  (was: 2h 10m)

> BigQueryIO DIRECT_READ is broken
> 
>
> Key: BEAM-8504
> URL: https://issues.apache.org/jira/browse/BEAM-8504
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.16.0, 2.17.0
>Reporter: Gleb Kanterov
>Assignee: Aryan Naraghi
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT 
> (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with 
> 2.15.0.
> {code}
> java.io.IOException: Failed to start reading from source: name: 
> "projects//locations/eu/streams/"
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Fraction consumed from 
> previous response (0.0) is not less than fraction consumed from current 
> response (0.0).
>   at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243)
>   at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206)
>   at 
> org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:601)
>   ... 14 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3419) Enable iterable side input for beam runners.

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3419?focusedWorklogId=347038&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347038
 ]

ASF GitHub Bot logged work on BEAM-3419:


Author: ASF GitHub Bot
Created on: 20/Nov/19 22:41
Start Date: 20/Nov/19 22:41
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10147: [BEAM-3419] Flesh 
out iterable side inputs and key enumeration for multimaps in shared libraries
URL: https://github.com/apache/beam/pull/10147#issuecomment-556499485
 
 
   I think this breaks :runners:spark:compileJava on master. @lukecwik can you 
please take a look?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347038)
Time Spent: 3h 20m  (was: 3h 10m)

> Enable iterable side input for beam runners.
> 
>
> Key: BEAM-3419
> URL: https://issues.apache.org/jira/browse/BEAM-3419
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-core
>Reporter: Robert Bradshaw
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8794) Projects should be handled by an IOPushDownRule before applying AggregateProjectMergeRule

2019-11-20 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8794:

Status: Open  (was: Triage Needed)

> Projects should be handled by an IOPushDownRule before applying 
> AggregateProjectMergeRule
> -
>
> Key: BEAM-8794
> URL: https://issues.apache.org/jira/browse/BEAM-8794
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> It is more efficient to push-down projected fields at an IO level (vs merging 
> with an Aggregate), when supported.
> When running queries like:
> {code:java}
> select SUM(score) as total_score from  group by name{code}
> Projects get merged with an aggregate, as a result Calc (after an 
> IOSourceRel) projects all fields and BeamIOPushDown rule does know what 
> fields can be dropped, thus not dropping any.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-4776) Java PortableRunner should support metrics

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4776?focusedWorklogId=347035&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347035
 ]

ASF GitHub Bot logged work on BEAM-4776:


Author: ASF GitHub Bot
Created on: 20/Nov/19 22:35
Start Date: 20/Nov/19 22:35
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10105: [BEAM-4776] Add 
metrics support to Java PortableRunner
URL: https://github.com/apache/beam/pull/10105#issuecomment-556493149
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347035)
Time Spent: 5h 10m  (was: 5h)

> Java PortableRunner should support metrics
> --
>
> Key: BEAM-4776
> URL: https://issues.apache.org/jira/browse/BEAM-4776
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Eugene Kirpichov
>Assignee: Michal Walenia
>Priority: Major
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> BEAM-4775 concerns adding metrics to the JobService API; the current issue is 
> about making PortableRunner understand them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-4776) Java PortableRunner should support metrics

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4776?focusedWorklogId=347037&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347037
 ]

ASF GitHub Bot logged work on BEAM-4776:


Author: ASF GitHub Bot
Created on: 20/Nov/19 22:35
Start Date: 20/Nov/19 22:35
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10105: [BEAM-4776] Add 
metrics support to Java PortableRunner
URL: https://github.com/apache/beam/pull/10105#issuecomment-556493528
 
 
   Run Java Flink PortableValidatesRunner Streaming
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347037)
Time Spent: 5.5h  (was: 5h 20m)

> Java PortableRunner should support metrics
> --
>
> Key: BEAM-4776
> URL: https://issues.apache.org/jira/browse/BEAM-4776
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Eugene Kirpichov
>Assignee: Michal Walenia
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> BEAM-4775 concerns adding metrics to the JobService API; the current issue is 
> about making PortableRunner understand them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-4776) Java PortableRunner should support metrics

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4776?focusedWorklogId=347036&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347036
 ]

ASF GitHub Bot logged work on BEAM-4776:


Author: ASF GitHub Bot
Created on: 20/Nov/19 22:35
Start Date: 20/Nov/19 22:35
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10105: [BEAM-4776] Add 
metrics support to Java PortableRunner
URL: https://github.com/apache/beam/pull/10105#issuecomment-556493435
 
 
   Run Java Flink PortableValidatesRunner Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347036)
Time Spent: 5h 20m  (was: 5h 10m)

> Java PortableRunner should support metrics
> --
>
> Key: BEAM-4776
> URL: https://issues.apache.org/jira/browse/BEAM-4776
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Eugene Kirpichov
>Assignee: Michal Walenia
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> BEAM-4775 concerns adding metrics to the JobService API; the current issue is 
> about making PortableRunner understand them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8794) Projects should be handled by an IOPushDownRule before applying AggregateProjectMergeRule

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8794?focusedWorklogId=347029&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347029
 ]

ASF GitHub Bot logged work on BEAM-8794:


Author: ASF GitHub Bot
Created on: 20/Nov/19 22:11
Start Date: 20/Nov/19 22:11
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10180: [BEAM-8794] 
Conditional aggregate project merge
URL: https://github.com/apache/beam/pull/10180#issuecomment-556466893
 
 
   Run sql postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347029)
Time Spent: 0.5h  (was: 20m)

> Projects should be handled by an IOPushDownRule before applying 
> AggregateProjectMergeRule
> -
>
> Key: BEAM-8794
> URL: https://issues.apache.org/jira/browse/BEAM-8794
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> It is more efficient to push-down projected fields at an IO level (vs merging 
> with an Aggregate), when supported.
> When running queries like:
> {code:java}
> select SUM(score) as total_score from  group by name{code}
> Projects get merged with an aggregate, as a result Calc (after an 
> IOSourceRel) projects all fields and BeamIOPushDown rule does know what 
> fields can be dropped, thus not dropping any.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-4663) Implement Cost calculations for Cost-Based Optimization (CBO)

2019-11-20 Thread Kai Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Jiang resolved BEAM-4663.
-
Fix Version/s: Not applicable
   Resolution: Invalid

> Implement Cost calculations for Cost-Based Optimization (CBO) 
> --
>
> Key: BEAM-4663
> URL: https://issues.apache.org/jira/browse/BEAM-4663
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kai Jiang
>Assignee: Kai Jiang
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> To support CBO, we should implement methods in each Beam*Rel.java.  
> computeSelfCost(...) as our first step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8746) Allow the local job service to work from inside docker

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8746?focusedWorklogId=347032&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347032
 ]

ASF GitHub Bot logged work on BEAM-8746:


Author: ASF GitHub Bot
Created on: 20/Nov/19 22:14
Start Date: 20/Nov/19 22:14
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #10161: [BEAM-8746] 
Make local job service accessible from external machines
URL: https://github.com/apache/beam/pull/10161#discussion_r348773235
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/local_job_service.py
 ##
 @@ -95,7 +95,7 @@ def create_beam_job(self, preparation_id, job_name, 
pipeline, options):
 
   def start_grpc_server(self, port=0):
 self._server = grpc.server(UnboundedThreadPoolExecutor())
-port = self._server.add_insecure_port('localhost:%d' % port)
+port = self._server.add_insecure_port('[::]:%d' % port)
 
 Review comment:
   > Could this not be handled in the subclass?
   
   Yeah, let me look into the best design for this.  It'd be nice if we took 
this opportunity to make more than the hostname configurable (i.e. provide a 
way to further configure the server before starting).  I'll propose an 
alternative later today. 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347032)
Time Spent: 1h 10m  (was: 1h)

> Allow the local job service to work from inside docker
> --
>
> Key: BEAM-8746
> URL: https://issues.apache.org/jira/browse/BEAM-8746
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently the connection is refused.  It's a simple fix. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8691) Beam Dependency Update Request: com.google.cloud.bigtable:bigtable-client-core

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8691?focusedWorklogId=347031&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347031
 ]

ASF GitHub Bot logged work on BEAM-8691:


Author: ASF GitHub Bot
Created on: 20/Nov/19 22:12
Start Date: 20/Nov/19 22:12
Worklog Time Spent: 10m 
  Work Description: suztomo commented on pull request #10144: [BEAM-8691] 
Upgrading bigtable-client-core to latest 1.12.1
URL: https://github.com/apache/beam/pull/10144
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347031)
Time Spent: 3h 10m  (was: 3h)

> Beam Dependency Update Request: com.google.cloud.bigtable:bigtable-client-core
> --
>
> Key: BEAM-8691
> URL: https://issues.apache.org/jira/browse/BEAM-8691
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
>  - 2019-11-15 19:39:51.523448 
> -
> Please consider upgrading the dependency 
> com.google.cloud.bigtable:bigtable-client-core. 
> The current version is 1.8.0. The latest version is 1.12.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-11-19 21:05:43.901882 
> -
> Please consider upgrading the dependency 
> com.google.cloud.bigtable:bigtable-client-core. 
> The current version is 1.8.0. The latest version is 1.12.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8691) Beam Dependency Update Request: com.google.cloud.bigtable:bigtable-client-core

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8691?focusedWorklogId=347030&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347030
 ]

ASF GitHub Bot logged work on BEAM-8691:


Author: ASF GitHub Bot
Created on: 20/Nov/19 22:12
Start Date: 20/Nov/19 22:12
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10144: [BEAM-8691] 
Upgrading bigtable-client-core to latest 1.12.1
URL: https://github.com/apache/beam/pull/10144#issuecomment-556468925
 
 
   Closing this for now while investigating the errors.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347030)
Time Spent: 3h  (was: 2h 50m)

> Beam Dependency Update Request: com.google.cloud.bigtable:bigtable-client-core
> --
>
> Key: BEAM-8691
> URL: https://issues.apache.org/jira/browse/BEAM-8691
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
>  - 2019-11-15 19:39:51.523448 
> -
> Please consider upgrading the dependency 
> com.google.cloud.bigtable:bigtable-client-core. 
> The current version is 1.8.0. The latest version is 1.12.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-11-19 21:05:43.901882 
> -
> Please consider upgrading the dependency 
> com.google.cloud.bigtable:bigtable-client-core. 
> The current version is 1.8.0. The latest version is 1.12.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=347027&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347027
 ]

ASF GitHub Bot logged work on BEAM-8343:


Author: ASF GitHub Bot
Created on: 20/Nov/19 22:02
Start Date: 20/Nov/19 22:02
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10060: [BEAM-8343] [SQL] 
Updated the cost model to favor IO with push-down.
URL: https://github.com/apache/beam/pull/10060#issuecomment-556457627
 
 
   CC: @TheNeuralBit 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347027)
Time Spent: 7h  (was: 6h 50m)

> Add means for IO APIs to support predicate and/or project push-down when 
> running SQL pipelines
> --
>
> Key: BEAM-8343
> URL: https://issues.apache.org/jira/browse/BEAM-8343
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> The objective is to create a universal way for Beam SQL IO APIs to support 
> predicate/project push-down.
>  A proposed way to achieve that is by introducing an interface responsible 
> for identifying what portion(s) of a Calc can be moved down to IO layer. 
> Also, adding following methods to a BeamSqlTable interface to pass necessary 
> parameters to IO APIs:
>  - BeamSqlTableFilter constructFilter(List filter)
>  - ProjectSupport supportsProjects()
>  - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
> List fieldNames)
>   
> ProjectSupport is an enum with the following options:
>  * NONE
>  * WITHOUT_FIELD_REORDERING
>  * WITH_FIELD_REORDERING
>  
> Design doc 
> [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8794) Projects should be handled by an IOPushDownRule before applying AggregateProjectMergeRule

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8794?focusedWorklogId=347026&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347026
 ]

ASF GitHub Bot logged work on BEAM-8794:


Author: ASF GitHub Bot
Created on: 20/Nov/19 22:01
Start Date: 20/Nov/19 22:01
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10180: [BEAM-8794] 
Conditional aggregate project merge
URL: https://github.com/apache/beam/pull/10180#issuecomment-556456864
 
 
   CC: @TheNeuralBit 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347026)
Time Spent: 20m  (was: 10m)

> Projects should be handled by an IOPushDownRule before applying 
> AggregateProjectMergeRule
> -
>
> Key: BEAM-8794
> URL: https://issues.apache.org/jira/browse/BEAM-8794
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> It is more efficient to push-down projected fields at an IO level (vs merging 
> with an Aggregate), when supported.
> When running queries like:
> {code:java}
> select SUM(score) as total_score from  group by name{code}
> Projects get merged with an aggregate, as a result Calc (after an 
> IOSourceRel) projects all fields and BeamIOPushDown rule does know what 
> fields can be dropped, thus not dropping any.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8691) Beam Dependency Update Request: com.google.cloud.bigtable:bigtable-client-core

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8691?focusedWorklogId=347022&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347022
 ]

ASF GitHub Bot logged work on BEAM-8691:


Author: ASF GitHub Bot
Created on: 20/Nov/19 21:50
Start Date: 20/Nov/19 21:50
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10144: [BEAM-8691] 
Upgrading bigtable-client-core to latest 1.12.1
URL: https://github.com/apache/beam/pull/10144#issuecomment-556444847
 
 
   Run Java PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347022)
Time Spent: 2h 50m  (was: 2h 40m)

> Beam Dependency Update Request: com.google.cloud.bigtable:bigtable-client-core
> --
>
> Key: BEAM-8691
> URL: https://issues.apache.org/jira/browse/BEAM-8691
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
>  - 2019-11-15 19:39:51.523448 
> -
> Please consider upgrading the dependency 
> com.google.cloud.bigtable:bigtable-client-core. 
> The current version is 1.8.0. The latest version is 1.12.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-11-19 21:05:43.901882 
> -
> Please consider upgrading the dependency 
> com.google.cloud.bigtable:bigtable-client-core. 
> The current version is 1.8.0. The latest version is 1.12.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8691) Beam Dependency Update Request: com.google.cloud.bigtable:bigtable-client-core

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8691?focusedWorklogId=347021&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347021
 ]

ASF GitHub Bot logged work on BEAM-8691:


Author: ASF GitHub Bot
Created on: 20/Nov/19 21:50
Start Date: 20/Nov/19 21:50
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10144: [BEAM-8691] 
Upgrading bigtable-client-core to latest 1.12.1
URL: https://github.com/apache/beam/pull/10144#issuecomment-556444783
 
 
   @lukecwik The post commit seems to have detected an issue. Thank you for 
advice.
   
   ```
Failure message was: java.lang.AbstractMethodError: 
com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.needsCredentials()Z
at com.google.api.gax.rpc.ClientContext.create(ClientContext.java:157)
at 
com.google.cloud.bigquery.storage.v1beta1.stub.EnhancedBigQueryStorageStub.create(EnhancedBigQueryStorageStub.java:89)
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347021)
Time Spent: 2h 40m  (was: 2.5h)

> Beam Dependency Update Request: com.google.cloud.bigtable:bigtable-client-core
> --
>
> Key: BEAM-8691
> URL: https://issues.apache.org/jira/browse/BEAM-8691
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
>  - 2019-11-15 19:39:51.523448 
> -
> Please consider upgrading the dependency 
> com.google.cloud.bigtable:bigtable-client-core. 
> The current version is 1.8.0. The latest version is 1.12.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-11-19 21:05:43.901882 
> -
> Please consider upgrading the dependency 
> com.google.cloud.bigtable:bigtable-client-core. 
> The current version is 1.8.0. The latest version is 1.12.1 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8492) Python typehints: don't try to strip_iterable from None

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8492?focusedWorklogId=347019&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347019
 ]

ASF GitHub Bot logged work on BEAM-8492:


Author: ASF GitHub Bot
Created on: 20/Nov/19 21:46
Start Date: 20/Nov/19 21:46
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9895: [BEAM-8492] Allow None, 
Optional return hints for DoFn.process and friends
URL: https://github.com/apache/beam/pull/9895#issuecomment-556440849
 
 
   I've fixed some linter errors. Should be okay to review now.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347019)
Time Spent: 50m  (was: 40m)

> Python typehints: don't try to strip_iterable from None
> ---
>
> Key: BEAM-8492
> URL: https://issues.apache.org/jira/browse/BEAM-8492
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The return value of DoFn.process can be an iterable of elements or None.
> Handle the case when the output type hint of process is None.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8658) Optionally set artifact staging port in FlinkUberJarJobServer

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8658?focusedWorklogId=347017&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347017
 ]

ASF GitHub Bot logged work on BEAM-8658:


Author: ASF GitHub Bot
Created on: 20/Nov/19 21:38
Start Date: 20/Nov/19 21:38
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10163: [BEAM-8658] 
[BEAM-8781] Optionally set jar and artifact staging port …
URL: https://github.com/apache/beam/pull/10163#issuecomment-556403050
 
 
   Does anyone know if we publish nightly snapshots of the Flink job server jar 
anywhere?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347017)
Time Spent: 40m  (was: 0.5h)

> Optionally set artifact staging port in FlinkUberJarJobServer
> -
>
> Key: BEAM-8658
> URL: https://issues.apache.org/jira/browse/BEAM-8658
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> In certain network environments, port forwarding is necessary for our GRPC 
> servers, such as the artifact staging server. Currently, the port for 
> FlinkUberJarJobServer's artifact staging server is chosen randomly (0). We 
> will need to let the user choose it if they are to forward that port.
> https://github.com/apache/beam/blob/802e7cd86024c21d7b2eeb45f0e7c8e370661610/sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server.py#L129
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7390) Colab examples for aggregation transforms (Python)

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7390?focusedWorklogId=347016&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347016
 ]

ASF GitHub Bot logged work on BEAM-7390:


Author: ASF GitHub Bot
Created on: 20/Nov/19 21:36
Start Date: 20/Nov/19 21:36
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on issue #10174: [BEAM-7390] Add 
code snippet for Sample
URL: https://github.com/apache/beam/pull/10174#issuecomment-556429558
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347016)
Time Spent: 3h 20m  (was: 3h 10m)

> Colab examples for aggregation transforms (Python)
> --
>
> Key: BEAM-7390
> URL: https://issues.apache.org/jira/browse/BEAM-7390
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Merge aggregation Colabs into the transform catalog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7390) Colab examples for aggregation transforms (Python)

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7390?focusedWorklogId=347015&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347015
 ]

ASF GitHub Bot logged work on BEAM-7390:


Author: ASF GitHub Bot
Created on: 20/Nov/19 21:36
Start Date: 20/Nov/19 21:36
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on issue #10174: [BEAM-7390] Add 
code snippet for Sample
URL: https://github.com/apache/beam/pull/10174#issuecomment-556429558
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347015)
Time Spent: 3h 10m  (was: 3h)

> Colab examples for aggregation transforms (Python)
> --
>
> Key: BEAM-7390
> URL: https://issues.apache.org/jira/browse/BEAM-7390
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Merge aggregation Colabs into the transform catalog



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8787) Python setup issues

2019-11-20 Thread Tomo Suzuki (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978745#comment-16978745
 ] 

Tomo Suzuki commented on BEAM-8787:
---

I'm feeling our python3.6 installation is broken:


{noformat}
suztomo@suxtomo24:~$ which python3.6
/usr/bin/python3.6
suztomo@suxtomo24:~$ python3.6 --version
Python 3.6.8
suztomo@suxtomo24:~$ python3.6 -m pip install foo
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/lib/python3/dist-packages/pip/__main__.py", line 16, in 
    from pip._internal import main as _main  # isort:skip # noqa
  File "/usr/lib/python3/dist-packages/pip/_internal/__init__.py", line 40, in 

    from pip._internal.cli.autocompletion import autocomplete
  File "/usr/lib/python3/dist-packages/pip/_internal/cli/autocompletion.py", 
line 8, in 
    from pip._internal.cli.main_parser import create_main_parser
  File "/usr/lib/python3/dist-packages/pip/_internal/cli/main_parser.py", line 
8, in 
    from pip._internal.cli import cmdoptions
  File "/usr/lib/python3/dist-packages/pip/_internal/cli/cmdoptions.py", line 
17, in 
    from pip._internal.locations import USER_CACHE_DIR, src_prefix
  File "/usr/lib/python3/dist-packages/pip/_internal/locations.py", line 10, in 

    from distutils import sysconfig as distutils_sysconfig
ImportError: cannot import name 'sysconfig'
{noformat}

Found http://b/119097564


> Python setup issues
> ---
>
> Key: BEAM-8787
> URL: https://issues.apache.org/jira/browse/BEAM-8787
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Affects Versions: 2.16.0
> Environment: debian x86 (gLinux)
>Reporter: Elliotte Rusty Harold
>Priority: Major
>
> This could be an issue with incomplete or inaccurate contributing docs. tldr; 
> `./gradlew check` fails on Debian after initial checkout.
> The docs say that one should first run:
> sudo apt-get install \
> openjdk-8-jdk \
> python-setuptools \
> python-pip \
> virtualenv
> but even after running this pieces are missing. I'm still debugging exactly 
> what's missing but the symptoms look like this:
> > Task :sdks:python:test-suites:tox:py35:setupVirtualenv FAILED
> The path python3.5 (from --python=python3.5) does not exist
> > Task :sdks:python:test-suites:tox:py36:setupVirtualenv FAILED
> [ant:fmpp] Traceback (most recent call last):
> [ant:fmpp]   File "/usr/lib/python3/dist-packages/virtualenv.py", line 25, in 
> 
> [ant:fmpp] import distutils.sysconfig
> [ant:fmpp] ModuleNotFoundError: No module named 'distutils.sysconfig'
> ...
> FAILURE: Build completed with 2 failures.
> 1: Task failed with an exception.
> ---
> * What went wrong:
> Execution failed for task ':sdks:python:test-suites:tox:py35:setupVirtualenv'.
> > Process 'command 'virtualenv'' finished with non-zero exit value 3
> Indeed there is no Python 3.5 on this system:
> gnome-user-share  python2.6
> gnome-vfs-2.0 python2.7
> gnupg python3
> gnupg2python3.6
> gold-ld   python3.7
> goobuntu-config-tools python3.8
> But nowhere in the setup docs do we say that Python 3.5 is required to build 
> this. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8490) Python typehints: properly resolve empty dict type

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8490?focusedWorklogId=347014&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347014
 ]

ASF GitHub Bot logged work on BEAM-8490:


Author: ASF GitHub Bot
Created on: 20/Nov/19 21:30
Start Date: 20/Nov/19 21:30
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9894: [BEAM-8490] Fix 
instance_to_type for empty containers
URL: https://github.com/apache/beam/pull/9894#issuecomment-556422501
 
 
   R: @kennknowles 
   CC: @robertwb 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347014)
Time Spent: 20m  (was: 10m)

> Python typehints: properly resolve empty dict type
> --
>
> Key: BEAM-8490
> URL: https://issues.apache.org/jira/browse/BEAM-8490
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently:
> {code}
> trivial_inference.instance_to_type({})
> {code}
> returns
> {code}
> Dict[Union[], Union[]]
> {code}
> instead of
> {code}
> Dict[Any,Any]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8794) Projects should be handled by an IOPushDownRule before applying AggregateProjectMergeRule

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8794?focusedWorklogId=347013&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347013
 ]

ASF GitHub Bot logged work on BEAM-8794:


Author: ASF GitHub Bot
Created on: 20/Nov/19 21:29
Start Date: 20/Nov/19 21:29
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10180: [BEAM-8794] 
Conditional aggregate project merge
URL: https://github.com/apache/beam/pull/10180#issuecomment-556421814
 
 
   R: @apilloud 
   CC: @amaliujia 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347013)
Remaining Estimate: 0h
Time Spent: 10m

> Projects should be handled by an IOPushDownRule before applying 
> AggregateProjectMergeRule
> -
>
> Key: BEAM-8794
> URL: https://issues.apache.org/jira/browse/BEAM-8794
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It is more efficient to push-down projected fields at an IO level (vs merging 
> with an Aggregate), when supported.
> When running queries like:
> {code:java}
> select SUM(score) as total_score from  group by name{code}
> Projects get merged with an aggregate, as a result Calc (after an 
> IOSourceRel) projects all fields and BeamIOPushDown rule does know what 
> fields can be dropped, thus not dropping any.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8269) IOTypehints.from_callable doesn't convert native type hints to Beam

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8269?focusedWorklogId=347012&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347012
 ]

ASF GitHub Bot logged work on BEAM-8269:


Author: ASF GitHub Bot
Created on: 20/Nov/19 21:29
Start Date: 20/Nov/19 21:29
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9602: [BEAM-8269] Convert Py3 
type hints to Beam types
URL: https://github.com/apache/beam/pull/9602#issuecomment-556421486
 
 
   @robertwb PTAL, changes:
   - Commented out one test case with a TODO(BEAM-8492) comment.
   - Added a _LOGGER.info message when converting an unknown typing module type 
to typehints.Any.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347012)
Time Spent: 1h 10m  (was: 1h)

> IOTypehints.from_callable doesn't convert native type hints to Beam
> ---
>
> Key: BEAM-8269
> URL: https://issues.apache.org/jira/browse/BEAM-8269
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Users typically write type hints using typing module types. We should allow 
> that, be internally convert these type to Beam module types for now.
> In the future, Beam should stop using these internal types (BEAM-8156).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8794) Projects should be handled by an IOPushDownRule before applying AggregateProjectMergeRule

2019-11-20 Thread Kirill Kozlov (Jira)
Kirill Kozlov created BEAM-8794:
---

 Summary: Projects should be handled by an IOPushDownRule before 
applying AggregateProjectMergeRule
 Key: BEAM-8794
 URL: https://issues.apache.org/jira/browse/BEAM-8794
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Kirill Kozlov
Assignee: Kirill Kozlov


It is more efficient to push-down projected fields at an IO level (vs merging 
with an Aggregate), when supported.

When running queries like:
{code:java}
select SUM(score) as total_score from  group by name{code}
Projects get merged with an aggregate, as a result Calc (after an IOSourceRel) 
projects all fields and BeamIOPushDown rule does know what fields can be 
dropped, thus not dropping any.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-4132) Element type inference doesn't work for multi-output DoFns

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4132?focusedWorklogId=347010&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347010
 ]

ASF GitHub Bot logged work on BEAM-4132:


Author: ASF GitHub Bot
Created on: 20/Nov/19 21:21
Start Date: 20/Nov/19 21:21
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10142: [BEAM-4132] Set 
multi-output PCollections types to Any
URL: https://github.com/apache/beam/pull/10142#issuecomment-556412724
 
 
   R: @kennknowles 
   CC: @robertwb 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347010)
Time Spent: 2h 40m  (was: 2.5h)

> Element type inference doesn't work for multi-output DoFns
> --
>
> Key: BEAM-4132
> URL: https://issues.apache.org/jira/browse/BEAM-4132
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.4.0
>Reporter: Chuan Yu Foo
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> TLDR: if you have a multi-output DoFn, then the non-main PCollections with 
> incorrectly have their element types set to None. This affects type checking 
> for pipelines involving these PCollections.
> Minimal example:
> {code}
> import apache_beam as beam
> class TripleDoFn(beam.DoFn):
>   def process(self, elem):
> yield_elem
> if elem % 2 == 0:
>   yield beam.pvalue.TaggedOutput('ten_times', elem * 10)
> if elem % 3 == 0:
>   yield beam.pvalue.TaggedOutput('hundred_times', elem * 100)
>   
> @beam.typehints.with_input_types(int)
> @beam.typehints.with_output_types(int)
> class MultiplyBy(beam.DoFn):
>   def __init__(self, multiplier):
> self._multiplier = multiplier
>   def process(self, elem):
> return elem * self._multiplier
>   
> def main():
>   with beam.Pipeline() as p:
> x, a, b = (
>   p
>   | 'Create' >> beam.Create([1, 2, 3])
>   | 'TripleDo' >> beam.ParDo(TripleDoFn()).with_outputs(
> 'ten_times', 'hundred_times', main='main_output'))
> _ = a | 'MultiplyBy2' >> beam.ParDo(MultiplyBy(2))
> if __name__ == '__main__':
>   main()
> {code}
> Running this yields the following error:
> {noformat}
> apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 
> 'MultiplyBy2': requires  but got None for elem
> {noformat}
> Replacing {{a}} with {{b}} yields the same error. Replacing {{a}} with {{x}} 
> instead yields the following error:
> {noformat}
> apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 
> 'MultiplyBy2': requires  but got Union[TaggedOutput, int] for elem
> {noformat}
> I would expect Beam to correctly infer that {{a}} and {{b}} have element 
> types of {{int}} rather than {{None}}, and I would also expect Beam to 
> correctly figure out that the element types of {{x}} are compatible with 
> {{int}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines

2019-11-20 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov resolved BEAM-8343.
-
Fix Version/s: 2.18.0
   Resolution: Fixed

> Add means for IO APIs to support predicate and/or project push-down when 
> running SQL pipelines
> --
>
> Key: BEAM-8343
> URL: https://issues.apache.org/jira/browse/BEAM-8343
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> The objective is to create a universal way for Beam SQL IO APIs to support 
> predicate/project push-down.
>  A proposed way to achieve that is by introducing an interface responsible 
> for identifying what portion(s) of a Calc can be moved down to IO layer. 
> Also, adding following methods to a BeamSqlTable interface to pass necessary 
> parameters to IO APIs:
>  - BeamSqlTableFilter constructFilter(List filter)
>  - ProjectSupport supportsProjects()
>  - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, 
> List fieldNames)
>   
> ProjectSupport is an enum with the following options:
>  * NONE
>  * WITHOUT_FIELD_REORDERING
>  * WITH_FIELD_REORDERING
>  
> Design doc 
> [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8658) Optionally set artifact staging port in FlinkUberJarJobServer

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8658?focusedWorklogId=347008&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347008
 ]

ASF GitHub Bot logged work on BEAM-8658:


Author: ASF GitHub Bot
Created on: 20/Nov/19 21:12
Start Date: 20/Nov/19 21:12
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10163: [BEAM-8658] 
[BEAM-8781] Optionally set jar and artifact staging port …
URL: https://github.com/apache/beam/pull/10163#issuecomment-556403050
 
 
   For my reference, does anyone know if we publish nightly snapshots of the 
Flink job server jar anywhere?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347008)
Time Spent: 0.5h  (was: 20m)

> Optionally set artifact staging port in FlinkUberJarJobServer
> -
>
> Key: BEAM-8658
> URL: https://issues.apache.org/jira/browse/BEAM-8658
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In certain network environments, port forwarding is necessary for our GRPC 
> servers, such as the artifact staging server. Currently, the port for 
> FlinkUberJarJobServer's artifact staging server is chosen randomly (0). We 
> will need to let the user choose it if they are to forward that port.
> https://github.com/apache/beam/blob/802e7cd86024c21d7b2eeb45f0e7c8e370661610/sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server.py#L129
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-4776) Java PortableRunner should support metrics

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4776?focusedWorklogId=347007&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347007
 ]

ASF GitHub Bot logged work on BEAM-4776:


Author: ASF GitHub Bot
Created on: 20/Nov/19 21:12
Start Date: 20/Nov/19 21:12
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10105: [BEAM-4776] Add 
metrics support to Java PortableRunner
URL: https://github.com/apache/beam/pull/10105#issuecomment-556402486
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347007)
Time Spent: 5h  (was: 4h 50m)

> Java PortableRunner should support metrics
> --
>
> Key: BEAM-4776
> URL: https://issues.apache.org/jira/browse/BEAM-4776
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Eugene Kirpichov
>Assignee: Michal Walenia
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> BEAM-4775 concerns adding metrics to the JobService API; the current issue is 
> about making PortableRunner understand them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8747) Remove Unused non-vendored Guava compile dependencies

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8747?focusedWorklogId=347006&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347006
 ]

ASF GitHub Bot logged work on BEAM-8747:


Author: ASF GitHub Bot
Created on: 20/Nov/19 21:10
Start Date: 20/Nov/19 21:10
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10172: [BEAM-8747] Guava 
dependency cleanup
URL: https://github.com/apache/beam/pull/10172#issuecomment-556400195
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347006)
Time Spent: 50m  (was: 40m)

> Remove Unused non-vendored Guava compile dependencies
> -
>
> Key: BEAM-8747
> URL: https://issues.apache.org/jira/browse/BEAM-8747
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
> Attachments: Guava used as fully-qualified class name.png
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> [~kenn] says:
> BeamModulePlugin just contains lists of versions to ease coordination across 
> Beam modules, but mostly does not create dependencies. Most of Beam's modules 
> only depend on a few things there. For example Guava is not a core 
> dependency, but here is where it is actually depended upon:
> $ find . -name build.gradle | xargs grep library.java.guava
> ./sdks/java/core/build.gradle:  shadowTest library.java.guava_testlib
> ./sdks/java/extensions/sql/jdbc/build.gradle:  compile library.java.guava
> ./sdks/java/io/google-cloud-platform/build.gradle:  compile library.java.guava
> ./sdks/java/io/kinesis/build.gradle:  testCompile library.java.guava_testlib
> These results appear to be misleading. Grepping for 'import 
> com.google.common', I see this as the actual state of things:
>  - GCP connector does not appear to actually depend on Guava in compile scope
>  - The Beam SQL JDBC driver does not appear to actually depend on Guava in 
> compile scope
>  - The Dataflow Java worker does depend on Guava at compile scope but has 
> incorrect dependencies (and it probably shouldn't)
>  - KinesisIO does depend on Guava at compile scope but has incorrect 
> dependencies (Kinesis libs have Guava on API surface so it is OK here, but 
> should be correctly declared)
>  - ZetaSQL translator does depend on Guava at compile scope but has incorrect 
> dependencies (ZetaSQL has it on API surface so it is OK here, but should be 
> correctly declared)
> We used to have an analysis that prevented this class of error.
> Once the errors are fixed, the guava_version is simply a version that we have 
> discovered that seems to work for both Kinesis and ZetaSQL, libraries we do 
> not control. Kinesis producer is built against 18.0. Kinesis client against 
> 26.0-jre. ZetaSQL against 26.0-android.
> (or maybe I messed up in my analysis)
> Kenn



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8793) installGcpTest task flakes

2019-11-20 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8793:
--
Summary: installGcpTest task flakes  (was: 
:sdks:python:test-suites:direct:py3x:installGcpTest flakes)

> installGcpTest task flakes
> --
>
> Key: BEAM-8793
> URL: https://issues.apache.org/jira/browse/BEAM-8793
> Project: Beam
>  Issue Type: Improvement
>  Components: test-failures
>Reporter: Kyle Weaver
>Priority: Major
>
> I've also seen this happen with 
> :sdks:python:test-suites:portable:py37:installGcpTest.
> 11:01:38 > Task :sdks:python:test-suites:direct:py35:installGcpTest FAILED
> 11:01:38 Obtaining 
> file:///home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python
> 11:01:38 ERROR: Command errored out with exit status 1:
> 11:01:38  command: 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/bin/python3.5
>  -c 'import sys, setuptools, tokenize; sys.argv[0] = 
> '"'"'/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py'"'"';
>  
> __file__='"'"'/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py'"'"';f=getattr(tokenize,
>  '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', 
> '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' 
> egg_info
> 11:01:38  cwd: 
> /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/
> 11:01:38 Complete output (37 lines):
> 11:01:38 Traceback (most recent call last):
> 11:01:38   File "", line 1, in 
> 11:01:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py",
>  line 264, in 
> 11:01:38 'test': generate_protos_first(test),
> 11:01:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/__init__.py",
>  line 144, in setup
> 11:01:38 _install_setup_requires(attrs)
> 11:01:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/__init__.py",
>  line 139, in _install_setup_requires
> 11:01:38 dist.fetch_build_eggs(dist.setup_requires)
> 11:01:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/dist.py",
>  line 720, in fetch_build_eggs
> 11:01:38 replace_conflicting=True,
> 11:01:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py",
>  line 782, in resolve
> 11:01:38 replace_conflicting=replace_conflicting
> 11:01:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py",
>  line 1065, in best_match
> 11:01:38 return self.obtain(req, installer)
> 11:01:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py",
>  line 1077, in obtain
> 11:01:38 return installer(requirement)
> 11:01:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/dist.py",
>  line 787, in fetch_build_egg
> 11:01:38 return cmd.easy_install(req)
> 11:01:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py",
>  line 679, in easy_install
> 11:01:38 return self.install_item(spec, dist.location, tmpdir, deps)
> 11:01:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py",
>  line 705, in install_item
> 11:01:38 dists = self.install_eggs(spec, download, tmpdir)
> 11:01:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py",
>  line 855, in install_eggs
> 11:01:38 return [self.install_wheel(dist_filename, tmpdir)]
> 11:01:38   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py",
>  line 1073, in install_wheel
> 11:01:38 os.path.dirname(destination)
> 11:01:38   File "/usr/lib/python3.5/distutils/cmd.py"

[jira] [Work logged] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8651?focusedWorklogId=347005&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347005
 ]

ASF GitHub Bot logged work on BEAM-8651:


Author: ASF GitHub Bot
Created on: 20/Nov/19 21:03
Start Date: 20/Nov/19 21:03
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10167: [BEAM-8651] Guard 
pickling operations with a lock to prevent race condition in module imports.
URL: https://github.com/apache/beam/pull/10167#issuecomment-556392796
 
 
   Thanks, @ibzib ! All tests besides Direct Runner tests passed in the 
previous run: https://scans.gradle.com/s/eptpl337kz4ck. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347005)
Time Spent: 2h  (was: 1h 50m)

> Python 3 portable pipelines sometimes fail with errors in 
> StockUnpickler.find_class()
> -
>
> Key: BEAM-8651
> URL: https://issues.apache.org/jira/browse/BEAM-8651
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
> Fix For: 2.17.0
>
> Attachments: beam8651.py
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Several Beam users [1,2] reported an error which happens on Python 3 in 
> StockUnpickler.find_class.
> So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink 
> and Dataflow runners. On Dataflow runner so far I have seen this in streaming 
> pipelines only, which use portable SDK worker.
> Typical stack trace:
> {noformat}
> File 
> "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", 
> line 1148, in _create_pardo_operation
>     dofn_data = pickler.loads(serialized_fn)  
>  
>   File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, 
> in loads
>     return dill.loads(s)  
>  
>   File "python3.5/site-packages/dill/_dill.py", line 317, in loads
>  
>     return load(file, ignore) 
>  
>   File "python3.5/site-packages/dill/_dill.py", line 305, in load 
>  
>     obj = pik.load()  
>  
>   File "python3.5/site-packages/dill/_dill.py", line 474, in find_class   
>  
>     return StockUnpickler.find_class(self, module, name)  
>  
> AttributeError: Can't get attribute 'ClassName' on  'python3.5/site-packages/filename.py'>
> {noformat}
> According to Guenther from [1]:
> {quote}
> This looks exactly like a race condition that we've encountered on Python
> 3.7.1: There's a bug in some older 3.7.x releases that breaks the
> thread-safety of the unpickler, as concurrent unpickle threads can access a
> module before it has been fully imported. See
> https://bugs.python.org/issue34572 for more information.
> The traceback shows a Python 3.6 venv so this could be a different issue
> (the unpickle bug was introduced in version 3.7). If it's the same bug then
> upgrading to Python 3.7.3 or higher should fix that issue. One potential
> workaround is to ensure that all of the modules get imported during the
> initialization of the sdk_worker, as this bug only affects imports done by
> the unpickler.
> {quote}
> Opening this for visibility. Current open questions are:
> 1. Find a minimal example to reproduce this issue.
> 2. Figure out whether users are still affected by this issue on Python 3.7.3.
> 3. Communicate a workarounds for 3.5, 3.6 users affected by this.
> [1] 
> https://lists.apache.org/thread.html/5581ddfcf6d2ae10d25b834b8a61ebee265ffbcf650c6ec8d1e69408@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8793) :sdks:python:test-suites:direct:py3x:installGcpTest flakes

2019-11-20 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8793:
--
Description: 
I've also seen this happen with 
:sdks:python:test-suites:portable:py37:installGcpTest.

11:01:38 > Task :sdks:python:test-suites:direct:py35:installGcpTest FAILED
11:01:38 Obtaining 
file:///home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python
11:01:38 ERROR: Command errored out with exit status 1:
11:01:38  command: 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/bin/python3.5
 -c 'import sys, setuptools, tokenize; sys.argv[0] = 
'"'"'/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py'"'"';
 
__file__='"'"'/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py'"'"';f=getattr(tokenize,
 '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', 
'"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info
11:01:38  cwd: 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/
11:01:38 Complete output (37 lines):
11:01:38 Traceback (most recent call last):
11:01:38   File "", line 1, in 
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py",
 line 264, in 
11:01:38 'test': generate_protos_first(test),
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/__init__.py",
 line 144, in setup
11:01:38 _install_setup_requires(attrs)
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/__init__.py",
 line 139, in _install_setup_requires
11:01:38 dist.fetch_build_eggs(dist.setup_requires)
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/dist.py",
 line 720, in fetch_build_eggs
11:01:38 replace_conflicting=True,
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py",
 line 782, in resolve
11:01:38 replace_conflicting=replace_conflicting
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py",
 line 1065, in best_match
11:01:38 return self.obtain(req, installer)
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py",
 line 1077, in obtain
11:01:38 return installer(requirement)
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/dist.py",
 line 787, in fetch_build_egg
11:01:38 return cmd.easy_install(req)
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py",
 line 679, in easy_install
11:01:38 return self.install_item(spec, dist.location, tmpdir, deps)
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py",
 line 705, in install_item
11:01:38 dists = self.install_eggs(spec, download, tmpdir)
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py",
 line 855, in install_eggs
11:01:38 return [self.install_wheel(dist_filename, tmpdir)]
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py",
 line 1073, in install_wheel
11:01:38 os.path.dirname(destination)
11:01:38   File "/usr/lib/python3.5/distutils/cmd.py", line 336, in execute
11:01:38 util.execute(func, args, msg, dry_run=self.dry_run)
11:01:38   File "/usr/lib/python3.5/distutils/util.py", line 301, in execute
11:01:38 func(*args)
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/wheel.py",
 line 101, in install_as_egg
11:01:38 self._install_as_egg(destination_eggdir, zf)
11:01:38   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build

[jira] [Work logged] (BEAM-8746) Allow the local job service to work from inside docker

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8746?focusedWorklogId=347003&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347003
 ]

ASF GitHub Bot logged work on BEAM-8746:


Author: ASF GitHub Bot
Created on: 20/Nov/19 21:01
Start Date: 20/Nov/19 21:01
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #10161: [BEAM-8746] Make 
local job service accessible from external machines
URL: https://github.com/apache/beam/pull/10161#discussion_r348742690
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/local_job_service.py
 ##
 @@ -95,7 +95,7 @@ def create_beam_job(self, preparation_id, job_name, 
pipeline, options):
 
   def start_grpc_server(self, port=0):
 self._server = grpc.server(UnboundedThreadPoolExecutor())
-port = self._server.add_insecure_port('localhost:%d' % port)
+port = self._server.add_insecure_port('[::]:%d' % port)
 
 Review comment:
   Could this not be handled in the subclass? I think the notion of the 
`LocalJobServer` is not to listen on all interfaces. We could make the bind 
address configurable.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347003)
Time Spent: 1h  (was: 50m)

> Allow the local job service to work from inside docker
> --
>
> Key: BEAM-8746
> URL: https://issues.apache.org/jira/browse/BEAM-8746
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently the connection is refused.  It's a simple fix. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8792) Bring back the names of the runtime metrics to "runtime"

2019-11-20 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy updated BEAM-8792:

Status: Open  (was: Triage Needed)

> Bring back the names of the runtime metrics to "runtime"
> 
>
> Key: BEAM-8792
> URL: https://issues.apache.org/jira/browse/BEAM-8792
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Priority: Major
>
> Since this PR ([https://github.com/apache/beam/pull/8941),] the names of the 
> runtime metrics defined in Python load tests pipelines have changed to a 
> combination of metrics namespace and "runtime". This made querying BigQuery 
> table containing the results more difficult. The goal is to bring back the 
> names of the metrics to "runtime" to stay concise with the previous records.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8651?focusedWorklogId=347002&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347002
 ]

ASF GitHub Bot logged work on BEAM-8651:


Author: ASF GitHub Bot
Created on: 20/Nov/19 21:01
Start Date: 20/Nov/19 21:01
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10167: [BEAM-8651] Guard 
pickling operations with a lock to prevent race condition in module imports.
URL: https://github.com/apache/beam/pull/10167#issuecomment-556390364
 
 
   Filed https://issues.apache.org/jira/browse/BEAM-8793 for the test flake.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347002)
Time Spent: 1h 40m  (was: 1.5h)

> Python 3 portable pipelines sometimes fail with errors in 
> StockUnpickler.find_class()
> -
>
> Key: BEAM-8651
> URL: https://issues.apache.org/jira/browse/BEAM-8651
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
> Fix For: 2.17.0
>
> Attachments: beam8651.py
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Several Beam users [1,2] reported an error which happens on Python 3 in 
> StockUnpickler.find_class.
> So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink 
> and Dataflow runners. On Dataflow runner so far I have seen this in streaming 
> pipelines only, which use portable SDK worker.
> Typical stack trace:
> {noformat}
> File 
> "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", 
> line 1148, in _create_pardo_operation
>     dofn_data = pickler.loads(serialized_fn)  
>  
>   File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, 
> in loads
>     return dill.loads(s)  
>  
>   File "python3.5/site-packages/dill/_dill.py", line 317, in loads
>  
>     return load(file, ignore) 
>  
>   File "python3.5/site-packages/dill/_dill.py", line 305, in load 
>  
>     obj = pik.load()  
>  
>   File "python3.5/site-packages/dill/_dill.py", line 474, in find_class   
>  
>     return StockUnpickler.find_class(self, module, name)  
>  
> AttributeError: Can't get attribute 'ClassName' on  'python3.5/site-packages/filename.py'>
> {noformat}
> According to Guenther from [1]:
> {quote}
> This looks exactly like a race condition that we've encountered on Python
> 3.7.1: There's a bug in some older 3.7.x releases that breaks the
> thread-safety of the unpickler, as concurrent unpickle threads can access a
> module before it has been fully imported. See
> https://bugs.python.org/issue34572 for more information.
> The traceback shows a Python 3.6 venv so this could be a different issue
> (the unpickle bug was introduced in version 3.7). If it's the same bug then
> upgrading to Python 3.7.3 or higher should fix that issue. One potential
> workaround is to ensure that all of the modules get imported during the
> initialization of the sdk_worker, as this bug only affects imports done by
> the unpickler.
> {quote}
> Opening this for visibility. Current open questions are:
> 1. Find a minimal example to reproduce this issue.
> 2. Figure out whether users are still affected by this issue on Python 3.7.3.
> 3. Communicate a workarounds for 3.5, 3.6 users affected by this.
> [1] 
> https://lists.apache.org/thread.html/5581ddfcf6d2ae10d25b834b8a61ebee265ffbcf650c6ec8d1e69408@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()

2019-11-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8651?focusedWorklogId=347004&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347004
 ]

ASF GitHub Bot logged work on BEAM-8651:


Author: ASF GitHub Bot
Created on: 20/Nov/19 21:02
Start Date: 20/Nov/19 21:02
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10167: [BEAM-8651] Guard 
pickling operations with a lock to prevent race condition in module imports.
URL: https://github.com/apache/beam/pull/10167#issuecomment-556391540
 
 
   Run Python 3.5 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 347004)
Time Spent: 1h 50m  (was: 1h 40m)

> Python 3 portable pipelines sometimes fail with errors in 
> StockUnpickler.find_class()
> -
>
> Key: BEAM-8651
> URL: https://issues.apache.org/jira/browse/BEAM-8651
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
> Fix For: 2.17.0
>
> Attachments: beam8651.py
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Several Beam users [1,2] reported an error which happens on Python 3 in 
> StockUnpickler.find_class.
> So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink 
> and Dataflow runners. On Dataflow runner so far I have seen this in streaming 
> pipelines only, which use portable SDK worker.
> Typical stack trace:
> {noformat}
> File 
> "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", 
> line 1148, in _create_pardo_operation
>     dofn_data = pickler.loads(serialized_fn)  
>  
>   File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, 
> in loads
>     return dill.loads(s)  
>  
>   File "python3.5/site-packages/dill/_dill.py", line 317, in loads
>  
>     return load(file, ignore) 
>  
>   File "python3.5/site-packages/dill/_dill.py", line 305, in load 
>  
>     obj = pik.load()  
>  
>   File "python3.5/site-packages/dill/_dill.py", line 474, in find_class   
>  
>     return StockUnpickler.find_class(self, module, name)  
>  
> AttributeError: Can't get attribute 'ClassName' on  'python3.5/site-packages/filename.py'>
> {noformat}
> According to Guenther from [1]:
> {quote}
> This looks exactly like a race condition that we've encountered on Python
> 3.7.1: There's a bug in some older 3.7.x releases that breaks the
> thread-safety of the unpickler, as concurrent unpickle threads can access a
> module before it has been fully imported. See
> https://bugs.python.org/issue34572 for more information.
> The traceback shows a Python 3.6 venv so this could be a different issue
> (the unpickle bug was introduced in version 3.7). If it's the same bug then
> upgrading to Python 3.7.3 or higher should fix that issue. One potential
> workaround is to ensure that all of the modules get imported during the
> initialization of the sdk_worker, as this bug only affects imports done by
> the unpickler.
> {quote}
> Opening this for visibility. Current open questions are:
> 1. Find a minimal example to reproduce this issue.
> 2. Figure out whether users are still affected by this issue on Python 3.7.3.
> 3. Communicate a workarounds for 3.5, 3.6 users affected by this.
> [1] 
> https://lists.apache.org/thread.html/5581ddfcf6d2ae10d25b834b8a61ebee265ffbcf650c6ec8d1e69408@%3Cdev.beam.apache.org%3E



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8793) :sdks:python:test-suites:direct:py3x:installGcpTest flakes

2019-11-20 Thread Kyle Weaver (Jira)
Kyle Weaver created BEAM-8793:
-

 Summary: :sdks:python:test-suites:direct:py3x:installGcpTest flakes
 Key: BEAM-8793
 URL: https://issues.apache.org/jira/browse/BEAM-8793
 Project: Beam
  Issue Type: Improvement
  Components: test-failures
Reporter: Kyle Weaver


*11:01:38* >
 *Task :sdks:python:test-suites:direct:py35:installGcpTest*
 FAILED*11:01:38* Obtaining 
[file:///home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python]
*11:01:38* ERROR: Command errored out with exit status 1:*11:01:38*  
command: 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/bin/python3.5
 -c 'import sys, setuptools, tokenize; sys.argv[0] = 
'"'"'/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py'"'"';
 
__file__='"'"'/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py'"'"';f=getattr(tokenize,
 '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', 
'"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' 
egg_info*11:01:38*  cwd: 
/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/*11:01:38*
 Complete output (37 lines):*11:01:38* Traceback (most recent call 
last):*11:01:38*   File "", line 1, in *11:01:38*   
File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py",
 line 264, in *11:01:38* 'test': 
generate_protos_first(test),*11:01:38*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/__init__.py",
 line 144, in setup*11:01:38* _install_setup_requires(attrs)*11:01:38*  
 File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/__init__.py",
 line 139, in _install_setup_requires*11:01:38* 
dist.fetch_build_eggs(dist.setup_requires)*11:01:38*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/dist.py",
 line 720, in fetch_build_eggs*11:01:38* 
replace_conflicting=True,*11:01:38*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py",
 line 782, in resolve*11:01:38* 
replace_conflicting=replace_conflicting*11:01:38*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py",
 line 1065, in best_match*11:01:38* return self.obtain(req, 
installer)*11:01:38*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py",
 line 1077, in obtain*11:01:38* return installer(requirement)*11:01:38* 
  File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/dist.py",
 line 787, in fetch_build_egg*11:01:38* return 
cmd.easy_install(req)*11:01:38*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py",
 line 679, in easy_install*11:01:38* return self.install_item(spec, 
dist.location, tmpdir, deps)*11:01:38*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py",
 line 705, in install_item*11:01:38* dists = self.install_eggs(spec, 
download, tmpdir)*11:01:38*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py",
 line 855, in install_eggs*11:01:38* return 
[self.install_wheel(dist_filename, tmpdir)]*11:01:38*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py",
 line 1073, in install_wheel*11:01:38* 
os.path.dirname(destination)*11:01:38*   File 
"/usr/lib/python3.5/distutils/cmd.py", line 336, in execute*11:01:38* 
util.execute(func, args, msg, dry_run=self.dry_run)*11:01:38*   File 
"/usr/lib/python3.5/distutils/util.py", line 301, in execute*11:01:38* 
func(*args)*11:01:38*   File 
"/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/wheel.py",
 line 101, in install_as_egg*11:01:38

  1   2   3   >