[jira] [Commented] (BEAM-6860) WriteToText crash with "GlobalWindow -> ._IntervalWindowBase"

2020-04-09 Thread Josh Peng (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080225#comment-17080225
 ] 

Josh Peng commented on BEAM-6860:
-

[~bhulette] in the example code I was just having it come from a text file, but 
in my more real-world scenario I am reading from PubSub, that is why I wanted 
to window it before WriteToText. It errors in both scenarios regardless of 
initial input source when doing the WriteToText with a fixed window. Going into 
global window from streaming source becomes unwieldy.

> WriteToText crash with "GlobalWindow -> ._IntervalWindowBase"
> -
>
> Key: BEAM-6860
> URL: https://issues.apache.org/jira/browse/BEAM-6860
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.11.0
> Environment: macOS, DirectRunner, python 2.7.15 via 
> pyenv/pyenv-virtualenv
>Reporter: Henrik
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Labels: newbie
> Fix For: 2.16.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Main error:
> > Cannot convert GlobalWindow to 
> > apache_beam.utils.windowed_value._IntervalWindowBase
> This is very hard for me to debug. Doing a DoPar call before, printing the 
> input, gives me just what I want; so the lines of data to serialise are 
> "alright"; just JSON strings, in fact.
> Stacktrace:
> {code:java}
> Traceback (most recent call last):
>   File "./okr_end_ride.py", line 254, in 
>     run()
>   File "./okr_end_ride.py", line 250, in run
>     run_pipeline(pipeline_options, known_args)
>   File "./okr_end_ride.py", line 198, in run_pipeline
>     | 'write_all' >> WriteToText(known_args.output, 
> file_name_suffix=".txt")
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py",
>  line 426, in __exit__
>     self.run().wait_until_finish()
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py",
>  line 406, in run
>     self._options).run(False)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py",
>  line 419, in run
>     return self.runner.run_pipeline(self, self._options)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/direct/direct_runner.py",
>  line 132, in run_pipeline
>     return runner.run_pipeline(pipeline, options)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 275, in run_pipeline
>     default_environment=self._default_environment))
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 278, in run_via_runner_api
>     return self.run_stages(*self.create_stages(pipeline_proto))
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 354, in run_stages
>     stage_context.safe_coders)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 509, in run_stage
>     data_input, data_output)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 1206, in process_bundle
>     result_future = self._controller.control_handler.push(process_bundle)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py",
>  line 821, in push
>     response = self.worker.do_instruction(request)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 265, in do_instruction
>     request.instruction_id)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 281, in process_bundle
>     delayed_applications = bundle_processor.process_bundle(instruction_id)
>   File 
> "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 552, in process_bundle
>     op.finish()
>   File "apache_beam/runners/worker/operations.py", line 549, in 
> apache_beam.runners.worker.operations.DoOperation.finish
>   File "apache_beam/runners/worker/operations.py", line 550, in 
> apache_beam.runners.worker.operations.DoOperation.finish
>   File "apache_beam/runners/worker/operations.py", line 551, in 
> 

[jira] [Work logged] (BEAM-9703) Create py validations runner test for metrics

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9703?focusedWorklogId=420005=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-420005
 ]

ASF GitHub Bot logged work on BEAM-9703:


Author: ASF GitHub Bot
Created on: 10/Apr/20 04:22
Start Date: 10/Apr/20 04:22
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #11319: [BEAM-9703]Include 
user distritribution into metric-dedicated validate runner test.
URL: https://github.com/apache/beam/pull/11319#issuecomment-611871116
 
 
   pull from head and rebased (no changes to previous commits)
   
   please try again
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 420005)
Time Spent: 2.5h  (was: 2h 20m)

> Create py validations runner test for metrics
> -
>
> Key: BEAM-9703
> URL: https://issues.apache.org/jira/browse/BEAM-9703
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Minor
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Some of the metrics are not covered by dedicated validation runner test. 
> Would like create these if needed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9703) Create py validations runner test for metrics

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9703?focusedWorklogId=419998=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419998
 ]

ASF GitHub Bot logged work on BEAM-9703:


Author: ASF GitHub Bot
Created on: 10/Apr/20 04:14
Start Date: 10/Apr/20 04:14
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #11319: [BEAM-9703]Include 
user distritribution into metric-dedicated validate runner test.
URL: https://github.com/apache/beam/pull/11319#issuecomment-611868777
 
 
   gradlew lint and test both work fine on my local repository. 
   
   test command: python setup.py nosetests --tests 
apache_beam.metrics.metric_test:MetricsTest.test_user_counter_using_pardo
   
   And the PythonLint by github, is from a different file(not touched by this 
PR):
   00:42:57 apache_beam/transforms/environments.py:255:12: F821 undefined name 
'from_container_image'
   00:42:57 return from_container_image(
   00:42:57^
   00:42:57 1 F821 undefined name 'from_container_image'
   
   Not sure why.  Looking. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419998)
Time Spent: 2h 20m  (was: 2h 10m)

> Create py validations runner test for metrics
> -
>
> Key: BEAM-9703
> URL: https://issues.apache.org/jira/browse/BEAM-9703
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Minor
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Some of the metrics are not covered by dedicated validation runner test. 
> Would like create these if needed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9703) Create py validations runner test for metrics

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9703?focusedWorklogId=419996=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419996
 ]

ASF GitHub Bot logged work on BEAM-9703:


Author: ASF GitHub Bot
Created on: 10/Apr/20 04:11
Start Date: 10/Apr/20 04:11
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #11319: [BEAM-9703]Include 
user distritribution into metric-dedicated validate runner test.
URL: https://github.com/apache/beam/pull/11319#issuecomment-611868777
 
 
   lint and test both work fine on my local repository. 
   
   And the PythonLint by github reports from a different file:
   00:42:57 apache_beam/transforms/environments.py:255:12: F821 undefined name 
'from_container_image'
   00:42:57 return from_container_image(
   00:42:57^
   00:42:57 1 F821 undefined name 'from_container_image'
   
   Not sure why.  Looking. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419996)
Time Spent: 2h 10m  (was: 2h)

> Create py validations runner test for metrics
> -
>
> Key: BEAM-9703
> URL: https://issues.apache.org/jira/browse/BEAM-9703
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Minor
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Some of the metrics are not covered by dedicated validation runner test. 
> Would like create these if needed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9678) Introduction Kata | Go SDK Code Katas

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9678?focusedWorklogId=419987=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419987
 ]

ASF GitHub Bot logged work on BEAM-9678:


Author: ASF GitHub Bot
Created on: 10/Apr/20 03:18
Start Date: 10/Apr/20 03:18
Worklog Time Spent: 10m 
  Work Description: damondouglas commented on issue #11340: [BEAM-9678] 
Create Go SDK introduction kata
URL: https://github.com/apache/beam/pull/11340#issuecomment-611857902
 
 
   @lostluck @henryken I've made the recommended changes. Thank you for 
reviewing. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419987)
Time Spent: 1.5h  (was: 1h 20m)

> Introduction Kata | Go SDK Code Katas
> -
>
> Key: BEAM-9678
> URL: https://issues.apache.org/jira/browse/BEAM-9678
> Project: Beam
>  Issue Type: Sub-task
>  Components: katas, sdk-go
>Reporter: Damon Douglas
>Assignee: Damon Douglas
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> An Introduction kata patterns after 
> [https://github.com/apache/beam/tree/master/learning/katas/java/Introduction] 
> where the take away is an individual's ability to start an Apache Beam 
> pipeline using the Golang SDK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9678) Introduction Kata | Go SDK Code Katas

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9678?focusedWorklogId=419985=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419985
 ]

ASF GitHub Bot logged work on BEAM-9678:


Author: ASF GitHub Bot
Created on: 10/Apr/20 03:17
Start Date: 10/Apr/20 03:17
Worklog Time Spent: 10m 
  Work Description: damondouglas commented on pull request #11340: 
[BEAM-9678] Create Go SDK introduction kata
URL: https://github.com/apache/beam/pull/11340#discussion_r406584247
 
 

 ##
 File path: learning/katas/go/Introduction/task2/test/task_test.go
 ##
 @@ -0,0 +1,44 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package test
+
+import (
+   "io/ioutil"
+   "task"
+   "testing"
+)
+
+func TestTask(t *testing.T) {
+   err := task.Task()
+   if err != nil {
+   t.Error(err)
+   }
+   data, err := ioutil.ReadFile(task.OutputFile)
+   if err != nil {
+   t.Error(err)
+   }
+
+   want := "Hello Beam\n"
+   got := string(data)
+   if want != got {
+   t.Errorf("want: %s got: %s", want, got)
+   }
+
+   err = ioutil.WriteFile(task.OutputFile, []byte{}, 0644)
 
 Review comment:
   Done :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419985)
Time Spent: 1h 10m  (was: 1h)

> Introduction Kata | Go SDK Code Katas
> -
>
> Key: BEAM-9678
> URL: https://issues.apache.org/jira/browse/BEAM-9678
> Project: Beam
>  Issue Type: Sub-task
>  Components: katas, sdk-go
>Reporter: Damon Douglas
>Assignee: Damon Douglas
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> An Introduction kata patterns after 
> [https://github.com/apache/beam/tree/master/learning/katas/java/Introduction] 
> where the take away is an individual's ability to start an Apache Beam 
> pipeline using the Golang SDK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9678) Introduction Kata | Go SDK Code Katas

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9678?focusedWorklogId=419986=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419986
 ]

ASF GitHub Bot logged work on BEAM-9678:


Author: ASF GitHub Bot
Created on: 10/Apr/20 03:17
Start Date: 10/Apr/20 03:17
Worklog Time Spent: 10m 
  Work Description: damondouglas commented on pull request #11340: 
[BEAM-9678] Create Go SDK introduction kata
URL: https://github.com/apache/beam/pull/11340#discussion_r406584290
 
 

 ##
 File path: learning/katas/go/Introduction/task1/task-info.yaml
 ##
 @@ -0,0 +1,39 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+type: edu
+custom_name: Setup
+files:
+- name: test/task_test.go
+  visible: false
+- name: task.go
+  visible: true
+  placeholders:
+  - offset: 961
+length: 26
+placeholder_text: 'TODO: create a new pipeline'
+  - offset: 1012
+length: 34
+placeholder_text: 'TODO: execute the pipeline'
+- name: go.mod
+  visible: false
+- name: go.sum
+  visible: true
 
 Review comment:
   Fixed :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419986)
Time Spent: 1h 20m  (was: 1h 10m)

> Introduction Kata | Go SDK Code Katas
> -
>
> Key: BEAM-9678
> URL: https://issues.apache.org/jira/browse/BEAM-9678
> Project: Beam
>  Issue Type: Sub-task
>  Components: katas, sdk-go
>Reporter: Damon Douglas
>Assignee: Damon Douglas
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> An Introduction kata patterns after 
> [https://github.com/apache/beam/tree/master/learning/katas/java/Introduction] 
> where the take away is an individual's ability to start an Apache Beam 
> pipeline using the Golang SDK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9678) Introduction Kata | Go SDK Code Katas

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9678?focusedWorklogId=419983=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419983
 ]

ASF GitHub Bot logged work on BEAM-9678:


Author: ASF GitHub Bot
Created on: 10/Apr/20 03:14
Start Date: 10/Apr/20 03:14
Worklog Time Spent: 10m 
  Work Description: damondouglas commented on pull request #11340: 
[BEAM-9678] Create Go SDK introduction kata
URL: https://github.com/apache/beam/pull/11340#discussion_r406583732
 
 

 ##
 File path: learning/katas/go/Introduction/lesson-info.yaml
 ##
 @@ -0,0 +1,23 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+content:
 
 Review comment:
   Thank you for reviewing.  I reorganized the Introduction section to pattern 
after the structure in the Java katas.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419983)
Time Spent: 1h  (was: 50m)

> Introduction Kata | Go SDK Code Katas
> -
>
> Key: BEAM-9678
> URL: https://issues.apache.org/jira/browse/BEAM-9678
> Project: Beam
>  Issue Type: Sub-task
>  Components: katas, sdk-go
>Reporter: Damon Douglas
>Assignee: Damon Douglas
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> An Introduction kata patterns after 
> [https://github.com/apache/beam/tree/master/learning/katas/java/Introduction] 
> where the take away is an individual's ability to start an Apache Beam 
> pipeline using the Golang SDK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419958=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419958
 ]

ASF GitHub Bot logged work on BEAM-9651:


Author: ASF GitHub Bot
Created on: 10/Apr/20 02:05
Start Date: 10/Apr/20 02:05
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11364: [BEAM-9651] Prevent 
StreamPool and stream initialization livelock
URL: https://github.com/apache/beam/pull/11364#issuecomment-611840338
 
 
   They would run automatically as a cron PostCommit or you can manually launch 
them from the Jenkins UI.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419958)
Time Spent: 3h  (was: 2h 50m)

> StreamingDataflowWorker stuck waiting for 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
> --
>
> Key: BEAM-9651
> URL: https://issues.apache.org/jira/browse/BEAM-9651
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Operation ongoing in step  for at least 28h10m00s without 
> outputting or completing in state windmill-read at 
> sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at 
> java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at 
> java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown
>  Source) at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389)
>  at
> 
> Because the stream is started in a StreamPool synchronized block, all other 
> threads interacting with StreamPool to get or release streams end up blocking.
> It is unclear if the stream never became usable and thus blocked forever or 
> if there is a race with the use of the Phaser that causes the stuckness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419957=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419957
 ]

ASF GitHub Bot logged work on BEAM-9562:


Author: ASF GitHub Bot
Created on: 10/Apr/20 02:01
Start Date: 10/Apr/20 02:01
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11373: [BEAM-9562] 
Update Element.timer to Element.timers
URL: https://github.com/apache/beam/pull/11373#discussion_r406567717
 
 

 ##
 File path: model/fn-execution/src/main/proto/beam_fn_api.proto
 ##
 @@ -516,7 +516,7 @@ message Elements {
   repeated Data data = 1;
 
   // (Optional)  A list of timer byte streams.
-  repeated Timer timer = 2;
+  repeated Timer timers = 2;
 
 Review comment:
   @robertwb did you want to rename this field or the proto message Timer -> 
Timers or both?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419957)
Time Spent: 21h  (was: 20h 50m)

> Remove timer from PCollection and treat timers as Elements 
> ---
>
> Key: BEAM-9562
> URL: https://issues.apache.org/jira/browse/BEAM-9562
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-harness, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 21h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419956=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419956
 ]

ASF GitHub Bot logged work on BEAM-9562:


Author: ASF GitHub Bot
Created on: 10/Apr/20 02:00
Start Date: 10/Apr/20 02:00
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11373: [BEAM-9562] Update 
Element.timer to Element.timers
URL: https://github.com/apache/beam/pull/11373#issuecomment-611839103
 
 
   please regenerate the go protos
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419956)
Time Spent: 20h 50m  (was: 20h 40m)

> Remove timer from PCollection and treat timers as Elements 
> ---
>
> Key: BEAM-9562
> URL: https://issues.apache.org/jira/browse/BEAM-9562
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-harness, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419953=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419953
 ]

ASF GitHub Bot logged work on BEAM-9651:


Author: ASF GitHub Bot
Created on: 10/Apr/20 01:59
Start Date: 10/Apr/20 01:59
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11368: [BEAM-9651] Prevent 
StreamPool and stream initialization livelock
URL: https://github.com/apache/beam/pull/11368#issuecomment-611838764
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419953)
Time Spent: 2h 50m  (was: 2h 40m)

> StreamingDataflowWorker stuck waiting for 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
> --
>
> Key: BEAM-9651
> URL: https://issues.apache.org/jira/browse/BEAM-9651
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Operation ongoing in step  for at least 28h10m00s without 
> outputting or completing in state windmill-read at 
> sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at 
> java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at 
> java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown
>  Source) at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389)
>  at
> 
> Because the stream is started in a StreamPool synchronized block, all other 
> threads interacting with StreamPool to get or release streams end up blocking.
> It is unclear if the stream never became usable and thus blocked forever or 
> if there is a race with the use of the Phaser that causes the stuckness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9721) Dataflow-based Jenkins jobs are failing due to lack of --region option

2020-04-09 Thread Daniel Oliveira (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080150#comment-17080150
 ] 

Daniel Oliveira commented on BEAM-9721:
---

Looks like this is also affecting beam_PerformanceTests_WordCountIT jobs 
([example|https://builds.apache.org/job/beam_PerformanceTests_WordCountIT_Py27/1314/]).

> Dataflow-based Jenkins jobs are failing due to lack of --region option
> --
>
> Key: BEAM-9721
> URL: https://issues.apache.org/jira/browse/BEAM-9721
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures, testing
>Reporter: Michał Walenia
>Assignee: Michał Walenia
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9674) "Selected fields list too long" error when calling tables.get in BigQueryStorageTableSource

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9674?focusedWorklogId=419927=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419927
 ]

ASF GitHub Bot logged work on BEAM-9674:


Author: ASF GitHub Bot
Created on: 10/Apr/20 00:56
Start Date: 10/Apr/20 00:56
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #11292: [BEAM-9674] Don't 
specify selected fields when fetching BigQuery table size
URL: https://github.com/apache/beam/pull/11292#issuecomment-611822669
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419927)
Time Spent: 2h 20m  (was: 2h 10m)

> "Selected fields list too long" error when calling tables.get in 
> BigQueryStorageTableSource
> ---
>
> Key: BEAM-9674
> URL: https://issues.apache.org/jira/browse/BEAM-9674
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.19.0
>Reporter: Kenneth Jung
>Assignee: Kenneth Jung
>Priority: Minor
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Customers experience errors similar to the following:
>  Caused by: 
> com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
> Request { "code" : 400, "errors" : [
> { "domain" : "global", "message" : "Selected fields too long:  must 
> be less than 16384 characters.", "reason" : "invalid" }
> ], "message" : "Selected fields too long:  must be less than 16384 
> characters.", "status" : "INVALID_ARGUMENT" } 
> com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
>  
> com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
>  
> com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
>  
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321)
>  com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1097) 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
>  
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
>  
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.executeWithRetries(BigQueryServicesImpl.java:938)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9674) "Selected fields list too long" error when calling tables.get in BigQueryStorageTableSource

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9674?focusedWorklogId=419926=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419926
 ]

ASF GitHub Bot logged work on BEAM-9674:


Author: ASF GitHub Bot
Created on: 10/Apr/20 00:55
Start Date: 10/Apr/20 00:55
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #11292: [BEAM-9674] Don't 
specify selected fields when fetching BigQuery table size
URL: https://github.com/apache/beam/pull/11292#issuecomment-611822518
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419926)
Time Spent: 2h 10m  (was: 2h)

> "Selected fields list too long" error when calling tables.get in 
> BigQueryStorageTableSource
> ---
>
> Key: BEAM-9674
> URL: https://issues.apache.org/jira/browse/BEAM-9674
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.19.0
>Reporter: Kenneth Jung
>Assignee: Kenneth Jung
>Priority: Minor
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Customers experience errors similar to the following:
>  Caused by: 
> com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
> Request { "code" : 400, "errors" : [
> { "domain" : "global", "message" : "Selected fields too long:  must 
> be less than 16384 characters.", "reason" : "invalid" }
> ], "message" : "Selected fields too long:  must be less than 16384 
> characters.", "status" : "INVALID_ARGUMENT" } 
> com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
>  
> com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
>  
> com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
>  
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321)
>  com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1097) 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
>  
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
>  
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
>  
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.executeWithRetries(BigQueryServicesImpl.java:938)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5504) PubsubAvroTable

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5504?focusedWorklogId=419922=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419922
 ]

ASF GitHub Bot logged work on BEAM-5504:


Author: ASF GitHub Bot
Created on: 10/Apr/20 00:51
Start Date: 10/Apr/20 00:51
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10487: [BEAM-5504] Introduce 
PubsubAvroTable
URL: https://github.com/apache/beam/pull/10487#issuecomment-611821536
 
 
   Is this PR still active?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419922)
Time Spent: 4.5h  (was: 4h 20m)

> PubsubAvroTable
> ---
>
> Key: BEAM-5504
> URL: https://issues.apache.org/jira/browse/BEAM-5504
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Jing Chen
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419923=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419923
 ]

ASF GitHub Bot logged work on BEAM-9562:


Author: ASF GitHub Bot
Created on: 10/Apr/20 00:51
Start Date: 10/Apr/20 00:51
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #11373: [BEAM-9562] 
Update Element.timer to Element.timers
URL: https://github.com/apache/beam/pull/11373
 
 
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=419920=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419920
 ]

ASF GitHub Bot logged work on BEAM-9468:


Author: ASF GitHub Bot
Created on: 10/Apr/20 00:49
Start Date: 10/Apr/20 00:49
Worklog Time Spent: 10m 
  Work Description: jaketf commented on pull request #11151: [BEAM-9468]  
Hl7v2 io
URL: https://github.com/apache/beam/pull/11151#discussion_r406551322
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -454,6 +455,7 @@ class BeamModulePlugin implements Plugin {
 google_api_services_clouddebugger   : 
"com.google.apis:google-api-services-clouddebugger:v2-rev20191003-$google_clients_version",
 google_api_services_cloudresourcemanager: 
"com.google.apis:google-api-services-cloudresourcemanager:v1-rev20191206-$google_clients_version",
 google_api_services_dataflow: 
"com.google.apis:google-api-services-dataflow:v1b3-rev20190927-$google_clients_version",
+google_api_services_healthcare  : 
"com.google.apis:google-api-services-healthcare:v1alpha2-rev20190901-$google_cloud_healthcare_version",
 
 Review comment:
   I'm rather new to these generated clients and beam.
   Is there any process this community has followed in the past for finding a 
suitable version in situations like this?
   Seems like a lot for this PR to make this rather global upgrade.
   Naturally if I just flip google_clients_version to 1.30.9 it breaks  a lot:
   ```
   Could not determine the dependencies of task ':sdks:java:harness:shadowJar'.
   > Could not resolve all dependencies for configuration 
':sdks:java:harness:runtimeClasspath'.
  > Could not find 
com.google.apis:google-api-services-cloudresourcemanager:v1-rev20191206-1.30.9.
Required by:
project :sdks:java:harness > project 
:sdks:java:extensions:google-cloud-platform-core
  > Could not find 
com.google.apis:google-api-services-storage:v1-rev20191011-1.30.9.
Required by:
project :sdks:java:harness > project 
:sdks:java:extensions:google-cloud-platform-core
  > Could not find com.google.oauth-client:google-oauth-client:1.30.9.
  > Could not find 
com.google.apis:google-api-services-storage:v1-rev20191011-1.30.9.
Required by:
project :sdks:java:harness > project 
:sdks:java:extensions:google-cloud-platform-core > 
com.google.cloud.bigdataoss:gcsio:2.0.0
project :sdks:java:harness > project 
:sdks:java:extensions:google-cloud-platform-core > 
com.google.cloud.bigdataoss:util:2.0.0
  > Could not find com.google.oauth-client:google-oauth-client:1.30.9.
Required by:
project :sdks:java:harness > project 
:sdks:java:extensions:google-cloud-platform-core > 
com.google.cloud.bigdataoss:gcsio:2.0.0
project :sdks:java:harness > project 
:sdks:java:extensions:google-cloud-platform-core > 
com.google.cloud.bigdataoss:util:2.0.0
  > Could not find com.google.oauth-client:google-oauth-client-java6:1.30.9.
Required by:
project :sdks:java:harness > project 
:sdks:java:extensions:google-cloud-platform-core > 
com.google.cloud.bigdataoss:gcsio:2.0.0
project :sdks:java:harness > project 
:sdks:java:extensions:google-cloud-platform-core > 
com.google.cloud.bigdataoss:util:2.0.0
  > Could not find com.google.oauth-client:google-oauth-client-java6:1.30.9.
Required by:
project :sdks:java:harness > project 
:sdks:java:extensions:google-cloud-platform-core > 
com.google.cloud.bigdataoss:gcsio:2.0.0 > 
com.google.api-client:google-api-client-java6:1.30.9
   * 
   ```
   
   seems like our culprits are 
   - [ ] `google-oauth-client`
   - [ ] `google-oauth-client-java6`
   - [ ] `google-api-services-storage`
   
   If I go in and try to massage the revision numbers to make this work how 
will I ensure that all those reliant on those deps are still working? Is that 
covered in pre/post commit ITs?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419920)
Time Spent: 19h  (was: 18h 50m)

> Add Google Cloud Healthcare API IO Connectors
> -
>
> Key: BEAM-9468
> URL: https://issues.apache.org/jira/browse/BEAM-9468
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>

[jira] [Work logged] (BEAM-8910) Use AVRO instead of JSON in BigQuery bounded source.

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8910?focusedWorklogId=419916=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419916
 ]

ASF GitHub Bot logged work on BEAM-8910:


Author: ASF GitHub Bot
Created on: 10/Apr/20 00:30
Start Date: 10/Apr/20 00:30
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #11086: [BEAM-8910] Make 
custom BQ source read from Avro
URL: https://github.com/apache/beam/pull/11086#issuecomment-611816596
 
 
   Run Python 3.5 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419916)
Time Spent: 5.5h  (was: 5h 20m)

> Use AVRO instead of JSON in BigQuery bounded source.
> 
>
> Key: BEAM-8910
> URL: https://issues.apache.org/jira/browse/BEAM-8910
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Kamil Wasilewski
>Assignee: Pablo Estrada
>Priority: Minor
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> The proposed BigQuery bounded source in Python SDK (see PR: 
> [https://github.com/apache/beam/pull/9772)] uses a BigQuery export job to 
> take a snapshot of the table and read from each produced JSON file. A 
> performance improvement can be gain by switching to AVRO instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9443) support direct_num_workers=0

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9443?focusedWorklogId=419915=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419915
 ]

ASF GitHub Bot logged work on BEAM-9443:


Author: ASF GitHub Bot
Created on: 10/Apr/20 00:29
Start Date: 10/Apr/20 00:29
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #11372: 
[BEAM-9443] support direct_num_workers=0
URL: https://github.com/apache/beam/pull/11372#discussion_r406546390
 
 

 ##
 File path: CHANGES.md
 ##
 @@ -67,6 +67,7 @@
 [Ensuring Python Type 
Safety](https://beam.apache.org/documentation/sdks/python-type-safety/)
 and an upcoming
 [blog 
post](https://beam.apache.org/blog/python/typing/2020/03/06/python-typing.html).
+* --direct_num_workers=0 is supported for FnApi. It will set the number of 
threads/subprocesses to number of cores of the machine executing the 
pipeline([BEAM-9443](https://issues.apache.org/jira/browse/BEAM-9443)).
 
 Review comment:
   It is added to 2.21.0, please let me know if the branch is already cut, I 
can move it to 2.22.0 if needed. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419915)
Time Spent: 20m  (was: 10m)

> support direct_num_workers=0 
> -
>
> Key: BEAM-9443
> URL: https://issues.apache.org/jira/browse/BEAM-9443
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.22.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> when direct_num_workers=0, set it to number of cores.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9443) support direct_num_workers=0

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9443?focusedWorklogId=419912=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419912
 ]

ASF GitHub Bot logged work on BEAM-9443:


Author: ASF GitHub Bot
Created on: 10/Apr/20 00:27
Start Date: 10/Apr/20 00:27
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #11372: 
[BEAM-9443] support direct_num_workers=0
URL: https://github.com/apache/beam/pull/11372
 
 
   R: @ibzib 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | 

[jira] [Work started] (BEAM-9443) support direct_num_workers=0

2020-04-09 Thread Hannah Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-9443 started by Hannah Jiang.
--
> support direct_num_workers=0 
> -
>
> Key: BEAM-9443
> URL: https://issues.apache.org/jira/browse/BEAM-9443
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.22.0
>
>
> when direct_num_workers=0, set it to number of cores.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419907=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419907
 ]

ASF GitHub Bot logged work on BEAM-9562:


Author: ASF GitHub Bot
Created on: 10/Apr/20 00:19
Start Date: 10/Apr/20 00:19
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11314: [BEAM-9562] 
Send Timers over Data Channel as Elements
URL: https://github.com/apache/beam/pull/11314
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419907)
Time Spent: 20.5h  (was: 20h 20m)

> Remove timer from PCollection and treat timers as Elements 
> ---
>
> Key: BEAM-9562
> URL: https://issues.apache.org/jira/browse/BEAM-9562
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-harness, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 20.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9703) Create py validations runner test for metrics

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9703?focusedWorklogId=419905=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419905
 ]

ASF GitHub Bot logged work on BEAM-9703:


Author: ASF GitHub Bot
Created on: 10/Apr/20 00:15
Start Date: 10/Apr/20 00:15
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11319: [BEAM-9703]Include 
user distritribution into metric-dedicated validate runner test.
URL: https://github.com/apache/beam/pull/11319#issuecomment-611812812
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419905)
Time Spent: 2h  (was: 1h 50m)

> Create py validations runner test for metrics
> -
>
> Key: BEAM-9703
> URL: https://issues.apache.org/jira/browse/BEAM-9703
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Some of the metrics are not covered by dedicated validation runner test. 
> Would like create these if needed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9727) Auto populate required feature experiment flags for enable dataflow runner v2

2020-04-09 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-9727.
-
Resolution: Fixed

> Auto populate required feature experiment flags for enable dataflow runner v2
> -
>
> Key: BEAM-9727
> URL: https://issues.apache.org/jira/browse/BEAM-9727
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9727) Auto populate required feature experiment flags for enable dataflow runner v2

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9727?focusedWorklogId=419902=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419902
 ]

ASF GitHub Bot logged work on BEAM-9727:


Author: ASF GitHub Bot
Created on: 10/Apr/20 00:13
Start Date: 10/Apr/20 00:13
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11355: [BEAM-9727] 
Automatically set required experiment flags for dataflow runner v2.
URL: https://github.com/apache/beam/pull/11355
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419902)
Time Spent: 3h 20m  (was: 3h 10m)

> Auto populate required feature experiment flags for enable dataflow runner v2
> -
>
> Key: BEAM-9727
> URL: https://issues.apache.org/jira/browse/BEAM-9727
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419897=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419897
 ]

ASF GitHub Bot logged work on BEAM-9651:


Author: ASF GitHub Bot
Created on: 09/Apr/20 23:59
Start Date: 09/Apr/20 23:59
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #11368: [BEAM-9651] 
Prevent StreamPool and stream initialization livelock
URL: https://github.com/apache/beam/pull/11368#issuecomment-611808620
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419897)
Time Spent: 2.5h  (was: 2h 20m)

> StreamingDataflowWorker stuck waiting for 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
> --
>
> Key: BEAM-9651
> URL: https://issues.apache.org/jira/browse/BEAM-9651
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Operation ongoing in step  for at least 28h10m00s without 
> outputting or completing in state windmill-read at 
> sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at 
> java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at 
> java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown
>  Source) at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389)
>  at
> 
> Because the stream is started in a StreamPool synchronized block, all other 
> threads interacting with StreamPool to get or release streams end up blocking.
> It is unclear if the stream never became usable and thus blocked forever or 
> if there is a race with the use of the Phaser that causes the stuckness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419898=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419898
 ]

ASF GitHub Bot logged work on BEAM-9651:


Author: ASF GitHub Bot
Created on: 09/Apr/20 23:59
Start Date: 09/Apr/20 23:59
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #11368: [BEAM-9651] 
Prevent StreamPool and stream initialization livelock
URL: https://github.com/apache/beam/pull/11368#issuecomment-611808657
 
 
   run dataflow validatesrunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419898)
Time Spent: 2h 40m  (was: 2.5h)

> StreamingDataflowWorker stuck waiting for 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
> --
>
> Key: BEAM-9651
> URL: https://issues.apache.org/jira/browse/BEAM-9651
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Operation ongoing in step  for at least 28h10m00s without 
> outputting or completing in state windmill-read at 
> sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at 
> java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at 
> java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown
>  Source) at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389)
>  at
> 
> Because the stream is started in a StreamPool synchronized block, all other 
> threads interacting with StreamPool to get or release streams end up blocking.
> It is unclear if the stream never became usable and thus blocked forever or 
> if there is a race with the use of the Phaser that causes the stuckness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419896=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419896
 ]

ASF GitHub Bot logged work on BEAM-9651:


Author: ASF GitHub Bot
Created on: 09/Apr/20 23:56
Start Date: 09/Apr/20 23:56
Worklog Time Spent: 10m 
  Work Description: scwhittle commented on issue #11368: [BEAM-9651] 
Prevent StreamPool and stream initialization livelock
URL: https://github.com/apache/beam/pull/11368#issuecomment-611808055
 
 
   R: @reuvenlax 
   Please trigger the tests before merging
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419896)
Time Spent: 2h 20m  (was: 2h 10m)

> StreamingDataflowWorker stuck waiting for 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
> --
>
> Key: BEAM-9651
> URL: https://issues.apache.org/jira/browse/BEAM-9651
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Operation ongoing in step  for at least 28h10m00s without 
> outputting or completing in state windmill-read at 
> sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at 
> java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at 
> java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown
>  Source) at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389)
>  at
> 
> Because the stream is started in a StreamPool synchronized block, all other 
> threads interacting with StreamPool to get or release streams end up blocking.
> It is unclear if the stream never became usable and thus blocked forever or 
> if there is a race with the use of the Phaser that causes the stuckness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419894=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419894
 ]

ASF GitHub Bot logged work on BEAM-9651:


Author: ASF GitHub Bot
Created on: 09/Apr/20 23:56
Start Date: 09/Apr/20 23:56
Worklog Time Spent: 10m 
  Work Description: scwhittle commented on issue #11368: [BEAM-9651] 
Prevent StreamPool and stream initialization livelock
URL: https://github.com/apache/beam/pull/11368#issuecomment-611808055
 
 
   R: @reuvenlax 
   With tests this time :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419894)
Time Spent: 2h 10m  (was: 2h)

> StreamingDataflowWorker stuck waiting for 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
> --
>
> Key: BEAM-9651
> URL: https://issues.apache.org/jira/browse/BEAM-9651
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Operation ongoing in step  for at least 28h10m00s without 
> outputting or completing in state windmill-read at 
> sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at 
> java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at 
> java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown
>  Source) at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389)
>  at
> 
> Because the stream is started in a StreamPool synchronized block, all other 
> threads interacting with StreamPool to get or release streams end up blocking.
> It is unclear if the stream never became usable and thus blocked forever or 
> if there is a race with the use of the Phaser that causes the stuckness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9600) Implement GetJobMetrics in Flink uber jar job server

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9600?focusedWorklogId=419895=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419895
 ]

ASF GitHub Bot logged work on BEAM-9600:


Author: ASF GitHub Bot
Created on: 09/Apr/20 23:56
Start Date: 09/Apr/20 23:56
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #11369: [BEAM-9600] Get 
metrics in Flink uber jar job server.
URL: https://github.com/apache/beam/pull/11369
 
 
   Accumulators are visible in the Flink REST API at the `/accumulators` 
endpoint, including the Beam Flink runner's custom accumulator, 
`__metricscontainers`. The value of each accumulator is printed using the 
accumulator's `toString` method. Previously, 
`MetricsContainerStepMap::toString` printed some sort of bespoke format. In 
this PR, I change it to use the JSON representation of the underlying protos so 
it's easy to parse on the Python side.
   
   This implementation ignores `committed` metrics. My understanding is that 
the Flink runner doesn't support them anyway, so I assume the distinction isn't 
important here.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 

[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419893=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419893
 ]

ASF GitHub Bot logged work on BEAM-9651:


Author: ASF GitHub Bot
Created on: 09/Apr/20 23:55
Start Date: 09/Apr/20 23:55
Worklog Time Spent: 10m 
  Work Description: scwhittle commented on pull request #11368: [BEAM-9651] 
Prevent StreamPool and stream initialization livelock
URL: https://github.com/apache/beam/pull/11368
 
 
   Reduce StreamPool synchronized block by excluding stream creation and
   requiring it for release. Some Dataflow pipelines were observing stuck
   get and release methods due to stuck stream creation waiting for the
   onReady callback. The DirectStreamObserver is modified to additionally
   pool isReady() to handle cases where blocked threads are blocking the
   onReady callback. Additionally as we would prefer to possibly OOM
   versus becoming permanently stuck we give up waiting for onReady after
   a minute.
   
   This is a recommit of a reverted commit to allow for testing before
   merging.
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | 

[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419879=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419879
 ]

ASF GitHub Bot logged work on BEAM-9651:


Author: ASF GitHub Bot
Created on: 09/Apr/20 23:36
Start Date: 09/Apr/20 23:36
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #11367: Revert 
"[BEAM-9651] Prevent StreamPool and stream initialization livelock"
URL: https://github.com/apache/beam/pull/11367
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419879)
Time Spent: 1h 50m  (was: 1h 40m)

> StreamingDataflowWorker stuck waiting for 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
> --
>
> Key: BEAM-9651
> URL: https://issues.apache.org/jira/browse/BEAM-9651
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Operation ongoing in step  for at least 28h10m00s without 
> outputting or completing in state windmill-read at 
> sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at 
> java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at 
> java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown
>  Source) at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389)
>  at
> 
> Because the stream is started in a StreamPool synchronized block, all other 
> threads interacting with StreamPool to get or release streams end up blocking.
> It is unclear if the stream never became usable and thus blocked forever or 
> if there is a race with the use of the Phaser that causes the stuckness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=419867=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419867
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 09/Apr/20 23:03
Start Date: 09/Apr/20 23:03
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #11243: 
[BEAM-9136]Add licenses for dependencies for Java
URL: https://github.com/apache/beam/pull/11243#discussion_r406523425
 
 

 ##
 File path: sdks/java/container/license_scripts/pull_licenses_java.py
 ##
 @@ -0,0 +1,186 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+"""
+A script to pull licenses/notices/source code for Java dependencies.
 
 Review comment:
   This is done. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419867)
Time Spent: 21h  (was: 20h 50m)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 21h
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419864=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419864
 ]

ASF GitHub Bot logged work on BEAM-9651:


Author: ASF GitHub Bot
Created on: 09/Apr/20 23:02
Start Date: 09/Apr/20 23:02
Worklog Time Spent: 10m 
  Work Description: scwhittle commented on issue #11367: Revert 
"[BEAM-9651] Prevent StreamPool and stream initialization livelock"
URL: https://github.com/apache/beam/pull/11367#issuecomment-611794270
 
 
   R: @lukecwik 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419864)
Time Spent: 1h 40m  (was: 1.5h)

> StreamingDataflowWorker stuck waiting for 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
> --
>
> Key: BEAM-9651
> URL: https://issues.apache.org/jira/browse/BEAM-9651
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Operation ongoing in step  for at least 28h10m00s without 
> outputting or completing in state windmill-read at 
> sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at 
> java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at 
> java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown
>  Source) at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389)
>  at
> 
> Because the stream is started in a StreamPool synchronized block, all other 
> threads interacting with StreamPool to get or release streams end up blocking.
> It is unclear if the stream never became usable and thus blocked forever or 
> if there is a race with the use of the Phaser that causes the stuckness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419861=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419861
 ]

ASF GitHub Bot logged work on BEAM-9651:


Author: ASF GitHub Bot
Created on: 09/Apr/20 22:59
Start Date: 09/Apr/20 22:59
Worklog Time Spent: 10m 
  Work Description: scwhittle commented on issue #11367: Revert 
"[BEAM-9651] Prevent StreamPool and stream initialization livelock"
URL: https://github.com/apache/beam/pull/11367#issuecomment-611793406
 
 
   R: @reuvenlax 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419861)
Time Spent: 1.5h  (was: 1h 20m)

> StreamingDataflowWorker stuck waiting for 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
> --
>
> Key: BEAM-9651
> URL: https://issues.apache.org/jira/browse/BEAM-9651
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Operation ongoing in step  for at least 28h10m00s without 
> outputting or completing in state windmill-read at 
> sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at 
> java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at 
> java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown
>  Source) at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389)
>  at
> 
> Because the stream is started in a StreamPool synchronized block, all other 
> threads interacting with StreamPool to get or release streams end up blocking.
> It is unclear if the stream never became usable and thus blocked forever or 
> if there is a race with the use of the Phaser that causes the stuckness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419860=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419860
 ]

ASF GitHub Bot logged work on BEAM-9651:


Author: ASF GitHub Bot
Created on: 09/Apr/20 22:58
Start Date: 09/Apr/20 22:58
Worklog Time Spent: 10m 
  Work Description: scwhittle commented on pull request #11367: Revert 
"[BEAM-9651] Prevent StreamPool and stream initialization livelock"
URL: https://github.com/apache/beam/pull/11367
 
 
   Reverts apache/beam#11364 because the tests were not run before merging.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419860)
Time Spent: 1h 20m  (was: 1h 10m)

> StreamingDataflowWorker stuck waiting for 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
> --
>
> Key: BEAM-9651
> URL: https://issues.apache.org/jira/browse/BEAM-9651
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Operation ongoing in step  for at least 28h10m00s without 
> outputting or completing in state windmill-read at 
> sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at 
> java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at 
> java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown
>  Source) at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389)
>  at
> 
> Because the stream is started in a StreamPool synchronized block, all other 
> threads interacting with StreamPool to get or release streams end up blocking.
> It is unclear if the stream never became usable and thus blocked forever or 
> if there is a race with the use of the Phaser that causes the stuckness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419850=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419850
 ]

ASF GitHub Bot logged work on BEAM-9651:


Author: ASF GitHub Bot
Created on: 09/Apr/20 22:40
Start Date: 09/Apr/20 22:40
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #11364: [BEAM-9651] 
Prevent StreamPool and stream initialization livelock
URL: https://github.com/apache/beam/pull/11364#issuecomment-611787993
 
 
   @lukecwik my bad - I missed that the tests hadn't triggered. is it possible 
to trigger tests now, or do we need a new PR?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419850)
Time Spent: 1h 10m  (was: 1h)

> StreamingDataflowWorker stuck waiting for 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
> --
>
> Key: BEAM-9651
> URL: https://issues.apache.org/jira/browse/BEAM-9651
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Operation ongoing in step  for at least 28h10m00s without 
> outputting or completing in state windmill-read at 
> sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at 
> java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at 
> java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown
>  Source) at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389)
>  at
> 
> Because the stream is started in a StreamPool synchronized block, all other 
> threads interacting with StreamPool to get or release streams end up blocking.
> It is unclear if the stream never became usable and thus blocked forever or 
> if there is a race with the use of the Phaser that causes the stuckness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419849=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419849
 ]

ASF GitHub Bot logged work on BEAM-9651:


Author: ASF GitHub Bot
Created on: 09/Apr/20 22:39
Start Date: 09/Apr/20 22:39
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #11364: [BEAM-9651] 
Prevent StreamPool and stream initialization livelock
URL: https://github.com/apache/beam/pull/11364#issuecomment-611787817
 
 
   run dataflow validatesrunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419849)
Time Spent: 1h  (was: 50m)

> StreamingDataflowWorker stuck waiting for 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
> --
>
> Key: BEAM-9651
> URL: https://issues.apache.org/jira/browse/BEAM-9651
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Operation ongoing in step  for at least 28h10m00s without 
> outputting or completing in state windmill-read at 
> sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at 
> java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at 
> java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown
>  Source) at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389)
>  at
> 
> Because the stream is started in a StreamPool synchronized block, all other 
> threads interacting with StreamPool to get or release streams end up blocking.
> It is unclear if the stream never became usable and thus blocked forever or 
> if there is a race with the use of the Phaser that causes the stuckness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419846=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419846
 ]

ASF GitHub Bot logged work on BEAM-9651:


Author: ASF GitHub Bot
Created on: 09/Apr/20 22:33
Start Date: 09/Apr/20 22:33
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11364: [BEAM-9651] Prevent 
StreamPool and stream initialization livelock
URL: https://github.com/apache/beam/pull/11364#issuecomment-611785948
 
 
   Note that the tests never ran on this change. Please only merge on green.
   
   Only committers can start tests on PRs which can sometimes be missed because 
changes can look like they are merge ready.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419846)
Time Spent: 50m  (was: 40m)

> StreamingDataflowWorker stuck waiting for 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
> --
>
> Key: BEAM-9651
> URL: https://issues.apache.org/jira/browse/BEAM-9651
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Operation ongoing in step  for at least 28h10m00s without 
> outputting or completing in state windmill-read at 
> sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at 
> java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at 
> java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown
>  Source) at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389)
>  at
> 
> Because the stream is started in a StreamPool synchronized block, all other 
> threads interacting with StreamPool to get or release streams end up blocking.
> It is unclear if the stream never became usable and thus blocked forever or 
> if there is a race with the use of the Phaser that causes the stuckness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9727) Auto populate required feature experiment flags for enable dataflow runner v2

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9727?focusedWorklogId=419843=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419843
 ]

ASF GitHub Bot logged work on BEAM-9727:


Author: ASF GitHub Bot
Created on: 09/Apr/20 22:31
Start Date: 09/Apr/20 22:31
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11355: [BEAM-9727] 
Automatically set required experiment flags for dataflow runner v2.
URL: https://github.com/apache/beam/pull/11355#issuecomment-611785115
 
 
   > > Consider adding the logic here instead:
   > > 
https://github.com/apache/beam/blob/79b2d87b59819ee55fb8600e8a845c6ba5b98d64/sdks/python/apache_beam/pipeline.py#L206
   > > 
   > > This would add the experiment slightly earlier then when the first 
ptransform is applied.
   > 
   > I initially considered to add it into pipeline.py following this check, 
but feel that the logic is too dataflow-specific to be added here.
   
   sgtm, in the worst case we can tell users to use a longer list of 
experiments if we find a problem later.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419843)
Time Spent: 3h 10m  (was: 3h)

> Auto populate required feature experiment flags for enable dataflow runner v2
> -
>
> Key: BEAM-9727
> URL: https://issues.apache.org/jira/browse/BEAM-9727
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1819) Key should be available in @OnTimer methods

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1819?focusedWorklogId=419840=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419840
 ]

ASF GitHub Bot logged work on BEAM-1819:


Author: ASF GitHub Bot
Created on: 09/Apr/20 22:25
Start Date: 09/Apr/20 22:25
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #11154: [BEAM-1819] Key 
should be available in @OnTimer methods
URL: https://github.com/apache/beam/pull/11154#issuecomment-611782665
 
 
   There appears to be a compilation failure in Java
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419840)
Time Spent: 16h 40m  (was: 16.5h)

> Key should be available in @OnTimer methods
> ---
>
> Key: BEAM-1819
> URL: https://issues.apache.org/jira/browse/BEAM-1819
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 16h 40m
>  Remaining Estimate: 0h
>
> Every timer firing has an associated key. This key should be available when 
> the timer is delivered to a user's {{DoFn}}, so they don't have to store it 
> in state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1589) Add OnWindowExpiration method to Stateful DoFn

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1589?focusedWorklogId=419839=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419839
 ]

ASF GitHub Bot logged work on BEAM-1589:


Author: ASF GitHub Bot
Created on: 09/Apr/20 22:24
Start Date: 09/Apr/20 22:24
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #11350: [BEAM-1589] 
Added @onWindowExpiration annotation.
URL: https://github.com/apache/beam/pull/11350#discussion_r406510585
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingModeExecutionContext.java
 ##
 @@ -591,7 +591,7 @@ public void flushState() {
   timerId,
   "",
   cleanupTime,
-  cleanupTime,
 
 Review comment:
   We also seem to set GC timers in ReduceFnRunner.java. @kennknowles do you 
know why we have both?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419839)
Time Spent: 2h 50m  (was: 2h 40m)

> Add OnWindowExpiration method to Stateful DoFn
> --
>
> Key: BEAM-1589
> URL: https://issues.apache.org/jira/browse/BEAM-1589
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core, sdk-java-core
>Reporter: Jingsong Lee
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> See BEAM-1517
> This allows the user to do some work before the state's garbage collection.
> It seems kind of annoying, but on the other hand forgetting to set a final 
> timer to flush state is probably data loss most of the time.
> FlinkRunner does this work very simply, but other runners, such as 
> DirectRunner, need to traverse all the states to do this, and maybe it's a 
> little hard.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1589) Add OnWindowExpiration method to Stateful DoFn

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1589?focusedWorklogId=419837=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419837
 ]

ASF GitHub Bot logged work on BEAM-1589:


Author: ASF GitHub Bot
Created on: 09/Apr/20 22:24
Start Date: 09/Apr/20 22:24
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #11350: [BEAM-1589] 
Added @onWindowExpiration annotation.
URL: https://github.com/apache/beam/pull/11350#discussion_r406508585
 
 

 ##
 File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java
 ##
 @@ -841,6 +847,238 @@ public BundleFinalizer bundleFinalizer() {
 }
   }
 
+  /**
+   * A concrete implementation of {@link DoFnInvoker.ArgumentProvider} used 
for running a {@link
+   * DoFn} on window expiration.
+   */
+  private class OnWindowExpirationArgumentProvider extends DoFn.OnTimerContext
 
 Review comment:
   Why is this based on OnTimerContext?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419837)
Time Spent: 2h 40m  (was: 2.5h)

> Add OnWindowExpiration method to Stateful DoFn
> --
>
> Key: BEAM-1589
> URL: https://issues.apache.org/jira/browse/BEAM-1589
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core, sdk-java-core
>Reporter: Jingsong Lee
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> See BEAM-1517
> This allows the user to do some work before the state's garbage collection.
> It seems kind of annoying, but on the other hand forgetting to set a final 
> timer to flush state is probably data loss most of the time.
> FlinkRunner does this work very simply, but other runners, such as 
> DirectRunner, need to traverse all the states to do this, and maybe it's a 
> little hard.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1589) Add OnWindowExpiration method to Stateful DoFn

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1589?focusedWorklogId=419838=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419838
 ]

ASF GitHub Bot logged work on BEAM-1589:


Author: ASF GitHub Bot
Created on: 09/Apr/20 22:24
Start Date: 09/Apr/20 22:24
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #11350: [BEAM-1589] 
Added @onWindowExpiration annotation.
URL: https://github.com/apache/beam/pull/11350#discussion_r406508479
 
 

 ##
 File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/DoFnRunner.java
 ##
 @@ -52,6 +52,12 @@ void onTimer(
*/
   void finishBundle();
 
+  /**
+   * Calls a {@link DoFn DoFn's} {@link DoFn.OnWindowExpiration 
@OnWindowExpiration} method and
+   * performs additional task, such as extracts a value saved in a state 
before garbage collection.
+   */
+  void onWindowExpiration(BoundedWindow window, Instant timestamp, TimeDomain 
timeDomain);
 
 Review comment:
   What do timestamp and timeDomain mean in this context? 
   
   Also presumably you do want to be able to access the key in 
onWindowExpiration
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419838)
Time Spent: 2h 40m  (was: 2.5h)

> Add OnWindowExpiration method to Stateful DoFn
> --
>
> Key: BEAM-1589
> URL: https://issues.apache.org/jira/browse/BEAM-1589
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core, sdk-java-core
>Reporter: Jingsong Lee
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> See BEAM-1517
> This allows the user to do some work before the state's garbage collection.
> It seems kind of annoying, but on the other hand forgetting to set a final 
> timer to flush state is probably data loss most of the time.
> FlinkRunner does this work very simply, but other runners, such as 
> DirectRunner, need to traverse all the states to do this, and maybe it's a 
> little hard.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9735) Performance regression in Python Batch pipeline in Reshuffle

2020-04-09 Thread Ankur Goenka (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka updated BEAM-9735:
---
Priority: Blocker  (was: Major)

> Performance regression in Python Batch pipeline in Reshuffle
> 
>
> Key: BEAM-9735
> URL: https://issues.apache.org/jira/browse/BEAM-9735
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Blocker
> Fix For: 2.21.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1819) Key should be available in @OnTimer methods

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1819?focusedWorklogId=419823=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419823
 ]

ASF GitHub Bot logged work on BEAM-1819:


Author: ASF GitHub Bot
Created on: 09/Apr/20 22:10
Start Date: 09/Apr/20 22:10
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #11154: [BEAM-1819] Key 
should be available in @OnTimer methods
URL: https://github.com/apache/beam/pull/11154#issuecomment-611777422
 
 
   Run Java Flink PortableValidatesRunner Streaming
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419823)
Time Spent: 16.5h  (was: 16h 20m)

> Key should be available in @OnTimer methods
> ---
>
> Key: BEAM-1819
> URL: https://issues.apache.org/jira/browse/BEAM-1819
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 16.5h
>  Remaining Estimate: 0h
>
> Every timer firing has an associated key. This key should be available when 
> the timer is delivered to a user's {{DoFn}}, so they don't have to store it 
> in state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1819) Key should be available in @OnTimer methods

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1819?focusedWorklogId=419820=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419820
 ]

ASF GitHub Bot logged work on BEAM-1819:


Author: ASF GitHub Bot
Created on: 09/Apr/20 22:09
Start Date: 09/Apr/20 22:09
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #11154: [BEAM-1819] Key 
should be available in @OnTimer methods
URL: https://github.com/apache/beam/pull/11154#issuecomment-611776987
 
 
   run dataflow validatesrunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419820)
Time Spent: 16h  (was: 15h 50m)

> Key should be available in @OnTimer methods
> ---
>
> Key: BEAM-1819
> URL: https://issues.apache.org/jira/browse/BEAM-1819
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 16h
>  Remaining Estimate: 0h
>
> Every timer firing has an associated key. This key should be available when 
> the timer is delivered to a user's {{DoFn}}, so they don't have to store it 
> in state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1819) Key should be available in @OnTimer methods

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1819?focusedWorklogId=419821=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419821
 ]

ASF GitHub Bot logged work on BEAM-1819:


Author: ASF GitHub Bot
Created on: 09/Apr/20 22:09
Start Date: 09/Apr/20 22:09
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #11154: [BEAM-1819] Key 
should be available in @OnTimer methods
URL: https://github.com/apache/beam/pull/11154#issuecomment-611777030
 
 
   run flink validatesrunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419821)
Time Spent: 16h 10m  (was: 16h)

> Key should be available in @OnTimer methods
> ---
>
> Key: BEAM-1819
> URL: https://issues.apache.org/jira/browse/BEAM-1819
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 16h 10m
>  Remaining Estimate: 0h
>
> Every timer firing has an associated key. This key should be available when 
> the timer is delivered to a user's {{DoFn}}, so they don't have to store it 
> in state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1819) Key should be available in @OnTimer methods

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1819?focusedWorklogId=419822=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419822
 ]

ASF GitHub Bot logged work on BEAM-1819:


Author: ASF GitHub Bot
Created on: 09/Apr/20 22:09
Start Date: 09/Apr/20 22:09
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on issue #11154: [BEAM-1819] Key 
should be available in @OnTimer methods
URL: https://github.com/apache/beam/pull/11154#issuecomment-611777072
 
 
   run spark validatesrunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419822)
Time Spent: 16h 20m  (was: 16h 10m)

> Key should be available in @OnTimer methods
> ---
>
> Key: BEAM-1819
> URL: https://issues.apache.org/jira/browse/BEAM-1819
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Thomas Groh
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 16h 20m
>  Remaining Estimate: 0h
>
> Every timer firing has an associated key. This key should be available when 
> the timer is delivered to a user's {{DoFn}}, so they don't have to store it 
> in state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9727) Auto populate required feature experiment flags for enable dataflow runner v2

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9727?focusedWorklogId=419818=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419818
 ]

ASF GitHub Bot logged work on BEAM-9727:


Author: ASF GitHub Bot
Created on: 09/Apr/20 22:08
Start Date: 09/Apr/20 22:08
Worklog Time Spent: 10m 
  Work Description: ananvay commented on issue #11355: [BEAM-9727] 
Automatically set required experiment flags for dataflow runner v2.
URL: https://github.com/apache/beam/pull/11355#issuecomment-611776464
 
 
   LGTM
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419818)
Time Spent: 3h  (was: 2h 50m)

> Auto populate required feature experiment flags for enable dataflow runner v2
> -
>
> Key: BEAM-9727
> URL: https://issues.apache.org/jira/browse/BEAM-9727
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9735) Performance regression in Python Batch pipeline in Reshuffle

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9735?focusedWorklogId=419819=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419819
 ]

ASF GitHub Bot logged work on BEAM-9735:


Author: ASF GitHub Bot
Created on: 09/Apr/20 22:08
Start Date: 09/Apr/20 22:08
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #11365: [BEAM-9735] 
Adding Always trigger and using it in Reshuffle
URL: https://github.com/apache/beam/pull/11365
 
 
   R: @robertwb 
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 

[jira] [Created] (BEAM-9735) Performance regression in Python Batch pipeline in Reshuffle

2020-04-09 Thread Ankur Goenka (Jira)
Ankur Goenka created BEAM-9735:
--

 Summary: Performance regression in Python Batch pipeline in 
Reshuffle
 Key: BEAM-9735
 URL: https://issues.apache.org/jira/browse/BEAM-9735
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Ankur Goenka
Assignee: Ankur Goenka
 Fix For: 2.21.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419817=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419817
 ]

ASF GitHub Bot logged work on BEAM-9651:


Author: ASF GitHub Bot
Created on: 09/Apr/20 22:04
Start Date: 09/Apr/20 22:04
Worklog Time Spent: 10m 
  Work Description: reuvenlax commented on pull request #11364: [BEAM-9651] 
Prevent StreamPool and stream initialization livelock
URL: https://github.com/apache/beam/pull/11364
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419817)
Time Spent: 40m  (was: 0.5h)

> StreamingDataflowWorker stuck waiting for 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
> --
>
> Key: BEAM-9651
> URL: https://issues.apache.org/jira/browse/BEAM-9651
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Operation ongoing in step  for at least 28h10m00s without 
> outputting or completing in state windmill-read at 
> sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at 
> java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at 
> java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown
>  Source) at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389)
>  at
> 
> Because the stream is started in a StreamPool synchronized block, all other 
> threads interacting with StreamPool to get or release streams end up blocking.
> It is unclear if the stream never became usable and thus blocked forever or 
> if there is a race with the use of the Phaser that causes the stuckness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9278) Make HBase client a provided dependency in HBaseIO

2020-04-09 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-9278:
--

Assignee: (was: Ismaël Mejía)

> Make HBase client a provided dependency in HBaseIO
> --
>
> Key: BEAM-9278
> URL: https://issues.apache.org/jira/browse/BEAM-9278
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-hbase
>Reporter: Ismaël Mejía
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> HBaseIO relies on included version of `hbase-shaded-client`. However it is 
> common that Hadoop distributions provide their own client versions, so it is 
> a good idea to allow users to provide their own shaded client versions.
> This in particular enables users to provide HBase 2 based dependencies. We 
> are not yet testing deps against Beam due to unrelated issues at BEAM-7435 
> (so Beam internal testing is still WIP).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=419813=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419813
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 09/Apr/20 22:01
Start Date: 09/Apr/20 22:01
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #11243: 
[BEAM-9136]Add licenses for dependencies for Java
URL: https://github.com/apache/beam/pull/11243#discussion_r406502326
 
 

 ##
 File path: sdks/java/container/build.gradle
 ##
 @@ -16,7 +16,11 @@
  * limitations under the License.
  */
 
-plugins { id 'org.apache.beam.module' }
+plugins {
+  id 'org.apache.beam.module'
+  id 'com.github.jk1.dependency-license-report' version '1.13'
 
 Review comment:
   If I understand it correctly, dependencies without licenses/notice included 
in jar are *considered* as the missing dependencies at first. However, the 
reality is dependency scan is correct, it only missing licenses.  
`com.github.jk1.dependency-license-report` generates `index.json` file as well, 
which provides a list of all dependencies. I used this file to go through all 
dependencies and pull licenses/notices if they are not pulled automatically.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419813)
Time Spent: 20h 40m  (was: 20.5h)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 20h 40m
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=419814=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419814
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 09/Apr/20 22:01
Start Date: 09/Apr/20 22:01
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #11243: 
[BEAM-9136]Add licenses for dependencies for Java
URL: https://github.com/apache/beam/pull/11243#discussion_r406502531
 
 

 ##
 File path: sdks/java/container/license_scripts/license_script.sh
 ##
 @@ -0,0 +1,37 @@
+ # Licensed to the Apache Software Foundation (ASF) under one
+ # or more contributor license agreements.  See the NOTICE file
+ # distributed with this work for additional information
+ # regarding copyright ownership.  The ASF licenses this file
+ # to you under the Apache License, Version 2.0 (the
+ # License); you may not use this file except in compliance
+ # with the License.  You may obtain a copy of the License at
+ #
+ # http://www.apache.org/licenses/LICENSE-2.0
+ #
+ # Unless required by applicable law or agreed to in writing, software
+ # distributed under the License is distributed on an AS IS BASIS,
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ # See the License for the specific language governing permissions and
+ # limitations under the License.
+
+set -e
+# reports are generated at ~/beam/java_third_party_licenses
+./gradlew generateLicenseReport --rerun-tasks
 
 Review comment:
   This is same as answered below.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419814)
Time Spent: 20h 50m  (was: 20h 40m)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-7770) Add ReadAll transform for SolrIO

2020-04-09 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-7770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-7770 started by Ismaël Mejía.
--
> Add ReadAll transform for SolrIO
> 
>
> Key: BEAM-7770
> URL: https://issues.apache.org/jira/browse/BEAM-7770
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-solr
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SolrIO already uses internally a composable approach but we need to expose an 
> explicit ReadAll transform that allows user to create reads in the middle of 
> the Pipeline to improve composability (e.g. Reads in the middle of a 
> Pipeline).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-7429) Rename ReadAllViaFileBasedSource to ReadFilesViaFileBasedSource

2020-04-09 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-7429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-7429:
--

Assignee: Ismaël Mejía

> Rename ReadAllViaFileBasedSource to ReadFilesViaFileBasedSource
> ---
>
> Key: BEAM-7429
> URL: https://issues.apache.org/jira/browse/BEAM-7429
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>
> `ReadAllViaFileBasedSource` is not used by the `ReadAll` transform, but by 
> the `ReadFiles` transform and uses ReadableFile objects so it makes sense to 
> better call it `ReadFilesViaFileBasedSource`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-7429) Rename ReadAllViaFileBasedSource to ReadFilesViaFileBasedSource

2020-04-09 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-7429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-7429:
--

Assignee: (was: Ismaël Mejía)

> Rename ReadAllViaFileBasedSource to ReadFilesViaFileBasedSource
> ---
>
> Key: BEAM-7429
> URL: https://issues.apache.org/jira/browse/BEAM-7429
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Priority: Minor
>
> `ReadAllViaFileBasedSource` is not used by the `ReadAll` transform, but by 
> the `ReadFiles` transform and uses ReadableFile objects so it makes sense to 
> better call it `ReadFilesViaFileBasedSource`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-9095) API improvements for KuduIO

2020-04-09 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-9095 started by Ismaël Mejía.
--
> API improvements for KuduIO
> ---
>
> Key: BEAM-9095
> URL: https://issues.apache.org/jira/browse/BEAM-9095
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kudu
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>
> KuduIO API has some minor issues that could easily be fixed including typos, 
> inconsistent (with kudu-client) types for some parameters and Beam transform 
> style improvements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9726) Don't require --region for non-service Dataflow endpoints.

2020-04-09 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver resolved BEAM-9726.
---
Resolution: Fixed

> Don't require --region for non-service Dataflow endpoints.
> --
>
> Key: BEAM-9726
> URL: https://issues.apache.org/jira/browse/BEAM-9726
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Some Dataflow internal tests don't run on the real Dataflow service. Since 
> region only applies to the real Dataflow service, we should not require these 
> tests to specify a region.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9428) CVEs in the dependencies of hive-exec for HiveIO

2020-04-09 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-9428:
--

Assignee: (was: Ismaël Mejía)

> CVEs in the dependencies of hive-exec for HiveIO
> 
>
> Key: BEAM-9428
> URL: https://issues.apache.org/jira/browse/BEAM-9428
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-hcatalog
>Reporter: XuCongying
>Priority: Major
> Attachments: apache-beam_CVE-report.md
>
>
> Hello, Your project uses some dependencies with CVEs. I found that the buggy 
> methods of the CVEs are in the program execution path of your project, which 
> makes your project at risk. I suggest a library update. See details below:
>  * *Vulnerable Dependency:* org.apache.hive : hive-exec : 2.1.0
>  * *Call Chain to Buggy Methods:*
>  ** *Some files in your project call the library method 
> org.apache.hadoop.hive.ql.Driver.run(java.lang.String), which can reach the 
> buggy method of 
> [CVE-2017-12625|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-12625].*
>  *** Files in your project:  
> sdks/java/io/hcatalog/src/main/java/org/apache/beam/sdk/io/hcatalog/test/EmbeddedMetastoreService.java
>  *** One of the possible call chain:
> org.apache.hadoop.hive.ql.Driver.run(java.lang.String)
> org.apache.hadoop.hive.ql.Driver.run(java.lang.String,boolean)
> org.apache.hadoop.hive.ql.Driver.runInternal(java.lang.String,boolean)
> org.apache.hadoop.hive.ql.Driver.compileInternal(java.lang.String)
> org.apache.hadoop.hive.ql.Driver.compile(java.lang.String)
> org.apache.hadoop.hive.ql.Driver.compile(java.lang.String,boolean)
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(java.lang.String,org.apache.hadoop.hive.ql.Context)
>  [buggy method]
>  ** *Update suggestion:* version 3.1.2 3.1.2 is a safe version without CVEs. 
> From 2.1.0 to 3.1.2, 2 of the APIs (called by 2 times in your project) were 
> removed, 3 APIs (called by 3 times in your project) were modified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-9406) Convert KuduIO away from BoundedSource

2020-04-09 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-9406 started by Ismaël Mejía.
--
> Convert KuduIO away from BoundedSource
> --
>
> Key: BEAM-9406
> URL: https://issues.apache.org/jira/browse/BEAM-9406
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kudu
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>
> Convert KuduIO to use the DoFn API instead of BoundedSource to be consistent 
> with recent patterns of use on Beam.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9351) Upgrade Hive/HCatalog to version 2.3.6

2020-04-09 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-9351:
--

Assignee: (was: Ismaël Mejía)

> Upgrade Hive/HCatalog to version 2.3.6
> --
>
> Key: BEAM-9351
> URL: https://issues.apache.org/jira/browse/BEAM-9351
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-hcatalog
>Reporter: Ismaël Mejía
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Beam Hive/HCatalog dependency is a bit outdated, it is probably a good idea 
> to update it to the most recent stable stable version of the 2.x.x line.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=419808=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419808
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:45
Start Date: 09/Apr/20 21:45
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #11243: 
[BEAM-9136]Add licenses for dependencies for Java
URL: https://github.com/apache/beam/pull/11243#discussion_r406495748
 
 

 ##
 File path: sdks/java/container/license_scripts/pull_licenses_java.py
 ##
 @@ -0,0 +1,186 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+"""
+A script to pull licenses/notices/source code for Java dependencies.
 
 Review comment:
   We can do that by adding some more code to the scripts. I will add it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419808)
Time Spent: 20.5h  (was: 20h 20m)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 20.5h
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9734) Revert https://github.com/apache/beam/pull/11122 which is a potential regression

2020-04-09 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9734:
---
Status: Open  (was: Triage Needed)

> Revert https://github.com/apache/beam/pull/11122 which is a potential 
> regression
> 
>
> Key: BEAM-9734
> URL: https://issues.apache.org/jira/browse/BEAM-9734
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Blocker
> Fix For: 2.21.0
>
>
> This is potentially a regression for Dataflow. We should revert and 
> re-introduce as an optional change that can be controlled by a user option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=419806=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419806
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:43
Start Date: 09/Apr/20 21:43
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #11243: 
[BEAM-9136]Add licenses for dependencies for Java
URL: https://github.com/apache/beam/pull/11243#discussion_r406494846
 
 

 ##
 File path: sdks/java/container/build.gradle
 ##
 @@ -68,6 +73,28 @@ golang {
   }
 }
 
+// this is a workaround to call generateLicenseReport task from project root 
directory.
+// generateLicenseReport does not return correct dependency list when not 
called from the root.
+task generateThirdPartyLicenses(type: Exec) {
+  workingDir project.rootProject.projectDir
+  commandLine './sdks/java/container/license_scripts/license_script.sh'
 
 Review comment:
   When the `generateLicenseReport` task is triggered by `dependsOn`, the task 
is not running at the project root directory regardless how to set project 
scope, and dependency list is not correct. This is a workaround to execute the 
`generateLicenseReport` from root directory and get correct dependency list.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419806)
Time Spent: 20h 20m  (was: 20h 10m)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 20h 20m
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=419801=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419801
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:37
Start Date: 09/Apr/20 21:37
Worklog Time Spent: 10m 
  Work Description: alanmyrvold commented on pull request #11243: 
[BEAM-9136]Add licenses for dependencies for Java
URL: https://github.com/apache/beam/pull/11243#discussion_r406487569
 
 

 ##
 File path: sdks/java/container/build.gradle
 ##
 @@ -16,7 +16,11 @@
  * limitations under the License.
  */
 
-plugins { id 'org.apache.beam.module' }
+plugins {
+  id 'org.apache.beam.module'
+  id 'com.github.jk1.dependency-license-report' version '1.13'
 
 Review comment:
   Kenn mentioned hierynomus/license-gradle-plugin found  more accurate license 
dependencies; does that need to be added?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419801)
Time Spent: 19h 50m  (was: 19h 40m)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 19h 50m
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=419804=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419804
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:37
Start Date: 09/Apr/20 21:37
Worklog Time Spent: 10m 
  Work Description: alanmyrvold commented on pull request #11243: 
[BEAM-9136]Add licenses for dependencies for Java
URL: https://github.com/apache/beam/pull/11243#discussion_r40641
 
 

 ##
 File path: sdks/java/container/build.gradle
 ##
 @@ -68,6 +73,28 @@ golang {
   }
 }
 
+// this is a workaround to call generateLicenseReport task from project root 
directory.
+// generateLicenseReport does not return correct dependency list when not 
called from the root.
+task generateThirdPartyLicenses(type: Exec) {
+  workingDir project.rootProject.projectDir
+  commandLine './sdks/java/container/license_scripts/license_script.sh'
 
 Review comment:
   Why does the gradle task need to start a shell script that runs gradle? Can 
the gradle task be a dependency instead?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419804)
Time Spent: 20h 10m  (was: 20h)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 20h 10m
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=419803=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419803
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:37
Start Date: 09/Apr/20 21:37
Worklog Time Spent: 10m 
  Work Description: alanmyrvold commented on pull request #11243: 
[BEAM-9136]Add licenses for dependencies for Java
URL: https://github.com/apache/beam/pull/11243#discussion_r406488322
 
 

 ##
 File path: sdks/java/container/license_scripts/license_script.sh
 ##
 @@ -0,0 +1,37 @@
+ # Licensed to the Apache Software Foundation (ASF) under one
+ # or more contributor license agreements.  See the NOTICE file
+ # distributed with this work for additional information
+ # regarding copyright ownership.  The ASF licenses this file
+ # to you under the Apache License, Version 2.0 (the
+ # License); you may not use this file except in compliance
+ # with the License.  You may obtain a copy of the License at
+ #
+ # http://www.apache.org/licenses/LICENSE-2.0
+ #
+ # Unless required by applicable law or agreed to in writing, software
+ # distributed under the License is distributed on an AS IS BASIS,
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ # See the License for the specific language governing permissions and
+ # limitations under the License.
+
+set -e
+# reports are generated at ~/beam/java_third_party_licenses
+./gradlew generateLicenseReport --rerun-tasks
 
 Review comment:
   Why does the gradle task need to run gradle? Can it be a dependency above?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419803)
Time Spent: 20h 10m  (was: 20h)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 20h 10m
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=419802=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419802
 ]

ASF GitHub Bot logged work on BEAM-9136:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:37
Start Date: 09/Apr/20 21:37
Worklog Time Spent: 10m 
  Work Description: alanmyrvold commented on pull request #11243: 
[BEAM-9136]Add licenses for dependencies for Java
URL: https://github.com/apache/beam/pull/11243#discussion_r406491805
 
 

 ##
 File path: sdks/java/container/license_scripts/pull_licenses_java.py
 ##
 @@ -0,0 +1,186 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+"""
+A script to pull licenses/notices/source code for Java dependencies.
 
 Review comment:
   In addition to pulling the licenses, can this script generate a CSV of 
Library name, link to license, license name (like Apache, LGPL, etc), and 
whether source is downloaded/included?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419802)
Time Spent: 20h  (was: 19h 50m)

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 20h
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9722) Add SnowflakeIO to Java SDK

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9722?focusedWorklogId=419800=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419800
 ]

ASF GitHub Bot logged work on BEAM-9722:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:37
Start Date: 09/Apr/20 21:37
Worklog Time Spent: 10m 
  Work Description: takidau commented on pull request #11360: 
[WIP][BEAM-9722] added SnowflakeIO with Read operation
URL: https://github.com/apache/beam/pull/11360#discussion_r406491845
 
 

 ##
 File path: 
sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/SnowflakeIO.java
 ##
 @@ -0,0 +1,691 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.snowflake;
+
+import static org.apache.beam.sdk.io.TextIO.readFiles;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+
+import com.google.auto.value.AutoValue;
+import com.opencsv.CSVParser;
+import com.opencsv.CSVParserBuilder;
+import java.io.IOException;
+import java.io.Serializable;
+import java.security.PrivateKey;
+import java.sql.Connection;
+import java.sql.SQLException;
+import java.text.SimpleDateFormat;
+import java.util.Date;
+import java.util.UUID;
+import java.util.concurrent.ConcurrentHashMap;
+import javax.annotation.Nullable;
+import javax.sql.DataSource;
+import net.snowflake.client.jdbc.SnowflakeBasicDataSource;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import 
org.apache.beam.sdk.io.snowflake.credentials.KeyPairSnowflakeCredentials;
+import 
org.apache.beam.sdk.io.snowflake.credentials.OAuthTokenSnowflakeCredentials;
+import org.apache.beam.sdk.io.snowflake.credentials.SnowflakeCredentials;
+import 
org.apache.beam.sdk.io.snowflake.credentials.UsernamePasswordSnowflakeCredentials;
+import org.apache.beam.sdk.options.ValueProvider;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.SerializableFunction;
+import org.apache.beam.sdk.transforms.Wait;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.transforms.display.HasDisplayData;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * IO to read and write data on Snowflake.
+ *
+ * SnowflakeIO uses https://docs.snowflake.net/manuals/user-guide/jdbc.html;>Snowflake
+ * JDBC driver under the hood, but data isn't read/written using JDBC 
directly. Instead,
+ * SnowflakeIO uses dedicated COPY operations to read/write data 
from/to Google Cloud
+ * Storage.
+ *
+ * To configure SnowflakeIO to read/write from your Snowflake instance, you 
have to provide a
+ * {@link DataSourceConfiguration} using {@link
+ * DataSourceConfiguration#create(SnowflakeCredentials)}, where {@link 
SnowflakeCredentials might be
+ * created using {@link 
org.apache.beam.sdk.io.snowflake.credentials.SnowflakeCredentialsFactory}}.
+ * Additionally one of {@link DataSourceConfiguration#withServerName(String)} 
or {@link
+ * DataSourceConfiguration#withUrl(String)} must be used to tell SnowflakeIO 
which instance to use.
+ * 
+ * There are also other options available to configure connection to Snowflake:
+ *
+ * 
+ *   {@link DataSourceConfiguration#withWarehouse(String)} to specify 
which Warehouse to use
+ *   {@link DataSourceConfiguration#withDatabase(String)} to specify which 
Database to connect
+ *   to
+ *   {@link DataSourceConfiguration#withSchema(String)} to specify which 
schema to use
+ *   {@link DataSourceConfiguration#withRole(String)} to specify which 
role to use
+ *   {@link DataSourceConfiguration#withPortNumber(int)} to specify custom 
port of Snowflake
+ *   instance
+ * 
+ *
+ * For example:
+ *
+ * {@code
+ * SnowflakeIO.DataSourceConfiguration dataSourceConfiguration =
+ * 
SnowflakeIO.DataSourceConfiguration.create(SnowflakeCredentialsFactory.of(options))
+ *  

[jira] [Work logged] (BEAM-9727) Auto populate required feature experiment flags for enable dataflow runner v2

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9727?focusedWorklogId=419799=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419799
 ]

ASF GitHub Bot logged work on BEAM-9727:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:37
Start Date: 09/Apr/20 21:37
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #11355: [BEAM-9727] 
Automatically set required experiment flags for dataflow runner v2.
URL: https://github.com/apache/beam/pull/11355#issuecomment-611764830
 
 
   > Consider adding the logic here instead:
   > 
https://github.com/apache/beam/blob/79b2d87b59819ee55fb8600e8a845c6ba5b98d64/sdks/python/apache_beam/pipeline.py#L206
   > 
   > This would add the experiment slightly earlier then when the first 
ptransform is applied.
   
   I initially considered to add it into pipeline.py following this check, but 
feel that the logic is too dataflow-specific to be added here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419799)
Time Spent: 2h 50m  (was: 2h 40m)

> Auto populate required feature experiment flags for enable dataflow runner v2
> -
>
> Key: BEAM-9727
> URL: https://issues.apache.org/jira/browse/BEAM-9727
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419796=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419796
 ]

ASF GitHub Bot logged work on BEAM-9562:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:36
Start Date: 09/Apr/20 21:36
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #11314: [BEAM-9562] 
Send Timers over Data Channel as Elements
URL: https://github.com/apache/beam/pull/11314#discussion_r406491316
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py
 ##
 @@ -536,7 +525,8 @@ def _run_stage(self,
 runner_execution_context,
 bundle_context_manager,
 data_input,
-data_output,
+data_output, {},
 
 Review comment:
   yapf helps me put the {} here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419796)
Time Spent: 20h 20m  (was: 20h 10m)

> Remove timer from PCollection and treat timers as Elements 
> ---
>
> Key: BEAM-9562
> URL: https://issues.apache.org/jira/browse/BEAM-9562
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-harness, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 20h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419797=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419797
 ]

ASF GitHub Bot logged work on BEAM-9651:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:36
Start Date: 09/Apr/20 21:36
Worklog Time Spent: 10m 
  Work Description: dpmills commented on issue #11364: [BEAM-9651] Prevent 
StreamPool and stream initialization livelock
URL: https://github.com/apache/beam/pull/11364#issuecomment-611764743
 
 
   LGTM
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419797)
Time Spent: 0.5h  (was: 20m)

> StreamingDataflowWorker stuck waiting for 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
> --
>
> Key: BEAM-9651
> URL: https://issues.apache.org/jira/browse/BEAM-9651
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Operation ongoing in step  for at least 28h10m00s without 
> outputting or completing in state windmill-read at 
> sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at 
> java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at 
> java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown
>  Source) at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389)
>  at
> 
> Because the stream is started in a StreamPool synchronized block, all other 
> threads interacting with StreamPool to get or release streams end up blocking.
> It is unclear if the stream never became usable and thus blocked forever or 
> if there is a race with the use of the Phaser that causes the stuckness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419793=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419793
 ]

ASF GitHub Bot logged work on BEAM-9562:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:34
Start Date: 09/Apr/20 21:34
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #11314: [BEAM-9562] 
Send Timers over Data Channel as Elements
URL: https://github.com/apache/beam/pull/11314#discussion_r406490454
 
 

 ##
 File path: sdks/python/apache_beam/transforms/core.py
 ##
 @@ -1272,6 +1272,8 @@ def expand(self, pcoll):
 key_coder = coder.key_coder()
   else:
 key_coder = coders.registry.get_coder(typehints.Any)
+  self.window_coder = pcoll.windowing.windowfn.get_window_coder()
 
 Review comment:
   No. Will removed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419793)
Time Spent: 19h 50m  (was: 19h 40m)

> Remove timer from PCollection and treat timers as Elements 
> ---
>
> Key: BEAM-9562
> URL: https://issues.apache.org/jira/browse/BEAM-9562
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-harness, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 19h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419795=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419795
 ]

ASF GitHub Bot logged work on BEAM-9562:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:34
Start Date: 09/Apr/20 21:34
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #11314: [BEAM-9562] 
Send Timers over Data Channel as Elements
URL: https://github.com/apache/beam/pull/11314#discussion_r406490556
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py
 ##
 @@ -987,8 +1019,10 @@ def __init__(
 
   def process_bundle(self,
  inputs,  # type: Mapping[str, PartitionableBuffer]
- expected_outputs  # type: DataOutput
-):
+ expected_outputs,  # type: DataOutput
+ fired_timers,  # type: Mapping[str, Mapping[str, 
PartitionableBuffer]]
 
 Review comment:
   I updated the `fired_timers` implementation but forgot to update the typing 
here. Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419795)
Time Spent: 20h 10m  (was: 20h)

> Remove timer from PCollection and treat timers as Elements 
> ---
>
> Key: BEAM-9562
> URL: https://issues.apache.org/jira/browse/BEAM-9562
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-harness, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 20h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419794=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419794
 ]

ASF GitHub Bot logged work on BEAM-9562:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:34
Start Date: 09/Apr/20 21:34
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #11314: [BEAM-9562] 
Send Timers over Data Channel as Elements
URL: https://github.com/apache/beam/pull/11314#discussion_r406490504
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -837,25 +869,59 @@ def process_bundle(self, instruction_id):
 op.execution_context = execution_context
 op.start()
 
-  # Inject inputs from data plane.
+  # Each data_channel is mapped to a list of expected inputs which includes
+  # both data input and timer input. The data input is identied by
+  # transform_id. The data input is identified by
+  # (transform_id, timer_family_id).
   data_channels = collections.defaultdict(
   list
   )  # type: DefaultDict[data_plane.GrpcClientDataChannel, List[str]]
+
+  # Inject data inputs from data plane.
 
 Review comment:
   Updated the comment.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419794)
Time Spent: 20h  (was: 19h 50m)

> Remove timer from PCollection and treat timers as Elements 
> ---
>
> Key: BEAM-9562
> URL: https://issues.apache.org/jira/browse/BEAM-9562
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-harness, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 20h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419783=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419783
 ]

ASF GitHub Bot logged work on BEAM-9651:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:16
Start Date: 09/Apr/20 21:16
Worklog Time Spent: 10m 
  Work Description: scwhittle commented on issue #11364: [BEAM-9651] 
Prevent StreamPool and stream initialization livelock
URL: https://github.com/apache/beam/pull/11364#issuecomment-611756669
 
 
   R: @dpmills @reuvenlax 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419783)
Time Spent: 20m  (was: 10m)

> StreamingDataflowWorker stuck waiting for 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
> --
>
> Key: BEAM-9651
> URL: https://issues.apache.org/jira/browse/BEAM-9651
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Operation ongoing in step  for at least 28h10m00s without 
> outputting or completing in state windmill-read at 
> sun.misc.Unsafe.park(Native Method) at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at 
> java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at 
> java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at 
> java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at 
> org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown
>  Source) at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158)
>  at 
> org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191)
>  at 
> org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328)
>  at 
> org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389)
>  at
> 
> Because the stream is started in a StreamPool synchronized block, all other 
> threads interacting with StreamPool to get or release streams end up blocking.
> It is unclear if the stream never became usable and thus blocked forever or 
> if there is a race with the use of the Phaser that causes the stuckness.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419780=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419780
 ]

ASF GitHub Bot logged work on BEAM-9562:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:15
Start Date: 09/Apr/20 21:15
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #11314: [BEAM-9562] 
Send Timers over Data Channel as Elements
URL: https://github.com/apache/beam/pull/11314#discussion_r406481820
 
 

 ##
 File path: sdks/python/apache_beam/transforms/core.py
 ##
 @@ -1323,12 +1325,47 @@ def _pardo_fn_data(self):
 windowing = None
 return self.fn, self.args, self.kwargs, si_tags_and_types, windowing
 
-  def to_runner_api_parameter(self, context):
+  def _get_key_and_window_coder(self, named_inputs):
+if named_inputs is None or not self._signature.is_stateful_dofn():
+  return None, None
+main_input = list(set(named_inputs.keys()) - set(self.side_inputs))[0]
+input_pcoll = named_inputs[main_input]
+kv_type_hint = input_pcoll.element_type
+if kv_type_hint and kv_type_hint != typehints.Any:
+  coder = coders.registry.get_coder(kv_type_hint)
+  if not coder.is_kv_coder():
+raise ValueError(
+'Input elements to the transform %s with stateful DoFn must be '
+'key-value pairs.' % self)
+  key_coder = coder.key_coder()
+else:
+  key_coder = coders.registry.get_coder(typehints.Any)
+window_coder = input_pcoll.windowing.windowfn.get_window_coder()
+return key_coder, window_coder
+
+  def to_runner_api(self, context, **extra_kwargs):
 
 Review comment:
   We can delete this override since we pass `extra_kwargs` from `PTransform` 
now. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419780)
Time Spent: 19h 40m  (was: 19.5h)

> Remove timer from PCollection and treat timers as Elements 
> ---
>
> Key: BEAM-9562
> URL: https://issues.apache.org/jira/browse/BEAM-9562
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-harness, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 19h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419782=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419782
 ]

ASF GitHub Bot logged work on BEAM-9651:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:15
Start Date: 09/Apr/20 21:15
Worklog Time Spent: 10m 
  Work Description: scwhittle commented on pull request #11364: [BEAM-9651] 
Prevent StreamPool and stream initialization livelock
URL: https://github.com/apache/beam/pull/11364
 
 
   Reduce StreamPool synchronized block section to not include stream
   creation which includes sending initial messages. Additionally remove it 
from being
   required to release holds. Some pipelines were observing stuck get and
   release methods on synchronization.  This was due to a stuck stream
   creation waiting for the onReady callback. This could possibly have
   been due to the callback being queued behind other callbacks on
   blocked grpc threads. This was addressed in the DirectStreamObserver
   used for the fn api so we reuse it here.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 

[jira] [Work logged] (BEAM-9727) Auto populate required feature experiment flags for enable dataflow runner v2

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9727?focusedWorklogId=419777=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419777
 ]

ASF GitHub Bot logged work on BEAM-9727:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:12
Start Date: 09/Apr/20 21:12
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11355: [BEAM-9727] 
Automatically set required experiment flags for dataflow runner v2.
URL: https://github.com/apache/beam/pull/11355#issuecomment-611754766
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419777)
Time Spent: 2h 40m  (was: 2.5h)

> Auto populate required feature experiment flags for enable dataflow runner v2
> -
>
> Key: BEAM-9727
> URL: https://issues.apache.org/jira/browse/BEAM-9727
> Project: Beam
>  Issue Type: Task
>  Components: runner-dataflow
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9346) TFRecordIO inefficient read from sideinput causing pipeline to be slow

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9346?focusedWorklogId=419775=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419775
 ]

ASF GitHub Bot logged work on BEAM-9346:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:11
Start Date: 09/Apr/20 21:11
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11122: 
[BEAM-9346] Improve the efficiency of TFRecordIO
URL: https://github.com/apache/beam/pull/11122#discussion_r406479773
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java
 ##
 @@ -410,13 +412,44 @@ private GatherResults(Coder resultCoder) {
   } else {
 // Pass results via a side input rather than reshuffle, because we 
need to get an empty
 // iterable to finalize if there are no results.
-return input
-.getPipeline()
-.apply(Reify.viewInGlobalWindow(input.apply(View.asList()), 
ListCoder.of(resultCoder)));
+return input.apply("ToList", Combine.globally(new 
ToListCombineFn<>()));
 
 Review comment:
   Created https://issues.apache.org/jira/browse/BEAM-9734 to make sure that 
this does not get into 2.21.0
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419775)
Time Spent: 3h 50m  (was: 3h 40m)

> TFRecordIO inefficient read from sideinput causing pipeline to be slow
> --
>
> Key: BEAM-9346
> URL: https://issues.apache.org/jira/browse/BEAM-9346
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ban Piao
>Assignee: Piotr Szuberski
>Priority: Major
>  Labels: dataflow, easyfix, performance
> Fix For: Not applicable
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> In TFRecordIO, Reify.viewInGlobalWindow(input.apply(View.asList()), 
> ListCoder.of(resultCoder)) is an inefficient way of reading large set of side 
> input.
> Pipeline can be sped up significantly by combinging the PCollection 
> to a single element PCollection>.
> Sample code: 
>  
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java#L412
>  from
> ```
> return input
> .getPipeline()
> .apply(Reify.viewInGlobalWindow(input.apply(View.asList()), 
> ListCoder.of(resultCoder)));
> ```
> to
> ```
> return input.apply("ToList", Combine.globally(new ToListCombineFn<>()));
> ```
> where ToListCombineFn is defined as
> ```
> public static class ToListCombineFn extends CombineFn List, List> {
> @Override
> public List createAccumulator() {
>   return new ArrayList<>();
> }
> @Override
> public List addInput(List mutableAccumulator, ResultT 
> input) {
>   mutableAccumulator.add(input);
>   return mutableAccumulator;
> }
> @Override
> public List mergeAccumulators(Iterable> 
> accumulators) {
>   Iterator> iter = accumulators.iterator();
>   if (!iter.hasNext()) {
> return new ArrayList<>();
>   }
>   List merged = iter.next();
>   while (iter.hasNext()) {
> merged.addAll(iter.next());
>   }
>   return merged;
> }
> @Override
> public List extractOutput(List accumulator) {
>   return accumulator;
> }
>   }
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9734) Revert https://github.com/apache/beam/pull/11122 which is a potential regression

2020-04-09 Thread Chamikara Madhusanka Jayalath (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chamikara Madhusanka Jayalath updated BEAM-9734:

Priority: Blocker  (was: Major)

> Revert https://github.com/apache/beam/pull/11122 which is a potential 
> regression
> 
>
> Key: BEAM-9734
> URL: https://issues.apache.org/jira/browse/BEAM-9734
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Blocker
> Fix For: 2.21.0
>
>
> This is potentially a regression for Dataflow. We should revert and 
> re-introduce as an optional change that can be controlled by a user option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9734) Revert https://github.com/apache/beam/pull/11122 which is a potential regression

2020-04-09 Thread Chamikara Madhusanka Jayalath (Jira)
Chamikara Madhusanka Jayalath created BEAM-9734:
---

 Summary: Revert https://github.com/apache/beam/pull/11122 which is 
a potential regression
 Key: BEAM-9734
 URL: https://issues.apache.org/jira/browse/BEAM-9734
 Project: Beam
  Issue Type: Bug
  Components: sdk-java-core
Reporter: Chamikara Madhusanka Jayalath
Assignee: Chamikara Madhusanka Jayalath
 Fix For: 2.21.0


This is potentially a regression for Dataflow. We should revert and 
re-introduce as an optional change that can be controlled by a user option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9346) TFRecordIO inefficient read from sideinput causing pipeline to be slow

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9346?focusedWorklogId=419768=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419768
 ]

ASF GitHub Bot logged work on BEAM-9346:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:08
Start Date: 09/Apr/20 21:08
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #11122: 
[BEAM-9346] Improve the efficiency of TFRecordIO
URL: https://github.com/apache/beam/pull/11122#discussion_r406478334
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java
 ##
 @@ -410,13 +412,44 @@ private GatherResults(Coder resultCoder) {
   } else {
 // Pass results via a side input rather than reshuffle, because we 
need to get an empty
 // iterable to finalize if there are no results.
-return input
-.getPipeline()
-.apply(Reify.viewInGlobalWindow(input.apply(View.asList()), 
ListCoder.of(resultCoder)));
+return input.apply("ToList", Combine.globally(new 
ToListCombineFn<>()));
 
 Review comment:
   What's the verdict here ? I suggest we revert this and introduce as an 
optional change since this is a potential regression for Dataflow runner.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419768)
Time Spent: 3h 40m  (was: 3.5h)

> TFRecordIO inefficient read from sideinput causing pipeline to be slow
> --
>
> Key: BEAM-9346
> URL: https://issues.apache.org/jira/browse/BEAM-9346
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ban Piao
>Assignee: Piotr Szuberski
>Priority: Major
>  Labels: dataflow, easyfix, performance
> Fix For: Not applicable
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> In TFRecordIO, Reify.viewInGlobalWindow(input.apply(View.asList()), 
> ListCoder.of(resultCoder)) is an inefficient way of reading large set of side 
> input.
> Pipeline can be sped up significantly by combinging the PCollection 
> to a single element PCollection>.
> Sample code: 
>  
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java#L412
>  from
> ```
> return input
> .getPipeline()
> .apply(Reify.viewInGlobalWindow(input.apply(View.asList()), 
> ListCoder.of(resultCoder)));
> ```
> to
> ```
> return input.apply("ToList", Combine.globally(new ToListCombineFn<>()));
> ```
> where ToListCombineFn is defined as
> ```
> public static class ToListCombineFn extends CombineFn List, List> {
> @Override
> public List createAccumulator() {
>   return new ArrayList<>();
> }
> @Override
> public List addInput(List mutableAccumulator, ResultT 
> input) {
>   mutableAccumulator.add(input);
>   return mutableAccumulator;
> }
> @Override
> public List mergeAccumulators(Iterable> 
> accumulators) {
>   Iterator> iter = accumulators.iterator();
>   if (!iter.hasNext()) {
> return new ArrayList<>();
>   }
>   List merged = iter.next();
>   while (iter.hasNext()) {
> merged.addAll(iter.next());
>   }
>   return merged;
> }
> @Override
> public List extractOutput(List accumulator) {
>   return accumulator;
> }
>   }
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9136) Add LICENSES and NOTICES to docker images

2020-04-09 Thread Hannah Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hannah Jiang updated BEAM-9136:
---
Fix Version/s: (was: 2.22.0)
   2.21.0

> Add LICENSES and NOTICES to docker images
> -
>
> Key: BEAM-9136
> URL: https://issues.apache.org/jira/browse/BEAM-9136
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 19h 40m
>  Remaining Estimate: 0h
>
> Scan dependencies and add licenses and notices of the dependencies to SDK 
> docker images.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=419764=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419764
 ]

ASF GitHub Bot logged work on BEAM-9468:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:05
Start Date: 09/Apr/20 21:05
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11151: [BEAM-9468]  
Hl7v2 io
URL: https://github.com/apache/beam/pull/11151#discussion_r406476510
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -454,6 +455,7 @@ class BeamModulePlugin implements Plugin {
 google_api_services_clouddebugger   : 
"com.google.apis:google-api-services-clouddebugger:v2-rev20191003-$google_clients_version",
 google_api_services_cloudresourcemanager: 
"com.google.apis:google-api-services-cloudresourcemanager:v1-rev20191206-$google_clients_version",
 google_api_services_dataflow: 
"com.google.apis:google-api-services-dataflow:v1b3-rev20190927-$google_clients_version",
+google_api_services_healthcare  : 
"com.google.apis:google-api-services-healthcare:v1alpha2-rev20190901-$google_cloud_healthcare_version",
 
 Review comment:
   It doesn't have to be 1.30.3, it could be any later version. The purpose of 
them being the same is to ensure that GCP dependencies are compatible with each 
other so the healthcare dep doesn't break other GCP deps and vice versa.
   
   It sounds like your going to need to update google_clients_version to a 
later version so it supports existing GCP deps and also the new dep your adding.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419764)
Time Spent: 18h 50m  (was: 18h 40m)

> Add Google Cloud Healthcare API IO Connectors
> -
>
> Key: BEAM-9468
> URL: https://issues.apache.org/jira/browse/BEAM-9468
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Minor
>  Time Spent: 18h 50m
>  Remaining Estimate: 0h
>
> Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud 
> Healthcare API|https://cloud.google.com/healthcare/docs/]
> HL7v2IO
> FHIRIO
> DICOM 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=419763=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419763
 ]

ASF GitHub Bot logged work on BEAM-9468:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:04
Start Date: 09/Apr/20 21:04
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #11151: [BEAM-9468]  
Hl7v2 io
URL: https://github.com/apache/beam/pull/11151#discussion_r406476510
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -454,6 +455,7 @@ class BeamModulePlugin implements Plugin {
 google_api_services_clouddebugger   : 
"com.google.apis:google-api-services-clouddebugger:v2-rev20191003-$google_clients_version",
 google_api_services_cloudresourcemanager: 
"com.google.apis:google-api-services-cloudresourcemanager:v1-rev20191206-$google_clients_version",
 google_api_services_dataflow: 
"com.google.apis:google-api-services-dataflow:v1b3-rev20190927-$google_clients_version",
+google_api_services_healthcare  : 
"com.google.apis:google-api-services-healthcare:v1alpha2-rev20190901-$google_cloud_healthcare_version",
 
 Review comment:
   It doesn't have to be 1.30.3, it could be any later version. The purpose of 
them being the same is to ensure that GCP dependencies are compatible with each 
other so the healthcare dep doesn't break other GCP deps and vice versa.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419763)
Time Spent: 18h 40m  (was: 18.5h)

> Add Google Cloud Healthcare API IO Connectors
> -
>
> Key: BEAM-9468
> URL: https://issues.apache.org/jira/browse/BEAM-9468
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Jacob Ferriero
>Assignee: Jacob Ferriero
>Priority: Minor
>  Time Spent: 18h 40m
>  Remaining Estimate: 0h
>
> Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud 
> Healthcare API|https://cloud.google.com/healthcare/docs/]
> HL7v2IO
> FHIRIO
> DICOM 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419762=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419762
 ]

ASF GitHub Bot logged work on BEAM-9562:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:01
Start Date: 09/Apr/20 21:01
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11314: [BEAM-9562] 
Send Timers over Data Channel as Elements
URL: https://github.com/apache/beam/pull/11314#discussion_r406468331
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py
 ##
 @@ -536,7 +525,8 @@ def _run_stage(self,
 runner_execution_context,
 bundle_context_manager,
 data_input,
-data_output,
+data_output, {},
 
 Review comment:
   Put {} on its own line. (Surprised yapf didn't complain, or maybe you 
haven't run it yet.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419762)

> Remove timer from PCollection and treat timers as Elements 
> ---
>
> Key: BEAM-9562
> URL: https://issues.apache.org/jira/browse/BEAM-9562
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-harness, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 19.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419758=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419758
 ]

ASF GitHub Bot logged work on BEAM-9562:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:01
Start Date: 09/Apr/20 21:01
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11314: [BEAM-9562] 
Send Timers over Data Channel as Elements
URL: https://github.com/apache/beam/pull/11314#discussion_r406444781
 
 

 ##
 File path: sdks/python/apache_beam/transforms/core.py
 ##
 @@ -1323,12 +1325,47 @@ def _pardo_fn_data(self):
 windowing = None
 return self.fn, self.args, self.kwargs, si_tags_and_types, windowing
 
-  def to_runner_api_parameter(self, context):
+  def _get_key_and_window_coder(self, named_inputs):
+if named_inputs is None or not self._signature.is_stateful_dofn():
+  return None, None
+main_input = list(set(named_inputs.keys()) - set(self.side_inputs))[0]
+input_pcoll = named_inputs[main_input]
+kv_type_hint = input_pcoll.element_type
+if kv_type_hint and kv_type_hint != typehints.Any:
+  coder = coders.registry.get_coder(kv_type_hint)
+  if not coder.is_kv_coder():
+raise ValueError(
+'Input elements to the transform %s with stateful DoFn must be '
+'key-value pairs.' % self)
+  key_coder = coder.key_coder()
+else:
+  key_coder = coders.registry.get_coder(typehints.Any)
+window_coder = input_pcoll.windowing.windowfn.get_window_coder()
+return key_coder, window_coder
+
+  def to_runner_api(self, context, **extra_kwargs):
 
 Review comment:
   This code looks like it's copied from the superclass, instead just do
   
   ```
   def to_runner_api(self, context, named_inputs, **extra_kwargs):
 super(ParDo, self).to_runner_api, named_inputs=named_inputs, 
**extra_kwargs)
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419758)
Time Spent: 19h 20m  (was: 19h 10m)

> Remove timer from PCollection and treat timers as Elements 
> ---
>
> Key: BEAM-9562
> URL: https://issues.apache.org/jira/browse/BEAM-9562
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-harness, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 19h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419759=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419759
 ]

ASF GitHub Bot logged work on BEAM-9562:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:01
Start Date: 09/Apr/20 21:01
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11314: [BEAM-9562] 
Send Timers over Data Channel as Elements
URL: https://github.com/apache/beam/pull/11314#discussion_r406473230
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py
 ##
 @@ -914,6 +926,17 @@ def process_bundle(self,
 
 split_manager = self._select_split_manager()
 if not split_manager:
+  # Send the fired timers if any.
+  for (transform_id, timer_family_id), timers in fired_timers.items():
+self._send_timers_to_worker(
+process_bundle_id, transform_id, timer_family_id, timers)
+
+  for transform_id, timer_family_id in (
+  set(expected_output_timers.keys()) - set(fired_timers.keys())):
+# Close the stream if there is no timers to be sent.
 
 Review comment:
   This is a subtle point. I might write something like "The worker waits for a 
logical timer stream to be closed for every possible timer, regardless of 
whether there are any timers to be sent."
   
   Maybe it'd be clearer to iterate over `expected_output_timers`, and send 
`fired_timers.get((transform_id, timer_family_id), [])`. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419759)
Time Spent: 19.5h  (was: 19h 20m)

> Remove timer from PCollection and treat timers as Elements 
> ---
>
> Key: BEAM-9562
> URL: https://issues.apache.org/jira/browse/BEAM-9562
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-harness, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 19.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419760=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419760
 ]

ASF GitHub Bot logged work on BEAM-9562:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:01
Start Date: 09/Apr/20 21:01
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11314: [BEAM-9562] 
Send Timers over Data Channel as Elements
URL: https://github.com/apache/beam/pull/11314#discussion_r406474463
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/portability/fn_api_runner/execution.py
 ##
 @@ -355,20 +364,41 @@ def _build_process_bundle_descriptor(self):
 items()),
 environments=dict(
 self.execution_context.pipeline_components.environments.items()),
-state_api_service_descriptor=self.state_api_service_descriptor())
+state_api_service_descriptor=self.state_api_service_descriptor(),
+timer_api_service_descriptor=self.data_api_service_descriptor())
 
   def get_input_coder_impl(self, transform_id):
 # type: (str) -> CoderImpl
 coder_id = beam_fn_api_pb2.RemoteGrpcPort.FromString(
 self.process_bundle_descriptor.transforms[transform_id].spec.payload
 ).coder_id
 assert coder_id
+return self.get_coder_impl(coder_id)
+
+  def _build_timer_coders_id_map(self):
+timer_coder_ids = {}
+for transform_id, transform_proto in (self._process_bundle_descriptor
+.transforms.items()):
+  if transform_proto.spec.urn == common_urns.primitives.PAR_DO.urn:
+pardo_payload = proto_utils.parse_Bytes(
+transform_proto.spec.payload, beam_runner_api_pb2.ParDoPayload)
+for id, timer_family_spec in pardo_payload.timer_family_specs.items():
+  timer_coder_ids[(transform_id, id)] = (
+  timer_family_spec.timer_family_coder_id)
+return timer_coder_ids
+
+  def get_coder_impl(self, coder_id):
 if coder_id in self.execution_context.safe_coders:
   return self.execution_context.pipeline_context.coders[
   self.execution_context.safe_coders[coder_id]].get_impl()
 else:
   return 
self.execution_context.pipeline_context.coders[coder_id].get_impl()
 
+  def get_timer_coder_impl(self, transform_id, timer_family_id):
+assert (transform_id, timer_family_id) in self._timer_coder_ids
 
 Review comment:
   The key error if it's not present below will be sufficient. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419760)

> Remove timer from PCollection and treat timers as Elements 
> ---
>
> Key: BEAM-9562
> URL: https://issues.apache.org/jira/browse/BEAM-9562
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-harness, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 19.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419756=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419756
 ]

ASF GitHub Bot logged work on BEAM-9562:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:01
Start Date: 09/Apr/20 21:01
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11314: [BEAM-9562] 
Send Timers over Data Channel as Elements
URL: https://github.com/apache/beam/pull/11314#discussion_r406466215
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/portability/fn_api_runner/translations.py
 ##
 @@ -1321,80 +1321,18 @@ def remove_data_plane_ops(stages, pipeline_context):
   yield stage
 
 
-def inject_timer_pcollections(stages, pipeline_context):
+def setup_timer_mapping(stages, pipeline_context):
   # type: (Iterable[Stage], TransformContext) -> Iterator[Stage]
 
-  """Create PCollections for fired timers and to-be-set timers.
-
-  At execution time, fired timers and timers-to-set are represented as
-  PCollections that are managed by the runner.  This phase adds the
-  necissary collections, with their read and writes, to any stages using
-  timers.
+  """Set up a mapping of {transform_id: [timer_ids]} for each stage.
   """
   for stage in stages:
-for transform in list(stage.transforms):
+for transform in stage.transforms:
   if transform.spec.urn in PAR_DO_URNS:
 payload = proto_utils.parse_Bytes(
 transform.spec.payload, beam_runner_api_pb2.ParDoPayload)
-for tag, spec in payload.timer_family_specs.items():
-  if len(transform.inputs) > 1:
-raise NotImplementedError('Timers and side inputs.')
-  input_pcoll = pipeline_context.components.pcollections[next(
-  iter(transform.inputs.values()))]
-  # Create the appropriate coder for the timer PCollection.
-  key_coder_id = input_pcoll.coder_id
-  if (pipeline_context.components.coders[key_coder_id].spec.urn ==
-  common_urns.coders.KV.urn):
-key_coder_id = pipeline_context.components.coders[
-key_coder_id].component_coder_ids[0]
-  key_timer_coder_id = pipeline_context.add_or_get_coder_id(
-  beam_runner_api_pb2.Coder(
-  spec=beam_runner_api_pb2.FunctionSpec(
-  urn=common_urns.coders.KV.urn),
-  component_coder_ids=[
-  key_coder_id, spec.timer_family_coder_id
-  ]))
-  # Inject the read and write pcollections.
-  timer_read_pcoll = unique_name(
-  pipeline_context.components.pcollections,
-  '%s_timers_to_read_%s' % (transform.unique_name, tag))
-  timer_write_pcoll = unique_name(
-  pipeline_context.components.pcollections,
-  '%s_timers_to_write_%s' % (transform.unique_name, tag))
-  pipeline_context.components.pcollections[timer_read_pcoll].CopyFrom(
-  beam_runner_api_pb2.PCollection(
-  unique_name=timer_read_pcoll,
-  coder_id=key_timer_coder_id,
-  windowing_strategy_id=input_pcoll.windowing_strategy_id,
-  is_bounded=input_pcoll.is_bounded))
-  pipeline_context.components.pcollections[timer_write_pcoll].CopyFrom(
-  beam_runner_api_pb2.PCollection(
-  unique_name=timer_write_pcoll,
-  coder_id=key_timer_coder_id,
-  windowing_strategy_id=input_pcoll.windowing_strategy_id,
-  is_bounded=input_pcoll.is_bounded))
-  stage.transforms.append(
-  beam_runner_api_pb2.PTransform(
-  unique_name=timer_read_pcoll + '/Read',
-  outputs={'out': timer_read_pcoll},
-  spec=beam_runner_api_pb2.FunctionSpec(
-  urn=bundle_processor.DATA_INPUT_URN,
-  payload=create_buffer_id(timer_read_pcoll,
-   kind='timers'
-  stage.transforms.append(
-  beam_runner_api_pb2.PTransform(
-  unique_name=timer_write_pcoll + '/Write',
-  inputs={'in': timer_write_pcoll},
-  spec=beam_runner_api_pb2.FunctionSpec(
-  urn=bundle_processor.DATA_OUTPUT_URN,
-  payload=create_buffer_id(
-  timer_write_pcoll, kind='timers'
-  assert tag not in transform.inputs
-  transform.inputs[tag] = timer_read_pcoll
-  assert tag not in transform.outputs
-  transform.outputs[tag] = timer_write_pcoll
-  stage.timer_pcollections.append(
-  (timer_read_pcoll + '/Read', timer_write_pcoll))
+for timer_family_id in payload.timer_family_specs.keys():
+  

[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419755=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419755
 ]

ASF GitHub Bot logged work on BEAM-9562:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:01
Start Date: 09/Apr/20 21:01
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11314: [BEAM-9562] 
Send Timers over Data Channel as Elements
URL: https://github.com/apache/beam/pull/11314#discussion_r406465514
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/bundle_processor.py
 ##
 @@ -837,25 +869,59 @@ def process_bundle(self, instruction_id):
 op.execution_context = execution_context
 op.start()
 
-  # Inject inputs from data plane.
+  # Each data_channel is mapped to a list of expected inputs which includes
+  # both data input and timer input. The data input is identied by
+  # transform_id. The data input is identified by
+  # (transform_id, timer_family_id).
   data_channels = collections.defaultdict(
   list
   )  # type: DefaultDict[data_plane.GrpcClientDataChannel, List[str]]
+
+  # Inject data inputs from data plane.
 
 Review comment:
   This comment is a bit misleading, as the injection doesn't happen in this 
for loop. (Similarly with timers.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419755)
Time Spent: 19h 10m  (was: 19h)

> Remove timer from PCollection and treat timers as Elements 
> ---
>
> Key: BEAM-9562
> URL: https://issues.apache.org/jira/browse/BEAM-9562
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-harness, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 19h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419757=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419757
 ]

ASF GitHub Bot logged work on BEAM-9562:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:01
Start Date: 09/Apr/20 21:01
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11314: [BEAM-9562] 
Send Timers over Data Channel as Elements
URL: https://github.com/apache/beam/pull/11314#discussion_r406467481
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py
 ##
 @@ -987,8 +1019,10 @@ def __init__(
 
   def process_bundle(self,
  inputs,  # type: Mapping[str, PartitionableBuffer]
- expected_outputs  # type: DataOutput
-):
+ expected_outputs,  # type: DataOutput
+ fired_timers,  # type: Mapping[str, Mapping[str, 
PartitionableBuffer]]
 
 Review comment:
   For consistency, should this be a `Mapping[Tuple[str, str], 
PartitionableBuffer]`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419757)
Time Spent: 19h 20m  (was: 19h 10m)

> Remove timer from PCollection and treat timers as Elements 
> ---
>
> Key: BEAM-9562
> URL: https://issues.apache.org/jira/browse/BEAM-9562
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-harness, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 19h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419761=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419761
 ]

ASF GitHub Bot logged work on BEAM-9562:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:01
Start Date: 09/Apr/20 21:01
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11314: [BEAM-9562] 
Send Timers over Data Channel as Elements
URL: https://github.com/apache/beam/pull/11314#discussion_r406471091
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py
 ##
 @@ -896,7 +906,9 @@ def _generate_splits_for_testing(self,
 
   def process_bundle(self,
  inputs,  # type: Mapping[str, PartitionableBuffer]
- expected_outputs  # type: DataOutput
+ expected_outputs,  # type: DataOutput
+ fired_timers,  # type: Mapping[str, Mapping[str, 
PartitionableBuffer]]
 
 Review comment:
   Mapping[Tuple[str, str], PartitionableBuffer]?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419761)

> Remove timer from PCollection and treat timers as Elements 
> ---
>
> Key: BEAM-9562
> URL: https://issues.apache.org/jira/browse/BEAM-9562
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-harness, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 19.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419754=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419754
 ]

ASF GitHub Bot logged work on BEAM-9562:


Author: ASF GitHub Bot
Created on: 09/Apr/20 21:01
Start Date: 09/Apr/20 21:01
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #11314: [BEAM-9562] 
Send Timers over Data Channel as Elements
URL: https://github.com/apache/beam/pull/11314#discussion_r406444656
 
 

 ##
 File path: sdks/python/apache_beam/transforms/core.py
 ##
 @@ -1272,6 +1272,8 @@ def expand(self, pcoll):
 key_coder = coder.key_coder()
   else:
 key_coder = coders.registry.get_coder(typehints.Any)
+  self.window_coder = pcoll.windowing.windowfn.get_window_coder()
 
 Review comment:
   Are these still used?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419754)
Time Spent: 19h  (was: 18h 50m)

> Remove timer from PCollection and treat timers as Elements 
> ---
>
> Key: BEAM-9562
> URL: https://issues.apache.org/jira/browse/BEAM-9562
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-harness, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.21.0
>
>  Time Spent: 19h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1589) Add OnWindowExpiration method to Stateful DoFn

2020-04-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1589?focusedWorklogId=419753=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419753
 ]

ASF GitHub Bot logged work on BEAM-1589:


Author: ASF GitHub Bot
Created on: 09/Apr/20 20:58
Start Date: 09/Apr/20 20:58
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #11350: [BEAM-1589] Added 
@onWindowExpiration annotation.
URL: https://github.com/apache/beam/pull/11350#issuecomment-611749077
 
 
   Any discussion about adding support for @OnWindowExpiration for portable 
runners?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 419753)
Time Spent: 2.5h  (was: 2h 20m)

> Add OnWindowExpiration method to Stateful DoFn
> --
>
> Key: BEAM-1589
> URL: https://issues.apache.org/jira/browse/BEAM-1589
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-core, sdk-java-core
>Reporter: Jingsong Lee
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> See BEAM-1517
> This allows the user to do some work before the state's garbage collection.
> It seems kind of annoying, but on the other hand forgetting to set a final 
> timer to flush state is probably data loss most of the time.
> FlinkRunner does this work very simply, but other runners, such as 
> DirectRunner, need to traverse all the states to do this, and maybe it's a 
> little hard.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >