[jira] [Commented] (BEAM-6860) WriteToText crash with "GlobalWindow -> ._IntervalWindowBase"
[ https://issues.apache.org/jira/browse/BEAM-6860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080225#comment-17080225 ] Josh Peng commented on BEAM-6860: - [~bhulette] in the example code I was just having it come from a text file, but in my more real-world scenario I am reading from PubSub, that is why I wanted to window it before WriteToText. It errors in both scenarios regardless of initial input source when doing the WriteToText with a fixed window. Going into global window from streaming source becomes unwieldy. > WriteToText crash with "GlobalWindow -> ._IntervalWindowBase" > - > > Key: BEAM-6860 > URL: https://issues.apache.org/jira/browse/BEAM-6860 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.11.0 > Environment: macOS, DirectRunner, python 2.7.15 via > pyenv/pyenv-virtualenv >Reporter: Henrik >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Labels: newbie > Fix For: 2.16.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Main error: > > Cannot convert GlobalWindow to > > apache_beam.utils.windowed_value._IntervalWindowBase > This is very hard for me to debug. Doing a DoPar call before, printing the > input, gives me just what I want; so the lines of data to serialise are > "alright"; just JSON strings, in fact. > Stacktrace: > {code:java} > Traceback (most recent call last): > File "./okr_end_ride.py", line 254, in > run() > File "./okr_end_ride.py", line 250, in run > run_pipeline(pipeline_options, known_args) > File "./okr_end_ride.py", line 198, in run_pipeline > | 'write_all' >> WriteToText(known_args.output, > file_name_suffix=".txt") > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py", > line 426, in __exit__ > self.run().wait_until_finish() > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py", > line 406, in run > self._options).run(False) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/pipeline.py", > line 419, in run > return self.runner.run_pipeline(self, self._options) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/direct/direct_runner.py", > line 132, in run_pipeline > return runner.run_pipeline(pipeline, options) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 275, in run_pipeline > default_environment=self._default_environment)) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 278, in run_via_runner_api > return self.run_stages(*self.create_stages(pipeline_proto)) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 354, in run_stages > stage_context.safe_coders) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 509, in run_stage > data_input, data_output) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 1206, in process_bundle > result_future = self._controller.control_handler.push(process_bundle) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/portability/fn_api_runner.py", > line 821, in push > response = self.worker.do_instruction(request) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 265, in do_instruction > request.instruction_id) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py", > line 281, in process_bundle > delayed_applications = bundle_processor.process_bundle(instruction_id) > File > "/Users/h/.pyenv/versions/2.7.15/envs/log-analytics/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 552, in process_bundle > op.finish() > File "apache_beam/runners/worker/operations.py", line 549, in > apache_beam.runners.worker.operations.DoOperation.finish > File "apache_beam/runners/worker/operations.py", line 550, in > apache_beam.runners.worker.operations.DoOperation.finish > File "apache_beam/runners/worker/operations.py", line 551, in >
[jira] [Work logged] (BEAM-9703) Create py validations runner test for metrics
[ https://issues.apache.org/jira/browse/BEAM-9703?focusedWorklogId=420005=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-420005 ] ASF GitHub Bot logged work on BEAM-9703: Author: ASF GitHub Bot Created on: 10/Apr/20 04:22 Start Date: 10/Apr/20 04:22 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #11319: [BEAM-9703]Include user distritribution into metric-dedicated validate runner test. URL: https://github.com/apache/beam/pull/11319#issuecomment-611871116 pull from head and rebased (no changes to previous commits) please try again This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 420005) Time Spent: 2.5h (was: 2h 20m) > Create py validations runner test for metrics > - > > Key: BEAM-9703 > URL: https://issues.apache.org/jira/browse/BEAM-9703 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Ruoyun Huang >Assignee: Ruoyun Huang >Priority: Minor > Time Spent: 2.5h > Remaining Estimate: 0h > > Some of the metrics are not covered by dedicated validation runner test. > Would like create these if needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9703) Create py validations runner test for metrics
[ https://issues.apache.org/jira/browse/BEAM-9703?focusedWorklogId=419998=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419998 ] ASF GitHub Bot logged work on BEAM-9703: Author: ASF GitHub Bot Created on: 10/Apr/20 04:14 Start Date: 10/Apr/20 04:14 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #11319: [BEAM-9703]Include user distritribution into metric-dedicated validate runner test. URL: https://github.com/apache/beam/pull/11319#issuecomment-611868777 gradlew lint and test both work fine on my local repository. test command: python setup.py nosetests --tests apache_beam.metrics.metric_test:MetricsTest.test_user_counter_using_pardo And the PythonLint by github, is from a different file(not touched by this PR): 00:42:57 apache_beam/transforms/environments.py:255:12: F821 undefined name 'from_container_image' 00:42:57 return from_container_image( 00:42:57^ 00:42:57 1 F821 undefined name 'from_container_image' Not sure why. Looking. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419998) Time Spent: 2h 20m (was: 2h 10m) > Create py validations runner test for metrics > - > > Key: BEAM-9703 > URL: https://issues.apache.org/jira/browse/BEAM-9703 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Ruoyun Huang >Assignee: Ruoyun Huang >Priority: Minor > Time Spent: 2h 20m > Remaining Estimate: 0h > > Some of the metrics are not covered by dedicated validation runner test. > Would like create these if needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9703) Create py validations runner test for metrics
[ https://issues.apache.org/jira/browse/BEAM-9703?focusedWorklogId=419996=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419996 ] ASF GitHub Bot logged work on BEAM-9703: Author: ASF GitHub Bot Created on: 10/Apr/20 04:11 Start Date: 10/Apr/20 04:11 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #11319: [BEAM-9703]Include user distritribution into metric-dedicated validate runner test. URL: https://github.com/apache/beam/pull/11319#issuecomment-611868777 lint and test both work fine on my local repository. And the PythonLint by github reports from a different file: 00:42:57 apache_beam/transforms/environments.py:255:12: F821 undefined name 'from_container_image' 00:42:57 return from_container_image( 00:42:57^ 00:42:57 1 F821 undefined name 'from_container_image' Not sure why. Looking. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419996) Time Spent: 2h 10m (was: 2h) > Create py validations runner test for metrics > - > > Key: BEAM-9703 > URL: https://issues.apache.org/jira/browse/BEAM-9703 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Ruoyun Huang >Assignee: Ruoyun Huang >Priority: Minor > Time Spent: 2h 10m > Remaining Estimate: 0h > > Some of the metrics are not covered by dedicated validation runner test. > Would like create these if needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9678) Introduction Kata | Go SDK Code Katas
[ https://issues.apache.org/jira/browse/BEAM-9678?focusedWorklogId=419987=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419987 ] ASF GitHub Bot logged work on BEAM-9678: Author: ASF GitHub Bot Created on: 10/Apr/20 03:18 Start Date: 10/Apr/20 03:18 Worklog Time Spent: 10m Work Description: damondouglas commented on issue #11340: [BEAM-9678] Create Go SDK introduction kata URL: https://github.com/apache/beam/pull/11340#issuecomment-611857902 @lostluck @henryken I've made the recommended changes. Thank you for reviewing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419987) Time Spent: 1.5h (was: 1h 20m) > Introduction Kata | Go SDK Code Katas > - > > Key: BEAM-9678 > URL: https://issues.apache.org/jira/browse/BEAM-9678 > Project: Beam > Issue Type: Sub-task > Components: katas, sdk-go >Reporter: Damon Douglas >Assignee: Damon Douglas >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > An Introduction kata patterns after > [https://github.com/apache/beam/tree/master/learning/katas/java/Introduction] > where the take away is an individual's ability to start an Apache Beam > pipeline using the Golang SDK. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9678) Introduction Kata | Go SDK Code Katas
[ https://issues.apache.org/jira/browse/BEAM-9678?focusedWorklogId=419985=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419985 ] ASF GitHub Bot logged work on BEAM-9678: Author: ASF GitHub Bot Created on: 10/Apr/20 03:17 Start Date: 10/Apr/20 03:17 Worklog Time Spent: 10m Work Description: damondouglas commented on pull request #11340: [BEAM-9678] Create Go SDK introduction kata URL: https://github.com/apache/beam/pull/11340#discussion_r406584247 ## File path: learning/katas/go/Introduction/task2/test/task_test.go ## @@ -0,0 +1,44 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +//http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package test + +import ( + "io/ioutil" + "task" + "testing" +) + +func TestTask(t *testing.T) { + err := task.Task() + if err != nil { + t.Error(err) + } + data, err := ioutil.ReadFile(task.OutputFile) + if err != nil { + t.Error(err) + } + + want := "Hello Beam\n" + got := string(data) + if want != got { + t.Errorf("want: %s got: %s", want, got) + } + + err = ioutil.WriteFile(task.OutputFile, []byte{}, 0644) Review comment: Done :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419985) Time Spent: 1h 10m (was: 1h) > Introduction Kata | Go SDK Code Katas > - > > Key: BEAM-9678 > URL: https://issues.apache.org/jira/browse/BEAM-9678 > Project: Beam > Issue Type: Sub-task > Components: katas, sdk-go >Reporter: Damon Douglas >Assignee: Damon Douglas >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > An Introduction kata patterns after > [https://github.com/apache/beam/tree/master/learning/katas/java/Introduction] > where the take away is an individual's ability to start an Apache Beam > pipeline using the Golang SDK. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9678) Introduction Kata | Go SDK Code Katas
[ https://issues.apache.org/jira/browse/BEAM-9678?focusedWorklogId=419986=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419986 ] ASF GitHub Bot logged work on BEAM-9678: Author: ASF GitHub Bot Created on: 10/Apr/20 03:17 Start Date: 10/Apr/20 03:17 Worklog Time Spent: 10m Work Description: damondouglas commented on pull request #11340: [BEAM-9678] Create Go SDK introduction kata URL: https://github.com/apache/beam/pull/11340#discussion_r406584290 ## File path: learning/katas/go/Introduction/task1/task-info.yaml ## @@ -0,0 +1,39 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +type: edu +custom_name: Setup +files: +- name: test/task_test.go + visible: false +- name: task.go + visible: true + placeholders: + - offset: 961 +length: 26 +placeholder_text: 'TODO: create a new pipeline' + - offset: 1012 +length: 34 +placeholder_text: 'TODO: execute the pipeline' +- name: go.mod + visible: false +- name: go.sum + visible: true Review comment: Fixed :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419986) Time Spent: 1h 20m (was: 1h 10m) > Introduction Kata | Go SDK Code Katas > - > > Key: BEAM-9678 > URL: https://issues.apache.org/jira/browse/BEAM-9678 > Project: Beam > Issue Type: Sub-task > Components: katas, sdk-go >Reporter: Damon Douglas >Assignee: Damon Douglas >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > An Introduction kata patterns after > [https://github.com/apache/beam/tree/master/learning/katas/java/Introduction] > where the take away is an individual's ability to start an Apache Beam > pipeline using the Golang SDK. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9678) Introduction Kata | Go SDK Code Katas
[ https://issues.apache.org/jira/browse/BEAM-9678?focusedWorklogId=419983=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419983 ] ASF GitHub Bot logged work on BEAM-9678: Author: ASF GitHub Bot Created on: 10/Apr/20 03:14 Start Date: 10/Apr/20 03:14 Worklog Time Spent: 10m Work Description: damondouglas commented on pull request #11340: [BEAM-9678] Create Go SDK introduction kata URL: https://github.com/apache/beam/pull/11340#discussion_r406583732 ## File path: learning/katas/go/Introduction/lesson-info.yaml ## @@ -0,0 +1,23 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +content: Review comment: Thank you for reviewing. I reorganized the Introduction section to pattern after the structure in the Java katas. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419983) Time Spent: 1h (was: 50m) > Introduction Kata | Go SDK Code Katas > - > > Key: BEAM-9678 > URL: https://issues.apache.org/jira/browse/BEAM-9678 > Project: Beam > Issue Type: Sub-task > Components: katas, sdk-go >Reporter: Damon Douglas >Assignee: Damon Douglas >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > An Introduction kata patterns after > [https://github.com/apache/beam/tree/master/learning/katas/java/Introduction] > where the take away is an individual's ability to start an Apache Beam > pipeline using the Golang SDK. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
[ https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419958=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419958 ] ASF GitHub Bot logged work on BEAM-9651: Author: ASF GitHub Bot Created on: 10/Apr/20 02:05 Start Date: 10/Apr/20 02:05 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11364: [BEAM-9651] Prevent StreamPool and stream initialization livelock URL: https://github.com/apache/beam/pull/11364#issuecomment-611840338 They would run automatically as a cron PostCommit or you can manually launch them from the Jenkins UI. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419958) Time Spent: 3h (was: 2h 50m) > StreamingDataflowWorker stuck waiting for > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext > -- > > Key: BEAM-9651 > URL: https://issues.apache.org/jira/browse/BEAM-9651 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Sam Whittle >Assignee: Sam Whittle >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > Operation ongoing in step for at least 28h10m00s without > outputting or completing in state windmill-read at > sun.misc.Unsafe.park(Native Method) at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at > java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at > java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at > java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown > Source) at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328) > at > org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389) > at > > Because the stream is started in a StreamPool synchronized block, all other > threads interacting with StreamPool to get or release streams end up blocking. > It is unclear if the stream never became usable and thus blocked forever or > if there is a race with the use of the Phaser that causes the stuckness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419957=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419957 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 10/Apr/20 02:01 Start Date: 10/Apr/20 02:01 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11373: [BEAM-9562] Update Element.timer to Element.timers URL: https://github.com/apache/beam/pull/11373#discussion_r406567717 ## File path: model/fn-execution/src/main/proto/beam_fn_api.proto ## @@ -516,7 +516,7 @@ message Elements { repeated Data data = 1; // (Optional) A list of timer byte streams. - repeated Timer timer = 2; + repeated Timer timers = 2; Review comment: @robertwb did you want to rename this field or the proto message Timer -> Timers or both? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419957) Time Spent: 21h (was: 20h 50m) > Remove timer from PCollection and treat timers as Elements > --- > > Key: BEAM-9562 > URL: https://issues.apache.org/jira/browse/BEAM-9562 > Project: Beam > Issue Type: New Feature > Components: sdk-java-harness, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 21h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419956=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419956 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 10/Apr/20 02:00 Start Date: 10/Apr/20 02:00 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11373: [BEAM-9562] Update Element.timer to Element.timers URL: https://github.com/apache/beam/pull/11373#issuecomment-611839103 please regenerate the go protos This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419956) Time Spent: 20h 50m (was: 20h 40m) > Remove timer from PCollection and treat timers as Elements > --- > > Key: BEAM-9562 > URL: https://issues.apache.org/jira/browse/BEAM-9562 > Project: Beam > Issue Type: New Feature > Components: sdk-java-harness, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 20h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
[ https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419953=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419953 ] ASF GitHub Bot logged work on BEAM-9651: Author: ASF GitHub Bot Created on: 10/Apr/20 01:59 Start Date: 10/Apr/20 01:59 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11368: [BEAM-9651] Prevent StreamPool and stream initialization livelock URL: https://github.com/apache/beam/pull/11368#issuecomment-611838764 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419953) Time Spent: 2h 50m (was: 2h 40m) > StreamingDataflowWorker stuck waiting for > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext > -- > > Key: BEAM-9651 > URL: https://issues.apache.org/jira/browse/BEAM-9651 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Sam Whittle >Assignee: Sam Whittle >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > Operation ongoing in step for at least 28h10m00s without > outputting or completing in state windmill-read at > sun.misc.Unsafe.park(Native Method) at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at > java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at > java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at > java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown > Source) at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328) > at > org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389) > at > > Because the stream is started in a StreamPool synchronized block, all other > threads interacting with StreamPool to get or release streams end up blocking. > It is unclear if the stream never became usable and thus blocked forever or > if there is a race with the use of the Phaser that causes the stuckness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9721) Dataflow-based Jenkins jobs are failing due to lack of --region option
[ https://issues.apache.org/jira/browse/BEAM-9721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17080150#comment-17080150 ] Daniel Oliveira commented on BEAM-9721: --- Looks like this is also affecting beam_PerformanceTests_WordCountIT jobs ([example|https://builds.apache.org/job/beam_PerformanceTests_WordCountIT_Py27/1314/]). > Dataflow-based Jenkins jobs are failing due to lack of --region option > -- > > Key: BEAM-9721 > URL: https://issues.apache.org/jira/browse/BEAM-9721 > Project: Beam > Issue Type: Bug > Components: test-failures, testing >Reporter: Michał Walenia >Assignee: Michał Walenia >Priority: Major > Time Spent: 5h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9674) "Selected fields list too long" error when calling tables.get in BigQueryStorageTableSource
[ https://issues.apache.org/jira/browse/BEAM-9674?focusedWorklogId=419927=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419927 ] ASF GitHub Bot logged work on BEAM-9674: Author: ASF GitHub Bot Created on: 10/Apr/20 00:56 Start Date: 10/Apr/20 00:56 Worklog Time Spent: 10m Work Description: aaltay commented on issue #11292: [BEAM-9674] Don't specify selected fields when fetching BigQuery table size URL: https://github.com/apache/beam/pull/11292#issuecomment-611822669 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419927) Time Spent: 2h 20m (was: 2h 10m) > "Selected fields list too long" error when calling tables.get in > BigQueryStorageTableSource > --- > > Key: BEAM-9674 > URL: https://issues.apache.org/jira/browse/BEAM-9674 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.19.0 >Reporter: Kenneth Jung >Assignee: Kenneth Jung >Priority: Minor > Time Spent: 2h 20m > Remaining Estimate: 0h > > Customers experience errors similar to the following: > Caused by: > com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad > Request { "code" : 400, "errors" : [ > { "domain" : "global", "message" : "Selected fields too long: must > be less than 16384 characters.", "reason" : "invalid" } > ], "message" : "Selected fields too long: must be less than 16384 > characters.", "status" : "INVALID_ARGUMENT" } > com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146) > > com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113) > > com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40) > > com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321) > com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1097) > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419) > > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352) > > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469) > > org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.executeWithRetries(BigQueryServicesImpl.java:938) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9674) "Selected fields list too long" error when calling tables.get in BigQueryStorageTableSource
[ https://issues.apache.org/jira/browse/BEAM-9674?focusedWorklogId=419926=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419926 ] ASF GitHub Bot logged work on BEAM-9674: Author: ASF GitHub Bot Created on: 10/Apr/20 00:55 Start Date: 10/Apr/20 00:55 Worklog Time Spent: 10m Work Description: aaltay commented on issue #11292: [BEAM-9674] Don't specify selected fields when fetching BigQuery table size URL: https://github.com/apache/beam/pull/11292#issuecomment-611822518 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419926) Time Spent: 2h 10m (was: 2h) > "Selected fields list too long" error when calling tables.get in > BigQueryStorageTableSource > --- > > Key: BEAM-9674 > URL: https://issues.apache.org/jira/browse/BEAM-9674 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.19.0 >Reporter: Kenneth Jung >Assignee: Kenneth Jung >Priority: Minor > Time Spent: 2h 10m > Remaining Estimate: 0h > > Customers experience errors similar to the following: > Caused by: > com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad > Request { "code" : 400, "errors" : [ > { "domain" : "global", "message" : "Selected fields too long: must > be less than 16384 characters.", "reason" : "invalid" } > ], "message" : "Selected fields too long: must be less than 16384 > characters.", "status" : "INVALID_ARGUMENT" } > com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146) > > com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113) > > com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40) > > com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321) > com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1097) > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419) > > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352) > > com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469) > > org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl.executeWithRetries(BigQueryServicesImpl.java:938) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5504) PubsubAvroTable
[ https://issues.apache.org/jira/browse/BEAM-5504?focusedWorklogId=419922=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419922 ] ASF GitHub Bot logged work on BEAM-5504: Author: ASF GitHub Bot Created on: 10/Apr/20 00:51 Start Date: 10/Apr/20 00:51 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10487: [BEAM-5504] Introduce PubsubAvroTable URL: https://github.com/apache/beam/pull/10487#issuecomment-611821536 Is this PR still active? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419922) Time Spent: 4.5h (was: 4h 20m) > PubsubAvroTable > --- > > Key: BEAM-5504 > URL: https://issues.apache.org/jira/browse/BEAM-5504 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: Jing Chen >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419923=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419923 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 10/Apr/20 00:51 Start Date: 10/Apr/20 00:51 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #11373: [BEAM-9562] Update Element.timer to Element.timers URL: https://github.com/apache/beam/pull/11373 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=419920=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419920 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 10/Apr/20 00:49 Start Date: 10/Apr/20 00:49 Worklog Time Spent: 10m Work Description: jaketf commented on pull request #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151#discussion_r406551322 ## File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy ## @@ -454,6 +455,7 @@ class BeamModulePlugin implements Plugin { google_api_services_clouddebugger : "com.google.apis:google-api-services-clouddebugger:v2-rev20191003-$google_clients_version", google_api_services_cloudresourcemanager: "com.google.apis:google-api-services-cloudresourcemanager:v1-rev20191206-$google_clients_version", google_api_services_dataflow: "com.google.apis:google-api-services-dataflow:v1b3-rev20190927-$google_clients_version", +google_api_services_healthcare : "com.google.apis:google-api-services-healthcare:v1alpha2-rev20190901-$google_cloud_healthcare_version", Review comment: I'm rather new to these generated clients and beam. Is there any process this community has followed in the past for finding a suitable version in situations like this? Seems like a lot for this PR to make this rather global upgrade. Naturally if I just flip google_clients_version to 1.30.9 it breaks a lot: ``` Could not determine the dependencies of task ':sdks:java:harness:shadowJar'. > Could not resolve all dependencies for configuration ':sdks:java:harness:runtimeClasspath'. > Could not find com.google.apis:google-api-services-cloudresourcemanager:v1-rev20191206-1.30.9. Required by: project :sdks:java:harness > project :sdks:java:extensions:google-cloud-platform-core > Could not find com.google.apis:google-api-services-storage:v1-rev20191011-1.30.9. Required by: project :sdks:java:harness > project :sdks:java:extensions:google-cloud-platform-core > Could not find com.google.oauth-client:google-oauth-client:1.30.9. > Could not find com.google.apis:google-api-services-storage:v1-rev20191011-1.30.9. Required by: project :sdks:java:harness > project :sdks:java:extensions:google-cloud-platform-core > com.google.cloud.bigdataoss:gcsio:2.0.0 project :sdks:java:harness > project :sdks:java:extensions:google-cloud-platform-core > com.google.cloud.bigdataoss:util:2.0.0 > Could not find com.google.oauth-client:google-oauth-client:1.30.9. Required by: project :sdks:java:harness > project :sdks:java:extensions:google-cloud-platform-core > com.google.cloud.bigdataoss:gcsio:2.0.0 project :sdks:java:harness > project :sdks:java:extensions:google-cloud-platform-core > com.google.cloud.bigdataoss:util:2.0.0 > Could not find com.google.oauth-client:google-oauth-client-java6:1.30.9. Required by: project :sdks:java:harness > project :sdks:java:extensions:google-cloud-platform-core > com.google.cloud.bigdataoss:gcsio:2.0.0 project :sdks:java:harness > project :sdks:java:extensions:google-cloud-platform-core > com.google.cloud.bigdataoss:util:2.0.0 > Could not find com.google.oauth-client:google-oauth-client-java6:1.30.9. Required by: project :sdks:java:harness > project :sdks:java:extensions:google-cloud-platform-core > com.google.cloud.bigdataoss:gcsio:2.0.0 > com.google.api-client:google-api-client-java6:1.30.9 * ``` seems like our culprits are - [ ] `google-oauth-client` - [ ] `google-oauth-client-java6` - [ ] `google-api-services-storage` If I go in and try to massage the revision numbers to make this work how will I ensure that all those reliant on those deps are still working? Is that covered in pre/post commit ITs? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419920) Time Spent: 19h (was: 18h 50m) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >
[jira] [Work logged] (BEAM-8910) Use AVRO instead of JSON in BigQuery bounded source.
[ https://issues.apache.org/jira/browse/BEAM-8910?focusedWorklogId=419916=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419916 ] ASF GitHub Bot logged work on BEAM-8910: Author: ASF GitHub Bot Created on: 10/Apr/20 00:30 Start Date: 10/Apr/20 00:30 Worklog Time Spent: 10m Work Description: pabloem commented on issue #11086: [BEAM-8910] Make custom BQ source read from Avro URL: https://github.com/apache/beam/pull/11086#issuecomment-611816596 Run Python 3.5 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419916) Time Spent: 5.5h (was: 5h 20m) > Use AVRO instead of JSON in BigQuery bounded source. > > > Key: BEAM-8910 > URL: https://issues.apache.org/jira/browse/BEAM-8910 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Kamil Wasilewski >Assignee: Pablo Estrada >Priority: Minor > Time Spent: 5.5h > Remaining Estimate: 0h > > The proposed BigQuery bounded source in Python SDK (see PR: > [https://github.com/apache/beam/pull/9772)] uses a BigQuery export job to > take a snapshot of the table and read from each produced JSON file. A > performance improvement can be gain by switching to AVRO instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9443) support direct_num_workers=0
[ https://issues.apache.org/jira/browse/BEAM-9443?focusedWorklogId=419915=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419915 ] ASF GitHub Bot logged work on BEAM-9443: Author: ASF GitHub Bot Created on: 10/Apr/20 00:29 Start Date: 10/Apr/20 00:29 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on pull request #11372: [BEAM-9443] support direct_num_workers=0 URL: https://github.com/apache/beam/pull/11372#discussion_r406546390 ## File path: CHANGES.md ## @@ -67,6 +67,7 @@ [Ensuring Python Type Safety](https://beam.apache.org/documentation/sdks/python-type-safety/) and an upcoming [blog post](https://beam.apache.org/blog/python/typing/2020/03/06/python-typing.html). +* --direct_num_workers=0 is supported for FnApi. It will set the number of threads/subprocesses to number of cores of the machine executing the pipeline([BEAM-9443](https://issues.apache.org/jira/browse/BEAM-9443)). Review comment: It is added to 2.21.0, please let me know if the branch is already cut, I can move it to 2.22.0 if needed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419915) Time Spent: 20m (was: 10m) > support direct_num_workers=0 > - > > Key: BEAM-9443 > URL: https://issues.apache.org/jira/browse/BEAM-9443 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.22.0 > > Time Spent: 20m > Remaining Estimate: 0h > > when direct_num_workers=0, set it to number of cores. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9443) support direct_num_workers=0
[ https://issues.apache.org/jira/browse/BEAM-9443?focusedWorklogId=419912=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419912 ] ASF GitHub Bot logged work on BEAM-9443: Author: ASF GitHub Bot Created on: 10/Apr/20 00:27 Start Date: 10/Apr/20 00:27 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on pull request #11372: [BEAM-9443] support direct_num_workers=0 URL: https://github.com/apache/beam/pull/11372 R: @ibzib Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python |
[jira] [Work started] (BEAM-9443) support direct_num_workers=0
[ https://issues.apache.org/jira/browse/BEAM-9443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-9443 started by Hannah Jiang. -- > support direct_num_workers=0 > - > > Key: BEAM-9443 > URL: https://issues.apache.org/jira/browse/BEAM-9443 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.22.0 > > > when direct_num_workers=0, set it to number of cores. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419907=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419907 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 10/Apr/20 00:19 Start Date: 10/Apr/20 00:19 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11314: [BEAM-9562] Send Timers over Data Channel as Elements URL: https://github.com/apache/beam/pull/11314 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419907) Time Spent: 20.5h (was: 20h 20m) > Remove timer from PCollection and treat timers as Elements > --- > > Key: BEAM-9562 > URL: https://issues.apache.org/jira/browse/BEAM-9562 > Project: Beam > Issue Type: New Feature > Components: sdk-java-harness, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 20.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9703) Create py validations runner test for metrics
[ https://issues.apache.org/jira/browse/BEAM-9703?focusedWorklogId=419905=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419905 ] ASF GitHub Bot logged work on BEAM-9703: Author: ASF GitHub Bot Created on: 10/Apr/20 00:15 Start Date: 10/Apr/20 00:15 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11319: [BEAM-9703]Include user distritribution into metric-dedicated validate runner test. URL: https://github.com/apache/beam/pull/11319#issuecomment-611812812 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419905) Time Spent: 2h (was: 1h 50m) > Create py validations runner test for metrics > - > > Key: BEAM-9703 > URL: https://issues.apache.org/jira/browse/BEAM-9703 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Ruoyun Huang >Assignee: Ruoyun Huang >Priority: Minor > Time Spent: 2h > Remaining Estimate: 0h > > Some of the metrics are not covered by dedicated validation runner test. > Would like create these if needed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9727) Auto populate required feature experiment flags for enable dataflow runner v2
[ https://issues.apache.org/jira/browse/BEAM-9727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Cwik resolved BEAM-9727. - Resolution: Fixed > Auto populate required feature experiment flags for enable dataflow runner v2 > - > > Key: BEAM-9727 > URL: https://issues.apache.org/jira/browse/BEAM-9727 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9727) Auto populate required feature experiment flags for enable dataflow runner v2
[ https://issues.apache.org/jira/browse/BEAM-9727?focusedWorklogId=419902=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419902 ] ASF GitHub Bot logged work on BEAM-9727: Author: ASF GitHub Bot Created on: 10/Apr/20 00:13 Start Date: 10/Apr/20 00:13 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11355: [BEAM-9727] Automatically set required experiment flags for dataflow runner v2. URL: https://github.com/apache/beam/pull/11355 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419902) Time Spent: 3h 20m (was: 3h 10m) > Auto populate required feature experiment flags for enable dataflow runner v2 > - > > Key: BEAM-9727 > URL: https://issues.apache.org/jira/browse/BEAM-9727 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
[ https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419897=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419897 ] ASF GitHub Bot logged work on BEAM-9651: Author: ASF GitHub Bot Created on: 09/Apr/20 23:59 Start Date: 09/Apr/20 23:59 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #11368: [BEAM-9651] Prevent StreamPool and stream initialization livelock URL: https://github.com/apache/beam/pull/11368#issuecomment-611808620 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419897) Time Spent: 2.5h (was: 2h 20m) > StreamingDataflowWorker stuck waiting for > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext > -- > > Key: BEAM-9651 > URL: https://issues.apache.org/jira/browse/BEAM-9651 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Sam Whittle >Assignee: Sam Whittle >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > Operation ongoing in step for at least 28h10m00s without > outputting or completing in state windmill-read at > sun.misc.Unsafe.park(Native Method) at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at > java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at > java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at > java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown > Source) at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328) > at > org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389) > at > > Because the stream is started in a StreamPool synchronized block, all other > threads interacting with StreamPool to get or release streams end up blocking. > It is unclear if the stream never became usable and thus blocked forever or > if there is a race with the use of the Phaser that causes the stuckness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
[ https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419898=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419898 ] ASF GitHub Bot logged work on BEAM-9651: Author: ASF GitHub Bot Created on: 09/Apr/20 23:59 Start Date: 09/Apr/20 23:59 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #11368: [BEAM-9651] Prevent StreamPool and stream initialization livelock URL: https://github.com/apache/beam/pull/11368#issuecomment-611808657 run dataflow validatesrunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419898) Time Spent: 2h 40m (was: 2.5h) > StreamingDataflowWorker stuck waiting for > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext > -- > > Key: BEAM-9651 > URL: https://issues.apache.org/jira/browse/BEAM-9651 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Sam Whittle >Assignee: Sam Whittle >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > Operation ongoing in step for at least 28h10m00s without > outputting or completing in state windmill-read at > sun.misc.Unsafe.park(Native Method) at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at > java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at > java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at > java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown > Source) at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328) > at > org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389) > at > > Because the stream is started in a StreamPool synchronized block, all other > threads interacting with StreamPool to get or release streams end up blocking. > It is unclear if the stream never became usable and thus blocked forever or > if there is a race with the use of the Phaser that causes the stuckness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
[ https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419896=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419896 ] ASF GitHub Bot logged work on BEAM-9651: Author: ASF GitHub Bot Created on: 09/Apr/20 23:56 Start Date: 09/Apr/20 23:56 Worklog Time Spent: 10m Work Description: scwhittle commented on issue #11368: [BEAM-9651] Prevent StreamPool and stream initialization livelock URL: https://github.com/apache/beam/pull/11368#issuecomment-611808055 R: @reuvenlax Please trigger the tests before merging This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419896) Time Spent: 2h 20m (was: 2h 10m) > StreamingDataflowWorker stuck waiting for > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext > -- > > Key: BEAM-9651 > URL: https://issues.apache.org/jira/browse/BEAM-9651 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Sam Whittle >Assignee: Sam Whittle >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > Operation ongoing in step for at least 28h10m00s without > outputting or completing in state windmill-read at > sun.misc.Unsafe.park(Native Method) at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at > java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at > java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at > java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown > Source) at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328) > at > org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389) > at > > Because the stream is started in a StreamPool synchronized block, all other > threads interacting with StreamPool to get or release streams end up blocking. > It is unclear if the stream never became usable and thus blocked forever or > if there is a race with the use of the Phaser that causes the stuckness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
[ https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419894=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419894 ] ASF GitHub Bot logged work on BEAM-9651: Author: ASF GitHub Bot Created on: 09/Apr/20 23:56 Start Date: 09/Apr/20 23:56 Worklog Time Spent: 10m Work Description: scwhittle commented on issue #11368: [BEAM-9651] Prevent StreamPool and stream initialization livelock URL: https://github.com/apache/beam/pull/11368#issuecomment-611808055 R: @reuvenlax With tests this time :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419894) Time Spent: 2h 10m (was: 2h) > StreamingDataflowWorker stuck waiting for > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext > -- > > Key: BEAM-9651 > URL: https://issues.apache.org/jira/browse/BEAM-9651 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Sam Whittle >Assignee: Sam Whittle >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > Operation ongoing in step for at least 28h10m00s without > outputting or completing in state windmill-read at > sun.misc.Unsafe.park(Native Method) at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at > java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at > java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at > java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown > Source) at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328) > at > org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389) > at > > Because the stream is started in a StreamPool synchronized block, all other > threads interacting with StreamPool to get or release streams end up blocking. > It is unclear if the stream never became usable and thus blocked forever or > if there is a race with the use of the Phaser that causes the stuckness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9600) Implement GetJobMetrics in Flink uber jar job server
[ https://issues.apache.org/jira/browse/BEAM-9600?focusedWorklogId=419895=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419895 ] ASF GitHub Bot logged work on BEAM-9600: Author: ASF GitHub Bot Created on: 09/Apr/20 23:56 Start Date: 09/Apr/20 23:56 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11369: [BEAM-9600] Get metrics in Flink uber jar job server. URL: https://github.com/apache/beam/pull/11369 Accumulators are visible in the Flink REST API at the `/accumulators` endpoint, including the Beam Flink runner's custom accumulator, `__metricscontainers`. The value of each accumulator is printed using the accumulator's `toString` method. Previously, `MetricsContainerStepMap::toString` printed some sort of bespoke format. In this PR, I change it to use the JSON representation of the underlying protos so it's easy to parse on the Python side. This implementation ignores `committed` metrics. My understanding is that the Flink runner doesn't support them anyway, so I assume the distinction isn't important here. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build
[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
[ https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419893=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419893 ] ASF GitHub Bot logged work on BEAM-9651: Author: ASF GitHub Bot Created on: 09/Apr/20 23:55 Start Date: 09/Apr/20 23:55 Worklog Time Spent: 10m Work Description: scwhittle commented on pull request #11368: [BEAM-9651] Prevent StreamPool and stream initialization livelock URL: https://github.com/apache/beam/pull/11368 Reduce StreamPool synchronized block by excluding stream creation and requiring it for release. Some Dataflow pipelines were observing stuck get and release methods due to stuck stream creation waiting for the onReady callback. The DirectStreamObserver is modified to additionally pool isReady() to handle cases where blocked threads are blocking the onReady callback. Additionally as we would prefer to possibly OOM versus becoming permanently stuck we give up waiting for onReady after a minute. This is a recommit of a reverted commit to allow for testing before merging. **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) |
[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
[ https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419879=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419879 ] ASF GitHub Bot logged work on BEAM-9651: Author: ASF GitHub Bot Created on: 09/Apr/20 23:36 Start Date: 09/Apr/20 23:36 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #11367: Revert "[BEAM-9651] Prevent StreamPool and stream initialization livelock" URL: https://github.com/apache/beam/pull/11367 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419879) Time Spent: 1h 50m (was: 1h 40m) > StreamingDataflowWorker stuck waiting for > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext > -- > > Key: BEAM-9651 > URL: https://issues.apache.org/jira/browse/BEAM-9651 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Sam Whittle >Assignee: Sam Whittle >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > Operation ongoing in step for at least 28h10m00s without > outputting or completing in state windmill-read at > sun.misc.Unsafe.park(Native Method) at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at > java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at > java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at > java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown > Source) at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328) > at > org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389) > at > > Because the stream is started in a StreamPool synchronized block, all other > threads interacting with StreamPool to get or release streams end up blocking. > It is unclear if the stream never became usable and thus blocked forever or > if there is a race with the use of the Phaser that causes the stuckness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=419867=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419867 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 09/Apr/20 23:03 Start Date: 09/Apr/20 23:03 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on pull request #11243: [BEAM-9136]Add licenses for dependencies for Java URL: https://github.com/apache/beam/pull/11243#discussion_r406523425 ## File path: sdks/java/container/license_scripts/pull_licenses_java.py ## @@ -0,0 +1,186 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +""" +A script to pull licenses/notices/source code for Java dependencies. Review comment: This is done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419867) Time Spent: 21h (was: 20h 50m) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 21h > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
[ https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419864=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419864 ] ASF GitHub Bot logged work on BEAM-9651: Author: ASF GitHub Bot Created on: 09/Apr/20 23:02 Start Date: 09/Apr/20 23:02 Worklog Time Spent: 10m Work Description: scwhittle commented on issue #11367: Revert "[BEAM-9651] Prevent StreamPool and stream initialization livelock" URL: https://github.com/apache/beam/pull/11367#issuecomment-611794270 R: @lukecwik This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419864) Time Spent: 1h 40m (was: 1.5h) > StreamingDataflowWorker stuck waiting for > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext > -- > > Key: BEAM-9651 > URL: https://issues.apache.org/jira/browse/BEAM-9651 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Sam Whittle >Assignee: Sam Whittle >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > Operation ongoing in step for at least 28h10m00s without > outputting or completing in state windmill-read at > sun.misc.Unsafe.park(Native Method) at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at > java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at > java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at > java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown > Source) at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328) > at > org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389) > at > > Because the stream is started in a StreamPool synchronized block, all other > threads interacting with StreamPool to get or release streams end up blocking. > It is unclear if the stream never became usable and thus blocked forever or > if there is a race with the use of the Phaser that causes the stuckness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
[ https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419861=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419861 ] ASF GitHub Bot logged work on BEAM-9651: Author: ASF GitHub Bot Created on: 09/Apr/20 22:59 Start Date: 09/Apr/20 22:59 Worklog Time Spent: 10m Work Description: scwhittle commented on issue #11367: Revert "[BEAM-9651] Prevent StreamPool and stream initialization livelock" URL: https://github.com/apache/beam/pull/11367#issuecomment-611793406 R: @reuvenlax This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419861) Time Spent: 1.5h (was: 1h 20m) > StreamingDataflowWorker stuck waiting for > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext > -- > > Key: BEAM-9651 > URL: https://issues.apache.org/jira/browse/BEAM-9651 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Sam Whittle >Assignee: Sam Whittle >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Operation ongoing in step for at least 28h10m00s without > outputting or completing in state windmill-read at > sun.misc.Unsafe.park(Native Method) at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at > java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at > java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at > java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown > Source) at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328) > at > org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389) > at > > Because the stream is started in a StreamPool synchronized block, all other > threads interacting with StreamPool to get or release streams end up blocking. > It is unclear if the stream never became usable and thus blocked forever or > if there is a race with the use of the Phaser that causes the stuckness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
[ https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419860=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419860 ] ASF GitHub Bot logged work on BEAM-9651: Author: ASF GitHub Bot Created on: 09/Apr/20 22:58 Start Date: 09/Apr/20 22:58 Worklog Time Spent: 10m Work Description: scwhittle commented on pull request #11367: Revert "[BEAM-9651] Prevent StreamPool and stream initialization livelock" URL: https://github.com/apache/beam/pull/11367 Reverts apache/beam#11364 because the tests were not run before merging. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419860) Time Spent: 1h 20m (was: 1h 10m) > StreamingDataflowWorker stuck waiting for > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext > -- > > Key: BEAM-9651 > URL: https://issues.apache.org/jira/browse/BEAM-9651 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Sam Whittle >Assignee: Sam Whittle >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > Operation ongoing in step for at least 28h10m00s without > outputting or completing in state windmill-read at > sun.misc.Unsafe.park(Native Method) at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at > java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at > java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at > java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown > Source) at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328) > at > org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389) > at > > Because the stream is started in a StreamPool synchronized block, all other > threads interacting with StreamPool to get or release streams end up blocking. > It is unclear if the stream never became usable and thus blocked forever or > if there is a race with the use of the Phaser that causes the stuckness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
[ https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419850=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419850 ] ASF GitHub Bot logged work on BEAM-9651: Author: ASF GitHub Bot Created on: 09/Apr/20 22:40 Start Date: 09/Apr/20 22:40 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #11364: [BEAM-9651] Prevent StreamPool and stream initialization livelock URL: https://github.com/apache/beam/pull/11364#issuecomment-611787993 @lukecwik my bad - I missed that the tests hadn't triggered. is it possible to trigger tests now, or do we need a new PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419850) Time Spent: 1h 10m (was: 1h) > StreamingDataflowWorker stuck waiting for > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext > -- > > Key: BEAM-9651 > URL: https://issues.apache.org/jira/browse/BEAM-9651 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Sam Whittle >Assignee: Sam Whittle >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > Operation ongoing in step for at least 28h10m00s without > outputting or completing in state windmill-read at > sun.misc.Unsafe.park(Native Method) at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at > java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at > java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at > java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown > Source) at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328) > at > org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389) > at > > Because the stream is started in a StreamPool synchronized block, all other > threads interacting with StreamPool to get or release streams end up blocking. > It is unclear if the stream never became usable and thus blocked forever or > if there is a race with the use of the Phaser that causes the stuckness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
[ https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419849=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419849 ] ASF GitHub Bot logged work on BEAM-9651: Author: ASF GitHub Bot Created on: 09/Apr/20 22:39 Start Date: 09/Apr/20 22:39 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #11364: [BEAM-9651] Prevent StreamPool and stream initialization livelock URL: https://github.com/apache/beam/pull/11364#issuecomment-611787817 run dataflow validatesrunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419849) Time Spent: 1h (was: 50m) > StreamingDataflowWorker stuck waiting for > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext > -- > > Key: BEAM-9651 > URL: https://issues.apache.org/jira/browse/BEAM-9651 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Sam Whittle >Assignee: Sam Whittle >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Operation ongoing in step for at least 28h10m00s without > outputting or completing in state windmill-read at > sun.misc.Unsafe.park(Native Method) at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at > java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at > java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at > java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown > Source) at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328) > at > org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389) > at > > Because the stream is started in a StreamPool synchronized block, all other > threads interacting with StreamPool to get or release streams end up blocking. > It is unclear if the stream never became usable and thus blocked forever or > if there is a race with the use of the Phaser that causes the stuckness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
[ https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419846=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419846 ] ASF GitHub Bot logged work on BEAM-9651: Author: ASF GitHub Bot Created on: 09/Apr/20 22:33 Start Date: 09/Apr/20 22:33 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11364: [BEAM-9651] Prevent StreamPool and stream initialization livelock URL: https://github.com/apache/beam/pull/11364#issuecomment-611785948 Note that the tests never ran on this change. Please only merge on green. Only committers can start tests on PRs which can sometimes be missed because changes can look like they are merge ready. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419846) Time Spent: 50m (was: 40m) > StreamingDataflowWorker stuck waiting for > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext > -- > > Key: BEAM-9651 > URL: https://issues.apache.org/jira/browse/BEAM-9651 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Sam Whittle >Assignee: Sam Whittle >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Operation ongoing in step for at least 28h10m00s without > outputting or completing in state windmill-read at > sun.misc.Unsafe.park(Native Method) at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at > java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at > java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at > java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown > Source) at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328) > at > org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389) > at > > Because the stream is started in a StreamPool synchronized block, all other > threads interacting with StreamPool to get or release streams end up blocking. > It is unclear if the stream never became usable and thus blocked forever or > if there is a race with the use of the Phaser that causes the stuckness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9727) Auto populate required feature experiment flags for enable dataflow runner v2
[ https://issues.apache.org/jira/browse/BEAM-9727?focusedWorklogId=419843=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419843 ] ASF GitHub Bot logged work on BEAM-9727: Author: ASF GitHub Bot Created on: 09/Apr/20 22:31 Start Date: 09/Apr/20 22:31 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11355: [BEAM-9727] Automatically set required experiment flags for dataflow runner v2. URL: https://github.com/apache/beam/pull/11355#issuecomment-611785115 > > Consider adding the logic here instead: > > https://github.com/apache/beam/blob/79b2d87b59819ee55fb8600e8a845c6ba5b98d64/sdks/python/apache_beam/pipeline.py#L206 > > > > This would add the experiment slightly earlier then when the first ptransform is applied. > > I initially considered to add it into pipeline.py following this check, but feel that the logic is too dataflow-specific to be added here. sgtm, in the worst case we can tell users to use a longer list of experiments if we find a problem later. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419843) Time Spent: 3h 10m (was: 3h) > Auto populate required feature experiment flags for enable dataflow runner v2 > - > > Key: BEAM-9727 > URL: https://issues.apache.org/jira/browse/BEAM-9727 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1819) Key should be available in @OnTimer methods
[ https://issues.apache.org/jira/browse/BEAM-1819?focusedWorklogId=419840=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419840 ] ASF GitHub Bot logged work on BEAM-1819: Author: ASF GitHub Bot Created on: 09/Apr/20 22:25 Start Date: 09/Apr/20 22:25 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #11154: [BEAM-1819] Key should be available in @OnTimer methods URL: https://github.com/apache/beam/pull/11154#issuecomment-611782665 There appears to be a compilation failure in Java This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419840) Time Spent: 16h 40m (was: 16.5h) > Key should be available in @OnTimer methods > --- > > Key: BEAM-1819 > URL: https://issues.apache.org/jira/browse/BEAM-1819 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Thomas Groh >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 16h 40m > Remaining Estimate: 0h > > Every timer firing has an associated key. This key should be available when > the timer is delivered to a user's {{DoFn}}, so they don't have to store it > in state. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1589) Add OnWindowExpiration method to Stateful DoFn
[ https://issues.apache.org/jira/browse/BEAM-1589?focusedWorklogId=419839=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419839 ] ASF GitHub Bot logged work on BEAM-1589: Author: ASF GitHub Bot Created on: 09/Apr/20 22:24 Start Date: 09/Apr/20 22:24 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #11350: [BEAM-1589] Added @onWindowExpiration annotation. URL: https://github.com/apache/beam/pull/11350#discussion_r406510585 ## File path: runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingModeExecutionContext.java ## @@ -591,7 +591,7 @@ public void flushState() { timerId, "", cleanupTime, - cleanupTime, Review comment: We also seem to set GC timers in ReduceFnRunner.java. @kennknowles do you know why we have both? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419839) Time Spent: 2h 50m (was: 2h 40m) > Add OnWindowExpiration method to Stateful DoFn > -- > > Key: BEAM-1589 > URL: https://issues.apache.org/jira/browse/BEAM-1589 > Project: Beam > Issue Type: New Feature > Components: runner-core, sdk-java-core >Reporter: Jingsong Lee >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > See BEAM-1517 > This allows the user to do some work before the state's garbage collection. > It seems kind of annoying, but on the other hand forgetting to set a final > timer to flush state is probably data loss most of the time. > FlinkRunner does this work very simply, but other runners, such as > DirectRunner, need to traverse all the states to do this, and maybe it's a > little hard. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1589) Add OnWindowExpiration method to Stateful DoFn
[ https://issues.apache.org/jira/browse/BEAM-1589?focusedWorklogId=419837=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419837 ] ASF GitHub Bot logged work on BEAM-1589: Author: ASF GitHub Bot Created on: 09/Apr/20 22:24 Start Date: 09/Apr/20 22:24 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #11350: [BEAM-1589] Added @onWindowExpiration annotation. URL: https://github.com/apache/beam/pull/11350#discussion_r406508585 ## File path: runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java ## @@ -841,6 +847,238 @@ public BundleFinalizer bundleFinalizer() { } } + /** + * A concrete implementation of {@link DoFnInvoker.ArgumentProvider} used for running a {@link + * DoFn} on window expiration. + */ + private class OnWindowExpirationArgumentProvider extends DoFn.OnTimerContext Review comment: Why is this based on OnTimerContext? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419837) Time Spent: 2h 40m (was: 2.5h) > Add OnWindowExpiration method to Stateful DoFn > -- > > Key: BEAM-1589 > URL: https://issues.apache.org/jira/browse/BEAM-1589 > Project: Beam > Issue Type: New Feature > Components: runner-core, sdk-java-core >Reporter: Jingsong Lee >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > See BEAM-1517 > This allows the user to do some work before the state's garbage collection. > It seems kind of annoying, but on the other hand forgetting to set a final > timer to flush state is probably data loss most of the time. > FlinkRunner does this work very simply, but other runners, such as > DirectRunner, need to traverse all the states to do this, and maybe it's a > little hard. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1589) Add OnWindowExpiration method to Stateful DoFn
[ https://issues.apache.org/jira/browse/BEAM-1589?focusedWorklogId=419838=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419838 ] ASF GitHub Bot logged work on BEAM-1589: Author: ASF GitHub Bot Created on: 09/Apr/20 22:24 Start Date: 09/Apr/20 22:24 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #11350: [BEAM-1589] Added @onWindowExpiration annotation. URL: https://github.com/apache/beam/pull/11350#discussion_r406508479 ## File path: runners/core-java/src/main/java/org/apache/beam/runners/core/DoFnRunner.java ## @@ -52,6 +52,12 @@ void onTimer( */ void finishBundle(); + /** + * Calls a {@link DoFn DoFn's} {@link DoFn.OnWindowExpiration @OnWindowExpiration} method and + * performs additional task, such as extracts a value saved in a state before garbage collection. + */ + void onWindowExpiration(BoundedWindow window, Instant timestamp, TimeDomain timeDomain); Review comment: What do timestamp and timeDomain mean in this context? Also presumably you do want to be able to access the key in onWindowExpiration This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419838) Time Spent: 2h 40m (was: 2.5h) > Add OnWindowExpiration method to Stateful DoFn > -- > > Key: BEAM-1589 > URL: https://issues.apache.org/jira/browse/BEAM-1589 > Project: Beam > Issue Type: New Feature > Components: runner-core, sdk-java-core >Reporter: Jingsong Lee >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > See BEAM-1517 > This allows the user to do some work before the state's garbage collection. > It seems kind of annoying, but on the other hand forgetting to set a final > timer to flush state is probably data loss most of the time. > FlinkRunner does this work very simply, but other runners, such as > DirectRunner, need to traverse all the states to do this, and maybe it's a > little hard. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9735) Performance regression in Python Batch pipeline in Reshuffle
[ https://issues.apache.org/jira/browse/BEAM-9735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Goenka updated BEAM-9735: --- Priority: Blocker (was: Major) > Performance regression in Python Batch pipeline in Reshuffle > > > Key: BEAM-9735 > URL: https://issues.apache.org/jira/browse/BEAM-9735 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Blocker > Fix For: 2.21.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1819) Key should be available in @OnTimer methods
[ https://issues.apache.org/jira/browse/BEAM-1819?focusedWorklogId=419823=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419823 ] ASF GitHub Bot logged work on BEAM-1819: Author: ASF GitHub Bot Created on: 09/Apr/20 22:10 Start Date: 09/Apr/20 22:10 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #11154: [BEAM-1819] Key should be available in @OnTimer methods URL: https://github.com/apache/beam/pull/11154#issuecomment-611777422 Run Java Flink PortableValidatesRunner Streaming This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419823) Time Spent: 16.5h (was: 16h 20m) > Key should be available in @OnTimer methods > --- > > Key: BEAM-1819 > URL: https://issues.apache.org/jira/browse/BEAM-1819 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Thomas Groh >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 16.5h > Remaining Estimate: 0h > > Every timer firing has an associated key. This key should be available when > the timer is delivered to a user's {{DoFn}}, so they don't have to store it > in state. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1819) Key should be available in @OnTimer methods
[ https://issues.apache.org/jira/browse/BEAM-1819?focusedWorklogId=419820=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419820 ] ASF GitHub Bot logged work on BEAM-1819: Author: ASF GitHub Bot Created on: 09/Apr/20 22:09 Start Date: 09/Apr/20 22:09 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #11154: [BEAM-1819] Key should be available in @OnTimer methods URL: https://github.com/apache/beam/pull/11154#issuecomment-611776987 run dataflow validatesrunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419820) Time Spent: 16h (was: 15h 50m) > Key should be available in @OnTimer methods > --- > > Key: BEAM-1819 > URL: https://issues.apache.org/jira/browse/BEAM-1819 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Thomas Groh >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 16h > Remaining Estimate: 0h > > Every timer firing has an associated key. This key should be available when > the timer is delivered to a user's {{DoFn}}, so they don't have to store it > in state. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1819) Key should be available in @OnTimer methods
[ https://issues.apache.org/jira/browse/BEAM-1819?focusedWorklogId=419821=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419821 ] ASF GitHub Bot logged work on BEAM-1819: Author: ASF GitHub Bot Created on: 09/Apr/20 22:09 Start Date: 09/Apr/20 22:09 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #11154: [BEAM-1819] Key should be available in @OnTimer methods URL: https://github.com/apache/beam/pull/11154#issuecomment-611777030 run flink validatesrunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419821) Time Spent: 16h 10m (was: 16h) > Key should be available in @OnTimer methods > --- > > Key: BEAM-1819 > URL: https://issues.apache.org/jira/browse/BEAM-1819 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Thomas Groh >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 16h 10m > Remaining Estimate: 0h > > Every timer firing has an associated key. This key should be available when > the timer is delivered to a user's {{DoFn}}, so they don't have to store it > in state. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1819) Key should be available in @OnTimer methods
[ https://issues.apache.org/jira/browse/BEAM-1819?focusedWorklogId=419822=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419822 ] ASF GitHub Bot logged work on BEAM-1819: Author: ASF GitHub Bot Created on: 09/Apr/20 22:09 Start Date: 09/Apr/20 22:09 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #11154: [BEAM-1819] Key should be available in @OnTimer methods URL: https://github.com/apache/beam/pull/11154#issuecomment-611777072 run spark validatesrunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419822) Time Spent: 16h 20m (was: 16h 10m) > Key should be available in @OnTimer methods > --- > > Key: BEAM-1819 > URL: https://issues.apache.org/jira/browse/BEAM-1819 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Thomas Groh >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 16h 20m > Remaining Estimate: 0h > > Every timer firing has an associated key. This key should be available when > the timer is delivered to a user's {{DoFn}}, so they don't have to store it > in state. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9727) Auto populate required feature experiment flags for enable dataflow runner v2
[ https://issues.apache.org/jira/browse/BEAM-9727?focusedWorklogId=419818=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419818 ] ASF GitHub Bot logged work on BEAM-9727: Author: ASF GitHub Bot Created on: 09/Apr/20 22:08 Start Date: 09/Apr/20 22:08 Worklog Time Spent: 10m Work Description: ananvay commented on issue #11355: [BEAM-9727] Automatically set required experiment flags for dataflow runner v2. URL: https://github.com/apache/beam/pull/11355#issuecomment-611776464 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419818) Time Spent: 3h (was: 2h 50m) > Auto populate required feature experiment flags for enable dataflow runner v2 > - > > Key: BEAM-9727 > URL: https://issues.apache.org/jira/browse/BEAM-9727 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 3h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9735) Performance regression in Python Batch pipeline in Reshuffle
[ https://issues.apache.org/jira/browse/BEAM-9735?focusedWorklogId=419819=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419819 ] ASF GitHub Bot logged work on BEAM-9735: Author: ASF GitHub Bot Created on: 09/Apr/20 22:08 Start Date: 09/Apr/20 22:08 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #11365: [BEAM-9735] Adding Always trigger and using it in Reshuffle URL: https://github.com/apache/beam/pull/11365 R: @robertwb **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
[jira] [Created] (BEAM-9735) Performance regression in Python Batch pipeline in Reshuffle
Ankur Goenka created BEAM-9735: -- Summary: Performance regression in Python Batch pipeline in Reshuffle Key: BEAM-9735 URL: https://issues.apache.org/jira/browse/BEAM-9735 Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: Ankur Goenka Assignee: Ankur Goenka Fix For: 2.21.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
[ https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419817=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419817 ] ASF GitHub Bot logged work on BEAM-9651: Author: ASF GitHub Bot Created on: 09/Apr/20 22:04 Start Date: 09/Apr/20 22:04 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #11364: [BEAM-9651] Prevent StreamPool and stream initialization livelock URL: https://github.com/apache/beam/pull/11364 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419817) Time Spent: 40m (was: 0.5h) > StreamingDataflowWorker stuck waiting for > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext > -- > > Key: BEAM-9651 > URL: https://issues.apache.org/jira/browse/BEAM-9651 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Sam Whittle >Assignee: Sam Whittle >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Operation ongoing in step for at least 28h10m00s without > outputting or completing in state windmill-read at > sun.misc.Unsafe.park(Native Method) at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at > java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at > java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at > java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown > Source) at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328) > at > org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389) > at > > Because the stream is started in a StreamPool synchronized block, all other > threads interacting with StreamPool to get or release streams end up blocking. > It is unclear if the stream never became usable and thus blocked forever or > if there is a race with the use of the Phaser that causes the stuckness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9278) Make HBase client a provided dependency in HBaseIO
[ https://issues.apache.org/jira/browse/BEAM-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía reassigned BEAM-9278: -- Assignee: (was: Ismaël Mejía) > Make HBase client a provided dependency in HBaseIO > -- > > Key: BEAM-9278 > URL: https://issues.apache.org/jira/browse/BEAM-9278 > Project: Beam > Issue Type: Improvement > Components: io-java-hbase >Reporter: Ismaël Mejía >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > HBaseIO relies on included version of `hbase-shaded-client`. However it is > common that Hadoop distributions provide their own client versions, so it is > a good idea to allow users to provide their own shaded client versions. > This in particular enables users to provide HBase 2 based dependencies. We > are not yet testing deps against Beam due to unrelated issues at BEAM-7435 > (so Beam internal testing is still WIP). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=419813=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419813 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 09/Apr/20 22:01 Start Date: 09/Apr/20 22:01 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on pull request #11243: [BEAM-9136]Add licenses for dependencies for Java URL: https://github.com/apache/beam/pull/11243#discussion_r406502326 ## File path: sdks/java/container/build.gradle ## @@ -16,7 +16,11 @@ * limitations under the License. */ -plugins { id 'org.apache.beam.module' } +plugins { + id 'org.apache.beam.module' + id 'com.github.jk1.dependency-license-report' version '1.13' Review comment: If I understand it correctly, dependencies without licenses/notice included in jar are *considered* as the missing dependencies at first. However, the reality is dependency scan is correct, it only missing licenses. `com.github.jk1.dependency-license-report` generates `index.json` file as well, which provides a list of all dependencies. I used this file to go through all dependencies and pull licenses/notices if they are not pulled automatically. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419813) Time Spent: 20h 40m (was: 20.5h) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 20h 40m > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=419814=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419814 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 09/Apr/20 22:01 Start Date: 09/Apr/20 22:01 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on pull request #11243: [BEAM-9136]Add licenses for dependencies for Java URL: https://github.com/apache/beam/pull/11243#discussion_r406502531 ## File path: sdks/java/container/license_scripts/license_script.sh ## @@ -0,0 +1,37 @@ + # Licensed to the Apache Software Foundation (ASF) under one + # or more contributor license agreements. See the NOTICE file + # distributed with this work for additional information + # regarding copyright ownership. The ASF licenses this file + # to you under the Apache License, Version 2.0 (the + # License); you may not use this file except in compliance + # with the License. You may obtain a copy of the License at + # + # http://www.apache.org/licenses/LICENSE-2.0 + # + # Unless required by applicable law or agreed to in writing, software + # distributed under the License is distributed on an AS IS BASIS, + # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + # See the License for the specific language governing permissions and + # limitations under the License. + +set -e +# reports are generated at ~/beam/java_third_party_licenses +./gradlew generateLicenseReport --rerun-tasks Review comment: This is same as answered below. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419814) Time Spent: 20h 50m (was: 20h 40m) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 20h 50m > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-7770) Add ReadAll transform for SolrIO
[ https://issues.apache.org/jira/browse/BEAM-7770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-7770 started by Ismaël Mejía. -- > Add ReadAll transform for SolrIO > > > Key: BEAM-7770 > URL: https://issues.apache.org/jira/browse/BEAM-7770 > Project: Beam > Issue Type: Improvement > Components: io-java-solr >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > SolrIO already uses internally a composable approach but we need to expose an > explicit ReadAll transform that allows user to create reads in the middle of > the Pipeline to improve composability (e.g. Reads in the middle of a > Pipeline). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-7429) Rename ReadAllViaFileBasedSource to ReadFilesViaFileBasedSource
[ https://issues.apache.org/jira/browse/BEAM-7429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía reassigned BEAM-7429: -- Assignee: Ismaël Mejía > Rename ReadAllViaFileBasedSource to ReadFilesViaFileBasedSource > --- > > Key: BEAM-7429 > URL: https://issues.apache.org/jira/browse/BEAM-7429 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > > `ReadAllViaFileBasedSource` is not used by the `ReadAll` transform, but by > the `ReadFiles` transform and uses ReadableFile objects so it makes sense to > better call it `ReadFilesViaFileBasedSource`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-7429) Rename ReadAllViaFileBasedSource to ReadFilesViaFileBasedSource
[ https://issues.apache.org/jira/browse/BEAM-7429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía reassigned BEAM-7429: -- Assignee: (was: Ismaël Mejía) > Rename ReadAllViaFileBasedSource to ReadFilesViaFileBasedSource > --- > > Key: BEAM-7429 > URL: https://issues.apache.org/jira/browse/BEAM-7429 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ismaël Mejía >Priority: Minor > > `ReadAllViaFileBasedSource` is not used by the `ReadAll` transform, but by > the `ReadFiles` transform and uses ReadableFile objects so it makes sense to > better call it `ReadFilesViaFileBasedSource`. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-9095) API improvements for KuduIO
[ https://issues.apache.org/jira/browse/BEAM-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-9095 started by Ismaël Mejía. -- > API improvements for KuduIO > --- > > Key: BEAM-9095 > URL: https://issues.apache.org/jira/browse/BEAM-9095 > Project: Beam > Issue Type: Improvement > Components: io-java-kudu >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > > KuduIO API has some minor issues that could easily be fixed including typos, > inconsistent (with kudu-client) types for some parameters and Beam transform > style improvements. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9726) Don't require --region for non-service Dataflow endpoints.
[ https://issues.apache.org/jira/browse/BEAM-9726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver resolved BEAM-9726. --- Resolution: Fixed > Don't require --region for non-service Dataflow endpoints. > -- > > Key: BEAM-9726 > URL: https://issues.apache.org/jira/browse/BEAM-9726 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Fix For: 2.21.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Some Dataflow internal tests don't run on the real Dataflow service. Since > region only applies to the real Dataflow service, we should not require these > tests to specify a region. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9428) CVEs in the dependencies of hive-exec for HiveIO
[ https://issues.apache.org/jira/browse/BEAM-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía reassigned BEAM-9428: -- Assignee: (was: Ismaël Mejía) > CVEs in the dependencies of hive-exec for HiveIO > > > Key: BEAM-9428 > URL: https://issues.apache.org/jira/browse/BEAM-9428 > Project: Beam > Issue Type: Bug > Components: io-java-hcatalog >Reporter: XuCongying >Priority: Major > Attachments: apache-beam_CVE-report.md > > > Hello, Your project uses some dependencies with CVEs. I found that the buggy > methods of the CVEs are in the program execution path of your project, which > makes your project at risk. I suggest a library update. See details below: > * *Vulnerable Dependency:* org.apache.hive : hive-exec : 2.1.0 > * *Call Chain to Buggy Methods:* > ** *Some files in your project call the library method > org.apache.hadoop.hive.ql.Driver.run(java.lang.String), which can reach the > buggy method of > [CVE-2017-12625|https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-12625].* > *** Files in your project: > sdks/java/io/hcatalog/src/main/java/org/apache/beam/sdk/io/hcatalog/test/EmbeddedMetastoreService.java > *** One of the possible call chain: > org.apache.hadoop.hive.ql.Driver.run(java.lang.String) > org.apache.hadoop.hive.ql.Driver.run(java.lang.String,boolean) > org.apache.hadoop.hive.ql.Driver.runInternal(java.lang.String,boolean) > org.apache.hadoop.hive.ql.Driver.compileInternal(java.lang.String) > org.apache.hadoop.hive.ql.Driver.compile(java.lang.String) > org.apache.hadoop.hive.ql.Driver.compile(java.lang.String,boolean) > org.apache.hadoop.hive.ql.parse.ParseDriver.parse(java.lang.String,org.apache.hadoop.hive.ql.Context) > [buggy method] > ** *Update suggestion:* version 3.1.2 3.1.2 is a safe version without CVEs. > From 2.1.0 to 3.1.2, 2 of the APIs (called by 2 times in your project) were > removed, 3 APIs (called by 3 times in your project) were modified. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-9406) Convert KuduIO away from BoundedSource
[ https://issues.apache.org/jira/browse/BEAM-9406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-9406 started by Ismaël Mejía. -- > Convert KuduIO away from BoundedSource > -- > > Key: BEAM-9406 > URL: https://issues.apache.org/jira/browse/BEAM-9406 > Project: Beam > Issue Type: Improvement > Components: io-java-kudu >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > > Convert KuduIO to use the DoFn API instead of BoundedSource to be consistent > with recent patterns of use on Beam. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9351) Upgrade Hive/HCatalog to version 2.3.6
[ https://issues.apache.org/jira/browse/BEAM-9351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía reassigned BEAM-9351: -- Assignee: (was: Ismaël Mejía) > Upgrade Hive/HCatalog to version 2.3.6 > -- > > Key: BEAM-9351 > URL: https://issues.apache.org/jira/browse/BEAM-9351 > Project: Beam > Issue Type: Improvement > Components: io-java-hcatalog >Reporter: Ismaël Mejía >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > Beam Hive/HCatalog dependency is a bit outdated, it is probably a good idea > to update it to the most recent stable stable version of the 2.x.x line. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=419808=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419808 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 09/Apr/20 21:45 Start Date: 09/Apr/20 21:45 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on pull request #11243: [BEAM-9136]Add licenses for dependencies for Java URL: https://github.com/apache/beam/pull/11243#discussion_r406495748 ## File path: sdks/java/container/license_scripts/pull_licenses_java.py ## @@ -0,0 +1,186 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +""" +A script to pull licenses/notices/source code for Java dependencies. Review comment: We can do that by adding some more code to the scripts. I will add it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419808) Time Spent: 20.5h (was: 20h 20m) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 20.5h > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9734) Revert https://github.com/apache/beam/pull/11122 which is a potential regression
[ https://issues.apache.org/jira/browse/BEAM-9734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9734: --- Status: Open (was: Triage Needed) > Revert https://github.com/apache/beam/pull/11122 which is a potential > regression > > > Key: BEAM-9734 > URL: https://issues.apache.org/jira/browse/BEAM-9734 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Blocker > Fix For: 2.21.0 > > > This is potentially a regression for Dataflow. We should revert and > re-introduce as an optional change that can be controlled by a user option. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=419806=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419806 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 09/Apr/20 21:43 Start Date: 09/Apr/20 21:43 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on pull request #11243: [BEAM-9136]Add licenses for dependencies for Java URL: https://github.com/apache/beam/pull/11243#discussion_r406494846 ## File path: sdks/java/container/build.gradle ## @@ -68,6 +73,28 @@ golang { } } +// this is a workaround to call generateLicenseReport task from project root directory. +// generateLicenseReport does not return correct dependency list when not called from the root. +task generateThirdPartyLicenses(type: Exec) { + workingDir project.rootProject.projectDir + commandLine './sdks/java/container/license_scripts/license_script.sh' Review comment: When the `generateLicenseReport` task is triggered by `dependsOn`, the task is not running at the project root directory regardless how to set project scope, and dependency list is not correct. This is a workaround to execute the `generateLicenseReport` from root directory and get correct dependency list. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419806) Time Spent: 20h 20m (was: 20h 10m) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 20h 20m > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=419801=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419801 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 09/Apr/20 21:37 Start Date: 09/Apr/20 21:37 Worklog Time Spent: 10m Work Description: alanmyrvold commented on pull request #11243: [BEAM-9136]Add licenses for dependencies for Java URL: https://github.com/apache/beam/pull/11243#discussion_r406487569 ## File path: sdks/java/container/build.gradle ## @@ -16,7 +16,11 @@ * limitations under the License. */ -plugins { id 'org.apache.beam.module' } +plugins { + id 'org.apache.beam.module' + id 'com.github.jk1.dependency-license-report' version '1.13' Review comment: Kenn mentioned hierynomus/license-gradle-plugin found more accurate license dependencies; does that need to be added? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419801) Time Spent: 19h 50m (was: 19h 40m) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 19h 50m > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=419804=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419804 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 09/Apr/20 21:37 Start Date: 09/Apr/20 21:37 Worklog Time Spent: 10m Work Description: alanmyrvold commented on pull request #11243: [BEAM-9136]Add licenses for dependencies for Java URL: https://github.com/apache/beam/pull/11243#discussion_r40641 ## File path: sdks/java/container/build.gradle ## @@ -68,6 +73,28 @@ golang { } } +// this is a workaround to call generateLicenseReport task from project root directory. +// generateLicenseReport does not return correct dependency list when not called from the root. +task generateThirdPartyLicenses(type: Exec) { + workingDir project.rootProject.projectDir + commandLine './sdks/java/container/license_scripts/license_script.sh' Review comment: Why does the gradle task need to start a shell script that runs gradle? Can the gradle task be a dependency instead? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419804) Time Spent: 20h 10m (was: 20h) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 20h 10m > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=419803=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419803 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 09/Apr/20 21:37 Start Date: 09/Apr/20 21:37 Worklog Time Spent: 10m Work Description: alanmyrvold commented on pull request #11243: [BEAM-9136]Add licenses for dependencies for Java URL: https://github.com/apache/beam/pull/11243#discussion_r406488322 ## File path: sdks/java/container/license_scripts/license_script.sh ## @@ -0,0 +1,37 @@ + # Licensed to the Apache Software Foundation (ASF) under one + # or more contributor license agreements. See the NOTICE file + # distributed with this work for additional information + # regarding copyright ownership. The ASF licenses this file + # to you under the Apache License, Version 2.0 (the + # License); you may not use this file except in compliance + # with the License. You may obtain a copy of the License at + # + # http://www.apache.org/licenses/LICENSE-2.0 + # + # Unless required by applicable law or agreed to in writing, software + # distributed under the License is distributed on an AS IS BASIS, + # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + # See the License for the specific language governing permissions and + # limitations under the License. + +set -e +# reports are generated at ~/beam/java_third_party_licenses +./gradlew generateLicenseReport --rerun-tasks Review comment: Why does the gradle task need to run gradle? Can it be a dependency above? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419803) Time Spent: 20h 10m (was: 20h) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 20h 10m > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?focusedWorklogId=419802=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419802 ] ASF GitHub Bot logged work on BEAM-9136: Author: ASF GitHub Bot Created on: 09/Apr/20 21:37 Start Date: 09/Apr/20 21:37 Worklog Time Spent: 10m Work Description: alanmyrvold commented on pull request #11243: [BEAM-9136]Add licenses for dependencies for Java URL: https://github.com/apache/beam/pull/11243#discussion_r406491805 ## File path: sdks/java/container/license_scripts/pull_licenses_java.py ## @@ -0,0 +1,186 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +""" +A script to pull licenses/notices/source code for Java dependencies. Review comment: In addition to pulling the licenses, can this script generate a CSV of Library name, link to license, license name (like Apache, LGPL, etc), and whether source is downloaded/included? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419802) Time Spent: 20h (was: 19h 50m) > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 20h > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9722) Add SnowflakeIO to Java SDK
[ https://issues.apache.org/jira/browse/BEAM-9722?focusedWorklogId=419800=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419800 ] ASF GitHub Bot logged work on BEAM-9722: Author: ASF GitHub Bot Created on: 09/Apr/20 21:37 Start Date: 09/Apr/20 21:37 Worklog Time Spent: 10m Work Description: takidau commented on pull request #11360: [WIP][BEAM-9722] added SnowflakeIO with Read operation URL: https://github.com/apache/beam/pull/11360#discussion_r406491845 ## File path: sdks/java/io/snowflake/src/main/java/org/apache/beam/sdk/io/snowflake/SnowflakeIO.java ## @@ -0,0 +1,691 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.snowflake; + +import static org.apache.beam.sdk.io.TextIO.readFiles; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; + +import com.google.auto.value.AutoValue; +import com.opencsv.CSVParser; +import com.opencsv.CSVParserBuilder; +import java.io.IOException; +import java.io.Serializable; +import java.security.PrivateKey; +import java.sql.Connection; +import java.sql.SQLException; +import java.text.SimpleDateFormat; +import java.util.Date; +import java.util.UUID; +import java.util.concurrent.ConcurrentHashMap; +import javax.annotation.Nullable; +import javax.sql.DataSource; +import net.snowflake.client.jdbc.SnowflakeBasicDataSource; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.io.snowflake.credentials.KeyPairSnowflakeCredentials; +import org.apache.beam.sdk.io.snowflake.credentials.OAuthTokenSnowflakeCredentials; +import org.apache.beam.sdk.io.snowflake.credentials.SnowflakeCredentials; +import org.apache.beam.sdk.io.snowflake.credentials.UsernamePasswordSnowflakeCredentials; +import org.apache.beam.sdk.options.ValueProvider; +import org.apache.beam.sdk.transforms.Create; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.SerializableFunction; +import org.apache.beam.sdk.transforms.Wait; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.transforms.display.HasDisplayData; +import org.apache.beam.sdk.values.PBegin; +import org.apache.beam.sdk.values.PCollection; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * IO to read and write data on Snowflake. + * + * SnowflakeIO uses https://docs.snowflake.net/manuals/user-guide/jdbc.html;>Snowflake + * JDBC driver under the hood, but data isn't read/written using JDBC directly. Instead, + * SnowflakeIO uses dedicated COPY operations to read/write data from/to Google Cloud + * Storage. + * + * To configure SnowflakeIO to read/write from your Snowflake instance, you have to provide a + * {@link DataSourceConfiguration} using {@link + * DataSourceConfiguration#create(SnowflakeCredentials)}, where {@link SnowflakeCredentials might be + * created using {@link org.apache.beam.sdk.io.snowflake.credentials.SnowflakeCredentialsFactory}}. + * Additionally one of {@link DataSourceConfiguration#withServerName(String)} or {@link + * DataSourceConfiguration#withUrl(String)} must be used to tell SnowflakeIO which instance to use. + * + * There are also other options available to configure connection to Snowflake: + * + * + * {@link DataSourceConfiguration#withWarehouse(String)} to specify which Warehouse to use + * {@link DataSourceConfiguration#withDatabase(String)} to specify which Database to connect + * to + * {@link DataSourceConfiguration#withSchema(String)} to specify which schema to use + * {@link DataSourceConfiguration#withRole(String)} to specify which role to use + * {@link DataSourceConfiguration#withPortNumber(int)} to specify custom port of Snowflake + * instance + * + * + * For example: + * + * {@code + * SnowflakeIO.DataSourceConfiguration dataSourceConfiguration = + * SnowflakeIO.DataSourceConfiguration.create(SnowflakeCredentialsFactory.of(options)) + *
[jira] [Work logged] (BEAM-9727) Auto populate required feature experiment flags for enable dataflow runner v2
[ https://issues.apache.org/jira/browse/BEAM-9727?focusedWorklogId=419799=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419799 ] ASF GitHub Bot logged work on BEAM-9727: Author: ASF GitHub Bot Created on: 09/Apr/20 21:37 Start Date: 09/Apr/20 21:37 Worklog Time Spent: 10m Work Description: y1chi commented on issue #11355: [BEAM-9727] Automatically set required experiment flags for dataflow runner v2. URL: https://github.com/apache/beam/pull/11355#issuecomment-611764830 > Consider adding the logic here instead: > https://github.com/apache/beam/blob/79b2d87b59819ee55fb8600e8a845c6ba5b98d64/sdks/python/apache_beam/pipeline.py#L206 > > This would add the experiment slightly earlier then when the first ptransform is applied. I initially considered to add it into pipeline.py following this check, but feel that the logic is too dataflow-specific to be added here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419799) Time Spent: 2h 50m (was: 2h 40m) > Auto populate required feature experiment flags for enable dataflow runner v2 > - > > Key: BEAM-9727 > URL: https://issues.apache.org/jira/browse/BEAM-9727 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419796=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419796 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 09/Apr/20 21:36 Start Date: 09/Apr/20 21:36 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #11314: [BEAM-9562] Send Timers over Data Channel as Elements URL: https://github.com/apache/beam/pull/11314#discussion_r406491316 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py ## @@ -536,7 +525,8 @@ def _run_stage(self, runner_execution_context, bundle_context_manager, data_input, -data_output, +data_output, {}, Review comment: yapf helps me put the {} here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419796) Time Spent: 20h 20m (was: 20h 10m) > Remove timer from PCollection and treat timers as Elements > --- > > Key: BEAM-9562 > URL: https://issues.apache.org/jira/browse/BEAM-9562 > Project: Beam > Issue Type: New Feature > Components: sdk-java-harness, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 20h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
[ https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419797=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419797 ] ASF GitHub Bot logged work on BEAM-9651: Author: ASF GitHub Bot Created on: 09/Apr/20 21:36 Start Date: 09/Apr/20 21:36 Worklog Time Spent: 10m Work Description: dpmills commented on issue #11364: [BEAM-9651] Prevent StreamPool and stream initialization livelock URL: https://github.com/apache/beam/pull/11364#issuecomment-611764743 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419797) Time Spent: 0.5h (was: 20m) > StreamingDataflowWorker stuck waiting for > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext > -- > > Key: BEAM-9651 > URL: https://issues.apache.org/jira/browse/BEAM-9651 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Sam Whittle >Assignee: Sam Whittle >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Operation ongoing in step for at least 28h10m00s without > outputting or completing in state windmill-read at > sun.misc.Unsafe.park(Native Method) at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at > java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at > java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at > java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown > Source) at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328) > at > org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389) > at > > Because the stream is started in a StreamPool synchronized block, all other > threads interacting with StreamPool to get or release streams end up blocking. > It is unclear if the stream never became usable and thus blocked forever or > if there is a race with the use of the Phaser that causes the stuckness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419793=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419793 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 09/Apr/20 21:34 Start Date: 09/Apr/20 21:34 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #11314: [BEAM-9562] Send Timers over Data Channel as Elements URL: https://github.com/apache/beam/pull/11314#discussion_r406490454 ## File path: sdks/python/apache_beam/transforms/core.py ## @@ -1272,6 +1272,8 @@ def expand(self, pcoll): key_coder = coder.key_coder() else: key_coder = coders.registry.get_coder(typehints.Any) + self.window_coder = pcoll.windowing.windowfn.get_window_coder() Review comment: No. Will removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419793) Time Spent: 19h 50m (was: 19h 40m) > Remove timer from PCollection and treat timers as Elements > --- > > Key: BEAM-9562 > URL: https://issues.apache.org/jira/browse/BEAM-9562 > Project: Beam > Issue Type: New Feature > Components: sdk-java-harness, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 19h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419795=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419795 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 09/Apr/20 21:34 Start Date: 09/Apr/20 21:34 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #11314: [BEAM-9562] Send Timers over Data Channel as Elements URL: https://github.com/apache/beam/pull/11314#discussion_r406490556 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py ## @@ -987,8 +1019,10 @@ def __init__( def process_bundle(self, inputs, # type: Mapping[str, PartitionableBuffer] - expected_outputs # type: DataOutput -): + expected_outputs, # type: DataOutput + fired_timers, # type: Mapping[str, Mapping[str, PartitionableBuffer]] Review comment: I updated the `fired_timers` implementation but forgot to update the typing here. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419795) Time Spent: 20h 10m (was: 20h) > Remove timer from PCollection and treat timers as Elements > --- > > Key: BEAM-9562 > URL: https://issues.apache.org/jira/browse/BEAM-9562 > Project: Beam > Issue Type: New Feature > Components: sdk-java-harness, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 20h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419794=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419794 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 09/Apr/20 21:34 Start Date: 09/Apr/20 21:34 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #11314: [BEAM-9562] Send Timers over Data Channel as Elements URL: https://github.com/apache/beam/pull/11314#discussion_r406490504 ## File path: sdks/python/apache_beam/runners/worker/bundle_processor.py ## @@ -837,25 +869,59 @@ def process_bundle(self, instruction_id): op.execution_context = execution_context op.start() - # Inject inputs from data plane. + # Each data_channel is mapped to a list of expected inputs which includes + # both data input and timer input. The data input is identied by + # transform_id. The data input is identified by + # (transform_id, timer_family_id). data_channels = collections.defaultdict( list ) # type: DefaultDict[data_plane.GrpcClientDataChannel, List[str]] + + # Inject data inputs from data plane. Review comment: Updated the comment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419794) Time Spent: 20h (was: 19h 50m) > Remove timer from PCollection and treat timers as Elements > --- > > Key: BEAM-9562 > URL: https://issues.apache.org/jira/browse/BEAM-9562 > Project: Beam > Issue Type: New Feature > Components: sdk-java-harness, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 20h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
[ https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419783=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419783 ] ASF GitHub Bot logged work on BEAM-9651: Author: ASF GitHub Bot Created on: 09/Apr/20 21:16 Start Date: 09/Apr/20 21:16 Worklog Time Spent: 10m Work Description: scwhittle commented on issue #11364: [BEAM-9651] Prevent StreamPool and stream initialization livelock URL: https://github.com/apache/beam/pull/11364#issuecomment-611756669 R: @dpmills @reuvenlax This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419783) Time Spent: 20m (was: 10m) > StreamingDataflowWorker stuck waiting for > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext > -- > > Key: BEAM-9651 > URL: https://issues.apache.org/jira/browse/BEAM-9651 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Sam Whittle >Assignee: Sam Whittle >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Operation ongoing in step for at least 28h10m00s without > outputting or completing in state windmill-read at > sun.misc.Unsafe.park(Native Method) at > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at > java.util.concurrent.Phaser$QNode.block(Phaser.java:1140) at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at > java.util.concurrent.Phaser.internalAwaitAdvance(Phaser.java:1067) at > java.util.concurrent.Phaser.awaitAdvanceInterruptibly(Phaser.java:758) at > org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext(DirectStreamObserver.java:49) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.send(GrpcWindmillServer.java:615) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.onNewStream(GrpcWindmillServer.java:946) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$AbstractWindmillStream.startStream(GrpcWindmillServer.java:628) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer$GrpcGetDataStream.(GrpcWindmillServer.java:941) > at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.getDataStream(GrpcWindmillServer.java:506) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub$$Lambda$129/665137804.get(Unknown > Source) at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:159) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool$StreamData.(WindmillServerStub.java:158) > at > org.apache.beam.runners.dataflow.worker.windmill.WindmillServerStub$StreamPool.getStream(WindmillServerStub.java:191) > at > org.apache.beam.runners.dataflow.worker.MetricTrackingWindmillServerStub.getStateData(MetricTrackingWindmillServerStub.java:199) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader.startBatchAndBlock(WindmillStateReader.java:433) > at > org.apache.beam.runners.dataflow.worker.WindmillStateReader$WrappedFuture.get(WindmillStateReader.java:328) > at > org.apache.beam.runners.dataflow.worker.WindmillStateInternals$WindmillValue.read(WindmillStateInternals.java:389) > at > > Because the stream is started in a StreamPool synchronized block, all other > threads interacting with StreamPool to get or release streams end up blocking. > It is unclear if the stream never became usable and thus blocked forever or > if there is a race with the use of the Phaser that causes the stuckness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419780=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419780 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 09/Apr/20 21:15 Start Date: 09/Apr/20 21:15 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #11314: [BEAM-9562] Send Timers over Data Channel as Elements URL: https://github.com/apache/beam/pull/11314#discussion_r406481820 ## File path: sdks/python/apache_beam/transforms/core.py ## @@ -1323,12 +1325,47 @@ def _pardo_fn_data(self): windowing = None return self.fn, self.args, self.kwargs, si_tags_and_types, windowing - def to_runner_api_parameter(self, context): + def _get_key_and_window_coder(self, named_inputs): +if named_inputs is None or not self._signature.is_stateful_dofn(): + return None, None +main_input = list(set(named_inputs.keys()) - set(self.side_inputs))[0] +input_pcoll = named_inputs[main_input] +kv_type_hint = input_pcoll.element_type +if kv_type_hint and kv_type_hint != typehints.Any: + coder = coders.registry.get_coder(kv_type_hint) + if not coder.is_kv_coder(): +raise ValueError( +'Input elements to the transform %s with stateful DoFn must be ' +'key-value pairs.' % self) + key_coder = coder.key_coder() +else: + key_coder = coders.registry.get_coder(typehints.Any) +window_coder = input_pcoll.windowing.windowfn.get_window_coder() +return key_coder, window_coder + + def to_runner_api(self, context, **extra_kwargs): Review comment: We can delete this override since we pass `extra_kwargs` from `PTransform` now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419780) Time Spent: 19h 40m (was: 19.5h) > Remove timer from PCollection and treat timers as Elements > --- > > Key: BEAM-9562 > URL: https://issues.apache.org/jira/browse/BEAM-9562 > Project: Beam > Issue Type: New Feature > Components: sdk-java-harness, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 19h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9651) StreamingDataflowWorker stuck waiting for org.apache.beam.runners.dataflow.worker.windmill.DirectStreamObserver.onNext
[ https://issues.apache.org/jira/browse/BEAM-9651?focusedWorklogId=419782=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419782 ] ASF GitHub Bot logged work on BEAM-9651: Author: ASF GitHub Bot Created on: 09/Apr/20 21:15 Start Date: 09/Apr/20 21:15 Worklog Time Spent: 10m Work Description: scwhittle commented on pull request #11364: [BEAM-9651] Prevent StreamPool and stream initialization livelock URL: https://github.com/apache/beam/pull/11364 Reduce StreamPool synchronized block section to not include stream creation which includes sending initial messages. Additionally remove it from being required to release holds. Some pipelines were observing stuck get and release methods on synchronization. This was due to a stuck stream creation waiting for the onReady callback. This could possibly have been due to the callback being queued behind other callbacks on blocked grpc threads. This was addressed in the DirectStreamObserver used for the fn api so we reuse it here. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build
[jira] [Work logged] (BEAM-9727) Auto populate required feature experiment flags for enable dataflow runner v2
[ https://issues.apache.org/jira/browse/BEAM-9727?focusedWorklogId=419777=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419777 ] ASF GitHub Bot logged work on BEAM-9727: Author: ASF GitHub Bot Created on: 09/Apr/20 21:12 Start Date: 09/Apr/20 21:12 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11355: [BEAM-9727] Automatically set required experiment flags for dataflow runner v2. URL: https://github.com/apache/beam/pull/11355#issuecomment-611754766 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419777) Time Spent: 2h 40m (was: 2.5h) > Auto populate required feature experiment flags for enable dataflow runner v2 > - > > Key: BEAM-9727 > URL: https://issues.apache.org/jira/browse/BEAM-9727 > Project: Beam > Issue Type: Task > Components: runner-dataflow >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9346) TFRecordIO inefficient read from sideinput causing pipeline to be slow
[ https://issues.apache.org/jira/browse/BEAM-9346?focusedWorklogId=419775=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419775 ] ASF GitHub Bot logged work on BEAM-9346: Author: ASF GitHub Bot Created on: 09/Apr/20 21:11 Start Date: 09/Apr/20 21:11 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11122: [BEAM-9346] Improve the efficiency of TFRecordIO URL: https://github.com/apache/beam/pull/11122#discussion_r406479773 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java ## @@ -410,13 +412,44 @@ private GatherResults(Coder resultCoder) { } else { // Pass results via a side input rather than reshuffle, because we need to get an empty // iterable to finalize if there are no results. -return input -.getPipeline() -.apply(Reify.viewInGlobalWindow(input.apply(View.asList()), ListCoder.of(resultCoder))); +return input.apply("ToList", Combine.globally(new ToListCombineFn<>())); Review comment: Created https://issues.apache.org/jira/browse/BEAM-9734 to make sure that this does not get into 2.21.0 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419775) Time Spent: 3h 50m (was: 3h 40m) > TFRecordIO inefficient read from sideinput causing pipeline to be slow > -- > > Key: BEAM-9346 > URL: https://issues.apache.org/jira/browse/BEAM-9346 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ban Piao >Assignee: Piotr Szuberski >Priority: Major > Labels: dataflow, easyfix, performance > Fix For: Not applicable > > Time Spent: 3h 50m > Remaining Estimate: 0h > > In TFRecordIO, Reify.viewInGlobalWindow(input.apply(View.asList()), > ListCoder.of(resultCoder)) is an inefficient way of reading large set of side > input. > Pipeline can be sped up significantly by combinging the PCollection > to a single element PCollection>. > Sample code: > > https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java#L412 > from > ``` > return input > .getPipeline() > .apply(Reify.viewInGlobalWindow(input.apply(View.asList()), > ListCoder.of(resultCoder))); > ``` > to > ``` > return input.apply("ToList", Combine.globally(new ToListCombineFn<>())); > ``` > where ToListCombineFn is defined as > ``` > public static class ToListCombineFn extends CombineFn List, List> { > @Override > public List createAccumulator() { > return new ArrayList<>(); > } > @Override > public List addInput(List mutableAccumulator, ResultT > input) { > mutableAccumulator.add(input); > return mutableAccumulator; > } > @Override > public List mergeAccumulators(Iterable> > accumulators) { > Iterator> iter = accumulators.iterator(); > if (!iter.hasNext()) { > return new ArrayList<>(); > } > List merged = iter.next(); > while (iter.hasNext()) { > merged.addAll(iter.next()); > } > return merged; > } > @Override > public List extractOutput(List accumulator) { > return accumulator; > } > } > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9734) Revert https://github.com/apache/beam/pull/11122 which is a potential regression
[ https://issues.apache.org/jira/browse/BEAM-9734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chamikara Madhusanka Jayalath updated BEAM-9734: Priority: Blocker (was: Major) > Revert https://github.com/apache/beam/pull/11122 which is a potential > regression > > > Key: BEAM-9734 > URL: https://issues.apache.org/jira/browse/BEAM-9734 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Blocker > Fix For: 2.21.0 > > > This is potentially a regression for Dataflow. We should revert and > re-introduce as an optional change that can be controlled by a user option. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9734) Revert https://github.com/apache/beam/pull/11122 which is a potential regression
Chamikara Madhusanka Jayalath created BEAM-9734: --- Summary: Revert https://github.com/apache/beam/pull/11122 which is a potential regression Key: BEAM-9734 URL: https://issues.apache.org/jira/browse/BEAM-9734 Project: Beam Issue Type: Bug Components: sdk-java-core Reporter: Chamikara Madhusanka Jayalath Assignee: Chamikara Madhusanka Jayalath Fix For: 2.21.0 This is potentially a regression for Dataflow. We should revert and re-introduce as an optional change that can be controlled by a user option. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9346) TFRecordIO inefficient read from sideinput causing pipeline to be slow
[ https://issues.apache.org/jira/browse/BEAM-9346?focusedWorklogId=419768=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419768 ] ASF GitHub Bot logged work on BEAM-9346: Author: ASF GitHub Bot Created on: 09/Apr/20 21:08 Start Date: 09/Apr/20 21:08 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11122: [BEAM-9346] Improve the efficiency of TFRecordIO URL: https://github.com/apache/beam/pull/11122#discussion_r406478334 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java ## @@ -410,13 +412,44 @@ private GatherResults(Coder resultCoder) { } else { // Pass results via a side input rather than reshuffle, because we need to get an empty // iterable to finalize if there are no results. -return input -.getPipeline() -.apply(Reify.viewInGlobalWindow(input.apply(View.asList()), ListCoder.of(resultCoder))); +return input.apply("ToList", Combine.globally(new ToListCombineFn<>())); Review comment: What's the verdict here ? I suggest we revert this and introduce as an optional change since this is a potential regression for Dataflow runner. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419768) Time Spent: 3h 40m (was: 3.5h) > TFRecordIO inefficient read from sideinput causing pipeline to be slow > -- > > Key: BEAM-9346 > URL: https://issues.apache.org/jira/browse/BEAM-9346 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ban Piao >Assignee: Piotr Szuberski >Priority: Major > Labels: dataflow, easyfix, performance > Fix For: Not applicable > > Time Spent: 3h 40m > Remaining Estimate: 0h > > In TFRecordIO, Reify.viewInGlobalWindow(input.apply(View.asList()), > ListCoder.of(resultCoder)) is an inefficient way of reading large set of side > input. > Pipeline can be sped up significantly by combinging the PCollection > to a single element PCollection>. > Sample code: > > https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java#L412 > from > ``` > return input > .getPipeline() > .apply(Reify.viewInGlobalWindow(input.apply(View.asList()), > ListCoder.of(resultCoder))); > ``` > to > ``` > return input.apply("ToList", Combine.globally(new ToListCombineFn<>())); > ``` > where ToListCombineFn is defined as > ``` > public static class ToListCombineFn extends CombineFn List, List> { > @Override > public List createAccumulator() { > return new ArrayList<>(); > } > @Override > public List addInput(List mutableAccumulator, ResultT > input) { > mutableAccumulator.add(input); > return mutableAccumulator; > } > @Override > public List mergeAccumulators(Iterable> > accumulators) { > Iterator> iter = accumulators.iterator(); > if (!iter.hasNext()) { > return new ArrayList<>(); > } > List merged = iter.next(); > while (iter.hasNext()) { > merged.addAll(iter.next()); > } > return merged; > } > @Override > public List extractOutput(List accumulator) { > return accumulator; > } > } > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9136) Add LICENSES and NOTICES to docker images
[ https://issues.apache.org/jira/browse/BEAM-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hannah Jiang updated BEAM-9136: --- Fix Version/s: (was: 2.22.0) 2.21.0 > Add LICENSES and NOTICES to docker images > - > > Key: BEAM-9136 > URL: https://issues.apache.org/jira/browse/BEAM-9136 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 19h 40m > Remaining Estimate: 0h > > Scan dependencies and add licenses and notices of the dependencies to SDK > docker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=419764=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419764 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 09/Apr/20 21:05 Start Date: 09/Apr/20 21:05 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151#discussion_r406476510 ## File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy ## @@ -454,6 +455,7 @@ class BeamModulePlugin implements Plugin { google_api_services_clouddebugger : "com.google.apis:google-api-services-clouddebugger:v2-rev20191003-$google_clients_version", google_api_services_cloudresourcemanager: "com.google.apis:google-api-services-cloudresourcemanager:v1-rev20191206-$google_clients_version", google_api_services_dataflow: "com.google.apis:google-api-services-dataflow:v1b3-rev20190927-$google_clients_version", +google_api_services_healthcare : "com.google.apis:google-api-services-healthcare:v1alpha2-rev20190901-$google_cloud_healthcare_version", Review comment: It doesn't have to be 1.30.3, it could be any later version. The purpose of them being the same is to ensure that GCP dependencies are compatible with each other so the healthcare dep doesn't break other GCP deps and vice versa. It sounds like your going to need to update google_clients_version to a later version so it supports existing GCP deps and also the new dep your adding. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419764) Time Spent: 18h 50m (was: 18h 40m) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 18h 50m > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9468) Add Google Cloud Healthcare API IO Connectors
[ https://issues.apache.org/jira/browse/BEAM-9468?focusedWorklogId=419763=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419763 ] ASF GitHub Bot logged work on BEAM-9468: Author: ASF GitHub Bot Created on: 09/Apr/20 21:04 Start Date: 09/Apr/20 21:04 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11151: [BEAM-9468] Hl7v2 io URL: https://github.com/apache/beam/pull/11151#discussion_r406476510 ## File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy ## @@ -454,6 +455,7 @@ class BeamModulePlugin implements Plugin { google_api_services_clouddebugger : "com.google.apis:google-api-services-clouddebugger:v2-rev20191003-$google_clients_version", google_api_services_cloudresourcemanager: "com.google.apis:google-api-services-cloudresourcemanager:v1-rev20191206-$google_clients_version", google_api_services_dataflow: "com.google.apis:google-api-services-dataflow:v1b3-rev20190927-$google_clients_version", +google_api_services_healthcare : "com.google.apis:google-api-services-healthcare:v1alpha2-rev20190901-$google_cloud_healthcare_version", Review comment: It doesn't have to be 1.30.3, it could be any later version. The purpose of them being the same is to ensure that GCP dependencies are compatible with each other so the healthcare dep doesn't break other GCP deps and vice versa. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419763) Time Spent: 18h 40m (was: 18.5h) > Add Google Cloud Healthcare API IO Connectors > - > > Key: BEAM-9468 > URL: https://issues.apache.org/jira/browse/BEAM-9468 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Jacob Ferriero >Assignee: Jacob Ferriero >Priority: Minor > Time Spent: 18h 40m > Remaining Estimate: 0h > > Add IO Transforms for the HL7v2, FHIR and DICOM stores in the [Google Cloud > Healthcare API|https://cloud.google.com/healthcare/docs/] > HL7v2IO > FHIRIO > DICOM -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419762=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419762 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 09/Apr/20 21:01 Start Date: 09/Apr/20 21:01 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11314: [BEAM-9562] Send Timers over Data Channel as Elements URL: https://github.com/apache/beam/pull/11314#discussion_r406468331 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py ## @@ -536,7 +525,8 @@ def _run_stage(self, runner_execution_context, bundle_context_manager, data_input, -data_output, +data_output, {}, Review comment: Put {} on its own line. (Surprised yapf didn't complain, or maybe you haven't run it yet.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419762) > Remove timer from PCollection and treat timers as Elements > --- > > Key: BEAM-9562 > URL: https://issues.apache.org/jira/browse/BEAM-9562 > Project: Beam > Issue Type: New Feature > Components: sdk-java-harness, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 19.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419758=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419758 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 09/Apr/20 21:01 Start Date: 09/Apr/20 21:01 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11314: [BEAM-9562] Send Timers over Data Channel as Elements URL: https://github.com/apache/beam/pull/11314#discussion_r406444781 ## File path: sdks/python/apache_beam/transforms/core.py ## @@ -1323,12 +1325,47 @@ def _pardo_fn_data(self): windowing = None return self.fn, self.args, self.kwargs, si_tags_and_types, windowing - def to_runner_api_parameter(self, context): + def _get_key_and_window_coder(self, named_inputs): +if named_inputs is None or not self._signature.is_stateful_dofn(): + return None, None +main_input = list(set(named_inputs.keys()) - set(self.side_inputs))[0] +input_pcoll = named_inputs[main_input] +kv_type_hint = input_pcoll.element_type +if kv_type_hint and kv_type_hint != typehints.Any: + coder = coders.registry.get_coder(kv_type_hint) + if not coder.is_kv_coder(): +raise ValueError( +'Input elements to the transform %s with stateful DoFn must be ' +'key-value pairs.' % self) + key_coder = coder.key_coder() +else: + key_coder = coders.registry.get_coder(typehints.Any) +window_coder = input_pcoll.windowing.windowfn.get_window_coder() +return key_coder, window_coder + + def to_runner_api(self, context, **extra_kwargs): Review comment: This code looks like it's copied from the superclass, instead just do ``` def to_runner_api(self, context, named_inputs, **extra_kwargs): super(ParDo, self).to_runner_api, named_inputs=named_inputs, **extra_kwargs) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419758) Time Spent: 19h 20m (was: 19h 10m) > Remove timer from PCollection and treat timers as Elements > --- > > Key: BEAM-9562 > URL: https://issues.apache.org/jira/browse/BEAM-9562 > Project: Beam > Issue Type: New Feature > Components: sdk-java-harness, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 19h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419759=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419759 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 09/Apr/20 21:01 Start Date: 09/Apr/20 21:01 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11314: [BEAM-9562] Send Timers over Data Channel as Elements URL: https://github.com/apache/beam/pull/11314#discussion_r406473230 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py ## @@ -914,6 +926,17 @@ def process_bundle(self, split_manager = self._select_split_manager() if not split_manager: + # Send the fired timers if any. + for (transform_id, timer_family_id), timers in fired_timers.items(): +self._send_timers_to_worker( +process_bundle_id, transform_id, timer_family_id, timers) + + for transform_id, timer_family_id in ( + set(expected_output_timers.keys()) - set(fired_timers.keys())): +# Close the stream if there is no timers to be sent. Review comment: This is a subtle point. I might write something like "The worker waits for a logical timer stream to be closed for every possible timer, regardless of whether there are any timers to be sent." Maybe it'd be clearer to iterate over `expected_output_timers`, and send `fired_timers.get((transform_id, timer_family_id), [])`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419759) Time Spent: 19.5h (was: 19h 20m) > Remove timer from PCollection and treat timers as Elements > --- > > Key: BEAM-9562 > URL: https://issues.apache.org/jira/browse/BEAM-9562 > Project: Beam > Issue Type: New Feature > Components: sdk-java-harness, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 19.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419760=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419760 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 09/Apr/20 21:01 Start Date: 09/Apr/20 21:01 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11314: [BEAM-9562] Send Timers over Data Channel as Elements URL: https://github.com/apache/beam/pull/11314#discussion_r406474463 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner/execution.py ## @@ -355,20 +364,41 @@ def _build_process_bundle_descriptor(self): items()), environments=dict( self.execution_context.pipeline_components.environments.items()), -state_api_service_descriptor=self.state_api_service_descriptor()) +state_api_service_descriptor=self.state_api_service_descriptor(), +timer_api_service_descriptor=self.data_api_service_descriptor()) def get_input_coder_impl(self, transform_id): # type: (str) -> CoderImpl coder_id = beam_fn_api_pb2.RemoteGrpcPort.FromString( self.process_bundle_descriptor.transforms[transform_id].spec.payload ).coder_id assert coder_id +return self.get_coder_impl(coder_id) + + def _build_timer_coders_id_map(self): +timer_coder_ids = {} +for transform_id, transform_proto in (self._process_bundle_descriptor +.transforms.items()): + if transform_proto.spec.urn == common_urns.primitives.PAR_DO.urn: +pardo_payload = proto_utils.parse_Bytes( +transform_proto.spec.payload, beam_runner_api_pb2.ParDoPayload) +for id, timer_family_spec in pardo_payload.timer_family_specs.items(): + timer_coder_ids[(transform_id, id)] = ( + timer_family_spec.timer_family_coder_id) +return timer_coder_ids + + def get_coder_impl(self, coder_id): if coder_id in self.execution_context.safe_coders: return self.execution_context.pipeline_context.coders[ self.execution_context.safe_coders[coder_id]].get_impl() else: return self.execution_context.pipeline_context.coders[coder_id].get_impl() + def get_timer_coder_impl(self, transform_id, timer_family_id): +assert (transform_id, timer_family_id) in self._timer_coder_ids Review comment: The key error if it's not present below will be sufficient. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419760) > Remove timer from PCollection and treat timers as Elements > --- > > Key: BEAM-9562 > URL: https://issues.apache.org/jira/browse/BEAM-9562 > Project: Beam > Issue Type: New Feature > Components: sdk-java-harness, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 19.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419756=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419756 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 09/Apr/20 21:01 Start Date: 09/Apr/20 21:01 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11314: [BEAM-9562] Send Timers over Data Channel as Elements URL: https://github.com/apache/beam/pull/11314#discussion_r406466215 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner/translations.py ## @@ -1321,80 +1321,18 @@ def remove_data_plane_ops(stages, pipeline_context): yield stage -def inject_timer_pcollections(stages, pipeline_context): +def setup_timer_mapping(stages, pipeline_context): # type: (Iterable[Stage], TransformContext) -> Iterator[Stage] - """Create PCollections for fired timers and to-be-set timers. - - At execution time, fired timers and timers-to-set are represented as - PCollections that are managed by the runner. This phase adds the - necissary collections, with their read and writes, to any stages using - timers. + """Set up a mapping of {transform_id: [timer_ids]} for each stage. """ for stage in stages: -for transform in list(stage.transforms): +for transform in stage.transforms: if transform.spec.urn in PAR_DO_URNS: payload = proto_utils.parse_Bytes( transform.spec.payload, beam_runner_api_pb2.ParDoPayload) -for tag, spec in payload.timer_family_specs.items(): - if len(transform.inputs) > 1: -raise NotImplementedError('Timers and side inputs.') - input_pcoll = pipeline_context.components.pcollections[next( - iter(transform.inputs.values()))] - # Create the appropriate coder for the timer PCollection. - key_coder_id = input_pcoll.coder_id - if (pipeline_context.components.coders[key_coder_id].spec.urn == - common_urns.coders.KV.urn): -key_coder_id = pipeline_context.components.coders[ -key_coder_id].component_coder_ids[0] - key_timer_coder_id = pipeline_context.add_or_get_coder_id( - beam_runner_api_pb2.Coder( - spec=beam_runner_api_pb2.FunctionSpec( - urn=common_urns.coders.KV.urn), - component_coder_ids=[ - key_coder_id, spec.timer_family_coder_id - ])) - # Inject the read and write pcollections. - timer_read_pcoll = unique_name( - pipeline_context.components.pcollections, - '%s_timers_to_read_%s' % (transform.unique_name, tag)) - timer_write_pcoll = unique_name( - pipeline_context.components.pcollections, - '%s_timers_to_write_%s' % (transform.unique_name, tag)) - pipeline_context.components.pcollections[timer_read_pcoll].CopyFrom( - beam_runner_api_pb2.PCollection( - unique_name=timer_read_pcoll, - coder_id=key_timer_coder_id, - windowing_strategy_id=input_pcoll.windowing_strategy_id, - is_bounded=input_pcoll.is_bounded)) - pipeline_context.components.pcollections[timer_write_pcoll].CopyFrom( - beam_runner_api_pb2.PCollection( - unique_name=timer_write_pcoll, - coder_id=key_timer_coder_id, - windowing_strategy_id=input_pcoll.windowing_strategy_id, - is_bounded=input_pcoll.is_bounded)) - stage.transforms.append( - beam_runner_api_pb2.PTransform( - unique_name=timer_read_pcoll + '/Read', - outputs={'out': timer_read_pcoll}, - spec=beam_runner_api_pb2.FunctionSpec( - urn=bundle_processor.DATA_INPUT_URN, - payload=create_buffer_id(timer_read_pcoll, - kind='timers' - stage.transforms.append( - beam_runner_api_pb2.PTransform( - unique_name=timer_write_pcoll + '/Write', - inputs={'in': timer_write_pcoll}, - spec=beam_runner_api_pb2.FunctionSpec( - urn=bundle_processor.DATA_OUTPUT_URN, - payload=create_buffer_id( - timer_write_pcoll, kind='timers' - assert tag not in transform.inputs - transform.inputs[tag] = timer_read_pcoll - assert tag not in transform.outputs - transform.outputs[tag] = timer_write_pcoll - stage.timer_pcollections.append( - (timer_read_pcoll + '/Read', timer_write_pcoll)) +for timer_family_id in payload.timer_family_specs.keys(): +
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419755=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419755 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 09/Apr/20 21:01 Start Date: 09/Apr/20 21:01 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11314: [BEAM-9562] Send Timers over Data Channel as Elements URL: https://github.com/apache/beam/pull/11314#discussion_r406465514 ## File path: sdks/python/apache_beam/runners/worker/bundle_processor.py ## @@ -837,25 +869,59 @@ def process_bundle(self, instruction_id): op.execution_context = execution_context op.start() - # Inject inputs from data plane. + # Each data_channel is mapped to a list of expected inputs which includes + # both data input and timer input. The data input is identied by + # transform_id. The data input is identified by + # (transform_id, timer_family_id). data_channels = collections.defaultdict( list ) # type: DefaultDict[data_plane.GrpcClientDataChannel, List[str]] + + # Inject data inputs from data plane. Review comment: This comment is a bit misleading, as the injection doesn't happen in this for loop. (Similarly with timers.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419755) Time Spent: 19h 10m (was: 19h) > Remove timer from PCollection and treat timers as Elements > --- > > Key: BEAM-9562 > URL: https://issues.apache.org/jira/browse/BEAM-9562 > Project: Beam > Issue Type: New Feature > Components: sdk-java-harness, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 19h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419757=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419757 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 09/Apr/20 21:01 Start Date: 09/Apr/20 21:01 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11314: [BEAM-9562] Send Timers over Data Channel as Elements URL: https://github.com/apache/beam/pull/11314#discussion_r406467481 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py ## @@ -987,8 +1019,10 @@ def __init__( def process_bundle(self, inputs, # type: Mapping[str, PartitionableBuffer] - expected_outputs # type: DataOutput -): + expected_outputs, # type: DataOutput + fired_timers, # type: Mapping[str, Mapping[str, PartitionableBuffer]] Review comment: For consistency, should this be a `Mapping[Tuple[str, str], PartitionableBuffer]`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419757) Time Spent: 19h 20m (was: 19h 10m) > Remove timer from PCollection and treat timers as Elements > --- > > Key: BEAM-9562 > URL: https://issues.apache.org/jira/browse/BEAM-9562 > Project: Beam > Issue Type: New Feature > Components: sdk-java-harness, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 19h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419761=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419761 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 09/Apr/20 21:01 Start Date: 09/Apr/20 21:01 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11314: [BEAM-9562] Send Timers over Data Channel as Elements URL: https://github.com/apache/beam/pull/11314#discussion_r406471091 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner/fn_runner.py ## @@ -896,7 +906,9 @@ def _generate_splits_for_testing(self, def process_bundle(self, inputs, # type: Mapping[str, PartitionableBuffer] - expected_outputs # type: DataOutput + expected_outputs, # type: DataOutput + fired_timers, # type: Mapping[str, Mapping[str, PartitionableBuffer]] Review comment: Mapping[Tuple[str, str], PartitionableBuffer]? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419761) > Remove timer from PCollection and treat timers as Elements > --- > > Key: BEAM-9562 > URL: https://issues.apache.org/jira/browse/BEAM-9562 > Project: Beam > Issue Type: New Feature > Components: sdk-java-harness, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 19.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9562) Remove timer from PCollection and treat timers as Elements
[ https://issues.apache.org/jira/browse/BEAM-9562?focusedWorklogId=419754=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419754 ] ASF GitHub Bot logged work on BEAM-9562: Author: ASF GitHub Bot Created on: 09/Apr/20 21:01 Start Date: 09/Apr/20 21:01 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11314: [BEAM-9562] Send Timers over Data Channel as Elements URL: https://github.com/apache/beam/pull/11314#discussion_r406444656 ## File path: sdks/python/apache_beam/transforms/core.py ## @@ -1272,6 +1272,8 @@ def expand(self, pcoll): key_coder = coder.key_coder() else: key_coder = coders.registry.get_coder(typehints.Any) + self.window_coder = pcoll.windowing.windowfn.get_window_coder() Review comment: Are these still used? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419754) Time Spent: 19h (was: 18h 50m) > Remove timer from PCollection and treat timers as Elements > --- > > Key: BEAM-9562 > URL: https://issues.apache.org/jira/browse/BEAM-9562 > Project: Beam > Issue Type: New Feature > Components: sdk-java-harness, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.21.0 > > Time Spent: 19h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1589) Add OnWindowExpiration method to Stateful DoFn
[ https://issues.apache.org/jira/browse/BEAM-1589?focusedWorklogId=419753=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-419753 ] ASF GitHub Bot logged work on BEAM-1589: Author: ASF GitHub Bot Created on: 09/Apr/20 20:58 Start Date: 09/Apr/20 20:58 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #11350: [BEAM-1589] Added @onWindowExpiration annotation. URL: https://github.com/apache/beam/pull/11350#issuecomment-611749077 Any discussion about adding support for @OnWindowExpiration for portable runners? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 419753) Time Spent: 2.5h (was: 2h 20m) > Add OnWindowExpiration method to Stateful DoFn > -- > > Key: BEAM-1589 > URL: https://issues.apache.org/jira/browse/BEAM-1589 > Project: Beam > Issue Type: New Feature > Components: runner-core, sdk-java-core >Reporter: Jingsong Lee >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > See BEAM-1517 > This allows the user to do some work before the state's garbage collection. > It seems kind of annoying, but on the other hand forgetting to set a final > timer to flush state is probably data loss most of the time. > FlinkRunner does this work very simply, but other runners, such as > DirectRunner, need to traverse all the states to do this, and maybe it's a > little hard. -- This message was sent by Atlassian Jira (v8.3.4#803005)