[jira] [Work logged] (BEAM-7948) Add time-based cache threshold support in the Java data service
[ https://issues.apache.org/jira/browse/BEAM-7948?focusedWorklogId=347238&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347238 ] ASF GitHub Bot logged work on BEAM-7948: Author: ASF GitHub Bot Created on: 21/Nov/19 07:31 Start Date: 21/Nov/19 07:31 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on issue #9949: [BEAM-7948] Add time-based cache threshold support in the Java data s… URL: https://github.com/apache/beam/pull/9949#issuecomment-556957491 I have update the PR accordingly except one comment I am not pretty sure, and left comment to you. :) Appreciate if you can have another look :) @lukecwik This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347238) Time Spent: 2h 50m (was: 2h 40m) > Add time-based cache threshold support in the Java data service > --- > > Key: BEAM-7948 > URL: https://issues.apache.org/jira/browse/BEAM-7948 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > Currently only size-based cache threshold is supported in data service. It > should also support the time-based cache threshold. This is very important, > especially for streaming jobs which are sensitive to the delay. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7948) Add time-based cache threshold support in the Java data service
[ https://issues.apache.org/jira/browse/BEAM-7948?focusedWorklogId=347237&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347237 ] ASF GitHub Bot logged work on BEAM-7948: Author: ASF GitHub Bot Created on: 21/Nov/19 07:27 Start Date: 21/Nov/19 07:27 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on pull request #9949: [BEAM-7948] Add time-based cache threshold support in the Java data s… URL: https://github.com/apache/beam/pull/9949#discussion_r348928070 ## File path: sdks/java/fn-execution/src/main/java/org/apache/beam/sdk/fn/data/BeamFnDataTimeBasedBufferingOutboundObserver.java ## @@ -48,25 +46,27 @@ Coder coder, StreamObserver outboundObserver) { super(sizeLimit, outputLocation, coder, outboundObserver); -this.lock = new Object(); +this.flushLock = new Object(); this.flushFuture = Executors.newSingleThreadScheduledExecutor( new ThreadFactoryBuilder() .setDaemon(true) .setNameFormat("DataBufferOutboundFlusher-thread") .build()) .scheduleAtFixedRate(this::periodicFlush, timeLimit, timeLimit, TimeUnit.MILLISECONDS); Review comment: I found that the callable version of schedule is not a periodic action. So, we have to create a new method which warps the `flush`, What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347237) Time Spent: 2h 40m (was: 2.5h) > Add time-based cache threshold support in the Java data service > --- > > Key: BEAM-7948 > URL: https://issues.apache.org/jira/browse/BEAM-7948 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > Currently only size-based cache threshold is supported in data service. It > should also support the time-based cache threshold. This is very important, > especially for streaming jobs which are sensitive to the delay. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8619) Tear down the DoFns upon the control service termination in Java SDK harness
[ https://issues.apache.org/jira/browse/BEAM-8619?focusedWorklogId=347221&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347221 ] ASF GitHub Bot logged work on BEAM-8619: Author: ASF GitHub Bot Created on: 21/Nov/19 06:54 Start Date: 21/Nov/19 06:54 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on issue #10126: [BEAM-8619] Tear down the DoFns upon the control service termination … URL: https://github.com/apache/beam/pull/10126#issuecomment-556947219 Thanks for the review and valuable comments. @lukecwik I divided the change into 4 commits, is that makes sense to you? Feel free to tell me if you want let me split the changes into new PRs. :) Best, Jincheng This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347221) Time Spent: 1h (was: 50m) > Tear down the DoFns upon the control service termination in Java SDK harness > > > Key: BEAM-8619 > URL: https://issues.apache.org/jira/browse/BEAM-8619 > Project: Beam > Issue Type: Improvement > Components: sdk-java-harness >Affects Versions: 2.18.0 >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Per the discussion in the ML, the detail can be found [1], the teardown of > DoFns should be supported in the portability framework. It happens at two > places: > 1) Upon the control service termination > 2) Tear down the unused DoFns periodically > The aim of this JIRA is to add support for teardown the DoFns upon the > control service termination in Java SDK harness. > [1] > https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8619) Tear down the DoFns upon the control service termination in Java SDK harness
[ https://issues.apache.org/jira/browse/BEAM-8619?focusedWorklogId=347210&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347210 ] ASF GitHub Bot logged work on BEAM-8619: Author: ASF GitHub Bot Created on: 21/Nov/19 06:39 Start Date: 21/Nov/19 06:39 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on pull request #10126: [BEAM-8619] Tear down the DoFns upon the control service termination … URL: https://github.com/apache/beam/pull/10126#discussion_r348915951 ## File path: sdks/java/harness/src/test/java/org/apache/beam/fn/harness/FnApiDoFnRunnerTest.java ## @@ -226,6 +226,7 @@ public void testUsingUserState() throws Exception { consumers, startFunctionRegistry, finishFunctionRegistry, +new ArrayList<>()::add, Review comment: Sorry, I don't think I fully understand what you mean. Can you explain it more? :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347210) Time Spent: 50m (was: 40m) > Tear down the DoFns upon the control service termination in Java SDK harness > > > Key: BEAM-8619 > URL: https://issues.apache.org/jira/browse/BEAM-8619 > Project: Beam > Issue Type: Improvement > Components: sdk-java-harness >Affects Versions: 2.18.0 >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Per the discussion in the ML, the detail can be found [1], the teardown of > DoFns should be supported in the portability framework. It happens at two > places: > 1) Upon the control service termination > 2) Tear down the unused DoFns periodically > The aim of this JIRA is to add support for teardown the DoFns upon the > control service termination in Java SDK harness. > [1] > https://lists.apache.org/thread.html/0c4a4cf83cf2e35c3dfeb9d906e26cd82d3820968ba6f862f91739e4@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7854) Reading files from local file system does not fully support glob
[ https://issues.apache.org/jira/browse/BEAM-7854?focusedWorklogId=347165&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347165 ] ASF GitHub Bot logged work on BEAM-7854: Author: ASF GitHub Bot Created on: 21/Nov/19 05:03 Start Date: 21/Nov/19 05:03 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #9197: [BEAM-7854] Resolve parent folder recursively in LocalFileSystem matc… URL: https://github.com/apache/beam/pull/9197#issuecomment-556921396 @lukecwik this one probably should have been squashed (just happened to come across these commits in the history debugging #10028) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347165) Time Spent: 4h 50m (was: 4h 40m) > Reading files from local file system does not fully support glob > > > Key: BEAM-7854 > URL: https://issues.apache.org/jira/browse/BEAM-7854 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Tomer Zeltzer >Assignee: Tomer Zeltzer >Priority: Major > Fix For: 2.16.0 > > Time Spent: 4h 50m > Remaining Estimate: 0h > > Folder structure: > {code:java} > A > B > a=100 > data1 > file1.zst > file2.zst > a=999 > data2 > file6.zst > a=397 > data3 > file7.zst{code} > > Glob: > > {code:java} > /A/B/a=[0-9][0-9][0-9]/*/*{code} > Code: > > {code:java} > input.apply(Create.of(patterns)) > .apply("Matching patterns", FileIO.matchAll()) > .apply(FileIO.readMatches()); > {code} > > input is of type PBegin. > The above code matches 0 files even though, from the glob, its clear it > should match all files. I suspect its because of line 227, where only the > first parent folder is checked while is could be an asterix in a glob. I > believe the right behaviour should be to check all parent folder and use the > first one that exists. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()
[ https://issues.apache.org/jira/browse/BEAM-8651?focusedWorklogId=347162&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347162 ] ASF GitHub Bot logged work on BEAM-8651: Author: ASF GitHub Bot Created on: 21/Nov/19 04:58 Start Date: 21/Nov/19 04:58 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10185: [BEAM-8651] Cherrypick PR #10167 to the release branch. URL: https://github.com/apache/beam/pull/10185#issuecomment-556920352 R: @Ardagan cc: @robertwb This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347162) Time Spent: 2.5h (was: 2h 20m) > Python 3 portable pipelines sometimes fail with errors in > StockUnpickler.find_class() > - > > Key: BEAM-8651 > URL: https://issues.apache.org/jira/browse/BEAM-8651 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Blocker > Fix For: 2.17.0 > > Attachments: beam8651.py > > Time Spent: 2.5h > Remaining Estimate: 0h > > Several Beam users [1,2] reported an error which happens on Python 3 in > StockUnpickler.find_class. > So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink > and Dataflow runners. On Dataflow runner so far I have seen this in streaming > pipelines only, which use portable SDK worker. > Typical stack trace: > {noformat} > File > "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 1148, in _create_pardo_operation > dofn_data = pickler.loads(serialized_fn) > > File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, > in loads > return dill.loads(s) > > File "python3.5/site-packages/dill/_dill.py", line 317, in loads > > return load(file, ignore) > > File "python3.5/site-packages/dill/_dill.py", line 305, in load > > obj = pik.load() > > File "python3.5/site-packages/dill/_dill.py", line 474, in find_class > > return StockUnpickler.find_class(self, module, name) > > AttributeError: Can't get attribute 'ClassName' on 'python3.5/site-packages/filename.py'> > {noformat} > According to Guenther from [1]: > {quote} > This looks exactly like a race condition that we've encountered on Python > 3.7.1: There's a bug in some older 3.7.x releases that breaks the > thread-safety of the unpickler, as concurrent unpickle threads can access a > module before it has been fully imported. See > https://bugs.python.org/issue34572 for more information. > The traceback shows a Python 3.6 venv so this could be a different issue > (the unpickle bug was introduced in version 3.7). If it's the same bug then > upgrading to Python 3.7.3 or higher should fix that issue. One potential > workaround is to ensure that all of the modules get imported during the > initialization of the sdk_worker, as this bug only affects imports done by > the unpickler. > {quote} > Opening this for visibility. Current open questions are: > 1. Find a minimal example to reproduce this issue. > 2. Figure out whether users are still affected by this issue on Python 3.7.3. > 3. Communicate a workarounds for 3.5, 3.6 users affected by this. > [1] > https://lists.apache.org/thread.html/5581ddfcf6d2ae10d25b834b8a61ebee265ffbcf650c6ec8d1e69408@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()
[ https://issues.apache.org/jira/browse/BEAM-8651?focusedWorklogId=347161&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347161 ] ASF GitHub Bot logged work on BEAM-8651: Author: ASF GitHub Bot Created on: 21/Nov/19 04:57 Start Date: 21/Nov/19 04:57 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #10185: [BEAM-8651] Cherrypick PR #10167 to the release branch. URL: https://github.com/apache/beam/pull/10185 This is a cherrypick of #10167 to 2.17.0 release branch. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBui
[jira] [Work logged] (BEAM-8568) Local file system does not match relative path with wildcards
[ https://issues.apache.org/jira/browse/BEAM-8568?focusedWorklogId=347160&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347160 ] ASF GitHub Bot logged work on BEAM-8568: Author: ASF GitHub Bot Created on: 21/Nov/19 04:57 Start Date: 21/Nov/19 04:57 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #10028: [BEAM-8568] Fixed problem that LocalFileSystem no longer supports wil… URL: https://github.com/apache/beam/pull/10028#issuecomment-556917287 Tests pass on the release branch. https://gradle.com/s/5jl76y2tkiwmc So something about this change is causing the error deterministically, as you say. Since it is healthy on `master`, perhaps there are other coupled commits that need to be cherrypicked. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347160) Time Spent: 4.5h (was: 4h 20m) > Local file system does not match relative path with wildcards > - > > Key: BEAM-8568 > URL: https://issues.apache.org/jira/browse/BEAM-8568 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.16.0 >Reporter: Ondrej Cerny >Assignee: David Moravek >Priority: Major > Fix For: 2.17.0 > > Time Spent: 4.5h > Remaining Estimate: 0h > > CWD structure: > {code} > src/test/resources/input/sometestfile.txt > {code} > > Code: > {code:java} > input > .apply(Create.of("src/test/resources/input/*)) > .apply(FileIO.matchAll()) > .apply(FileIO.readMatches()) > {code} > The code above doesn't match any file starting Beam 2.16.0. The regression > has been introduced in BEAM-7854. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8568) Local file system does not match relative path with wildcards
[ https://issues.apache.org/jira/browse/BEAM-8568?focusedWorklogId=347156&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347156 ] ASF GitHub Bot logged work on BEAM-8568: Author: ASF GitHub Bot Created on: 21/Nov/19 04:42 Start Date: 21/Nov/19 04:42 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #10028: [BEAM-8568] Fixed problem that LocalFileSystem no longer supports wil… URL: https://github.com/apache/beam/pull/10028#issuecomment-556917287 Tests pass on the release branch... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347156) Time Spent: 4h 20m (was: 4h 10m) > Local file system does not match relative path with wildcards > - > > Key: BEAM-8568 > URL: https://issues.apache.org/jira/browse/BEAM-8568 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.16.0 >Reporter: Ondrej Cerny >Assignee: David Moravek >Priority: Major > Fix For: 2.17.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > CWD structure: > {code} > src/test/resources/input/sometestfile.txt > {code} > > Code: > {code:java} > input > .apply(Create.of("src/test/resources/input/*)) > .apply(FileIO.matchAll()) > .apply(FileIO.readMatches()) > {code} > The code above doesn't match any file starting Beam 2.16.0. The regression > has been introduced in BEAM-7854. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8568) Local file system does not match relative path with wildcards
[ https://issues.apache.org/jira/browse/BEAM-8568?focusedWorklogId=347148&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347148 ] ASF GitHub Bot logged work on BEAM-8568: Author: ASF GitHub Bot Created on: 21/Nov/19 03:54 Start Date: 21/Nov/19 03:54 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #10028: [BEAM-8568] Fixed problem that LocalFileSystem no longer supports wil… URL: https://github.com/apache/beam/pull/10028#issuecomment-556908586 I could not identify a stuck job, even though the tests were stuck. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347148) Time Spent: 4h 10m (was: 4h) > Local file system does not match relative path with wildcards > - > > Key: BEAM-8568 > URL: https://issues.apache.org/jira/browse/BEAM-8568 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.16.0 >Reporter: Ondrej Cerny >Assignee: David Moravek >Priority: Major > Fix For: 2.17.0 > > Time Spent: 4h 10m > Remaining Estimate: 0h > > CWD structure: > {code} > src/test/resources/input/sometestfile.txt > {code} > > Code: > {code:java} > input > .apply(Create.of("src/test/resources/input/*)) > .apply(FileIO.matchAll()) > .apply(FileIO.readMatches()) > {code} > The code above doesn't match any file starting Beam 2.16.0. The regression > has been introduced in BEAM-7854. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8568) Local file system does not match relative path with wildcards
[ https://issues.apache.org/jira/browse/BEAM-8568?focusedWorklogId=347143&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347143 ] ASF GitHub Bot logged work on BEAM-8568: Author: ASF GitHub Bot Created on: 21/Nov/19 03:21 Start Date: 21/Nov/19 03:21 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #10028: [BEAM-8568] Fixed problem that LocalFileSystem no longer supports wil… URL: https://github.com/apache/beam/pull/10028#issuecomment-556902409 Hmm, confirmed that it times out. Running locally for me it is up to 50+ minutes. Presumably just stuck. I'll check. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347143) Time Spent: 4h (was: 3h 50m) > Local file system does not match relative path with wildcards > - > > Key: BEAM-8568 > URL: https://issues.apache.org/jira/browse/BEAM-8568 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.16.0 >Reporter: Ondrej Cerny >Assignee: David Moravek >Priority: Major > Fix For: 2.17.0 > > Time Spent: 4h > Remaining Estimate: 0h > > CWD structure: > {code} > src/test/resources/input/sometestfile.txt > {code} > > Code: > {code:java} > input > .apply(Create.of("src/test/resources/input/*)) > .apply(FileIO.matchAll()) > .apply(FileIO.readMatches()) > {code} > The code above doesn't match any file starting Beam 2.16.0. The regression > has been introduced in BEAM-7854. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8624) Implement FnService for status api in Dataflow runner
[ https://issues.apache.org/jira/browse/BEAM-8624?focusedWorklogId=347142&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347142 ] ASF GitHub Bot logged work on BEAM-8624: Author: ASF GitHub Bot Created on: 21/Nov/19 03:19 Start Date: 21/Nov/19 03:19 Worklog Time Spent: 10m Work Description: y1chi commented on issue #10115: [BEAM-8624] Implement Worker Status FnService in Dataflow runner URL: https://github.com/apache/beam/pull/10115#issuecomment-556901957 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347142) Time Spent: 3h 10m (was: 3h) > Implement FnService for status api in Dataflow runner > - > > Key: BEAM-8624 > URL: https://issues.apache.org/jira/browse/BEAM-8624 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8624) Implement FnService for status api in Dataflow runner
[ https://issues.apache.org/jira/browse/BEAM-8624?focusedWorklogId=347141&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347141 ] ASF GitHub Bot logged work on BEAM-8624: Author: ASF GitHub Bot Created on: 21/Nov/19 03:19 Start Date: 21/Nov/19 03:19 Worklog Time Spent: 10m Work Description: y1chi commented on issue #10115: [BEAM-8624] Implement Worker Status FnService in Dataflow runner URL: https://github.com/apache/beam/pull/10115#issuecomment-556277599 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347141) Time Spent: 3h (was: 2h 50m) > Implement FnService for status api in Dataflow runner > - > > Key: BEAM-8624 > URL: https://issues.apache.org/jira/browse/BEAM-8624 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8568) Local file system does not match relative path with wildcards
[ https://issues.apache.org/jira/browse/BEAM-8568?focusedWorklogId=347133&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347133 ] ASF GitHub Bot logged work on BEAM-8568: Author: ASF GitHub Bot Created on: 21/Nov/19 02:25 Start Date: 21/Nov/19 02:25 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #10028: [BEAM-8568] Fixed problem that LocalFileSystem no longer supports wil… URL: https://github.com/apache/beam/pull/10028#issuecomment-556854842 Something to do with Jenkins. That gradle scan I posted was a run against this PR's head (not the merge commit, so maybe that is the problem) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347133) Time Spent: 3h 50m (was: 3h 40m) > Local file system does not match relative path with wildcards > - > > Key: BEAM-8568 > URL: https://issues.apache.org/jira/browse/BEAM-8568 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.16.0 >Reporter: Ondrej Cerny >Assignee: David Moravek >Priority: Major > Fix For: 2.17.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > CWD structure: > {code} > src/test/resources/input/sometestfile.txt > {code} > > Code: > {code:java} > input > .apply(Create.of("src/test/resources/input/*)) > .apply(FileIO.matchAll()) > .apply(FileIO.readMatches()) > {code} > The code above doesn't match any file starting Beam 2.16.0. The regression > has been introduced in BEAM-7854. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8746) Allow the local job service to work from inside docker
[ https://issues.apache.org/jira/browse/BEAM-8746?focusedWorklogId=347126&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347126 ] ASF GitHub Bot logged work on BEAM-8746: Author: ASF GitHub Bot Created on: 21/Nov/19 02:10 Start Date: 21/Nov/19 02:10 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10161: [BEAM-8746] Make local job service accessible from external machines URL: https://github.com/apache/beam/pull/10161#discussion_r348866533 ## File path: sdks/python/apache_beam/runners/portability/local_job_service.py ## @@ -95,7 +95,7 @@ def create_beam_job(self, preparation_id, job_name, pipeline, options): def start_grpc_server(self, port=0): self._server = grpc.server(UnboundedThreadPoolExecutor()) -port = self._server.add_insecure_port('localhost:%d' % port) +port = self._server.add_insecure_port('[::]:%d' % port) Review comment: I think it'd make sense for this to be parameterized, likely with localhost as a default. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347126) Time Spent: 1h 20m (was: 1h 10m) > Allow the local job service to work from inside docker > -- > > Key: BEAM-8746 > URL: https://issues.apache.org/jira/browse/BEAM-8746 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently the connection is refused. It's a simple fix. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7850) Make Environment a top level attribute of PTransform
[ https://issues.apache.org/jira/browse/BEAM-7850?focusedWorklogId=347124&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347124 ] ASF GitHub Bot logged work on BEAM-7850: Author: ASF GitHub Bot Created on: 21/Nov/19 02:01 Start Date: 21/Nov/19 02:01 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10183: [BEAM-7850] Makes environment ID a top level attribute of PTransform. URL: https://github.com/apache/beam/pull/10183#issuecomment-556829893 Thanks. Yeah, design for cross-language UDFs should be done separately and SdkFunctionSpec is inadequate for this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347124) Time Spent: 50m (was: 40m) > Make Environment a top level attribute of PTransform > > > Key: BEAM-7850 > URL: https://issues.apache.org/jira/browse/BEAM-7850 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Currently Environment is not a top level attribute of the PTransform (of > runner API proto). > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > Instead it is hidden inside various payload objects. For example, for ParDo, > environment will be inside SdkFunctionSpec of ParDoPayload. > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > > This makes tracking environment of different types of PTransforms harder and > we have to fork code (on the type of PTransform) to extract the Environment > where the PTransform should be executed. It will probably be simpler to just > make Environment a top level attribute of PTransform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347123&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347123 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 21/Nov/19 01:59 Start Date: 21/Nov/19 01:59 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10173: [BEAM-8575] Added two unit tests in CombineTest class to test AccumulatingCombine URL: https://github.com/apache/beam/pull/10173#discussion_r348864276 ## File path: sdks/python/apache_beam/transforms/combiners_test.py ## @@ -393,6 +395,54 @@ def test_global_fanout(self): | beam.CombineGlobally(combine.MeanCombineFn()).with_fanout(11)) assert_that(result, equal_to([49.5])) + @attr('ValidatesRunner') + def test_accumulating_combine(self): Review comment: This seems mostly redundant with https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py#L562 , other than the fact that it does globally as well. (Globally is just built on top of per-key, so there's little value in making it validates runner.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347123) Time Spent: 15h 20m (was: 15h 10m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 15h 20m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7850) Make Environment a top level attribute of PTransform
[ https://issues.apache.org/jira/browse/BEAM-7850?focusedWorklogId=347121&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347121 ] ASF GitHub Bot logged work on BEAM-7850: Author: ASF GitHub Bot Created on: 21/Nov/19 01:59 Start Date: 21/Nov/19 01:59 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10183: [BEAM-7850] Makes environment ID a top level attribute of PTransform. URL: https://github.com/apache/beam/pull/10183#issuecomment-556829893 Thanks. Yeah, design for cross-language UDFs should be done separately and SdfFunctionSpec is inadequate for this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347121) Time Spent: 40m (was: 0.5h) > Make Environment a top level attribute of PTransform > > > Key: BEAM-7850 > URL: https://issues.apache.org/jira/browse/BEAM-7850 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Currently Environment is not a top level attribute of the PTransform (of > runner API proto). > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > Instead it is hidden inside various payload objects. For example, for ParDo, > environment will be inside SdkFunctionSpec of ParDoPayload. > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > > This makes tracking environment of different types of PTransforms harder and > we have to fork code (on the type of PTransform) to extract the Environment > where the PTransform should be executed. It will probably be simpler to just > make Environment a top level attribute of PTransform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347122&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347122 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 21/Nov/19 01:59 Start Date: 21/Nov/19 01:59 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10173: [BEAM-8575] Added two unit tests in CombineTest class to test AccumulatingCombine URL: https://github.com/apache/beam/pull/10173#discussion_r348863302 ## File path: sdks/python/apache_beam/transforms/combiners_test.py ## @@ -393,6 +395,54 @@ def test_global_fanout(self): | beam.CombineGlobally(combine.MeanCombineFn()).with_fanout(11)) assert_that(result, equal_to([49.5])) + @attr('ValidatesRunner') + def test_accumulating_combine(self): +with TestPipeline() as p: + input = (p + | beam.Create([('a', 1), + ('a', 1), + ('a', 4), + ('b', 1), + ('b', 13)])) + # The mean of all values regardless of key. + global_mean = (input + | beam.Values() + | beam.CombineGlobally(combine.MeanCombineFn())) + + # The (key, mean) pairs for all keys. + mean_per_key = (input | beam.CombinePerKey(combine.MeanCombineFn())) + + expected_mean_per_key = [('a', 2), ('b', 7)] + assert_that(global_mean, equal_to([4]), label='global mean') + assert_that(mean_per_key, equal_to(expected_mean_per_key), + label='mean per key') + + @attr('ValidatesRunner') + def test_accumulating_combine_empty(self): +# For each element in a PCollection, if it is float('NaN'), then emits +# a string 'NaN', otherwise emits str(element). +class FormatNaNDoFn(beam.DoFn): + def process(self, element): +return ([str(element)], ['NaN'])[math.isnan(element)] + +with TestPipeline() as p: + input = (p | beam.Create([])) + + # Compute the mean of all values in the PCollection, + # then format the mean. Since the Pcollection is empty, + # the mean is float('NaN'), and is formatted to be a string 'NaN'. + global_mean = (input + | beam.Values() + | beam.CombineGlobally(combine.MeanCombineFn()) + | beam.ParDo(FormatNaNDoFn())) Review comment: What about just doing beam.Map(str)? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347122) Time Spent: 15h 10m (was: 15h) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 15h 10m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7850) Make Environment a top level attribute of PTransform
[ https://issues.apache.org/jira/browse/BEAM-7850?focusedWorklogId=347119&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347119 ] ASF GitHub Bot logged work on BEAM-7850: Author: ASF GitHub Bot Created on: 21/Nov/19 01:51 Start Date: 21/Nov/19 01:51 Worklog Time Spent: 10m Work Description: robertwb commented on issue #10183: [BEAM-7850] Makes environment ID a top level attribute of PTransform. URL: https://github.com/apache/beam/pull/10183#issuecomment-556821932 The proto changes makes sense to me. The one bit I'm not sure of is how this will look for cross-language UDFs, but I think that will still look very different than the current SdfFunctionSpecs, and possibly will be modeled as "side" PTransforms which this would be in line with. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347119) Time Spent: 0.5h (was: 20m) > Make Environment a top level attribute of PTransform > > > Key: BEAM-7850 > URL: https://issues.apache.org/jira/browse/BEAM-7850 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently Environment is not a top level attribute of the PTransform (of > runner API proto). > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > Instead it is hidden inside various payload objects. For example, for ParDo, > environment will be inside SdkFunctionSpec of ParDoPayload. > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > > This makes tracking environment of different types of PTransforms harder and > we have to fork code (on the type of PTransform) to extract the Environment > where the PTransform should be executed. It will probably be simpler to just > make Environment a top level attribute of PTransform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8658) Optionally set artifact staging port in FlinkUberJarJobServer
[ https://issues.apache.org/jira/browse/BEAM-8658?focusedWorklogId=347113&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347113 ] ASF GitHub Bot logged work on BEAM-8658: Author: ASF GitHub Bot Created on: 21/Nov/19 01:17 Start Date: 21/Nov/19 01:17 Worklog Time Spent: 10m Work Description: ibzib commented on issue #10163: [BEAM-8658] [BEAM-8781] Optionally set jar and artifact staging port … URL: https://github.com/apache/beam/pull/10163#issuecomment-556752216 I expanded this PR a bit to include the job and expansion ports as well. PTAL This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347113) Time Spent: 50m (was: 40m) > Optionally set artifact staging port in FlinkUberJarJobServer > - > > Key: BEAM-8658 > URL: https://issues.apache.org/jira/browse/BEAM-8658 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 50m > Remaining Estimate: 0h > > In certain network environments, port forwarding is necessary for our GRPC > servers, such as the artifact staging server. Currently, the port for > FlinkUberJarJobServer's artifact staging server is chosen randomly (0). We > will need to let the user choose it if they are to forward that port. > https://github.com/apache/beam/blob/802e7cd86024c21d7b2eeb45f0e7c8e370661610/sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server.py#L129 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=347112&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347112 ] ASF GitHub Bot logged work on BEAM-8016: Author: ASF GitHub Bot Created on: 21/Nov/19 01:15 Start Date: 21/Nov/19 01:15 Worklog Time Spent: 10m Work Description: KevinGG commented on issue #10132: [BEAM-8016] Pipeline Graph URL: https://github.com/apache/beam/pull/10132#issuecomment-556749010 R: @pabloem Hi Pablo, could you please take a last round of review for this PR? Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347112) Time Spent: 6h 20m (was: 6h 10m) > Render Beam Pipeline as DOT with Interactive Beam > --- > > Key: BEAM-8016 > URL: https://issues.apache.org/jira/browse/BEAM-8016 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 6h 20m > Remaining Estimate: 0h > > With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline > converted to DOT then rendered should mark user defined variables on edges. > With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be > redundant or confusing to render arbitrary random sample PCollection data on > edges. > We'll also make sure edges in the graph corresponds to output -> input > relationship in the user defined pipeline. Each edge is one output. If > multiple down stream inputs take the same output, it should be rendered as > one edge diverging into two instead of two edges. > For advanced interactivity highlight where each execution highlights the part > of the pipeline really executed from the original pipeline, we'll also > provide the support in beta. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8691) Beam Dependency Update Request: com.google.cloud.bigtable:bigtable-client-core
[ https://issues.apache.org/jira/browse/BEAM-8691?focusedWorklogId=347106&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347106 ] ASF GitHub Bot logged work on BEAM-8691: Author: ASF GitHub Bot Created on: 21/Nov/19 00:55 Start Date: 21/Nov/19 00:55 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10144: [BEAM-8691] Upgrading bigtable-client-core to latest 1.12.1 URL: https://github.com/apache/beam/pull/10144#issuecomment-556703155 Thanks to @elharo telling me about the linkage checker but this may help you perform the analysis faster: https://github.com/apache/beam/pull/10184 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347106) Time Spent: 3h 20m (was: 3h 10m) > Beam Dependency Update Request: com.google.cloud.bigtable:bigtable-client-core > -- > > Key: BEAM-8691 > URL: https://issues.apache.org/jira/browse/BEAM-8691 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Tomo Suzuki >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > - 2019-11-15 19:39:51.523448 > - > Please consider upgrading the dependency > com.google.cloud.bigtable:bigtable-client-core. > The current version is 1.8.0. The latest version is 1.12.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-11-19 21:05:43.901882 > - > Please consider upgrading the dependency > com.google.cloud.bigtable:bigtable-client-core. > The current version is 1.8.0. The latest version is 1.12.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7278) Upgrade some Beam dependencies
[ https://issues.apache.org/jira/browse/BEAM-7278?focusedWorklogId=347105&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347105 ] ASF GitHub Bot logged work on BEAM-7278: Author: ASF GitHub Bot Created on: 21/Nov/19 00:53 Start Date: 21/Nov/19 00:53 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10184: [BEAM-7278, BEAM-2530] Add support for using a Java linkage testing tool to aid upgrading dependencies. URL: https://github.com/apache/beam/pull/10184#issuecomment-556695677 R: @elharo @suztomo CC: @kennknowles @iemejia This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347105) Time Spent: 20m (was: 10m) > Upgrade some Beam dependencies > -- > > Key: BEAM-7278 > URL: https://issues.apache.org/jira/browse/BEAM-7278 > Project: Beam > Issue Type: Task > Components: dependencies >Reporter: Etienne Chauchot >Assignee: Mujuzi Moses >Priority: Critical > Time Spent: 20m > Remaining Estimate: 0h > > Some dependencies need to be upgraded. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7278) Upgrade some Beam dependencies
[ https://issues.apache.org/jira/browse/BEAM-7278?focusedWorklogId=347101&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347101 ] ASF GitHub Bot logged work on BEAM-7278: Author: ASF GitHub Bot Created on: 21/Nov/19 00:51 Start Date: 21/Nov/19 00:51 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10184: [BEAM-7278, BEAM-2530] Add support for using a Java linkage testing tool to aid upgrading dependencies. URL: https://github.com/apache/beam/pull/10184 For example: ``` ./gradlew -Ppublishing -PjavaLinkageArtifacts=beam-sdks-java-core,beam-sdks-java-io-jdbc :checkJavaLinkage ``` More details in https://lists.apache.org/thread.html/eb5d95b9a33d7e32dc9bcd0f7d48ba8711d42bd7ed03b9cf0f1103f1@%3Cdev.beam.apache.org%3E Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructured
[jira] [Work logged] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()
[ https://issues.apache.org/jira/browse/BEAM-8651?focusedWorklogId=347097&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347097 ] ASF GitHub Bot logged work on BEAM-8651: Author: ASF GitHub Bot Created on: 21/Nov/19 00:41 Start Date: 21/Nov/19 00:41 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #10167: [BEAM-8651] Guard pickling operations with a lock to prevent race condition in module imports. URL: https://github.com/apache/beam/pull/10167 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347097) Time Spent: 2h 10m (was: 2h) > Python 3 portable pipelines sometimes fail with errors in > StockUnpickler.find_class() > - > > Key: BEAM-8651 > URL: https://issues.apache.org/jira/browse/BEAM-8651 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Blocker > Fix For: 2.17.0 > > Attachments: beam8651.py > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Several Beam users [1,2] reported an error which happens on Python 3 in > StockUnpickler.find_class. > So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink > and Dataflow runners. On Dataflow runner so far I have seen this in streaming > pipelines only, which use portable SDK worker. > Typical stack trace: > {noformat} > File > "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 1148, in _create_pardo_operation > dofn_data = pickler.loads(serialized_fn) > > File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, > in loads > return dill.loads(s) > > File "python3.5/site-packages/dill/_dill.py", line 317, in loads > > return load(file, ignore) > > File "python3.5/site-packages/dill/_dill.py", line 305, in load > > obj = pik.load() > > File "python3.5/site-packages/dill/_dill.py", line 474, in find_class > > return StockUnpickler.find_class(self, module, name) > > AttributeError: Can't get attribute 'ClassName' on 'python3.5/site-packages/filename.py'> > {noformat} > According to Guenther from [1]: > {quote} > This looks exactly like a race condition that we've encountered on Python > 3.7.1: There's a bug in some older 3.7.x releases that breaks the > thread-safety of the unpickler, as concurrent unpickle threads can access a > module before it has been fully imported. See > https://bugs.python.org/issue34572 for more information. > The traceback shows a Python 3.6 venv so this could be a different issue > (the unpickle bug was introduced in version 3.7). If it's the same bug then > upgrading to Python 3.7.3 or higher should fix that issue. One potential > workaround is to ensure that all of the modules get imported during the > initialization of the sdk_worker, as this bug only affects imports done by > the unpickler. > {quote} > Opening this for visibility. Current open questions are: > 1. Find a minimal example to reproduce this issue. > 2. Figure out whether users are still affected by this issue on Python 3.7.3. > 3. Communicate a workarounds for 3.5, 3.6 users affected by this. > [1] > https://lists.apache.org/thread.html/5581ddfcf6d2ae10d25b834b8a61ebee265ffbcf650c6ec8d1e69408@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8794) Projects should be handled by an IOPushDownRule before applying AggregateProjectMergeRule
[ https://issues.apache.org/jira/browse/BEAM-8794?focusedWorklogId=347088&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347088 ] ASF GitHub Bot logged work on BEAM-8794: Author: ASF GitHub Bot Created on: 21/Nov/19 00:25 Start Date: 21/Nov/19 00:25 Worklog Time Spent: 10m Work Description: 11moon11 commented on issue #10180: [BEAM-8794] Conditional aggregate project merge URL: https://github.com/apache/beam/pull/10180#issuecomment-556512353 Run sql postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347088) Time Spent: 1h (was: 50m) > Projects should be handled by an IOPushDownRule before applying > AggregateProjectMergeRule > - > > Key: BEAM-8794 > URL: https://issues.apache.org/jira/browse/BEAM-8794 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > It is more efficient to push-down projected fields at an IO level (vs merging > with an Aggregate), when supported. > When running queries like: > {code:java} > select SUM(score) as total_score from group by name{code} > Projects get merged with an aggregate, as a result Calc (after an > IOSourceRel) projects all fields and BeamIOPushDown rule does know what > fields can be dropped, thus not dropping any. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8794) Projects should be handled by an IOPushDownRule before applying AggregateProjectMergeRule
[ https://issues.apache.org/jira/browse/BEAM-8794?focusedWorklogId=347089&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347089 ] ASF GitHub Bot logged work on BEAM-8794: Author: ASF GitHub Bot Created on: 21/Nov/19 00:25 Start Date: 21/Nov/19 00:25 Worklog Time Spent: 10m Work Description: 11moon11 commented on issue #10180: [BEAM-8794] Conditional aggregate project merge URL: https://github.com/apache/beam/pull/10180#issuecomment-556609585 Run sql postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347089) Time Spent: 1h 10m (was: 1h) > Projects should be handled by an IOPushDownRule before applying > AggregateProjectMergeRule > - > > Key: BEAM-8794 > URL: https://issues.apache.org/jira/browse/BEAM-8794 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > It is more efficient to push-down projected fields at an IO level (vs merging > with an Aggregate), when supported. > When running queries like: > {code:java} > select SUM(score) as total_score from group by name{code} > Projects get merged with an aggregate, as a result Calc (after an > IOSourceRel) projects all fields and BeamIOPushDown rule does know what > fields can be dropped, thus not dropping any. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8603) Add Python SqlTransform example script
[ https://issues.apache.org/jira/browse/BEAM-8603?focusedWorklogId=347087&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347087 ] ASF GitHub Bot logged work on BEAM-8603: Author: ASF GitHub Bot Created on: 21/Nov/19 00:19 Start Date: 21/Nov/19 00:19 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #10055: [BEAM-8603] Add Python SqlTransform example script URL: https://github.com/apache/beam/pull/10055#discussion_r348841220 ## File path: sdks/java/extensions/sql/build.gradle ## @@ -24,6 +24,7 @@ plugins { } applyJavaNature( automaticModuleName: 'org.apache.beam.sdk.extensions.sql', + shadowClosure: {}, Review comment: We need two jars. (1) Jar for expansion service that contains the transform classes that need to be expanded (this can be multiple jars as well, IO transforms just have to be in the class path to be picked up by the AutoService). (2) Jar to be passed to the Java worker. For me, building a shadow Jar of java-harness (:sdks:java:harness:shadowJar) worked. It probably makes sens to move expansion service to it's own Gradle module and support building a shadow jar with all in-built cross-language transforms in Beam and release that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347087) Time Spent: 1h (was: 50m) > Add Python SqlTransform example script > -- > > Key: BEAM-8603 > URL: https://issues.apache.org/jira/browse/BEAM-8603 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7850) Make Environment a top level attribute of PTransform
[ https://issues.apache.org/jira/browse/BEAM-7850?focusedWorklogId=347084&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347084 ] ASF GitHub Bot logged work on BEAM-7850: Author: ASF GitHub Bot Created on: 21/Nov/19 00:11 Start Date: 21/Nov/19 00:11 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10183: [BEAM-7850] Makes environment ID a top level attribute of PTransform. URL: https://github.com/apache/beam/pull/10183#issuecomment-556563438 cc: @robertwb and @lukecwik I'm adding the rest of the refactoring needed for this but can you take a quick look to see if the proto changes look good ? I did not preserve tags since I think we do not worry about backwards compatibility at this point but lemme know if I should. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347084) Time Spent: 20m (was: 10m) > Make Environment a top level attribute of PTransform > > > Key: BEAM-7850 > URL: https://issues.apache.org/jira/browse/BEAM-7850 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Currently Environment is not a top level attribute of the PTransform (of > runner API proto). > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > Instead it is hidden inside various payload objects. For example, for ParDo, > environment will be inside SdkFunctionSpec of ParDoPayload. > [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99] > > This makes tracking environment of different types of PTransforms harder and > we have to fork code (on the type of PTransform) to extract the Environment > where the PTransform should be executed. It will probably be simpler to just > make Environment a top level attribute of PTransform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347083&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347083 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 21/Nov/19 00:10 Start Date: 21/Nov/19 00:10 Worklog Time Spent: 10m Work Description: bumblebee-coming commented on issue #10173: [BEAM-8575] Added two unit tests in CombineTest class to test simple … URL: https://github.com/apache/beam/pull/10173#issuecomment-556201858 Although the names of the tests contain "accumulating", those tests are not related to ACCUMULATING or DISCARDING mode. They are testing simple combine cases. Since the Java tests have these names, Python tests follow them. Note that the "simple combine cases" I mentioned above has no special meaning. It is different from the "SimpleCombine" in Java tests, which has a special meaning. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347083) Time Spent: 15h (was: 14h 50m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 15h > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7850) Make Environment a top level attribute of PTransform
[ https://issues.apache.org/jira/browse/BEAM-7850?focusedWorklogId=347082&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347082 ] ASF GitHub Bot logged work on BEAM-7850: Author: ASF GitHub Bot Created on: 21/Nov/19 00:09 Start Date: 21/Nov/19 00:09 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #10183: [BEAM-7850] Makes environment ID a top level attribute of PTransform. URL: https://github.com/apache/beam/pull/10183 Removes SDKFunctionSpec and replaces all usages of it with FunctionSpec. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status
[jira] [Work logged] (BEAM-8592) DataCatalogTableProvider should not squash table components together into a string
[ https://issues.apache.org/jira/browse/BEAM-8592?focusedWorklogId=347081&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347081 ] ASF GitHub Bot logged work on BEAM-8592: Author: ASF GitHub Bot Created on: 20/Nov/19 23:58 Start Date: 20/Nov/19 23:58 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #10021: [BEAM-8592] Adjusting ZetaSQL table resolution to standard URL: https://github.com/apache/beam/pull/10021#issuecomment-556555460 Please take another look. I have restored unit testing to ensure that `TableResolution` works as expected. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347081) Time Spent: 1h 20m (was: 1h 10m) > DataCatalogTableProvider should not squash table components together into a > string > -- > > Key: BEAM-8592 > URL: https://issues.apache.org/jira/browse/BEAM-8592 > Project: Beam > Issue Type: Bug > Components: dsl-sql, dsl-sql-zetasql >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently, if a user writes a table name like \{{foo.`baz.bar`.bizzle}} > representing the components \{{"foo", "baz.bar", "bizzle"}} the > DataCatalogTableProvider will concatenate the components into a string and > resolve the identifier as if it represented \{{"foo", "baz", "bar", > "bizzle"}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8797) Add artifactEndpoint to PortablePipelineOptions.java
Kyle Weaver created BEAM-8797: - Summary: Add artifactEndpoint to PortablePipelineOptions.java Key: BEAM-8797 URL: https://issues.apache.org/jira/browse/BEAM-8797 Project: Beam Issue Type: Improvement Components: runner-flink Reporter: Kyle Weaver Assignee: Kyle Weaver Same as BEAM-8660 but for Java. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8797) Add artifactEndpoint to PortablePipelineOptions.java
[ https://issues.apache.org/jira/browse/BEAM-8797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-8797: -- Status: Open (was: Triage Needed) > Add artifactEndpoint to PortablePipelineOptions.java > > > Key: BEAM-8797 > URL: https://issues.apache.org/jira/browse/BEAM-8797 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > > Same as BEAM-8660 but for Java. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8016) Render Beam Pipeline as DOT with Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8016?focusedWorklogId=347079&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347079 ] ASF GitHub Bot logged work on BEAM-8016: Author: ASF GitHub Bot Created on: 20/Nov/19 23:50 Start Date: 20/Nov/19 23:50 Worklog Time Spent: 10m Work Description: KevinGG commented on issue #10132: [BEAM-8016] Pipeline Graph URL: https://github.com/apache/beam/pull/10132#issuecomment-556553496 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347079) Time Spent: 6h 10m (was: 6h) > Render Beam Pipeline as DOT with Interactive Beam > --- > > Key: BEAM-8016 > URL: https://issues.apache.org/jira/browse/BEAM-8016 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 6h 10m > Remaining Estimate: 0h > > With work in https://issues.apache.org/jira/browse/BEAM-7760, Beam pipeline > converted to DOT then rendered should mark user defined variables on edges. > With work in https://issues.apache.org/jira/browse/BEAM-7926, it might be > redundant or confusing to render arbitrary random sample PCollection data on > edges. > We'll also make sure edges in the graph corresponds to output -> input > relationship in the user defined pipeline. Each edge is one output. If > multiple down stream inputs take the same output, it should be rendered as > one edge diverging into two instead of two edges. > For advanced interactivity highlight where each execution highlights the part > of the pipeline really executed from the original pipeline, we'll also > provide the support in beta. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8496) remove SDF translators in flink streaming transform translator
[ https://issues.apache.org/jira/browse/BEAM-8496?focusedWorklogId=347078&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347078 ] ASF GitHub Bot logged work on BEAM-8496: Author: ASF GitHub Bot Created on: 20/Nov/19 23:50 Start Date: 20/Nov/19 23:50 Worklog Time Spent: 10m Work Description: vectorijk commented on issue #9903: [BEAM-8496] remove SDF translators from flink translator URL: https://github.com/apache/beam/pull/9903#issuecomment-55655 Run Flink ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347078) Time Spent: 1h (was: 50m) > remove SDF translators in flink streaming transform translator > -- > > Key: BEAM-8496 > URL: https://issues.apache.org/jira/browse/BEAM-8496 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kai Jiang >Assignee: Kai Jiang >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Since URN of SDF has been moved to runners-core-construction-java, we need to > remove it. > Otherwise, in failed nexmark Jenkins > [job|https://builds.apache.org/job/beam_PostCommit_Java_Nexmark_Flink/4128/console], > it causes duplicated transformer registered in > [PTransformTranslation.KnownTransformPayloadTranslator()|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PTransformTranslation.java#L290] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8796) Optionally configure static job port for JavaJarJobServer
[ https://issues.apache.org/jira/browse/BEAM-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-8796: -- Issue Type: Improvement (was: Bug) > Optionally configure static job port for JavaJarJobServer > - > > Key: BEAM-8796 > URL: https://issues.apache.org/jira/browse/BEAM-8796 > Project: Beam > Issue Type: Improvement > Components: runner-flink, runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > > Right now, ports are always dynamically assigned. > https://github.com/apache/beam/blob/10243dc78d5472a5c312a316f03c6d4c622840ea/sdks/python/apache_beam/runners/portability/job_server.py#L144 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8743) Add support for flat schemas in pubsub
[ https://issues.apache.org/jira/browse/BEAM-8743?focusedWorklogId=347075&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347075 ] ASF GitHub Bot logged work on BEAM-8743: Author: ASF GitHub Bot Created on: 20/Nov/19 23:44 Start Date: 20/Nov/19 23:44 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #10158: [BEAM-8743] Add support for flat schemas in pubsub URL: https://github.com/apache/beam/pull/10158#discussion_r348805530 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubMessageToRow.java ## @@ -95,28 +109,40 @@ public void processElement(ProcessContext context) { * payload, and attributes. */ private List getFieldValues(ProcessContext context) { +Row payload = parsePayloadJsonRow(context.element()); return messageSchema().getFields().stream() -.map(field -> getValueForField(field, context.timestamp(), context.element())) +.map( +field -> +getValueForField( +field, context.timestamp(), context.element().getAttributeMap(), payload)) .collect(toList()); } private Object getValueForField( - Schema.Field field, Instant timestamp, PubsubMessage pubsubMessage) { - -switch (field.getName()) { - case TIMESTAMP_FIELD: + Schema.Field field, Instant timestamp, Map attributeMap, Row payload) { +// TODO: do this check once at construction time, rather than for every element. Review comment: I imagine you just fork the DoFn and share utility code? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347075) Time Spent: 1h (was: 50m) > Add support for flat schemas in pubsub > -- > > Key: BEAM-8743 > URL: https://issues.apache.org/jira/browse/BEAM-8743 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Fix For: 2.18.0 > > Time Spent: 1h > Remaining Estimate: 0h > > See > https://lists.apache.org/thread.html/bf4c37f21bda194d7f8c40f6e7b9a776262415755cc1658412af3c76@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8743) Add support for flat schemas in pubsub
[ https://issues.apache.org/jira/browse/BEAM-8743?focusedWorklogId=347076&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347076 ] ASF GitHub Bot logged work on BEAM-8743: Author: ASF GitHub Bot Created on: 20/Nov/19 23:44 Start Date: 20/Nov/19 23:44 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #10158: [BEAM-8743] Add support for flat schemas in pubsub URL: https://github.com/apache/beam/pull/10158#issuecomment-556552084 (be sure to `rebase -i` the fixup commits) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347076) Time Spent: 1h 10m (was: 1h) > Add support for flat schemas in pubsub > -- > > Key: BEAM-8743 > URL: https://issues.apache.org/jira/browse/BEAM-8743 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Fix For: 2.18.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > See > https://lists.apache.org/thread.html/bf4c37f21bda194d7f8c40f6e7b9a776262415755cc1658412af3c76@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8743) Add support for flat schemas in pubsub
[ https://issues.apache.org/jira/browse/BEAM-8743?focusedWorklogId=347074&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347074 ] ASF GitHub Bot logged work on BEAM-8743: Author: ASF GitHub Bot Created on: 20/Nov/19 23:44 Start Date: 20/Nov/19 23:44 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #10158: [BEAM-8743] Add support for flat schemas in pubsub URL: https://github.com/apache/beam/pull/10158#discussion_r348802643 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubMessageToRow.java ## @@ -68,8 +69,21 @@ public abstract boolean useDlq(); + public abstract boolean useFlatSchema(); + private Schema payloadSchema() { -return messageSchema().getField(PAYLOAD_FIELD).getType().getRowSchema(); +if (useFlatSchema()) { + Schema.Builder builder = Schema.builder(); + for (Schema.Field field : messageSchema().getFields()) { +if (field.getName().equals(TIMESTAMP_FIELD)) { + continue; +} +builder.addField(field); + } + return builder.build(); +} else { + return messageSchema().getField(PAYLOAD_FIELD).getType().getRowSchema(); Review comment: nit: shorter branch first is slightly easier to read This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347074) Time Spent: 50m (was: 40m) > Add support for flat schemas in pubsub > -- > > Key: BEAM-8743 > URL: https://issues.apache.org/jira/browse/BEAM-8743 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Fix For: 2.18.0 > > Time Spent: 50m > Remaining Estimate: 0h > > See > https://lists.apache.org/thread.html/bf4c37f21bda194d7f8c40f6e7b9a776262415755cc1658412af3c76@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8743) Add support for flat schemas in pubsub
[ https://issues.apache.org/jira/browse/BEAM-8743?focusedWorklogId=347073&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347073 ] ASF GitHub Bot logged work on BEAM-8743: Author: ASF GitHub Bot Created on: 20/Nov/19 23:44 Start Date: 20/Nov/19 23:44 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #10158: [BEAM-8743] Add support for flat schemas in pubsub URL: https://github.com/apache/beam/pull/10158#discussion_r348803919 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/PubsubMessageToRow.java ## @@ -68,8 +69,21 @@ public abstract boolean useDlq(); + public abstract boolean useFlatSchema(); + private Schema payloadSchema() { -return messageSchema().getField(PAYLOAD_FIELD).getType().getRowSchema(); +if (useFlatSchema()) { + Schema.Builder builder = Schema.builder(); + for (Schema.Field field : messageSchema().getFields()) { Review comment: nit: might be a pithy way to do e.g. `messageSchema().getFields().stream().filter(f -> !f.getName().equals(TIMESTAMP_FIELD)` but it might just get crufty anyhow This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347073) Time Spent: 40m (was: 0.5h) > Add support for flat schemas in pubsub > -- > > Key: BEAM-8743 > URL: https://issues.apache.org/jira/browse/BEAM-8743 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Fix For: 2.18.0 > > Time Spent: 40m > Remaining Estimate: 0h > > See > https://lists.apache.org/thread.html/bf4c37f21bda194d7f8c40f6e7b9a776262415755cc1658412af3c76@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=347072&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347072 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 20/Nov/19 23:42 Start Date: 20/Nov/19 23:42 Worklog Time Spent: 10m Work Description: bumblebee-coming commented on issue #10173: [BEAM-8575] Added two unit tests in CombineTest class to test simple … URL: https://github.com/apache/beam/pull/10173#issuecomment-556551619 R: @robertwb This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347072) Time Spent: 14h 50m (was: 14h 40m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 14h 50m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8795) runners:spark:compileJava broken on master
[ https://issues.apache.org/jira/browse/BEAM-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver resolved BEAM-8795. --- Fix Version/s: Not applicable Resolution: Fixed > runners:spark:compileJava broken on master > -- > > Key: BEAM-8795 > URL: https://issues.apache.org/jira/browse/BEAM-8795 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Fix For: Not applicable > > Time Spent: 0.5h > Remaining Estimate: 0h > > https://github.com/apache/beam/pull/10147 > beam/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/functions/SparkSideInputReader.java:49: > error: incompatible types: MultimapView is not a functional interface > o -> Collections.EMPTY_LIST; > ^ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8795) runners:spark:compileJava broken on master
[ https://issues.apache.org/jira/browse/BEAM-8795?focusedWorklogId=347071&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347071 ] ASF GitHub Bot logged work on BEAM-8795: Author: ASF GitHub Bot Created on: 20/Nov/19 23:34 Start Date: 20/Nov/19 23:34 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #10182: [BEAM-8795] fix Spark runner build URL: https://github.com/apache/beam/pull/10182 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347071) Time Spent: 0.5h (was: 20m) > runners:spark:compileJava broken on master > -- > > Key: BEAM-8795 > URL: https://issues.apache.org/jira/browse/BEAM-8795 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > https://github.com/apache/beam/pull/10147 > beam/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/functions/SparkSideInputReader.java:49: > error: incompatible types: MultimapView is not a functional interface > o -> Collections.EMPTY_LIST; > ^ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8796) Optionally configure static job port for JavaJarJobServer
[ https://issues.apache.org/jira/browse/BEAM-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-8796: -- Status: Open (was: Triage Needed) > Optionally configure static job port for JavaJarJobServer > - > > Key: BEAM-8796 > URL: https://issues.apache.org/jira/browse/BEAM-8796 > Project: Beam > Issue Type: Bug > Components: runner-flink, runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > > Right now, ports are always dynamically assigned. > https://github.com/apache/beam/blob/10243dc78d5472a5c312a316f03c6d4c622840ea/sdks/python/apache_beam/runners/portability/job_server.py#L144 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8796) Optionally configure static job port for JavaJarJobServer
Kyle Weaver created BEAM-8796: - Summary: Optionally configure static job port for JavaJarJobServer Key: BEAM-8796 URL: https://issues.apache.org/jira/browse/BEAM-8796 Project: Beam Issue Type: Bug Components: runner-flink, runner-spark Reporter: Kyle Weaver Assignee: Kyle Weaver Right now, ports are always dynamically assigned. https://github.com/apache/beam/blob/10243dc78d5472a5c312a316f03c6d4c622840ea/sdks/python/apache_beam/runners/portability/job_server.py#L144 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8629) WithTypeHints._get_or_create_type_hints may return a mutable copy of the class type hints.
[ https://issues.apache.org/jira/browse/BEAM-8629?focusedWorklogId=347069&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347069 ] ASF GitHub Bot logged work on BEAM-8629: Author: ASF GitHub Bot Created on: 20/Nov/19 23:28 Start Date: 20/Nov/19 23:28 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10080: [BEAM-8629] Don't return mutable class type hints. URL: https://github.com/apache/beam/pull/10080 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347069) Time Spent: 1h 20m (was: 1h 10m) > WithTypeHints._get_or_create_type_hints may return a mutable copy of the > class type hints. > -- > > Key: BEAM-8629 > URL: https://issues.apache.org/jira/browse/BEAM-8629 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8793) installGcpTest task flakes
[ https://issues.apache.org/jira/browse/BEAM-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-8793: -- Description: I've also seen this happen with :sdks:python:test-suites:portable:py37:installGcpTest. 11:01:38 > Task :sdks:python:test-suites:direct:py35:installGcpTest FAILED 11:01:38 Obtaining file:///home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python 11:01:38 ERROR: Command errored out with exit status 1: 11:01:38 command: /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/bin/python3.5 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py'"'"'; __file__='"'"'/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info 11:01:38 cwd: /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/ 11:01:38 Complete output (37 lines): 11:01:38 Traceback (most recent call last): 11:01:38 File "", line 1, in 11:01:38 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py", line 264, in 11:01:38 'test': generate_protos_first(test), 11:01:38 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/__init__.py", line 144, in setup 11:01:38 _install_setup_requires(attrs) 11:01:38 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/__init__.py", line 139, in _install_setup_requires 11:01:38 dist.fetch_build_eggs(dist.setup_requires) 11:01:38 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/dist.py", line 720, in fetch_build_eggs 11:01:38 replace_conflicting=True, 11:01:38 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py", line 782, in resolve 11:01:38 replace_conflicting=replace_conflicting 11:01:38 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py", line 1065, in best_match 11:01:38 return self.obtain(req, installer) 11:01:38 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py", line 1077, in obtain 11:01:38 return installer(requirement) 11:01:38 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/dist.py", line 787, in fetch_build_egg 11:01:38 return cmd.easy_install(req) 11:01:38 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py", line 679, in easy_install 11:01:38 return self.install_item(spec, dist.location, tmpdir, deps) 11:01:38 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py", line 705, in install_item 11:01:38 dists = self.install_eggs(spec, download, tmpdir) 11:01:38 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py", line 855, in install_eggs 11:01:38 return [self.install_wheel(dist_filename, tmpdir)] 11:01:38 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py", line 1073, in install_wheel 11:01:38 os.path.dirname(destination) 11:01:38 File "/usr/lib/python3.5/distutils/cmd.py", line 336, in execute 11:01:38 util.execute(func, args, msg, dry_run=self.dry_run) 11:01:38 File "/usr/lib/python3.5/distutils/util.py", line 301, in execute 11:01:38 func(*args) 11:01:38 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/wheel.py", line 101, in install_as_egg 11:01:38 self._install_as_egg(destination_eggdir, zf) 11:01:38 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build
[jira] [Assigned] (BEAM-8504) BigQueryIO DIRECT_READ is broken
[ https://issues.apache.org/jira/browse/BEAM-8504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reassigned BEAM-8504: - Assignee: Gleb Kanterov (was: Aryan Naraghi) > BigQueryIO DIRECT_READ is broken > > > Key: BEAM-8504 > URL: https://issues.apache.org/jira/browse/BEAM-8504 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.16.0, 2.17.0 >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Major > Fix For: 2.17.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT > (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with > 2.15.0. > {code} > java.io.IOException: Failed to start reading from source: name: > "projects//locations/eu/streams/" > at > org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalArgumentException: Fraction consumed from > previous response (0.0) is not less than fraction consumed from current > response (0.0). > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206) > at > org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:601) > ... 14 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8504) BigQueryIO DIRECT_READ is broken
[ https://issues.apache.org/jira/browse/BEAM-8504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978813#comment-16978813 ] Kenneth Knowles commented on BEAM-8504: --- LGTM. Thanks! Just close this out when green & merged. > BigQueryIO DIRECT_READ is broken > > > Key: BEAM-8504 > URL: https://issues.apache.org/jira/browse/BEAM-8504 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.16.0, 2.17.0 >Reporter: Gleb Kanterov >Assignee: Aryan Naraghi >Priority: Major > Fix For: 2.17.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT > (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with > 2.15.0. > {code} > java.io.IOException: Failed to start reading from source: name: > "projects//locations/eu/streams/" > at > org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalArgumentException: Fraction consumed from > previous response (0.0) is not less than fraction consumed from current > response (0.0). > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206) > at > org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:601) > ... 14 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8504) BigQueryIO DIRECT_READ is broken
[ https://issues.apache.org/jira/browse/BEAM-8504?focusedWorklogId=347066&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347066 ] ASF GitHub Bot logged work on BEAM-8504: Author: ASF GitHub Bot Created on: 20/Nov/19 23:11 Start Date: 20/Nov/19 23:11 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #10168: [BEAM-8504] Cherry-pick into release-2.17.0 URL: https://github.com/apache/beam/pull/10168#issuecomment-556530499 Looks good to merge to release branch when green. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347066) Time Spent: 2.5h (was: 2h 20m) > BigQueryIO DIRECT_READ is broken > > > Key: BEAM-8504 > URL: https://issues.apache.org/jira/browse/BEAM-8504 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.16.0, 2.17.0 >Reporter: Gleb Kanterov >Assignee: Aryan Naraghi >Priority: Major > Fix For: 2.17.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT > (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with > 2.15.0. > {code} > java.io.IOException: Failed to start reading from source: name: > "projects//locations/eu/streams/" > at > org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalArgumentException: Fraction consumed from > previous response (0.0) is not less than fraction consumed from current > response (0.0). > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206) > at > org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:601) > ... 14 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8489) Python typehints: filter callable output type hint should not be used
[ https://issues.apache.org/jira/browse/BEAM-8489?focusedWorklogId=347065&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347065 ] ASF GitHub Bot logged work on BEAM-8489: Author: ASF GitHub Bot Created on: 20/Nov/19 23:07 Start Date: 20/Nov/19 23:07 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #9890: [BEAM-8489] Filter: don't use callable's output type URL: https://github.com/apache/beam/pull/9890#discussion_r348791496 ## File path: sdks/python/apache_beam/transforms/core.py ## @@ -1544,10 +1544,16 @@ def Filter(fn, *args, **kwargs): # pylint: disable=invalid-name # TODO: What about callable classes? if hasattr(fn, '__name__'): wrapper.__name__ = fn.__name__ + + # Get type hints from this instance or the callable. Do not use output type + # hints from the callable (which should be bool if set). + fn_type_hints = typehints.decorators.IOTypeHints.from_callable(fn) + if fn_type_hints is not None: +fn_type_hints.output_types = None Review comment: With this change, do we still need both branches in line 1559, 1563? Perhaps we can make the evaluation more deterministic as in: ``` if (get_type_hints(wrapper).input_types and get_type_hints(wrapper).input_types[0]): output_hint = get_type_hints(wrapper).input_types[0][0] get_type_hints(wrapper).set_output_types(typehints.Iterable[output_hint]) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347065) Time Spent: 0.5h (was: 20m) > Python typehints: filter callable output type hint should not be used > - > > Key: BEAM-8489 > URL: https://issues.apache.org/jira/browse/BEAM-8489 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > A filter function returns bool, while the Filter() transform outputs the same > element type as the input. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8795) runners:spark:compileJava broken on master
[ https://issues.apache.org/jira/browse/BEAM-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-8795: -- Status: Open (was: Triage Needed) > runners:spark:compileJava broken on master > -- > > Key: BEAM-8795 > URL: https://issues.apache.org/jira/browse/BEAM-8795 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > https://github.com/apache/beam/pull/10147 > beam/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/functions/SparkSideInputReader.java:49: > error: incompatible types: MultimapView is not a functional interface > o -> Collections.EMPTY_LIST; > ^ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8794) Projects should be handled by an IOPushDownRule before applying AggregateProjectMergeRule
[ https://issues.apache.org/jira/browse/BEAM-8794?focusedWorklogId=347053&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347053 ] ASF GitHub Bot logged work on BEAM-8794: Author: ASF GitHub Bot Created on: 20/Nov/19 22:53 Start Date: 20/Nov/19 22:53 Worklog Time Spent: 10m Work Description: 11moon11 commented on issue #10180: [BEAM-8794] Conditional aggregate project merge URL: https://github.com/apache/beam/pull/10180#issuecomment-556512353 Run sql postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347053) Time Spent: 50m (was: 40m) > Projects should be handled by an IOPushDownRule before applying > AggregateProjectMergeRule > - > > Key: BEAM-8794 > URL: https://issues.apache.org/jira/browse/BEAM-8794 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > It is more efficient to push-down projected fields at an IO level (vs merging > with an Aggregate), when supported. > When running queries like: > {code:java} > select SUM(score) as total_score from group by name{code} > Projects get merged with an aggregate, as a result Calc (after an > IOSourceRel) projects all fields and BeamIOPushDown rule does know what > fields can be dropped, thus not dropping any. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8794) Projects should be handled by an IOPushDownRule before applying AggregateProjectMergeRule
[ https://issues.apache.org/jira/browse/BEAM-8794?focusedWorklogId=347051&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347051 ] ASF GitHub Bot logged work on BEAM-8794: Author: ASF GitHub Bot Created on: 20/Nov/19 22:53 Start Date: 20/Nov/19 22:53 Worklog Time Spent: 10m Work Description: 11moon11 commented on issue #10180: [BEAM-8794] Conditional aggregate project merge URL: https://github.com/apache/beam/pull/10180#issuecomment-556466893 Run sql postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347051) Time Spent: 40m (was: 0.5h) > Projects should be handled by an IOPushDownRule before applying > AggregateProjectMergeRule > - > > Key: BEAM-8794 > URL: https://issues.apache.org/jira/browse/BEAM-8794 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > It is more efficient to push-down projected fields at an IO level (vs merging > with an Aggregate), when supported. > When running queries like: > {code:java} > select SUM(score) as total_score from group by name{code} > Projects get merged with an aggregate, as a result Calc (after an > IOSourceRel) projects all fields and BeamIOPushDown rule does know what > fields can be dropped, thus not dropping any. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8795) runners:spark:compileJava broken on master
[ https://issues.apache.org/jira/browse/BEAM-8795?focusedWorklogId=347047&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347047 ] ASF GitHub Bot logged work on BEAM-8795: Author: ASF GitHub Bot Created on: 20/Nov/19 22:51 Start Date: 20/Nov/19 22:51 Worklog Time Spent: 10m Work Description: ibzib commented on issue #10182: [BEAM-8795] fix Spark runner build URL: https://github.com/apache/beam/pull/10182#issuecomment-556509852 Run Spark ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347047) Time Spent: 20m (was: 10m) > runners:spark:compileJava broken on master > -- > > Key: BEAM-8795 > URL: https://issues.apache.org/jira/browse/BEAM-8795 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > https://github.com/apache/beam/pull/10147 > beam/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/functions/SparkSideInputReader.java:49: > error: incompatible types: MultimapView is not a functional interface > o -> Collections.EMPTY_LIST; > ^ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3419) Enable iterable side input for beam runners.
[ https://issues.apache.org/jira/browse/BEAM-3419?focusedWorklogId=347045&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347045 ] ASF GitHub Bot logged work on BEAM-3419: Author: ASF GitHub Bot Created on: 20/Nov/19 22:47 Start Date: 20/Nov/19 22:47 Worklog Time Spent: 10m Work Description: ibzib commented on issue #10147: [BEAM-3419] Flesh out iterable side inputs and key enumeration for multimaps in shared libraries URL: https://github.com/apache/beam/pull/10147#issuecomment-556506562 #10182 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347045) Time Spent: 3h 50m (was: 3h 40m) > Enable iterable side input for beam runners. > > > Key: BEAM-3419 > URL: https://issues.apache.org/jira/browse/BEAM-3419 > Project: Beam > Issue Type: Improvement > Components: runner-core >Reporter: Robert Bradshaw >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8795) runners:spark:compileJava broken on master
[ https://issues.apache.org/jira/browse/BEAM-8795?focusedWorklogId=347044&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347044 ] ASF GitHub Bot logged work on BEAM-8795: Author: ASF GitHub Bot Created on: 20/Nov/19 22:46 Start Date: 20/Nov/19 22:46 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #10182: [BEAM-8795] fix Spark runner build URL: https://github.com/apache/beam/pull/10182 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/
[jira] [Work logged] (BEAM-3419) Enable iterable side input for beam runners.
[ https://issues.apache.org/jira/browse/BEAM-3419?focusedWorklogId=347043&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347043 ] ASF GitHub Bot logged work on BEAM-3419: Author: ASF GitHub Bot Created on: 20/Nov/19 22:45 Start Date: 20/Nov/19 22:45 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10147: [BEAM-3419] Flesh out iterable side inputs and key enumeration for multimaps in shared libraries URL: https://github.com/apache/beam/pull/10147#issuecomment-556503702 @ibzib has a fix in flight for this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347043) Time Spent: 3h 40m (was: 3.5h) > Enable iterable side input for beam runners. > > > Key: BEAM-3419 > URL: https://issues.apache.org/jira/browse/BEAM-3419 > Project: Beam > Issue Type: Improvement > Components: runner-core >Reporter: Robert Bradshaw >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8795) runners:spark:compileJava broken on master
Kyle Weaver created BEAM-8795: - Summary: runners:spark:compileJava broken on master Key: BEAM-8795 URL: https://issues.apache.org/jira/browse/BEAM-8795 Project: Beam Issue Type: Bug Components: runner-spark Reporter: Kyle Weaver Assignee: Kyle Weaver https://github.com/apache/beam/pull/10147 beam/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/functions/SparkSideInputReader.java:49: error: incompatible types: MultimapView is not a functional interface o -> Collections.EMPTY_LIST; ^ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3419) Enable iterable side input for beam runners.
[ https://issues.apache.org/jira/browse/BEAM-3419?focusedWorklogId=347041&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347041 ] ASF GitHub Bot logged work on BEAM-3419: Author: ASF GitHub Bot Created on: 20/Nov/19 22:42 Start Date: 20/Nov/19 22:42 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10147: [BEAM-3419] Flesh out iterable side inputs and key enumeration for multimaps in shared libraries URL: https://github.com/apache/beam/pull/10147#issuecomment-556500524 ``` beam/runners/spark/src/main/java/org/apache/beam/runners/spark/structuredstreaming/translation/batch/functions/SparkSideInputReader.java:49: error: incompatible types: MultimapView is not a functional interface o -> Collections.EMPTY_LIST; ^ ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347041) Time Spent: 3.5h (was: 3h 20m) > Enable iterable side input for beam runners. > > > Key: BEAM-3419 > URL: https://issues.apache.org/jira/browse/BEAM-3419 > Project: Beam > Issue Type: Improvement > Components: runner-core >Reporter: Robert Bradshaw >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8504) BigQueryIO DIRECT_READ is broken
[ https://issues.apache.org/jira/browse/BEAM-8504?focusedWorklogId=347039&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347039 ] ASF GitHub Bot logged work on BEAM-8504: Author: ASF GitHub Bot Created on: 20/Nov/19 22:41 Start Date: 20/Nov/19 22:41 Worklog Time Spent: 10m Work Description: kanterov commented on issue #10168: [BEAM-8504] Cherry-pick into release-2.17.0 URL: https://github.com/apache/beam/pull/10168#issuecomment-556499571 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347039) Time Spent: 2h 10m (was: 2h) > BigQueryIO DIRECT_READ is broken > > > Key: BEAM-8504 > URL: https://issues.apache.org/jira/browse/BEAM-8504 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.16.0, 2.17.0 >Reporter: Gleb Kanterov >Assignee: Aryan Naraghi >Priority: Major > Fix For: 2.17.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT > (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with > 2.15.0. > {code} > java.io.IOException: Failed to start reading from source: name: > "projects//locations/eu/streams/" > at > org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalArgumentException: Fraction consumed from > previous response (0.0) is not less than fraction consumed from current > response (0.0). > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206) > at > org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:601) > ... 14 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8504) BigQueryIO DIRECT_READ is broken
[ https://issues.apache.org/jira/browse/BEAM-8504?focusedWorklogId=347040&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347040 ] ASF GitHub Bot logged work on BEAM-8504: Author: ASF GitHub Bot Created on: 20/Nov/19 22:41 Start Date: 20/Nov/19 22:41 Worklog Time Spent: 10m Work Description: kanterov commented on issue #10168: [BEAM-8504] Cherry-pick into release-2.17.0 URL: https://github.com/apache/beam/pull/10168#issuecomment-556499732 Run Java_Examples_Dataflow PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347040) Time Spent: 2h 20m (was: 2h 10m) > BigQueryIO DIRECT_READ is broken > > > Key: BEAM-8504 > URL: https://issues.apache.org/jira/browse/BEAM-8504 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.16.0, 2.17.0 >Reporter: Gleb Kanterov >Assignee: Aryan Naraghi >Priority: Major > Fix For: 2.17.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > The issue is reproducible with 2.16.0, 2.17.0 candidate and 2.18.0-SNAPSHOT > (as of d96c6b21a8a95b01944016584bc8e4ad1ab5f6a6), and not reproducible with > 2.15.0. > {code} > java.io.IOException: Failed to start reading from source: name: > "projects//locations/eu/streams/" > at > org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:604) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.start(ReadOperation.java:361) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159) > at > org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380) > at > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120) > at > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalArgumentException: Fraction consumed from > previous response (0.0) is not less than fraction consumed from current > response (0.0). > at > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:243) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.start(BigQueryStorageStreamSource.java:206) > at > org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.start(WorkerCustomSources.java:601) > ... 14 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3419) Enable iterable side input for beam runners.
[ https://issues.apache.org/jira/browse/BEAM-3419?focusedWorklogId=347038&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347038 ] ASF GitHub Bot logged work on BEAM-3419: Author: ASF GitHub Bot Created on: 20/Nov/19 22:41 Start Date: 20/Nov/19 22:41 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10147: [BEAM-3419] Flesh out iterable side inputs and key enumeration for multimaps in shared libraries URL: https://github.com/apache/beam/pull/10147#issuecomment-556499485 I think this breaks :runners:spark:compileJava on master. @lukecwik can you please take a look? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347038) Time Spent: 3h 20m (was: 3h 10m) > Enable iterable side input for beam runners. > > > Key: BEAM-3419 > URL: https://issues.apache.org/jira/browse/BEAM-3419 > Project: Beam > Issue Type: Improvement > Components: runner-core >Reporter: Robert Bradshaw >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8794) Projects should be handled by an IOPushDownRule before applying AggregateProjectMergeRule
[ https://issues.apache.org/jira/browse/BEAM-8794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov updated BEAM-8794: Status: Open (was: Triage Needed) > Projects should be handled by an IOPushDownRule before applying > AggregateProjectMergeRule > - > > Key: BEAM-8794 > URL: https://issues.apache.org/jira/browse/BEAM-8794 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > It is more efficient to push-down projected fields at an IO level (vs merging > with an Aggregate), when supported. > When running queries like: > {code:java} > select SUM(score) as total_score from group by name{code} > Projects get merged with an aggregate, as a result Calc (after an > IOSourceRel) projects all fields and BeamIOPushDown rule does know what > fields can be dropped, thus not dropping any. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-4776) Java PortableRunner should support metrics
[ https://issues.apache.org/jira/browse/BEAM-4776?focusedWorklogId=347035&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347035 ] ASF GitHub Bot logged work on BEAM-4776: Author: ASF GitHub Bot Created on: 20/Nov/19 22:35 Start Date: 20/Nov/19 22:35 Worklog Time Spent: 10m Work Description: mwalenia commented on issue #10105: [BEAM-4776] Add metrics support to Java PortableRunner URL: https://github.com/apache/beam/pull/10105#issuecomment-556493149 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347035) Time Spent: 5h 10m (was: 5h) > Java PortableRunner should support metrics > -- > > Key: BEAM-4776 > URL: https://issues.apache.org/jira/browse/BEAM-4776 > Project: Beam > Issue Type: Bug > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Michal Walenia >Priority: Major > Time Spent: 5h 10m > Remaining Estimate: 0h > > BEAM-4775 concerns adding metrics to the JobService API; the current issue is > about making PortableRunner understand them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-4776) Java PortableRunner should support metrics
[ https://issues.apache.org/jira/browse/BEAM-4776?focusedWorklogId=347037&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347037 ] ASF GitHub Bot logged work on BEAM-4776: Author: ASF GitHub Bot Created on: 20/Nov/19 22:35 Start Date: 20/Nov/19 22:35 Worklog Time Spent: 10m Work Description: mwalenia commented on issue #10105: [BEAM-4776] Add metrics support to Java PortableRunner URL: https://github.com/apache/beam/pull/10105#issuecomment-556493528 Run Java Flink PortableValidatesRunner Streaming This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347037) Time Spent: 5.5h (was: 5h 20m) > Java PortableRunner should support metrics > -- > > Key: BEAM-4776 > URL: https://issues.apache.org/jira/browse/BEAM-4776 > Project: Beam > Issue Type: Bug > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Michal Walenia >Priority: Major > Time Spent: 5.5h > Remaining Estimate: 0h > > BEAM-4775 concerns adding metrics to the JobService API; the current issue is > about making PortableRunner understand them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-4776) Java PortableRunner should support metrics
[ https://issues.apache.org/jira/browse/BEAM-4776?focusedWorklogId=347036&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347036 ] ASF GitHub Bot logged work on BEAM-4776: Author: ASF GitHub Bot Created on: 20/Nov/19 22:35 Start Date: 20/Nov/19 22:35 Worklog Time Spent: 10m Work Description: mwalenia commented on issue #10105: [BEAM-4776] Add metrics support to Java PortableRunner URL: https://github.com/apache/beam/pull/10105#issuecomment-556493435 Run Java Flink PortableValidatesRunner Batch This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347036) Time Spent: 5h 20m (was: 5h 10m) > Java PortableRunner should support metrics > -- > > Key: BEAM-4776 > URL: https://issues.apache.org/jira/browse/BEAM-4776 > Project: Beam > Issue Type: Bug > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Michal Walenia >Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > > BEAM-4775 concerns adding metrics to the JobService API; the current issue is > about making PortableRunner understand them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8794) Projects should be handled by an IOPushDownRule before applying AggregateProjectMergeRule
[ https://issues.apache.org/jira/browse/BEAM-8794?focusedWorklogId=347029&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347029 ] ASF GitHub Bot logged work on BEAM-8794: Author: ASF GitHub Bot Created on: 20/Nov/19 22:11 Start Date: 20/Nov/19 22:11 Worklog Time Spent: 10m Work Description: 11moon11 commented on issue #10180: [BEAM-8794] Conditional aggregate project merge URL: https://github.com/apache/beam/pull/10180#issuecomment-556466893 Run sql postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347029) Time Spent: 0.5h (was: 20m) > Projects should be handled by an IOPushDownRule before applying > AggregateProjectMergeRule > - > > Key: BEAM-8794 > URL: https://issues.apache.org/jira/browse/BEAM-8794 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > It is more efficient to push-down projected fields at an IO level (vs merging > with an Aggregate), when supported. > When running queries like: > {code:java} > select SUM(score) as total_score from group by name{code} > Projects get merged with an aggregate, as a result Calc (after an > IOSourceRel) projects all fields and BeamIOPushDown rule does know what > fields can be dropped, thus not dropping any. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-4663) Implement Cost calculations for Cost-Based Optimization (CBO)
[ https://issues.apache.org/jira/browse/BEAM-4663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Jiang resolved BEAM-4663. - Fix Version/s: Not applicable Resolution: Invalid > Implement Cost calculations for Cost-Based Optimization (CBO) > -- > > Key: BEAM-4663 > URL: https://issues.apache.org/jira/browse/BEAM-4663 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql >Reporter: Kai Jiang >Assignee: Kai Jiang >Priority: Major > Fix For: Not applicable > > Time Spent: 2h 20m > Remaining Estimate: 0h > > To support CBO, we should implement methods in each Beam*Rel.java. > computeSelfCost(...) as our first step. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8746) Allow the local job service to work from inside docker
[ https://issues.apache.org/jira/browse/BEAM-8746?focusedWorklogId=347032&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347032 ] ASF GitHub Bot logged work on BEAM-8746: Author: ASF GitHub Bot Created on: 20/Nov/19 22:14 Start Date: 20/Nov/19 22:14 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #10161: [BEAM-8746] Make local job service accessible from external machines URL: https://github.com/apache/beam/pull/10161#discussion_r348773235 ## File path: sdks/python/apache_beam/runners/portability/local_job_service.py ## @@ -95,7 +95,7 @@ def create_beam_job(self, preparation_id, job_name, pipeline, options): def start_grpc_server(self, port=0): self._server = grpc.server(UnboundedThreadPoolExecutor()) -port = self._server.add_insecure_port('localhost:%d' % port) +port = self._server.add_insecure_port('[::]:%d' % port) Review comment: > Could this not be handled in the subclass? Yeah, let me look into the best design for this. It'd be nice if we took this opportunity to make more than the hostname configurable (i.e. provide a way to further configure the server before starting). I'll propose an alternative later today. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347032) Time Spent: 1h 10m (was: 1h) > Allow the local job service to work from inside docker > -- > > Key: BEAM-8746 > URL: https://issues.apache.org/jira/browse/BEAM-8746 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > Currently the connection is refused. It's a simple fix. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8691) Beam Dependency Update Request: com.google.cloud.bigtable:bigtable-client-core
[ https://issues.apache.org/jira/browse/BEAM-8691?focusedWorklogId=347031&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347031 ] ASF GitHub Bot logged work on BEAM-8691: Author: ASF GitHub Bot Created on: 20/Nov/19 22:12 Start Date: 20/Nov/19 22:12 Worklog Time Spent: 10m Work Description: suztomo commented on pull request #10144: [BEAM-8691] Upgrading bigtable-client-core to latest 1.12.1 URL: https://github.com/apache/beam/pull/10144 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347031) Time Spent: 3h 10m (was: 3h) > Beam Dependency Update Request: com.google.cloud.bigtable:bigtable-client-core > -- > > Key: BEAM-8691 > URL: https://issues.apache.org/jira/browse/BEAM-8691 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Tomo Suzuki >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > - 2019-11-15 19:39:51.523448 > - > Please consider upgrading the dependency > com.google.cloud.bigtable:bigtable-client-core. > The current version is 1.8.0. The latest version is 1.12.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-11-19 21:05:43.901882 > - > Please consider upgrading the dependency > com.google.cloud.bigtable:bigtable-client-core. > The current version is 1.8.0. The latest version is 1.12.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8691) Beam Dependency Update Request: com.google.cloud.bigtable:bigtable-client-core
[ https://issues.apache.org/jira/browse/BEAM-8691?focusedWorklogId=347030&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347030 ] ASF GitHub Bot logged work on BEAM-8691: Author: ASF GitHub Bot Created on: 20/Nov/19 22:12 Start Date: 20/Nov/19 22:12 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10144: [BEAM-8691] Upgrading bigtable-client-core to latest 1.12.1 URL: https://github.com/apache/beam/pull/10144#issuecomment-556468925 Closing this for now while investigating the errors. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347030) Time Spent: 3h (was: 2h 50m) > Beam Dependency Update Request: com.google.cloud.bigtable:bigtable-client-core > -- > > Key: BEAM-8691 > URL: https://issues.apache.org/jira/browse/BEAM-8691 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Tomo Suzuki >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > - 2019-11-15 19:39:51.523448 > - > Please consider upgrading the dependency > com.google.cloud.bigtable:bigtable-client-core. > The current version is 1.8.0. The latest version is 1.12.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-11-19 21:05:43.901882 > - > Please consider upgrading the dependency > com.google.cloud.bigtable:bigtable-client-core. > The current version is 1.8.0. The latest version is 1.12.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines
[ https://issues.apache.org/jira/browse/BEAM-8343?focusedWorklogId=347027&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347027 ] ASF GitHub Bot logged work on BEAM-8343: Author: ASF GitHub Bot Created on: 20/Nov/19 22:02 Start Date: 20/Nov/19 22:02 Worklog Time Spent: 10m Work Description: 11moon11 commented on issue #10060: [BEAM-8343] [SQL] Updated the cost model to favor IO with push-down. URL: https://github.com/apache/beam/pull/10060#issuecomment-556457627 CC: @TheNeuralBit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347027) Time Spent: 7h (was: 6h 50m) > Add means for IO APIs to support predicate and/or project push-down when > running SQL pipelines > -- > > Key: BEAM-8343 > URL: https://issues.apache.org/jira/browse/BEAM-8343 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Fix For: 2.18.0 > > Time Spent: 7h > Remaining Estimate: 0h > > The objective is to create a universal way for Beam SQL IO APIs to support > predicate/project push-down. > A proposed way to achieve that is by introducing an interface responsible > for identifying what portion(s) of a Calc can be moved down to IO layer. > Also, adding following methods to a BeamSqlTable interface to pass necessary > parameters to IO APIs: > - BeamSqlTableFilter constructFilter(List filter) > - ProjectSupport supportsProjects() > - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, > List fieldNames) > > ProjectSupport is an enum with the following options: > * NONE > * WITHOUT_FIELD_REORDERING > * WITH_FIELD_REORDERING > > Design doc > [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8794) Projects should be handled by an IOPushDownRule before applying AggregateProjectMergeRule
[ https://issues.apache.org/jira/browse/BEAM-8794?focusedWorklogId=347026&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347026 ] ASF GitHub Bot logged work on BEAM-8794: Author: ASF GitHub Bot Created on: 20/Nov/19 22:01 Start Date: 20/Nov/19 22:01 Worklog Time Spent: 10m Work Description: 11moon11 commented on issue #10180: [BEAM-8794] Conditional aggregate project merge URL: https://github.com/apache/beam/pull/10180#issuecomment-556456864 CC: @TheNeuralBit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347026) Time Spent: 20m (was: 10m) > Projects should be handled by an IOPushDownRule before applying > AggregateProjectMergeRule > - > > Key: BEAM-8794 > URL: https://issues.apache.org/jira/browse/BEAM-8794 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > It is more efficient to push-down projected fields at an IO level (vs merging > with an Aggregate), when supported. > When running queries like: > {code:java} > select SUM(score) as total_score from group by name{code} > Projects get merged with an aggregate, as a result Calc (after an > IOSourceRel) projects all fields and BeamIOPushDown rule does know what > fields can be dropped, thus not dropping any. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8691) Beam Dependency Update Request: com.google.cloud.bigtable:bigtable-client-core
[ https://issues.apache.org/jira/browse/BEAM-8691?focusedWorklogId=347022&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347022 ] ASF GitHub Bot logged work on BEAM-8691: Author: ASF GitHub Bot Created on: 20/Nov/19 21:50 Start Date: 20/Nov/19 21:50 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10144: [BEAM-8691] Upgrading bigtable-client-core to latest 1.12.1 URL: https://github.com/apache/beam/pull/10144#issuecomment-556444847 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347022) Time Spent: 2h 50m (was: 2h 40m) > Beam Dependency Update Request: com.google.cloud.bigtable:bigtable-client-core > -- > > Key: BEAM-8691 > URL: https://issues.apache.org/jira/browse/BEAM-8691 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Tomo Suzuki >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > - 2019-11-15 19:39:51.523448 > - > Please consider upgrading the dependency > com.google.cloud.bigtable:bigtable-client-core. > The current version is 1.8.0. The latest version is 1.12.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-11-19 21:05:43.901882 > - > Please consider upgrading the dependency > com.google.cloud.bigtable:bigtable-client-core. > The current version is 1.8.0. The latest version is 1.12.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8691) Beam Dependency Update Request: com.google.cloud.bigtable:bigtable-client-core
[ https://issues.apache.org/jira/browse/BEAM-8691?focusedWorklogId=347021&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347021 ] ASF GitHub Bot logged work on BEAM-8691: Author: ASF GitHub Bot Created on: 20/Nov/19 21:50 Start Date: 20/Nov/19 21:50 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10144: [BEAM-8691] Upgrading bigtable-client-core to latest 1.12.1 URL: https://github.com/apache/beam/pull/10144#issuecomment-556444783 @lukecwik The post commit seems to have detected an issue. Thank you for advice. ``` Failure message was: java.lang.AbstractMethodError: com.google.api.gax.grpc.InstantiatingGrpcChannelProvider.needsCredentials()Z at com.google.api.gax.rpc.ClientContext.create(ClientContext.java:157) at com.google.cloud.bigquery.storage.v1beta1.stub.EnhancedBigQueryStorageStub.create(EnhancedBigQueryStorageStub.java:89) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347021) Time Spent: 2h 40m (was: 2.5h) > Beam Dependency Update Request: com.google.cloud.bigtable:bigtable-client-core > -- > > Key: BEAM-8691 > URL: https://issues.apache.org/jira/browse/BEAM-8691 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Tomo Suzuki >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > - 2019-11-15 19:39:51.523448 > - > Please consider upgrading the dependency > com.google.cloud.bigtable:bigtable-client-core. > The current version is 1.8.0. The latest version is 1.12.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-11-19 21:05:43.901882 > - > Please consider upgrading the dependency > com.google.cloud.bigtable:bigtable-client-core. > The current version is 1.8.0. The latest version is 1.12.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8492) Python typehints: don't try to strip_iterable from None
[ https://issues.apache.org/jira/browse/BEAM-8492?focusedWorklogId=347019&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347019 ] ASF GitHub Bot logged work on BEAM-8492: Author: ASF GitHub Bot Created on: 20/Nov/19 21:46 Start Date: 20/Nov/19 21:46 Worklog Time Spent: 10m Work Description: udim commented on issue #9895: [BEAM-8492] Allow None, Optional return hints for DoFn.process and friends URL: https://github.com/apache/beam/pull/9895#issuecomment-556440849 I've fixed some linter errors. Should be okay to review now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347019) Time Spent: 50m (was: 40m) > Python typehints: don't try to strip_iterable from None > --- > > Key: BEAM-8492 > URL: https://issues.apache.org/jira/browse/BEAM-8492 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > The return value of DoFn.process can be an iterable of elements or None. > Handle the case when the output type hint of process is None. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8658) Optionally set artifact staging port in FlinkUberJarJobServer
[ https://issues.apache.org/jira/browse/BEAM-8658?focusedWorklogId=347017&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347017 ] ASF GitHub Bot logged work on BEAM-8658: Author: ASF GitHub Bot Created on: 20/Nov/19 21:38 Start Date: 20/Nov/19 21:38 Worklog Time Spent: 10m Work Description: ibzib commented on issue #10163: [BEAM-8658] [BEAM-8781] Optionally set jar and artifact staging port … URL: https://github.com/apache/beam/pull/10163#issuecomment-556403050 Does anyone know if we publish nightly snapshots of the Flink job server jar anywhere? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347017) Time Spent: 40m (was: 0.5h) > Optionally set artifact staging port in FlinkUberJarJobServer > - > > Key: BEAM-8658 > URL: https://issues.apache.org/jira/browse/BEAM-8658 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 40m > Remaining Estimate: 0h > > In certain network environments, port forwarding is necessary for our GRPC > servers, such as the artifact staging server. Currently, the port for > FlinkUberJarJobServer's artifact staging server is chosen randomly (0). We > will need to let the user choose it if they are to forward that port. > https://github.com/apache/beam/blob/802e7cd86024c21d7b2eeb45f0e7c8e370661610/sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server.py#L129 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7390) Colab examples for aggregation transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7390?focusedWorklogId=347016&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347016 ] ASF GitHub Bot logged work on BEAM-7390: Author: ASF GitHub Bot Created on: 20/Nov/19 21:36 Start Date: 20/Nov/19 21:36 Worklog Time Spent: 10m Work Description: davidcavazos commented on issue #10174: [BEAM-7390] Add code snippet for Sample URL: https://github.com/apache/beam/pull/10174#issuecomment-556429558 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347016) Time Spent: 3h 20m (was: 3h 10m) > Colab examples for aggregation transforms (Python) > -- > > Key: BEAM-7390 > URL: https://issues.apache.org/jira/browse/BEAM-7390 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 3h 20m > Remaining Estimate: 0h > > Merge aggregation Colabs into the transform catalog -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7390) Colab examples for aggregation transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7390?focusedWorklogId=347015&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347015 ] ASF GitHub Bot logged work on BEAM-7390: Author: ASF GitHub Bot Created on: 20/Nov/19 21:36 Start Date: 20/Nov/19 21:36 Worklog Time Spent: 10m Work Description: davidcavazos commented on issue #10174: [BEAM-7390] Add code snippet for Sample URL: https://github.com/apache/beam/pull/10174#issuecomment-556429558 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347015) Time Spent: 3h 10m (was: 3h) > Colab examples for aggregation transforms (Python) > -- > > Key: BEAM-7390 > URL: https://issues.apache.org/jira/browse/BEAM-7390 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 3h 10m > Remaining Estimate: 0h > > Merge aggregation Colabs into the transform catalog -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8787) Python setup issues
[ https://issues.apache.org/jira/browse/BEAM-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978745#comment-16978745 ] Tomo Suzuki commented on BEAM-8787: --- I'm feeling our python3.6 installation is broken: {noformat} suztomo@suxtomo24:~$ which python3.6 /usr/bin/python3.6 suztomo@suxtomo24:~$ python3.6 --version Python 3.6.8 suztomo@suxtomo24:~$ python3.6 -m pip install foo Traceback (most recent call last): File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/usr/lib/python3/dist-packages/pip/__main__.py", line 16, in from pip._internal import main as _main # isort:skip # noqa File "/usr/lib/python3/dist-packages/pip/_internal/__init__.py", line 40, in from pip._internal.cli.autocompletion import autocomplete File "/usr/lib/python3/dist-packages/pip/_internal/cli/autocompletion.py", line 8, in from pip._internal.cli.main_parser import create_main_parser File "/usr/lib/python3/dist-packages/pip/_internal/cli/main_parser.py", line 8, in from pip._internal.cli import cmdoptions File "/usr/lib/python3/dist-packages/pip/_internal/cli/cmdoptions.py", line 17, in from pip._internal.locations import USER_CACHE_DIR, src_prefix File "/usr/lib/python3/dist-packages/pip/_internal/locations.py", line 10, in from distutils import sysconfig as distutils_sysconfig ImportError: cannot import name 'sysconfig' {noformat} Found http://b/119097564 > Python setup issues > --- > > Key: BEAM-8787 > URL: https://issues.apache.org/jira/browse/BEAM-8787 > Project: Beam > Issue Type: Bug > Components: build-system >Affects Versions: 2.16.0 > Environment: debian x86 (gLinux) >Reporter: Elliotte Rusty Harold >Priority: Major > > This could be an issue with incomplete or inaccurate contributing docs. tldr; > `./gradlew check` fails on Debian after initial checkout. > The docs say that one should first run: > sudo apt-get install \ > openjdk-8-jdk \ > python-setuptools \ > python-pip \ > virtualenv > but even after running this pieces are missing. I'm still debugging exactly > what's missing but the symptoms look like this: > > Task :sdks:python:test-suites:tox:py35:setupVirtualenv FAILED > The path python3.5 (from --python=python3.5) does not exist > > Task :sdks:python:test-suites:tox:py36:setupVirtualenv FAILED > [ant:fmpp] Traceback (most recent call last): > [ant:fmpp] File "/usr/lib/python3/dist-packages/virtualenv.py", line 25, in > > [ant:fmpp] import distutils.sysconfig > [ant:fmpp] ModuleNotFoundError: No module named 'distutils.sysconfig' > ... > FAILURE: Build completed with 2 failures. > 1: Task failed with an exception. > --- > * What went wrong: > Execution failed for task ':sdks:python:test-suites:tox:py35:setupVirtualenv'. > > Process 'command 'virtualenv'' finished with non-zero exit value 3 > Indeed there is no Python 3.5 on this system: > gnome-user-share python2.6 > gnome-vfs-2.0 python2.7 > gnupg python3 > gnupg2python3.6 > gold-ld python3.7 > goobuntu-config-tools python3.8 > But nowhere in the setup docs do we say that Python 3.5 is required to build > this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8490) Python typehints: properly resolve empty dict type
[ https://issues.apache.org/jira/browse/BEAM-8490?focusedWorklogId=347014&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347014 ] ASF GitHub Bot logged work on BEAM-8490: Author: ASF GitHub Bot Created on: 20/Nov/19 21:30 Start Date: 20/Nov/19 21:30 Worklog Time Spent: 10m Work Description: udim commented on issue #9894: [BEAM-8490] Fix instance_to_type for empty containers URL: https://github.com/apache/beam/pull/9894#issuecomment-556422501 R: @kennknowles CC: @robertwb This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347014) Time Spent: 20m (was: 10m) > Python typehints: properly resolve empty dict type > -- > > Key: BEAM-8490 > URL: https://issues.apache.org/jira/browse/BEAM-8490 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Currently: > {code} > trivial_inference.instance_to_type({}) > {code} > returns > {code} > Dict[Union[], Union[]] > {code} > instead of > {code} > Dict[Any,Any] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8794) Projects should be handled by an IOPushDownRule before applying AggregateProjectMergeRule
[ https://issues.apache.org/jira/browse/BEAM-8794?focusedWorklogId=347013&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347013 ] ASF GitHub Bot logged work on BEAM-8794: Author: ASF GitHub Bot Created on: 20/Nov/19 21:29 Start Date: 20/Nov/19 21:29 Worklog Time Spent: 10m Work Description: 11moon11 commented on issue #10180: [BEAM-8794] Conditional aggregate project merge URL: https://github.com/apache/beam/pull/10180#issuecomment-556421814 R: @apilloud CC: @amaliujia This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347013) Remaining Estimate: 0h Time Spent: 10m > Projects should be handled by an IOPushDownRule before applying > AggregateProjectMergeRule > - > > Key: BEAM-8794 > URL: https://issues.apache.org/jira/browse/BEAM-8794 > Project: Beam > Issue Type: Improvement > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > It is more efficient to push-down projected fields at an IO level (vs merging > with an Aggregate), when supported. > When running queries like: > {code:java} > select SUM(score) as total_score from group by name{code} > Projects get merged with an aggregate, as a result Calc (after an > IOSourceRel) projects all fields and BeamIOPushDown rule does know what > fields can be dropped, thus not dropping any. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8269) IOTypehints.from_callable doesn't convert native type hints to Beam
[ https://issues.apache.org/jira/browse/BEAM-8269?focusedWorklogId=347012&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347012 ] ASF GitHub Bot logged work on BEAM-8269: Author: ASF GitHub Bot Created on: 20/Nov/19 21:29 Start Date: 20/Nov/19 21:29 Worklog Time Spent: 10m Work Description: udim commented on issue #9602: [BEAM-8269] Convert Py3 type hints to Beam types URL: https://github.com/apache/beam/pull/9602#issuecomment-556421486 @robertwb PTAL, changes: - Commented out one test case with a TODO(BEAM-8492) comment. - Added a _LOGGER.info message when converting an unknown typing module type to typehints.Any. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347012) Time Spent: 1h 10m (was: 1h) > IOTypehints.from_callable doesn't convert native type hints to Beam > --- > > Key: BEAM-8269 > URL: https://issues.apache.org/jira/browse/BEAM-8269 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > Users typically write type hints using typing module types. We should allow > that, be internally convert these type to Beam module types for now. > In the future, Beam should stop using these internal types (BEAM-8156). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8794) Projects should be handled by an IOPushDownRule before applying AggregateProjectMergeRule
Kirill Kozlov created BEAM-8794: --- Summary: Projects should be handled by an IOPushDownRule before applying AggregateProjectMergeRule Key: BEAM-8794 URL: https://issues.apache.org/jira/browse/BEAM-8794 Project: Beam Issue Type: Improvement Components: dsl-sql Reporter: Kirill Kozlov Assignee: Kirill Kozlov It is more efficient to push-down projected fields at an IO level (vs merging with an Aggregate), when supported. When running queries like: {code:java} select SUM(score) as total_score from group by name{code} Projects get merged with an aggregate, as a result Calc (after an IOSourceRel) projects all fields and BeamIOPushDown rule does know what fields can be dropped, thus not dropping any. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-4132) Element type inference doesn't work for multi-output DoFns
[ https://issues.apache.org/jira/browse/BEAM-4132?focusedWorklogId=347010&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347010 ] ASF GitHub Bot logged work on BEAM-4132: Author: ASF GitHub Bot Created on: 20/Nov/19 21:21 Start Date: 20/Nov/19 21:21 Worklog Time Spent: 10m Work Description: udim commented on issue #10142: [BEAM-4132] Set multi-output PCollections types to Any URL: https://github.com/apache/beam/pull/10142#issuecomment-556412724 R: @kennknowles CC: @robertwb This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347010) Time Spent: 2h 40m (was: 2.5h) > Element type inference doesn't work for multi-output DoFns > -- > > Key: BEAM-4132 > URL: https://issues.apache.org/jira/browse/BEAM-4132 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.4.0 >Reporter: Chuan Yu Foo >Assignee: Udi Meiri >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > TLDR: if you have a multi-output DoFn, then the non-main PCollections with > incorrectly have their element types set to None. This affects type checking > for pipelines involving these PCollections. > Minimal example: > {code} > import apache_beam as beam > class TripleDoFn(beam.DoFn): > def process(self, elem): > yield_elem > if elem % 2 == 0: > yield beam.pvalue.TaggedOutput('ten_times', elem * 10) > if elem % 3 == 0: > yield beam.pvalue.TaggedOutput('hundred_times', elem * 100) > > @beam.typehints.with_input_types(int) > @beam.typehints.with_output_types(int) > class MultiplyBy(beam.DoFn): > def __init__(self, multiplier): > self._multiplier = multiplier > def process(self, elem): > return elem * self._multiplier > > def main(): > with beam.Pipeline() as p: > x, a, b = ( > p > | 'Create' >> beam.Create([1, 2, 3]) > | 'TripleDo' >> beam.ParDo(TripleDoFn()).with_outputs( > 'ten_times', 'hundred_times', main='main_output')) > _ = a | 'MultiplyBy2' >> beam.ParDo(MultiplyBy(2)) > if __name__ == '__main__': > main() > {code} > Running this yields the following error: > {noformat} > apache_beam.typehints.decorators.TypeCheckError: Type hint violation for > 'MultiplyBy2': requires but got None for elem > {noformat} > Replacing {{a}} with {{b}} yields the same error. Replacing {{a}} with {{x}} > instead yields the following error: > {noformat} > apache_beam.typehints.decorators.TypeCheckError: Type hint violation for > 'MultiplyBy2': requires but got Union[TaggedOutput, int] for elem > {noformat} > I would expect Beam to correctly infer that {{a}} and {{b}} have element > types of {{int}} rather than {{None}}, and I would also expect Beam to > correctly figure out that the element types of {{x}} are compatible with > {{int}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8343) Add means for IO APIs to support predicate and/or project push-down when running SQL pipelines
[ https://issues.apache.org/jira/browse/BEAM-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kirill Kozlov resolved BEAM-8343. - Fix Version/s: 2.18.0 Resolution: Fixed > Add means for IO APIs to support predicate and/or project push-down when > running SQL pipelines > -- > > Key: BEAM-8343 > URL: https://issues.apache.org/jira/browse/BEAM-8343 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Kirill Kozlov >Assignee: Kirill Kozlov >Priority: Major > Fix For: 2.18.0 > > Time Spent: 6h 50m > Remaining Estimate: 0h > > The objective is to create a universal way for Beam SQL IO APIs to support > predicate/project push-down. > A proposed way to achieve that is by introducing an interface responsible > for identifying what portion(s) of a Calc can be moved down to IO layer. > Also, adding following methods to a BeamSqlTable interface to pass necessary > parameters to IO APIs: > - BeamSqlTableFilter constructFilter(List filter) > - ProjectSupport supportsProjects() > - PCollection buildIOReader(PBegin begin, BeamSqlTableFilter filters, > List fieldNames) > > ProjectSupport is an enum with the following options: > * NONE > * WITHOUT_FIELD_REORDERING > * WITH_FIELD_REORDERING > > Design doc > [link|https://docs.google.com/document/d/1-ysD7U7qF3MAmSfkbXZO_5PLJBevAL9bktlLCerd_jE/edit?usp=sharing]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8658) Optionally set artifact staging port in FlinkUberJarJobServer
[ https://issues.apache.org/jira/browse/BEAM-8658?focusedWorklogId=347008&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347008 ] ASF GitHub Bot logged work on BEAM-8658: Author: ASF GitHub Bot Created on: 20/Nov/19 21:12 Start Date: 20/Nov/19 21:12 Worklog Time Spent: 10m Work Description: ibzib commented on issue #10163: [BEAM-8658] [BEAM-8781] Optionally set jar and artifact staging port … URL: https://github.com/apache/beam/pull/10163#issuecomment-556403050 For my reference, does anyone know if we publish nightly snapshots of the Flink job server jar anywhere? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347008) Time Spent: 0.5h (was: 20m) > Optionally set artifact staging port in FlinkUberJarJobServer > - > > Key: BEAM-8658 > URL: https://issues.apache.org/jira/browse/BEAM-8658 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 0.5h > Remaining Estimate: 0h > > In certain network environments, port forwarding is necessary for our GRPC > servers, such as the artifact staging server. Currently, the port for > FlinkUberJarJobServer's artifact staging server is chosen randomly (0). We > will need to let the user choose it if they are to forward that port. > https://github.com/apache/beam/blob/802e7cd86024c21d7b2eeb45f0e7c8e370661610/sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server.py#L129 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-4776) Java PortableRunner should support metrics
[ https://issues.apache.org/jira/browse/BEAM-4776?focusedWorklogId=347007&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347007 ] ASF GitHub Bot logged work on BEAM-4776: Author: ASF GitHub Bot Created on: 20/Nov/19 21:12 Start Date: 20/Nov/19 21:12 Worklog Time Spent: 10m Work Description: mwalenia commented on issue #10105: [BEAM-4776] Add metrics support to Java PortableRunner URL: https://github.com/apache/beam/pull/10105#issuecomment-556402486 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347007) Time Spent: 5h (was: 4h 50m) > Java PortableRunner should support metrics > -- > > Key: BEAM-4776 > URL: https://issues.apache.org/jira/browse/BEAM-4776 > Project: Beam > Issue Type: Bug > Components: runner-core >Reporter: Eugene Kirpichov >Assignee: Michal Walenia >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > BEAM-4775 concerns adding metrics to the JobService API; the current issue is > about making PortableRunner understand them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8747) Remove Unused non-vendored Guava compile dependencies
[ https://issues.apache.org/jira/browse/BEAM-8747?focusedWorklogId=347006&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347006 ] ASF GitHub Bot logged work on BEAM-8747: Author: ASF GitHub Bot Created on: 20/Nov/19 21:10 Start Date: 20/Nov/19 21:10 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10172: [BEAM-8747] Guava dependency cleanup URL: https://github.com/apache/beam/pull/10172#issuecomment-556400195 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347006) Time Spent: 50m (was: 40m) > Remove Unused non-vendored Guava compile dependencies > - > > Key: BEAM-8747 > URL: https://issues.apache.org/jira/browse/BEAM-8747 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Major > Attachments: Guava used as fully-qualified class name.png > > Time Spent: 50m > Remaining Estimate: 0h > > [~kenn] says: > BeamModulePlugin just contains lists of versions to ease coordination across > Beam modules, but mostly does not create dependencies. Most of Beam's modules > only depend on a few things there. For example Guava is not a core > dependency, but here is where it is actually depended upon: > $ find . -name build.gradle | xargs grep library.java.guava > ./sdks/java/core/build.gradle: shadowTest library.java.guava_testlib > ./sdks/java/extensions/sql/jdbc/build.gradle: compile library.java.guava > ./sdks/java/io/google-cloud-platform/build.gradle: compile library.java.guava > ./sdks/java/io/kinesis/build.gradle: testCompile library.java.guava_testlib > These results appear to be misleading. Grepping for 'import > com.google.common', I see this as the actual state of things: > - GCP connector does not appear to actually depend on Guava in compile scope > - The Beam SQL JDBC driver does not appear to actually depend on Guava in > compile scope > - The Dataflow Java worker does depend on Guava at compile scope but has > incorrect dependencies (and it probably shouldn't) > - KinesisIO does depend on Guava at compile scope but has incorrect > dependencies (Kinesis libs have Guava on API surface so it is OK here, but > should be correctly declared) > - ZetaSQL translator does depend on Guava at compile scope but has incorrect > dependencies (ZetaSQL has it on API surface so it is OK here, but should be > correctly declared) > We used to have an analysis that prevented this class of error. > Once the errors are fixed, the guava_version is simply a version that we have > discovered that seems to work for both Kinesis and ZetaSQL, libraries we do > not control. Kinesis producer is built against 18.0. Kinesis client against > 26.0-jre. ZetaSQL against 26.0-android. > (or maybe I messed up in my analysis) > Kenn -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8793) installGcpTest task flakes
[ https://issues.apache.org/jira/browse/BEAM-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-8793: -- Summary: installGcpTest task flakes (was: :sdks:python:test-suites:direct:py3x:installGcpTest flakes) > installGcpTest task flakes > -- > > Key: BEAM-8793 > URL: https://issues.apache.org/jira/browse/BEAM-8793 > Project: Beam > Issue Type: Improvement > Components: test-failures >Reporter: Kyle Weaver >Priority: Major > > I've also seen this happen with > :sdks:python:test-suites:portable:py37:installGcpTest. > 11:01:38 > Task :sdks:python:test-suites:direct:py35:installGcpTest FAILED > 11:01:38 Obtaining > file:///home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python > 11:01:38 ERROR: Command errored out with exit status 1: > 11:01:38 command: > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/bin/python3.5 > -c 'import sys, setuptools, tokenize; sys.argv[0] = > '"'"'/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py'"'"'; > > __file__='"'"'/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py'"'"';f=getattr(tokenize, > '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', > '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' > egg_info > 11:01:38 cwd: > /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/ > 11:01:38 Complete output (37 lines): > 11:01:38 Traceback (most recent call last): > 11:01:38 File "", line 1, in > 11:01:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py", > line 264, in > 11:01:38 'test': generate_protos_first(test), > 11:01:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/__init__.py", > line 144, in setup > 11:01:38 _install_setup_requires(attrs) > 11:01:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/__init__.py", > line 139, in _install_setup_requires > 11:01:38 dist.fetch_build_eggs(dist.setup_requires) > 11:01:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/dist.py", > line 720, in fetch_build_eggs > 11:01:38 replace_conflicting=True, > 11:01:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py", > line 782, in resolve > 11:01:38 replace_conflicting=replace_conflicting > 11:01:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py", > line 1065, in best_match > 11:01:38 return self.obtain(req, installer) > 11:01:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py", > line 1077, in obtain > 11:01:38 return installer(requirement) > 11:01:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/dist.py", > line 787, in fetch_build_egg > 11:01:38 return cmd.easy_install(req) > 11:01:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py", > line 679, in easy_install > 11:01:38 return self.install_item(spec, dist.location, tmpdir, deps) > 11:01:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py", > line 705, in install_item > 11:01:38 dists = self.install_eggs(spec, download, tmpdir) > 11:01:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py", > line 855, in install_eggs > 11:01:38 return [self.install_wheel(dist_filename, tmpdir)] > 11:01:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py", > line 1073, in install_wheel > 11:01:38 os.path.dirname(destination) > 11:01:38 File "/usr/lib/python3.5/distutils/cmd.py"
[jira] [Work logged] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()
[ https://issues.apache.org/jira/browse/BEAM-8651?focusedWorklogId=347005&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347005 ] ASF GitHub Bot logged work on BEAM-8651: Author: ASF GitHub Bot Created on: 20/Nov/19 21:03 Start Date: 20/Nov/19 21:03 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10167: [BEAM-8651] Guard pickling operations with a lock to prevent race condition in module imports. URL: https://github.com/apache/beam/pull/10167#issuecomment-556392796 Thanks, @ibzib ! All tests besides Direct Runner tests passed in the previous run: https://scans.gradle.com/s/eptpl337kz4ck. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347005) Time Spent: 2h (was: 1h 50m) > Python 3 portable pipelines sometimes fail with errors in > StockUnpickler.find_class() > - > > Key: BEAM-8651 > URL: https://issues.apache.org/jira/browse/BEAM-8651 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Blocker > Fix For: 2.17.0 > > Attachments: beam8651.py > > Time Spent: 2h > Remaining Estimate: 0h > > Several Beam users [1,2] reported an error which happens on Python 3 in > StockUnpickler.find_class. > So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink > and Dataflow runners. On Dataflow runner so far I have seen this in streaming > pipelines only, which use portable SDK worker. > Typical stack trace: > {noformat} > File > "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 1148, in _create_pardo_operation > dofn_data = pickler.loads(serialized_fn) > > File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, > in loads > return dill.loads(s) > > File "python3.5/site-packages/dill/_dill.py", line 317, in loads > > return load(file, ignore) > > File "python3.5/site-packages/dill/_dill.py", line 305, in load > > obj = pik.load() > > File "python3.5/site-packages/dill/_dill.py", line 474, in find_class > > return StockUnpickler.find_class(self, module, name) > > AttributeError: Can't get attribute 'ClassName' on 'python3.5/site-packages/filename.py'> > {noformat} > According to Guenther from [1]: > {quote} > This looks exactly like a race condition that we've encountered on Python > 3.7.1: There's a bug in some older 3.7.x releases that breaks the > thread-safety of the unpickler, as concurrent unpickle threads can access a > module before it has been fully imported. See > https://bugs.python.org/issue34572 for more information. > The traceback shows a Python 3.6 venv so this could be a different issue > (the unpickle bug was introduced in version 3.7). If it's the same bug then > upgrading to Python 3.7.3 or higher should fix that issue. One potential > workaround is to ensure that all of the modules get imported during the > initialization of the sdk_worker, as this bug only affects imports done by > the unpickler. > {quote} > Opening this for visibility. Current open questions are: > 1. Find a minimal example to reproduce this issue. > 2. Figure out whether users are still affected by this issue on Python 3.7.3. > 3. Communicate a workarounds for 3.5, 3.6 users affected by this. > [1] > https://lists.apache.org/thread.html/5581ddfcf6d2ae10d25b834b8a61ebee265ffbcf650c6ec8d1e69408@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8793) :sdks:python:test-suites:direct:py3x:installGcpTest flakes
[ https://issues.apache.org/jira/browse/BEAM-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-8793: -- Description: I've also seen this happen with :sdks:python:test-suites:portable:py37:installGcpTest. 11:01:38 > Task :sdks:python:test-suites:direct:py35:installGcpTest FAILED 11:01:38 Obtaining file:///home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python 11:01:38 ERROR: Command errored out with exit status 1: 11:01:38 command: /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/bin/python3.5 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py'"'"'; __file__='"'"'/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info 11:01:38 cwd: /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/ 11:01:38 Complete output (37 lines): 11:01:38 Traceback (most recent call last): 11:01:38 File "", line 1, in 11:01:38 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py", line 264, in 11:01:38 'test': generate_protos_first(test), 11:01:38 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/__init__.py", line 144, in setup 11:01:38 _install_setup_requires(attrs) 11:01:38 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/__init__.py", line 139, in _install_setup_requires 11:01:38 dist.fetch_build_eggs(dist.setup_requires) 11:01:38 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/dist.py", line 720, in fetch_build_eggs 11:01:38 replace_conflicting=True, 11:01:38 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py", line 782, in resolve 11:01:38 replace_conflicting=replace_conflicting 11:01:38 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py", line 1065, in best_match 11:01:38 return self.obtain(req, installer) 11:01:38 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py", line 1077, in obtain 11:01:38 return installer(requirement) 11:01:38 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/dist.py", line 787, in fetch_build_egg 11:01:38 return cmd.easy_install(req) 11:01:38 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py", line 679, in easy_install 11:01:38 return self.install_item(spec, dist.location, tmpdir, deps) 11:01:38 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py", line 705, in install_item 11:01:38 dists = self.install_eggs(spec, download, tmpdir) 11:01:38 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py", line 855, in install_eggs 11:01:38 return [self.install_wheel(dist_filename, tmpdir)] 11:01:38 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py", line 1073, in install_wheel 11:01:38 os.path.dirname(destination) 11:01:38 File "/usr/lib/python3.5/distutils/cmd.py", line 336, in execute 11:01:38 util.execute(func, args, msg, dry_run=self.dry_run) 11:01:38 File "/usr/lib/python3.5/distutils/util.py", line 301, in execute 11:01:38 func(*args) 11:01:38 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/wheel.py", line 101, in install_as_egg 11:01:38 self._install_as_egg(destination_eggdir, zf) 11:01:38 File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build
[jira] [Work logged] (BEAM-8746) Allow the local job service to work from inside docker
[ https://issues.apache.org/jira/browse/BEAM-8746?focusedWorklogId=347003&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347003 ] ASF GitHub Bot logged work on BEAM-8746: Author: ASF GitHub Bot Created on: 20/Nov/19 21:01 Start Date: 20/Nov/19 21:01 Worklog Time Spent: 10m Work Description: mxm commented on pull request #10161: [BEAM-8746] Make local job service accessible from external machines URL: https://github.com/apache/beam/pull/10161#discussion_r348742690 ## File path: sdks/python/apache_beam/runners/portability/local_job_service.py ## @@ -95,7 +95,7 @@ def create_beam_job(self, preparation_id, job_name, pipeline, options): def start_grpc_server(self, port=0): self._server = grpc.server(UnboundedThreadPoolExecutor()) -port = self._server.add_insecure_port('localhost:%d' % port) +port = self._server.add_insecure_port('[::]:%d' % port) Review comment: Could this not be handled in the subclass? I think the notion of the `LocalJobServer` is not to listen on all interfaces. We could make the bind address configurable. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347003) Time Spent: 1h (was: 50m) > Allow the local job service to work from inside docker > -- > > Key: BEAM-8746 > URL: https://issues.apache.org/jira/browse/BEAM-8746 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Currently the connection is refused. It's a simple fix. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8792) Bring back the names of the runtime metrics to "runtime"
[ https://issues.apache.org/jira/browse/BEAM-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukasz Gajowy updated BEAM-8792: Status: Open (was: Triage Needed) > Bring back the names of the runtime metrics to "runtime" > > > Key: BEAM-8792 > URL: https://issues.apache.org/jira/browse/BEAM-8792 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Kamil Wasilewski >Priority: Major > > Since this PR ([https://github.com/apache/beam/pull/8941),] the names of the > runtime metrics defined in Python load tests pipelines have changed to a > combination of metrics namespace and "runtime". This made querying BigQuery > table containing the results more difficult. The goal is to bring back the > names of the metrics to "runtime" to stay concise with the previous records. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()
[ https://issues.apache.org/jira/browse/BEAM-8651?focusedWorklogId=347002&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347002 ] ASF GitHub Bot logged work on BEAM-8651: Author: ASF GitHub Bot Created on: 20/Nov/19 21:01 Start Date: 20/Nov/19 21:01 Worklog Time Spent: 10m Work Description: ibzib commented on issue #10167: [BEAM-8651] Guard pickling operations with a lock to prevent race condition in module imports. URL: https://github.com/apache/beam/pull/10167#issuecomment-556390364 Filed https://issues.apache.org/jira/browse/BEAM-8793 for the test flake. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347002) Time Spent: 1h 40m (was: 1.5h) > Python 3 portable pipelines sometimes fail with errors in > StockUnpickler.find_class() > - > > Key: BEAM-8651 > URL: https://issues.apache.org/jira/browse/BEAM-8651 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Blocker > Fix For: 2.17.0 > > Attachments: beam8651.py > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Several Beam users [1,2] reported an error which happens on Python 3 in > StockUnpickler.find_class. > So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink > and Dataflow runners. On Dataflow runner so far I have seen this in streaming > pipelines only, which use portable SDK worker. > Typical stack trace: > {noformat} > File > "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 1148, in _create_pardo_operation > dofn_data = pickler.loads(serialized_fn) > > File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, > in loads > return dill.loads(s) > > File "python3.5/site-packages/dill/_dill.py", line 317, in loads > > return load(file, ignore) > > File "python3.5/site-packages/dill/_dill.py", line 305, in load > > obj = pik.load() > > File "python3.5/site-packages/dill/_dill.py", line 474, in find_class > > return StockUnpickler.find_class(self, module, name) > > AttributeError: Can't get attribute 'ClassName' on 'python3.5/site-packages/filename.py'> > {noformat} > According to Guenther from [1]: > {quote} > This looks exactly like a race condition that we've encountered on Python > 3.7.1: There's a bug in some older 3.7.x releases that breaks the > thread-safety of the unpickler, as concurrent unpickle threads can access a > module before it has been fully imported. See > https://bugs.python.org/issue34572 for more information. > The traceback shows a Python 3.6 venv so this could be a different issue > (the unpickle bug was introduced in version 3.7). If it's the same bug then > upgrading to Python 3.7.3 or higher should fix that issue. One potential > workaround is to ensure that all of the modules get imported during the > initialization of the sdk_worker, as this bug only affects imports done by > the unpickler. > {quote} > Opening this for visibility. Current open questions are: > 1. Find a minimal example to reproduce this issue. > 2. Figure out whether users are still affected by this issue on Python 3.7.3. > 3. Communicate a workarounds for 3.5, 3.6 users affected by this. > [1] > https://lists.apache.org/thread.html/5581ddfcf6d2ae10d25b834b8a61ebee265ffbcf650c6ec8d1e69408@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8651) Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()
[ https://issues.apache.org/jira/browse/BEAM-8651?focusedWorklogId=347004&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-347004 ] ASF GitHub Bot logged work on BEAM-8651: Author: ASF GitHub Bot Created on: 20/Nov/19 21:02 Start Date: 20/Nov/19 21:02 Worklog Time Spent: 10m Work Description: ibzib commented on issue #10167: [BEAM-8651] Guard pickling operations with a lock to prevent race condition in module imports. URL: https://github.com/apache/beam/pull/10167#issuecomment-556391540 Run Python 3.5 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 347004) Time Spent: 1h 50m (was: 1h 40m) > Python 3 portable pipelines sometimes fail with errors in > StockUnpickler.find_class() > - > > Key: BEAM-8651 > URL: https://issues.apache.org/jira/browse/BEAM-8651 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Blocker > Fix For: 2.17.0 > > Attachments: beam8651.py > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Several Beam users [1,2] reported an error which happens on Python 3 in > StockUnpickler.find_class. > So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink > and Dataflow runners. On Dataflow runner so far I have seen this in streaming > pipelines only, which use portable SDK worker. > Typical stack trace: > {noformat} > File > "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", > line 1148, in _create_pardo_operation > dofn_data = pickler.loads(serialized_fn) > > File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, > in loads > return dill.loads(s) > > File "python3.5/site-packages/dill/_dill.py", line 317, in loads > > return load(file, ignore) > > File "python3.5/site-packages/dill/_dill.py", line 305, in load > > obj = pik.load() > > File "python3.5/site-packages/dill/_dill.py", line 474, in find_class > > return StockUnpickler.find_class(self, module, name) > > AttributeError: Can't get attribute 'ClassName' on 'python3.5/site-packages/filename.py'> > {noformat} > According to Guenther from [1]: > {quote} > This looks exactly like a race condition that we've encountered on Python > 3.7.1: There's a bug in some older 3.7.x releases that breaks the > thread-safety of the unpickler, as concurrent unpickle threads can access a > module before it has been fully imported. See > https://bugs.python.org/issue34572 for more information. > The traceback shows a Python 3.6 venv so this could be a different issue > (the unpickle bug was introduced in version 3.7). If it's the same bug then > upgrading to Python 3.7.3 or higher should fix that issue. One potential > workaround is to ensure that all of the modules get imported during the > initialization of the sdk_worker, as this bug only affects imports done by > the unpickler. > {quote} > Opening this for visibility. Current open questions are: > 1. Find a minimal example to reproduce this issue. > 2. Figure out whether users are still affected by this issue on Python 3.7.3. > 3. Communicate a workarounds for 3.5, 3.6 users affected by this. > [1] > https://lists.apache.org/thread.html/5581ddfcf6d2ae10d25b834b8a61ebee265ffbcf650c6ec8d1e69408@%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8793) :sdks:python:test-suites:direct:py3x:installGcpTest flakes
Kyle Weaver created BEAM-8793: - Summary: :sdks:python:test-suites:direct:py3x:installGcpTest flakes Key: BEAM-8793 URL: https://issues.apache.org/jira/browse/BEAM-8793 Project: Beam Issue Type: Improvement Components: test-failures Reporter: Kyle Weaver *11:01:38* > *Task :sdks:python:test-suites:direct:py35:installGcpTest* FAILED*11:01:38* Obtaining [file:///home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python] *11:01:38* ERROR: Command errored out with exit status 1:*11:01:38* command: /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/bin/python3.5 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py'"'"'; __file__='"'"'/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info*11:01:38* cwd: /home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/*11:01:38* Complete output (37 lines):*11:01:38* Traceback (most recent call last):*11:01:38* File "", line 1, in *11:01:38* File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/sdks/python/setup.py", line 264, in *11:01:38* 'test': generate_protos_first(test),*11:01:38* File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/__init__.py", line 144, in setup*11:01:38* _install_setup_requires(attrs)*11:01:38* File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/__init__.py", line 139, in _install_setup_requires*11:01:38* dist.fetch_build_eggs(dist.setup_requires)*11:01:38* File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/dist.py", line 720, in fetch_build_eggs*11:01:38* replace_conflicting=True,*11:01:38* File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py", line 782, in resolve*11:01:38* replace_conflicting=replace_conflicting*11:01:38* File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py", line 1065, in best_match*11:01:38* return self.obtain(req, installer)*11:01:38* File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/pkg_resources/__init__.py", line 1077, in obtain*11:01:38* return installer(requirement)*11:01:38* File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/dist.py", line 787, in fetch_build_egg*11:01:38* return cmd.easy_install(req)*11:01:38* File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py", line 679, in easy_install*11:01:38* return self.install_item(spec, dist.location, tmpdir, deps)*11:01:38* File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py", line 705, in install_item*11:01:38* dists = self.install_eggs(spec, download, tmpdir)*11:01:38* File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py", line 855, in install_eggs*11:01:38* return [self.install_wheel(dist_filename, tmpdir)]*11:01:38* File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/command/easy_install.py", line 1073, in install_wheel*11:01:38* os.path.dirname(destination)*11:01:38* File "/usr/lib/python3.5/distutils/cmd.py", line 336, in execute*11:01:38* util.execute(func, args, msg, dry_run=self.dry_run)*11:01:38* File "/usr/lib/python3.5/distutils/util.py", line 301, in execute*11:01:38* func(*args)*11:01:38* File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python35_PR/src/build/gradleenv/1398941889/lib/python3.5/site-packages/setuptools/wheel.py", line 101, in install_as_egg*11:01:38