[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality
[ https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386436&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386436 ] ASF GitHub Bot logged work on BEAM-9146: Author: ASF GitHub Bot Created on: 13/Feb/20 08:11 Start Date: 13/Feb/20 08:11 Worklog Time Spent: 10m Work Description: EDjur commented on issue #10764: [BEAM-9146] Integrate GCP Video Intelligence functionality for Python SDK URL: https://github.com/apache/beam/pull/10764#issuecomment-585603649 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386436) Time Spent: 8h 50m (was: 8h 40m) > [Python] PTransform that integrates Video Intelligence functionality > > > Key: BEAM-9146 > URL: https://issues.apache.org/jira/browse/BEAM-9146 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 8h 50m > Remaining Estimate: 0h > > The goal is to create a PTransform that integrates Google Cloud Video > Intelligence functionality [1]. > The transform should be able to take both video GCS location or video data > bytes as an input. > [1] https://cloud.google.com/video-intelligence/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality
[ https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386435&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386435 ] ASF GitHub Bot logged work on BEAM-9146: Author: ASF GitHub Bot Created on: 13/Feb/20 08:11 Start Date: 13/Feb/20 08:11 Worklog Time Spent: 10m Work Description: EDjur commented on issue #10764: [BEAM-9146] Integrate GCP Video Intelligence functionality for Python SDK URL: https://github.com/apache/beam/pull/10764#issuecomment-585603580 I cannot seem to make the tests fail locally so this is proving to be quite a struggle 🙄 But hopefully this should fix it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386435) Time Spent: 8h 40m (was: 8.5h) > [Python] PTransform that integrates Video Intelligence functionality > > > Key: BEAM-9146 > URL: https://issues.apache.org/jira/browse/BEAM-9146 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 8h 40m > Remaining Estimate: 0h > > The goal is to create a PTransform that integrates Google Cloud Video > Intelligence functionality [1]. > The transform should be able to take both video GCS location or video data > bytes as an input. > [1] https://cloud.google.com/video-intelligence/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality
[ https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386437&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386437 ] ASF GitHub Bot logged work on BEAM-9146: Author: ASF GitHub Bot Created on: 13/Feb/20 08:12 Start Date: 13/Feb/20 08:12 Worklog Time Spent: 10m Work Description: mwalenia commented on issue #10764: [BEAM-9146] Integrate GCP Video Intelligence functionality for Python SDK URL: https://github.com/apache/beam/pull/10764#issuecomment-585603857 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386437) Time Spent: 9h (was: 8h 50m) > [Python] PTransform that integrates Video Intelligence functionality > > > Key: BEAM-9146 > URL: https://issues.apache.org/jira/browse/BEAM-9146 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 9h > Remaining Estimate: 0h > > The goal is to create a PTransform that integrates Google Cloud Video > Intelligence functionality [1]. > The transform should be able to take both video GCS location or video data > bytes as an input. > [1] https://cloud.google.com/video-intelligence/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality
[ https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386440&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386440 ] ASF GitHub Bot logged work on BEAM-9146: Author: ASF GitHub Bot Created on: 13/Feb/20 08:15 Start Date: 13/Feb/20 08:15 Worklog Time Spent: 10m Work Description: mwalenia commented on issue #10764: [BEAM-9146] Integrate GCP Video Intelligence functionality for Python SDK URL: https://github.com/apache/beam/pull/10764#issuecomment-585605111 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386440) Time Spent: 9h 10m (was: 9h) > [Python] PTransform that integrates Video Intelligence functionality > > > Key: BEAM-9146 > URL: https://issues.apache.org/jira/browse/BEAM-9146 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 9h 10m > Remaining Estimate: 0h > > The goal is to create a PTransform that integrates Google Cloud Video > Intelligence functionality [1]. > The transform should be able to take both video GCS location or video data > bytes as an input. > [1] https://cloud.google.com/video-intelligence/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality
[ https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386441&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386441 ] ASF GitHub Bot logged work on BEAM-9146: Author: ASF GitHub Bot Created on: 13/Feb/20 08:16 Start Date: 13/Feb/20 08:16 Worklog Time Spent: 10m Work Description: mwalenia commented on issue #10764: [BEAM-9146] Integrate GCP Video Intelligence functionality for Python SDK URL: https://github.com/apache/beam/pull/10764#issuecomment-585603857 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386441) Time Spent: 9h 20m (was: 9h 10m) > [Python] PTransform that integrates Video Intelligence functionality > > > Key: BEAM-9146 > URL: https://issues.apache.org/jira/browse/BEAM-9146 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 9h 20m > Remaining Estimate: 0h > > The goal is to create a PTransform that integrates Google Cloud Video > Intelligence functionality [1]. > The transform should be able to take both video GCS location or video data > bytes as an input. > [1] https://cloud.google.com/video-intelligence/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9302) No space left on device - apache-beam-jenkins-7
[ https://issues.apache.org/jira/browse/BEAM-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9302: --- Status: Open (was: Triage Needed) > No space left on device - apache-beam-jenkins-7 > --- > > Key: BEAM-9302 > URL: https://issues.apache.org/jira/browse/BEAM-9302 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Michał Walenia >Priority: Blocker > > [https://builds.apache.org/job/beam_PreCommit_SQL_Commit/543/consoleFull] log > of a failed job with this error -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9302) No space left on device - apache-beam-jenkins-7
[ https://issues.apache.org/jira/browse/BEAM-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía reassigned BEAM-9302: -- Assignee: Udi Meiri > No space left on device - apache-beam-jenkins-7 > --- > > Key: BEAM-9302 > URL: https://issues.apache.org/jira/browse/BEAM-9302 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Michał Walenia >Assignee: Udi Meiri >Priority: Blocker > > [https://builds.apache.org/job/beam_PreCommit_SQL_Commit/543/consoleFull] log > of a failed job with this error -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9302) No space left on device - apache-beam-jenkins-7
[ https://issues.apache.org/jira/browse/BEAM-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía resolved BEAM-9302. Fix Version/s: Not applicable Resolution: Fixed > No space left on device - apache-beam-jenkins-7 > --- > > Key: BEAM-9302 > URL: https://issues.apache.org/jira/browse/BEAM-9302 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Michał Walenia >Assignee: Udi Meiri >Priority: Blocker > Fix For: Not applicable > > > [https://builds.apache.org/job/beam_PreCommit_SQL_Commit/543/consoleFull] log > of a failed job with this error -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9305) Support ValueProvider for BigQuerySource query string
[ https://issues.apache.org/jira/browse/BEAM-9305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036040#comment-17036040 ] Elias Djurfeldt commented on BEAM-9305: --- [~udim] is this something that we'd like to support? If so, I can implement it. > Support ValueProvider for BigQuerySource query string > - > > Key: BEAM-9305 > URL: https://issues.apache.org/jira/browse/BEAM-9305 > Project: Beam > Issue Type: New Feature > Components: io-py-gcp >Reporter: Elias Djurfeldt >Assignee: Elias Djurfeldt >Priority: Minor > > Users should be able to use ValueProviders for the query string in > BigQuerySource. > Ref: > [https://stackoverflow.com/questions/60146887/expected-eta-to-avail-pipeline-i-o-and-runtime-parameters-in-apache-beam-gcp-dat/60170614?noredirect=1#comment106464448_60170614] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9265) @RequiresTimeSortedInput does not respect allowedLateness
[ https://issues.apache.org/jira/browse/BEAM-9265?focusedWorklogId=386453&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386453 ] ASF GitHub Bot logged work on BEAM-9265: Author: ASF GitHub Bot Created on: 13/Feb/20 09:02 Start Date: 13/Feb/20 09:02 Worklog Time Spent: 10m Work Description: dmvk commented on pull request #10795: [BEAM-9265] @RequiresTimeSortedInput respects allowedLateness URL: https://github.com/apache/beam/pull/10795#discussion_r378724102 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java ## @@ -2409,6 +2410,39 @@ public void testRequiresTimeSortedInputWithTestStream() { testTimeSortedInput(numElements, pipeline.apply(stream.advanceWatermarkToInfinity())); } +@Test +@Category({ + ValidatesRunner.class, + UsesStatefulParDo.class, + UsesRequiresTimeSortedInput.class, + UsesStrictTimerOrdering.class, + UsesTestStream.class +}) +public void testRequiresTimeSortedInputWithLateDataAndAllowedLateness() { + // generate list long enough to rule out random shuffle in sorted order + int numElements = 1000; + List eventStamps = + LongStream.range(0, numElements) + .mapToObj(i -> numElements - i) + .collect(Collectors.toList()); + TestStream.Builder input = TestStream.create(VarLongCoder.of()); + for (Long stamp : eventStamps) { +input = input.addElements(TimestampedValue.of(stamp, Instant.ofEpochMilli(stamp))); +if (stamp == 100) { + // advance watermark when we have 100 remaining elements Review comment: 900 remaining elements? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386453) Time Spent: 2h 50m (was: 2h 40m) > @RequiresTimeSortedInput does not respect allowedLateness > - > > Key: BEAM-9265 > URL: https://issues.apache.org/jira/browse/BEAM-9265 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.20.0 >Reporter: Jan Lukavský >Assignee: Jan Lukavský >Priority: Major > Fix For: 2.20.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Currently, @RequiresTimeSortedInput drops data with respect to > allowedLateness, but timers are triggered without respecting it. We have to: > - drop data that is too late (after allowedLateness) > - setup timer for _minTimestamp + allowedLateness_ > - hold output watermark at _minTimestamp_ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9265) @RequiresTimeSortedInput does not respect allowedLateness
[ https://issues.apache.org/jira/browse/BEAM-9265?focusedWorklogId=386454&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386454 ] ASF GitHub Bot logged work on BEAM-9265: Author: ASF GitHub Bot Created on: 13/Feb/20 09:02 Start Date: 13/Feb/20 09:02 Worklog Time Spent: 10m Work Description: dmvk commented on pull request #10795: [BEAM-9265] @RequiresTimeSortedInput respects allowedLateness URL: https://github.com/apache/beam/pull/10795#discussion_r378725413 ## File path: runners/core-java/src/main/java/org/apache/beam/runners/core/StatefulDoFnRunner.java ## @@ -252,18 +255,29 @@ private void onSortFlushTimer(BoundedWindow window, Instant timestamp) { keep.forEach(sortBuffer::add); minStampState.write(newMinStamp); if (newMinStamp.isBefore(BoundedWindow.TIMESTAMP_MAX_VALUE)) { - setupFlushTimerAndWatermarkHold(namespace, newMinStamp); + setupFlushTimerAndWatermarkHold(namespace, window, newMinStamp); } else { clearWatermarkHold(namespace); } } - private void setupFlushTimerAndWatermarkHold(StateNamespace namespace, Instant flush) { + private void setupFlushTimerAndWatermarkHold( Review comment: Can you add a javadoc for this method? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386454) Time Spent: 2h 50m (was: 2h 40m) > @RequiresTimeSortedInput does not respect allowedLateness > - > > Key: BEAM-9265 > URL: https://issues.apache.org/jira/browse/BEAM-9265 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.20.0 >Reporter: Jan Lukavský >Assignee: Jan Lukavský >Priority: Major > Fix For: 2.20.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Currently, @RequiresTimeSortedInput drops data with respect to > allowedLateness, but timers are triggered without respecting it. We have to: > - drop data that is too late (after allowedLateness) > - setup timer for _minTimestamp + allowedLateness_ > - hold output watermark at _minTimestamp_ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9160) Update AWS SDK to support Kubernetes Pod Level Identity
[ https://issues.apache.org/jira/browse/BEAM-9160?focusedWorklogId=386477&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386477 ] ASF GitHub Bot logged work on BEAM-9160: Author: ASF GitHub Bot Created on: 13/Feb/20 09:24 Start Date: 13/Feb/20 09:24 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10846: [BEAM-9160] Removed WebIdentityTokenCredentialsProvider explicit json (de)serialization in AWS2 module URL: https://github.com/apache/beam/pull/10846 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386477) Time Spent: 3.5h (was: 3h 20m) > Update AWS SDK to support Kubernetes Pod Level Identity > --- > > Key: BEAM-9160 > URL: https://issues.apache.org/jira/browse/BEAM-9160 > Project: Beam > Issue Type: Improvement > Components: dependencies >Affects Versions: 2.17.0 >Reporter: Mohamed Noah >Priority: Major > Fix For: 2.20.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > Many organizations have started leveraging pod level identity in Kubernetes. > The current version of the AWS SDK packaged with Beam 2.17.0 is out of date > and doesn't provide native support to pod level identity access management. > > It is recommended that we introduce support to access AWS resources such as > S3 using pod level identity. > Current Version of the AWS Java SDK in Beam: > def aws_java_sdk_version = "1.11.519" > Proposed AWS Java SDK Version: > > com.amazonaws > aws-java-sdk > 1.11.710 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9160) Update AWS SDK to support Kubernetes Pod Level Identity
[ https://issues.apache.org/jira/browse/BEAM-9160?focusedWorklogId=386476&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386476 ] ASF GitHub Bot logged work on BEAM-9160: Author: ASF GitHub Bot Created on: 13/Feb/20 09:24 Start Date: 13/Feb/20 09:24 Worklog Time Spent: 10m Work Description: iemejia commented on issue #10846: [BEAM-9160] Removed WebIdentityTokenCredentialsProvider explicit json (de)serialization in AWS2 module URL: https://github.com/apache/beam/pull/10846#issuecomment-585631045 Merged manually to squash the commits This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386476) Time Spent: 3h 20m (was: 3h 10m) > Update AWS SDK to support Kubernetes Pod Level Identity > --- > > Key: BEAM-9160 > URL: https://issues.apache.org/jira/browse/BEAM-9160 > Project: Beam > Issue Type: Improvement > Components: dependencies >Affects Versions: 2.17.0 >Reporter: Mohamed Noah >Priority: Major > Fix For: 2.20.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Many organizations have started leveraging pod level identity in Kubernetes. > The current version of the AWS SDK packaged with Beam 2.17.0 is out of date > and doesn't provide native support to pod level identity access management. > > It is recommended that we introduce support to access AWS resources such as > S3 using pod level identity. > Current Version of the AWS Java SDK in Beam: > def aws_java_sdk_version = "1.11.519" > Proposed AWS Java SDK Version: > > com.amazonaws > aws-java-sdk > 1.11.710 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9292) Provide an ability to specify additional maven repositories for published POMs
[ https://issues.apache.org/jira/browse/BEAM-9292?focusedWorklogId=386481&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386481 ] ASF GitHub Bot logged work on BEAM-9292: Author: ASF GitHub Bot Created on: 13/Feb/20 09:28 Start Date: 13/Feb/20 09:28 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10832: [BEAM-9292] Provide an ability to specify additional maven repositories for published POMs URL: https://github.com/apache/beam/pull/10832 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386481) Time Spent: 1h 10m (was: 1h) > Provide an ability to specify additional maven repositories for published POMs > -- > > Key: BEAM-9292 > URL: https://issues.apache.org/jira/browse/BEAM-9292 > Project: Beam > Issue Type: Improvement > Components: build-system, io-java-kafka >Reporter: Alexey Romanenko >Assignee: Alexey Romanenko >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > To support Confluent Schema Registry, KafkaIO has a dependency on > {{io.confluent:kafka-avro-serializer}} from > https://packages.confluent.io/maven/ repository. In this case, it should add > this repository into published KafkaIO POM file. Otherwise, it will fail with > the following error during building a user pipeline: > {code} > [ERROR] Failed to execute goal on project kafka-io: Could not resolve > dependencies for project org.apache.beam.issues:kafka-io:jar:1.0.0-SNAPSHOT: > Could not find artifact io.confluent:kafka-avro-serializer:jar:5.3.2 in > central (https://repo.maven.apache.org/maven2) -> [Help 1] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9292) Provide an ability to specify additional maven repositories for published POMs
[ https://issues.apache.org/jira/browse/BEAM-9292?focusedWorklogId=386483&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386483 ] ASF GitHub Bot logged work on BEAM-9292: Author: ASF GitHub Bot Created on: 13/Feb/20 09:29 Start Date: 13/Feb/20 09:29 Worklog Time Spent: 10m Work Description: iemejia commented on issue #10832: [BEAM-9292] Provide an ability to specify additional maven repositories for published POMs URL: https://github.com/apache/beam/pull/10832#issuecomment-585632888 Merged eagerly because this is breaking the SNAPSHOTs + linkage error analysis This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386483) Time Spent: 1h 20m (was: 1h 10m) > Provide an ability to specify additional maven repositories for published POMs > -- > > Key: BEAM-9292 > URL: https://issues.apache.org/jira/browse/BEAM-9292 > Project: Beam > Issue Type: Improvement > Components: build-system, io-java-kafka >Reporter: Alexey Romanenko >Assignee: Alexey Romanenko >Priority: Major > Fix For: 2.20.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > To support Confluent Schema Registry, KafkaIO has a dependency on > {{io.confluent:kafka-avro-serializer}} from > https://packages.confluent.io/maven/ repository. In this case, it should add > this repository into published KafkaIO POM file. Otherwise, it will fail with > the following error during building a user pipeline: > {code} > [ERROR] Failed to execute goal on project kafka-io: Could not resolve > dependencies for project org.apache.beam.issues:kafka-io:jar:1.0.0-SNAPSHOT: > Could not find artifact io.confluent:kafka-avro-serializer:jar:5.3.2 in > central (https://repo.maven.apache.org/maven2) -> [Help 1] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9292) Provide an ability to specify additional maven repositories for published POMs
[ https://issues.apache.org/jira/browse/BEAM-9292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía resolved BEAM-9292. Fix Version/s: 2.20.0 Resolution: Fixed > Provide an ability to specify additional maven repositories for published POMs > -- > > Key: BEAM-9292 > URL: https://issues.apache.org/jira/browse/BEAM-9292 > Project: Beam > Issue Type: Improvement > Components: build-system, io-java-kafka >Reporter: Alexey Romanenko >Assignee: Alexey Romanenko >Priority: Major > Fix For: 2.20.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > To support Confluent Schema Registry, KafkaIO has a dependency on > {{io.confluent:kafka-avro-serializer}} from > https://packages.confluent.io/maven/ repository. In this case, it should add > this repository into published KafkaIO POM file. Otherwise, it will fail with > the following error during building a user pipeline: > {code} > [ERROR] Failed to execute goal on project kafka-io: Could not resolve > dependencies for project org.apache.beam.issues:kafka-io:jar:1.0.0-SNAPSHOT: > Could not find artifact io.confluent:kafka-avro-serializer:jar:5.3.2 in > central (https://repo.maven.apache.org/maven2) -> [Help 1] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9291) upload_graph support in Dataflow Python SDK
[ https://issues.apache.org/jira/browse/BEAM-9291?focusedWorklogId=386486&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386486 ] ASF GitHub Bot logged work on BEAM-9291: Author: ASF GitHub Bot Created on: 13/Feb/20 09:32 Start Date: 13/Feb/20 09:32 Worklog Time Spent: 10m Work Description: stankiewicz commented on issue #10829: [BEAM-9291] Upload graph option in dataflow's python sdk URL: https://github.com/apache/beam/pull/10829#issuecomment-585634276 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386486) Time Spent: 1h 50m (was: 1h 40m) > upload_graph support in Dataflow Python SDK > --- > > Key: BEAM-9291 > URL: https://issues.apache.org/jira/browse/BEAM-9291 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Radosław Stankiewicz >Assignee: Radosław Stankiewicz >Priority: Minor > Time Spent: 1h 50m > Remaining Estimate: 0h > > upload_graph option is not supported in Dataflow's Python SDK so there is no > workaround for large graphs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9265) @RequiresTimeSortedInput does not respect allowedLateness
[ https://issues.apache.org/jira/browse/BEAM-9265?focusedWorklogId=386497&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386497 ] ASF GitHub Bot logged work on BEAM-9265: Author: ASF GitHub Bot Created on: 13/Feb/20 09:49 Start Date: 13/Feb/20 09:49 Worklog Time Spent: 10m Work Description: je-ik commented on pull request #10795: [BEAM-9265] @RequiresTimeSortedInput respects allowedLateness URL: https://github.com/apache/beam/pull/10795#discussion_r378753051 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java ## @@ -2409,6 +2410,39 @@ public void testRequiresTimeSortedInputWithTestStream() { testTimeSortedInput(numElements, pipeline.apply(stream.advanceWatermarkToInfinity())); } +@Test +@Category({ + ValidatesRunner.class, + UsesStatefulParDo.class, + UsesRequiresTimeSortedInput.class, + UsesStrictTimerOrdering.class, + UsesTestStream.class +}) +public void testRequiresTimeSortedInputWithLateDataAndAllowedLateness() { + // generate list long enough to rule out random shuffle in sorted order + int numElements = 1000; + List eventStamps = + LongStream.range(0, numElements) + .mapToObj(i -> numElements - i) + .collect(Collectors.toList()); + TestStream.Builder input = TestStream.create(VarLongCoder.of()); + for (Long stamp : eventStamps) { +input = input.addElements(TimestampedValue.of(stamp, Instant.ofEpochMilli(stamp))); +if (stamp == 100) { + // advance watermark when we have 100 remaining elements Review comment: 100 because the stamp is descending. The watermark advances past the last 100 elements which should get dropped. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386497) Time Spent: 3h (was: 2h 50m) > @RequiresTimeSortedInput does not respect allowedLateness > - > > Key: BEAM-9265 > URL: https://issues.apache.org/jira/browse/BEAM-9265 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.20.0 >Reporter: Jan Lukavský >Assignee: Jan Lukavský >Priority: Major > Fix For: 2.20.0 > > Time Spent: 3h > Remaining Estimate: 0h > > Currently, @RequiresTimeSortedInput drops data with respect to > allowedLateness, but timers are triggered without respecting it. We have to: > - drop data that is too late (after allowedLateness) > - setup timer for _minTimestamp + allowedLateness_ > - hold output watermark at _minTimestamp_ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9292) Provide an ability to specify additional maven repositories for published POMs
[ https://issues.apache.org/jira/browse/BEAM-9292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Romanenko updated BEAM-9292: --- Description: To support Confluent Schema Registry, KafkaIO has a dependency on {{io.confluent:kafka-avro-serializer}} from https://packages.confluent.io/maven/ repository. In this case, it should add this repository into published KafkaIO POM file. Otherwise, it will fail with the following error during building a user pipeline: {code} [ERROR] Failed to execute goal on project kafka-io: Could not resolve dependencies for project org.apache.beam.issues:kafka-io:jar:1.0.0-SNAPSHOT: Could not find artifact io.confluent:kafka-avro-serializer:jar:5.3.2 in central (https://repo.maven.apache.org/maven2) -> [Help 1] {code} The repositories for publishing can be added by {{mavenRepositories}} argument in build script for Java configuration. For example (KafkaIO: {code} $ cat sdks/java/io/kafka/build.gradle ... applyJavaNature( ... mavenRepositories: [ [id: 'io.confluent', url: 'https://packages.confluent.io/maven/'] ] ) ... {code} It will generate the following xml code snippet in pom file of {{beam-sdks-java-io-kafka}} artifact after publishing: {code} io.confluent https://packages.confluent.io/maven/ {code} was: To support Confluent Schema Registry, KafkaIO has a dependency on {{io.confluent:kafka-avro-serializer}} from https://packages.confluent.io/maven/ repository. In this case, it should add this repository into published KafkaIO POM file. Otherwise, it will fail with the following error during building a user pipeline: {code} [ERROR] Failed to execute goal on project kafka-io: Could not resolve dependencies for project org.apache.beam.issues:kafka-io:jar:1.0.0-SNAPSHOT: Could not find artifact io.confluent:kafka-avro-serializer:jar:5.3.2 in central (https://repo.maven.apache.org/maven2) -> [Help 1] {code} > Provide an ability to specify additional maven repositories for published POMs > -- > > Key: BEAM-9292 > URL: https://issues.apache.org/jira/browse/BEAM-9292 > Project: Beam > Issue Type: Improvement > Components: build-system, io-java-kafka >Reporter: Alexey Romanenko >Assignee: Alexey Romanenko >Priority: Major > Fix For: 2.20.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > To support Confluent Schema Registry, KafkaIO has a dependency on > {{io.confluent:kafka-avro-serializer}} from > https://packages.confluent.io/maven/ repository. In this case, it should add > this repository into published KafkaIO POM file. Otherwise, it will fail with > the following error during building a user pipeline: > {code} > [ERROR] Failed to execute goal on project kafka-io: Could not resolve > dependencies for project org.apache.beam.issues:kafka-io:jar:1.0.0-SNAPSHOT: > Could not find artifact io.confluent:kafka-avro-serializer:jar:5.3.2 in > central (https://repo.maven.apache.org/maven2) -> [Help 1] > {code} > The repositories for publishing can be added by {{mavenRepositories}} > argument in build script for Java configuration. > For example (KafkaIO: > {code} > $ cat sdks/java/io/kafka/build.gradle > ... > applyJavaNature( > ... > mavenRepositories: [ > [id: 'io.confluent', url: 'https://packages.confluent.io/maven/'] > ] > ) > ... > {code} > It will generate the following xml code snippet in pom file of > {{beam-sdks-java-io-kafka}} artifact after publishing: > {code} > > > io.confluent > https://packages.confluent.io/maven/ > > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality
[ https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386507&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386507 ] ASF GitHub Bot logged work on BEAM-9146: Author: ASF GitHub Bot Created on: 13/Feb/20 10:07 Start Date: 13/Feb/20 10:07 Worklog Time Spent: 10m Work Description: EDjur commented on issue #10764: [BEAM-9146] Integrate GCP Video Intelligence functionality for Python SDK URL: https://github.com/apache/beam/pull/10764#issuecomment-585648977 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386507) Time Spent: 9.5h (was: 9h 20m) > [Python] PTransform that integrates Video Intelligence functionality > > > Key: BEAM-9146 > URL: https://issues.apache.org/jira/browse/BEAM-9146 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 9.5h > Remaining Estimate: 0h > > The goal is to create a PTransform that integrates Google Cloud Video > Intelligence functionality [1]. > The transform should be able to take both video GCS location or video data > bytes as an input. > [1] https://cloud.google.com/video-intelligence/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality
[ https://issues.apache.org/jira/browse/BEAM-9247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036100#comment-17036100 ] Kamil Wasilewski commented on BEAM-9247: I think in some cases it makes sense to accept both image and image_context in the form of tuple as an input. But, on the other hand, only few Video Intelligence and Cloud Vision features require additional parameters. In this case, video_context/image_context is always empty. What about creating two public PTransforms? The first transform could let the user specify one unique context per element and the second transform could accept the simpler form of an input (without context). Then, the user could choose between better control over the input data or simplicity. In terms of implementation, these transforms could use the same DoFn underneath. Let me know what you think about this. > [Python] PTransform that integrates Cloud Vision functionality > -- > > Key: BEAM-9247 > URL: https://issues.apache.org/jira/browse/BEAM-9247 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Kamil Wasilewski >Assignee: Elias Djurfeldt >Priority: Major > > The goal is to create a PTransform that integrates Google Cloud Vision API > functionality [1]. > [1] https://cloud.google.com/vision/docs/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9292) Provide an ability to specify additional maven repositories for published POMs
[ https://issues.apache.org/jira/browse/BEAM-9292?focusedWorklogId=386509&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386509 ] ASF GitHub Bot logged work on BEAM-9292: Author: ASF GitHub Bot Created on: 13/Feb/20 10:15 Start Date: 13/Feb/20 10:15 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on issue #10832: [BEAM-9292] Provide an ability to specify additional maven repositories for published POMs URL: https://github.com/apache/beam/pull/10832#issuecomment-585652309 Thanks, @iemejia ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386509) Time Spent: 1.5h (was: 1h 20m) > Provide an ability to specify additional maven repositories for published POMs > -- > > Key: BEAM-9292 > URL: https://issues.apache.org/jira/browse/BEAM-9292 > Project: Beam > Issue Type: Improvement > Components: build-system, io-java-kafka >Reporter: Alexey Romanenko >Assignee: Alexey Romanenko >Priority: Major > Fix For: 2.20.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > To support Confluent Schema Registry, KafkaIO has a dependency on > {{io.confluent:kafka-avro-serializer}} from > https://packages.confluent.io/maven/ repository. In this case, it should add > this repository into published KafkaIO POM file. Otherwise, it will fail with > the following error during building a user pipeline: > {code} > [ERROR] Failed to execute goal on project kafka-io: Could not resolve > dependencies for project org.apache.beam.issues:kafka-io:jar:1.0.0-SNAPSHOT: > Could not find artifact io.confluent:kafka-avro-serializer:jar:5.3.2 in > central (https://repo.maven.apache.org/maven2) -> [Help 1] > {code} > The repositories for publishing can be added by {{mavenRepositories}} > argument in build script for Java configuration. > For example (KafkaIO: > {code} > $ cat sdks/java/io/kafka/build.gradle > ... > applyJavaNature( > ... > mavenRepositories: [ > [id: 'io.confluent', url: 'https://packages.confluent.io/maven/'] > ] > ) > ... > {code} > It will generate the following xml code snippet in pom file of > {{beam-sdks-java-io-kafka}} artifact after publishing: > {code} > > > io.confluent > https://packages.confluent.io/maven/ > > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9281) Update commons-csv to version 1.8
[ https://issues.apache.org/jira/browse/BEAM-9281?focusedWorklogId=386523&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386523 ] ASF GitHub Bot logged work on BEAM-9281: Author: ASF GitHub Bot Created on: 13/Feb/20 10:30 Start Date: 13/Feb/20 10:30 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on issue #10818: [BEAM-9281] Update commons-csv to version 1.8 URL: https://github.com/apache/beam/pull/10818#issuecomment-585658529 Run PythonLint PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386523) Time Spent: 50m (was: 40m) > Update commons-csv to version 1.8 > - > > Key: BEAM-9281 > URL: https://issues.apache.org/jira/browse/BEAM-9281 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Trivial > Fix For: 2.20.0 > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9281) Update commons-csv to version 1.8
[ https://issues.apache.org/jira/browse/BEAM-9281?focusedWorklogId=386525&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386525 ] ASF GitHub Bot logged work on BEAM-9281: Author: ASF GitHub Bot Created on: 13/Feb/20 10:31 Start Date: 13/Feb/20 10:31 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on issue #10818: [BEAM-9281] Update commons-csv to version 1.8 URL: https://github.com/apache/beam/pull/10818#issuecomment-585658639 Run Python2_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386525) Time Spent: 1h 10m (was: 1h) > Update commons-csv to version 1.8 > - > > Key: BEAM-9281 > URL: https://issues.apache.org/jira/browse/BEAM-9281 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Trivial > Fix For: 2.20.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9281) Update commons-csv to version 1.8
[ https://issues.apache.org/jira/browse/BEAM-9281?focusedWorklogId=386524&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386524 ] ASF GitHub Bot logged work on BEAM-9281: Author: ASF GitHub Bot Created on: 13/Feb/20 10:31 Start Date: 13/Feb/20 10:31 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on issue #10818: [BEAM-9281] Update commons-csv to version 1.8 URL: https://github.com/apache/beam/pull/10818#issuecomment-585658584 Run PythonFormatter PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386524) Time Spent: 1h (was: 50m) > Update commons-csv to version 1.8 > - > > Key: BEAM-9281 > URL: https://issues.apache.org/jira/browse/BEAM-9281 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Trivial > Fix For: 2.20.0 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9281) Update commons-csv to version 1.8
[ https://issues.apache.org/jira/browse/BEAM-9281?focusedWorklogId=386526&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386526 ] ASF GitHub Bot logged work on BEAM-9281: Author: ASF GitHub Bot Created on: 13/Feb/20 10:31 Start Date: 13/Feb/20 10:31 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on issue #10818: [BEAM-9281] Update commons-csv to version 1.8 URL: https://github.com/apache/beam/pull/10818#issuecomment-585658720 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386526) Time Spent: 1h 20m (was: 1h 10m) > Update commons-csv to version 1.8 > - > > Key: BEAM-9281 > URL: https://issues.apache.org/jira/browse/BEAM-9281 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Trivial > Fix For: 2.20.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9298) Drop support for Flink 1.7
[ https://issues.apache.org/jira/browse/BEAM-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036121#comment-17036121 ] Maximilian Michels commented on BEAM-9298: -- +1 to removing 1.7 It's a good idea to announce this on the mailing list, prior to the removal. It'll help to communicate the change in case any users are still relying on the build. > Drop support for Flink 1.7 > --- > > Key: BEAM-9298 > URL: https://issues.apache.org/jira/browse/BEAM-9298 > Project: Beam > Issue Type: Task > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.20.0 > > > With Flink 1.10 around the corner, more detail can be found in BEAM-9295, we > should consider dropping support for Flink 1.7. Then dropping 1.7 will also > decrease the build time. > What do you think? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality
[ https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386530&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386530 ] ASF GitHub Bot logged work on BEAM-9146: Author: ASF GitHub Bot Created on: 13/Feb/20 10:33 Start Date: 13/Feb/20 10:33 Worklog Time Spent: 10m Work Description: mwalenia commented on issue #10764: [BEAM-9146] Integrate GCP Video Intelligence functionality for Python SDK URL: https://github.com/apache/beam/pull/10764#issuecomment-585659566 @EDjur your phrase triggers unfortunately won't trigger the tests - there are Jenkins restrictions that won't let you do it This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386530) Time Spent: 9h 50m (was: 9h 40m) > [Python] PTransform that integrates Video Intelligence functionality > > > Key: BEAM-9146 > URL: https://issues.apache.org/jira/browse/BEAM-9146 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 9h 50m > Remaining Estimate: 0h > > The goal is to create a PTransform that integrates Google Cloud Video > Intelligence functionality [1]. > The transform should be able to take both video GCS location or video data > bytes as an input. > [1] https://cloud.google.com/video-intelligence/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality
[ https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386529&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386529 ] ASF GitHub Bot logged work on BEAM-9146: Author: ASF GitHub Bot Created on: 13/Feb/20 10:33 Start Date: 13/Feb/20 10:33 Worklog Time Spent: 10m Work Description: mwalenia commented on issue #10764: [BEAM-9146] Integrate GCP Video Intelligence functionality for Python SDK URL: https://github.com/apache/beam/pull/10764#issuecomment-585659326 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386529) Time Spent: 9h 40m (was: 9.5h) > [Python] PTransform that integrates Video Intelligence functionality > > > Key: BEAM-9146 > URL: https://issues.apache.org/jira/browse/BEAM-9146 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 9h 40m > Remaining Estimate: 0h > > The goal is to create a PTransform that integrates Google Cloud Video > Intelligence functionality [1]. > The transform should be able to take both video GCS location or video data > bytes as an input. > [1] https://cloud.google.com/video-intelligence/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality
[ https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386532&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386532 ] ASF GitHub Bot logged work on BEAM-9146: Author: ASF GitHub Bot Created on: 13/Feb/20 10:37 Start Date: 13/Feb/20 10:37 Worklog Time Spent: 10m Work Description: EDjur commented on issue #10764: [BEAM-9146] Integrate GCP Video Intelligence functionality for Python SDK URL: https://github.com/apache/beam/pull/10764#issuecomment-585661147 > @EDjur your phrase triggers unfortunately won't trigger the tests - there are Jenkins restrictions that won't let you do it Gotcha. Thanks for triggering them! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386532) Time Spent: 10h (was: 9h 50m) > [Python] PTransform that integrates Video Intelligence functionality > > > Key: BEAM-9146 > URL: https://issues.apache.org/jira/browse/BEAM-9146 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 10h > Remaining Estimate: 0h > > The goal is to create a PTransform that integrates Google Cloud Video > Intelligence functionality [1]. > The transform should be able to take both video GCS location or video data > bytes as an input. > [1] https://cloud.google.com/video-intelligence/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality
[ https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386534&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386534 ] ASF GitHub Bot logged work on BEAM-9146: Author: ASF GitHub Bot Created on: 13/Feb/20 10:37 Start Date: 13/Feb/20 10:37 Worklog Time Spent: 10m Work Description: EDjur commented on issue #10764: [BEAM-9146] Integrate GCP Video Intelligence functionality for Python SDK URL: https://github.com/apache/beam/pull/10764#issuecomment-585603649 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386534) Time Spent: 10h 20m (was: 10h 10m) > [Python] PTransform that integrates Video Intelligence functionality > > > Key: BEAM-9146 > URL: https://issues.apache.org/jira/browse/BEAM-9146 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 10h 20m > Remaining Estimate: 0h > > The goal is to create a PTransform that integrates Google Cloud Video > Intelligence functionality [1]. > The transform should be able to take both video GCS location or video data > bytes as an input. > [1] https://cloud.google.com/video-intelligence/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality
[ https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386533&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386533 ] ASF GitHub Bot logged work on BEAM-9146: Author: ASF GitHub Bot Created on: 13/Feb/20 10:37 Start Date: 13/Feb/20 10:37 Worklog Time Spent: 10m Work Description: EDjur commented on issue #10764: [BEAM-9146] Integrate GCP Video Intelligence functionality for Python SDK URL: https://github.com/apache/beam/pull/10764#issuecomment-585648977 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386533) Time Spent: 10h 10m (was: 10h) > [Python] PTransform that integrates Video Intelligence functionality > > > Key: BEAM-9146 > URL: https://issues.apache.org/jira/browse/BEAM-9146 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 10h 10m > Remaining Estimate: 0h > > The goal is to create a PTransform that integrates Google Cloud Video > Intelligence functionality [1]. > The transform should be able to take both video GCS location or video data > bytes as an input. > [1] https://cloud.google.com/video-intelligence/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality
[ https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386535&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386535 ] ASF GitHub Bot logged work on BEAM-9146: Author: ASF GitHub Bot Created on: 13/Feb/20 10:37 Start Date: 13/Feb/20 10:37 Worklog Time Spent: 10m Work Description: EDjur commented on issue #10764: [BEAM-9146] Integrate GCP Video Intelligence functionality for Python SDK URL: https://github.com/apache/beam/pull/10764#issuecomment-585431841 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386535) Time Spent: 10.5h (was: 10h 20m) > [Python] PTransform that integrates Video Intelligence functionality > > > Key: BEAM-9146 > URL: https://issues.apache.org/jira/browse/BEAM-9146 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 10.5h > Remaining Estimate: 0h > > The goal is to create a PTransform that integrates Google Cloud Video > Intelligence functionality [1]. > The transform should be able to take both video GCS location or video data > bytes as an input. > [1] https://cloud.google.com/video-intelligence/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9281) Update commons-csv to version 1.8
[ https://issues.apache.org/jira/browse/BEAM-9281?focusedWorklogId=386540&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386540 ] ASF GitHub Bot logged work on BEAM-9281: Author: ASF GitHub Bot Created on: 13/Feb/20 10:47 Start Date: 13/Feb/20 10:47 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on issue #10818: [BEAM-9281] Update commons-csv to version 1.8 URL: https://github.com/apache/beam/pull/10818#issuecomment-585658639 Run Python2_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386540) Time Spent: 2h (was: 1h 50m) > Update commons-csv to version 1.8 > - > > Key: BEAM-9281 > URL: https://issues.apache.org/jira/browse/BEAM-9281 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Trivial > Fix For: 2.20.0 > > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9281) Update commons-csv to version 1.8
[ https://issues.apache.org/jira/browse/BEAM-9281?focusedWorklogId=386537&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386537 ] ASF GitHub Bot logged work on BEAM-9281: Author: ASF GitHub Bot Created on: 13/Feb/20 10:47 Start Date: 13/Feb/20 10:47 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on issue #10818: [BEAM-9281] Update commons-csv to version 1.8 URL: https://github.com/apache/beam/pull/10818#issuecomment-585665000 Seems like this dependency is used only in `sdks/java/extensions/sql`. Can someone from Beam SQL developers take a look? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386537) Time Spent: 1.5h (was: 1h 20m) > Update commons-csv to version 1.8 > - > > Key: BEAM-9281 > URL: https://issues.apache.org/jira/browse/BEAM-9281 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Trivial > Fix For: 2.20.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9281) Update commons-csv to version 1.8
[ https://issues.apache.org/jira/browse/BEAM-9281?focusedWorklogId=386541&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386541 ] ASF GitHub Bot logged work on BEAM-9281: Author: ASF GitHub Bot Created on: 13/Feb/20 10:47 Start Date: 13/Feb/20 10:47 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on issue #10818: [BEAM-9281] Update commons-csv to version 1.8 URL: https://github.com/apache/beam/pull/10818#issuecomment-585658720 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386541) Time Spent: 2h 10m (was: 2h) > Update commons-csv to version 1.8 > - > > Key: BEAM-9281 > URL: https://issues.apache.org/jira/browse/BEAM-9281 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Trivial > Fix For: 2.20.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9281) Update commons-csv to version 1.8
[ https://issues.apache.org/jira/browse/BEAM-9281?focusedWorklogId=386539&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386539 ] ASF GitHub Bot logged work on BEAM-9281: Author: ASF GitHub Bot Created on: 13/Feb/20 10:47 Start Date: 13/Feb/20 10:47 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on issue #10818: [BEAM-9281] Update commons-csv to version 1.8 URL: https://github.com/apache/beam/pull/10818#issuecomment-585658584 Run PythonFormatter PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386539) Time Spent: 1h 50m (was: 1h 40m) > Update commons-csv to version 1.8 > - > > Key: BEAM-9281 > URL: https://issues.apache.org/jira/browse/BEAM-9281 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Trivial > Fix For: 2.20.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9281) Update commons-csv to version 1.8
[ https://issues.apache.org/jira/browse/BEAM-9281?focusedWorklogId=386538&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386538 ] ASF GitHub Bot logged work on BEAM-9281: Author: ASF GitHub Bot Created on: 13/Feb/20 10:47 Start Date: 13/Feb/20 10:47 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on issue #10818: [BEAM-9281] Update commons-csv to version 1.8 URL: https://github.com/apache/beam/pull/10818#issuecomment-585658529 Run PythonLint PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386538) Time Spent: 1h 40m (was: 1.5h) > Update commons-csv to version 1.8 > - > > Key: BEAM-9281 > URL: https://issues.apache.org/jira/browse/BEAM-9281 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Trivial > Fix For: 2.20.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9258) [Python] PTransform that connects to Cloud DLP deidentification service
[ https://issues.apache.org/jira/browse/BEAM-9258?focusedWorklogId=386544&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386544 ] ASF GitHub Bot logged work on BEAM-9258: Author: ASF GitHub Bot Created on: 13/Feb/20 10:51 Start Date: 13/Feb/20 10:51 Worklog Time Spent: 10m Work Description: mwalenia commented on pull request #10849: [BEAM-9258] Integrate Google Cloud Data loss prevention functionality for Python SDK URL: https://github.com/apache/beam/pull/10849 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https:/
[jira] [Work logged] (BEAM-9258) [Python] PTransform that connects to Cloud DLP deidentification service
[ https://issues.apache.org/jira/browse/BEAM-9258?focusedWorklogId=386545&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386545 ] ASF GitHub Bot logged work on BEAM-9258: Author: ASF GitHub Bot Created on: 13/Feb/20 10:52 Start Date: 13/Feb/20 10:52 Worklog Time Spent: 10m Work Description: mwalenia commented on issue #10849: [BEAM-9258] Integrate Google Cloud Data loss prevention functionality for Python SDK URL: https://github.com/apache/beam/pull/10849#issuecomment-585666957 R: @aaltay Can you take a look at this? Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386545) Time Spent: 20m (was: 10m) > [Python] PTransform that connects to Cloud DLP deidentification service > --- > > Key: BEAM-9258 > URL: https://issues.apache.org/jira/browse/BEAM-9258 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Michał Walenia >Assignee: Michał Walenia >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality
[ https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386556&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386556 ] ASF GitHub Bot logged work on BEAM-9146: Author: ASF GitHub Bot Created on: 13/Feb/20 11:21 Start Date: 13/Feb/20 11:21 Worklog Time Spent: 10m Work Description: mwalenia commented on issue #10764: [BEAM-9146] Integrate GCP Video Intelligence functionality for Python SDK URL: https://github.com/apache/beam/pull/10764#issuecomment-585678000 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386556) Time Spent: 10h 40m (was: 10.5h) > [Python] PTransform that integrates Video Intelligence functionality > > > Key: BEAM-9146 > URL: https://issues.apache.org/jira/browse/BEAM-9146 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 10h 40m > Remaining Estimate: 0h > > The goal is to create a PTransform that integrates Google Cloud Video > Intelligence functionality [1]. > The transform should be able to take both video GCS location or video data > bytes as an input. > [1] https://cloud.google.com/video-intelligence/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9085) Investigate performance difference between Python 2/3 on Dataflow
[ https://issues.apache.org/jira/browse/BEAM-9085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036145#comment-17036145 ] Kamil Wasilewski commented on BEAM-9085: Thanks [~tvalentyn], great work! I'll take a look at this in the next couple of days. I'll keep you posted on any progress. > Investigate performance difference between Python 2/3 on Dataflow > - > > Key: BEAM-9085 > URL: https://issues.apache.org/jira/browse/BEAM-9085 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Kamil Wasilewski >Assignee: Valentyn Tymofieiev >Priority: Major > > Tests show that the performance of core Beam operations in Python 3.x on > Dataflow can be a few time slower than in Python 2.7. We should investigate > what's the cause of the problem. > Currently, we have one ParDo test that is run both in Py3 and Py2 [1]. A > dashboard with runtime results can be found here [2]. > [1] sdks/python/apache_beam/testing/load_tests/pardo_test.py > [2] https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9265) @RequiresTimeSortedInput does not respect allowedLateness
[ https://issues.apache.org/jira/browse/BEAM-9265?focusedWorklogId=386577&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386577 ] ASF GitHub Bot logged work on BEAM-9265: Author: ASF GitHub Bot Created on: 13/Feb/20 11:45 Start Date: 13/Feb/20 11:45 Worklog Time Spent: 10m Work Description: dmvk commented on pull request #10795: [BEAM-9265] @RequiresTimeSortedInput respects allowedLateness URL: https://github.com/apache/beam/pull/10795#discussion_r378811377 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java ## @@ -2409,6 +2410,39 @@ public void testRequiresTimeSortedInputWithTestStream() { testTimeSortedInput(numElements, pipeline.apply(stream.advanceWatermarkToInfinity())); } +@Test +@Category({ + ValidatesRunner.class, + UsesStatefulParDo.class, + UsesRequiresTimeSortedInput.class, + UsesStrictTimerOrdering.class, + UsesTestStream.class +}) +public void testRequiresTimeSortedInputWithLateDataAndAllowedLateness() { + // generate list long enough to rule out random shuffle in sorted order + int numElements = 1000; + List eventStamps = + LongStream.range(0, numElements) + .mapToObj(i -> numElements - i) + .collect(Collectors.toList()); + TestStream.Builder input = TestStream.create(VarLongCoder.of()); + for (Long stamp : eventStamps) { +input = input.addElements(TimestampedValue.of(stamp, Instant.ofEpochMilli(stamp))); +if (stamp == 100) { + // advance watermark when we have 100 remaining elements Review comment: 🤦♂ make sense, thanks for clarification This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386577) Time Spent: 3h 10m (was: 3h) > @RequiresTimeSortedInput does not respect allowedLateness > - > > Key: BEAM-9265 > URL: https://issues.apache.org/jira/browse/BEAM-9265 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.20.0 >Reporter: Jan Lukavský >Assignee: Jan Lukavský >Priority: Major > Fix For: 2.20.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Currently, @RequiresTimeSortedInput drops data with respect to > allowedLateness, but timers are triggered without respecting it. We have to: > - drop data that is too late (after allowedLateness) > - setup timer for _minTimestamp + allowedLateness_ > - hold output watermark at _minTimestamp_ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9265) @RequiresTimeSortedInput does not respect allowedLateness
[ https://issues.apache.org/jira/browse/BEAM-9265?focusedWorklogId=386578&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386578 ] ASF GitHub Bot logged work on BEAM-9265: Author: ASF GitHub Bot Created on: 13/Feb/20 11:45 Start Date: 13/Feb/20 11:45 Worklog Time Spent: 10m Work Description: dmvk commented on pull request #10795: [BEAM-9265] @RequiresTimeSortedInput respects allowedLateness URL: https://github.com/apache/beam/pull/10795#discussion_r378811377 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java ## @@ -2409,6 +2410,39 @@ public void testRequiresTimeSortedInputWithTestStream() { testTimeSortedInput(numElements, pipeline.apply(stream.advanceWatermarkToInfinity())); } +@Test +@Category({ + ValidatesRunner.class, + UsesStatefulParDo.class, + UsesRequiresTimeSortedInput.class, + UsesStrictTimerOrdering.class, + UsesTestStream.class +}) +public void testRequiresTimeSortedInputWithLateDataAndAllowedLateness() { + // generate list long enough to rule out random shuffle in sorted order + int numElements = 1000; + List eventStamps = + LongStream.range(0, numElements) + .mapToObj(i -> numElements - i) + .collect(Collectors.toList()); + TestStream.Builder input = TestStream.create(VarLongCoder.of()); + for (Long stamp : eventStamps) { +input = input.addElements(TimestampedValue.of(stamp, Instant.ofEpochMilli(stamp))); +if (stamp == 100) { + // advance watermark when we have 100 remaining elements Review comment: 🤦♂ makes sense, thanks for clarification This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386578) Time Spent: 3h 20m (was: 3h 10m) > @RequiresTimeSortedInput does not respect allowedLateness > - > > Key: BEAM-9265 > URL: https://issues.apache.org/jira/browse/BEAM-9265 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.20.0 >Reporter: Jan Lukavský >Assignee: Jan Lukavský >Priority: Major > Fix For: 2.20.0 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Currently, @RequiresTimeSortedInput drops data with respect to > allowedLateness, but timers are triggered without respecting it. We have to: > - drop data that is too late (after allowedLateness) > - setup timer for _minTimestamp + allowedLateness_ > - hold output watermark at _minTimestamp_ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9298) Drop support for Flink 1.7
[ https://issues.apache.org/jira/browse/BEAM-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036154#comment-17036154 ] sunjincheng commented on BEAM-9298: --- Thank you all! I will do it accodingly after add Flink 1.10 build target. :) > Drop support for Flink 1.7 > --- > > Key: BEAM-9298 > URL: https://issues.apache.org/jira/browse/BEAM-9298 > Project: Beam > Issue Type: Task > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.20.0 > > > With Flink 1.10 around the corner, more detail can be found in BEAM-9295, we > should consider dropping support for Flink 1.7. Then dropping 1.7 will also > decrease the build time. > What do you think? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9298) Drop support for Flink 1.7
[ https://issues.apache.org/jira/browse/BEAM-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036154#comment-17036154 ] sunjincheng edited comment on BEAM-9298 at 2/13/20 11:46 AM: - I see, thank you all! I will do it accodingly after add Flink 1.10 build target. :) was (Author: sunjincheng121): Thank you all! I will do it accodingly after add Flink 1.10 build target. :) > Drop support for Flink 1.7 > --- > > Key: BEAM-9298 > URL: https://issues.apache.org/jira/browse/BEAM-9298 > Project: Beam > Issue Type: Task > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.20.0 > > > With Flink 1.10 around the corner, more detail can be found in BEAM-9295, we > should consider dropping support for Flink 1.7. Then dropping 1.7 will also > decrease the build time. > What do you think? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9299) Upgrade Flink Runner to 1.8.3 and 1.9.2
[ https://issues.apache.org/jira/browse/BEAM-9299?focusedWorklogId=386587&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386587 ] ASF GitHub Bot logged work on BEAM-9299: Author: ASF GitHub Bot Created on: 13/Feb/20 11:53 Start Date: 13/Feb/20 11:53 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on pull request #10850: [BEAM-9299] Upgrade Flink Runner from 1.8.2 to 1.8.3 URL: https://github.com/apache/beam/pull/10850 Upgrade Flink Runner 1.8.x to 1.8.3 Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build St
[jira] [Work logged] (BEAM-9281) Update commons-csv to version 1.8
[ https://issues.apache.org/jira/browse/BEAM-9281?focusedWorklogId=386605&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386605 ] ASF GitHub Bot logged work on BEAM-9281: Author: ASF GitHub Bot Created on: 13/Feb/20 12:56 Start Date: 13/Feb/20 12:56 Worklog Time Spent: 10m Work Description: iemejia commented on issue #10818: [BEAM-9281] Update commons-csv to version 1.8 URL: https://github.com/apache/beam/pull/10818#issuecomment-585741925 The PR is passing in total 17 checks including both `Run SQL PreCommit` and `SQL Post Commit Tests`. If there is an error then we have bad test coverage, what about merging as it is? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386605) Time Spent: 2h 20m (was: 2h 10m) > Update commons-csv to version 1.8 > - > > Key: BEAM-9281 > URL: https://issues.apache.org/jira/browse/BEAM-9281 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Trivial > Fix For: 2.20.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality
[ https://issues.apache.org/jira/browse/BEAM-9247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036209#comment-17036209 ] Elias Djurfeldt commented on BEAM-9247: --- How about something like this? [https://github.com/EDjur/beam/commit/cae4f82957017df31a68de36288b74db2616fa78#diff-0cd6945eb0fdc8a93cb9d7fef0859ca0R94] This keeps a single PTransform and DoFn, but accepts {code:java} @typehints.with_input_types( Union[Tuple[Union[text_type, binary_type], vision.types.ImageContext], Union[text_type, binary_type]]){code} Defaults the `image_context` to `None` and unpacks the tuple if needed. Only concern I have is it might be confusing having a PTransform that accepts either a tuple of (image, ImageContext) or just string of image. But this reduces code duplication quite a lot. > [Python] PTransform that integrates Cloud Vision functionality > -- > > Key: BEAM-9247 > URL: https://issues.apache.org/jira/browse/BEAM-9247 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Kamil Wasilewski >Assignee: Elias Djurfeldt >Priority: Major > > The goal is to create a PTransform that integrates Google Cloud Vision API > functionality [1]. > [1] https://cloud.google.com/vision/docs/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality
[ https://issues.apache.org/jira/browse/BEAM-9247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036209#comment-17036209 ] Elias Djurfeldt edited comment on BEAM-9247 at 2/13/20 1:12 PM: How about something like this? [https://github.com/EDjur/beam/commit/cae4f82957017df31a68de36288b74db2616fa78#diff-0cd6945eb0fdc8a93cb9d7fef0859ca0R94] Have a look at the tests too to see how I envision it working. This keeps a single PTransform and DoFn, but accepts {code:java} @typehints.with_input_types( Union[Tuple[Union[text_type, binary_type], vision.types.ImageContext], Union[text_type, binary_type]]){code} Defaults the `image_context` to `None` and unpacks the tuple if needed. Only concern I have is it might be confusing having a PTransform that accepts either a tuple of (image, ImageContext) or just string of image. But this reduces code duplication quite a lot. was (Author: eliasdjur): How about something like this? [https://github.com/EDjur/beam/commit/cae4f82957017df31a68de36288b74db2616fa78#diff-0cd6945eb0fdc8a93cb9d7fef0859ca0R94] This keeps a single PTransform and DoFn, but accepts {code:java} @typehints.with_input_types( Union[Tuple[Union[text_type, binary_type], vision.types.ImageContext], Union[text_type, binary_type]]){code} Defaults the `image_context` to `None` and unpacks the tuple if needed. Only concern I have is it might be confusing having a PTransform that accepts either a tuple of (image, ImageContext) or just string of image. But this reduces code duplication quite a lot. > [Python] PTransform that integrates Cloud Vision functionality > -- > > Key: BEAM-9247 > URL: https://issues.apache.org/jira/browse/BEAM-9247 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Kamil Wasilewski >Assignee: Elias Djurfeldt >Priority: Major > > The goal is to create a PTransform that integrates Google Cloud Vision API > functionality [1]. > [1] https://cloud.google.com/vision/docs/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality
[ https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386614&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386614 ] ASF GitHub Bot logged work on BEAM-9146: Author: ASF GitHub Bot Created on: 13/Feb/20 13:19 Start Date: 13/Feb/20 13:19 Worklog Time Spent: 10m Work Description: EDjur commented on issue #10764: [BEAM-9146] Integrate GCP Video Intelligence functionality for Python SDK URL: https://github.com/apache/beam/pull/10764#issuecomment-585750091 I'm having a small discussion regarding the `video_context` with @kamilwu here: https://issues.apache.org/jira/browse/BEAM-9247 Perhaps this is something we should look at integrating to this PR before (or after considering the use-case seems small) merging too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386614) Time Spent: 10h 50m (was: 10h 40m) > [Python] PTransform that integrates Video Intelligence functionality > > > Key: BEAM-9146 > URL: https://issues.apache.org/jira/browse/BEAM-9146 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 10h 50m > Remaining Estimate: 0h > > The goal is to create a PTransform that integrates Google Cloud Video > Intelligence functionality [1]. > The transform should be able to take both video GCS location or video data > bytes as an input. > [1] https://cloud.google.com/video-intelligence/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality
[ https://issues.apache.org/jira/browse/BEAM-9247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036209#comment-17036209 ] Elias Djurfeldt edited comment on BEAM-9247 at 2/13/20 1:31 PM: How about something like this? [https://github.com/EDjur/beam/commit/cae4f82957017df31a68de36288b74db2616fa78#diff-0cd6945eb0fdc8a93cb9d7fef0859ca0R94] Have a look at the tests too to see how I envision it working. This keeps a single PTransform and DoFn, but accepts {code:java} @typehints.with_input_types( Union[Tuple[Union[text_type, binary_type], vision.types.ImageContext], Union[text_type, binary_type]]){code} Defaults the `image_context` to `None` and unpacks the tuple if needed. Only concern I have is it might be confusing having a PTransform that accepts either a tuple of (image, ImageContext) or just Union[string, bytes] of image. But this reduces code duplication quite a lot. was (Author: eliasdjur): How about something like this? [https://github.com/EDjur/beam/commit/cae4f82957017df31a68de36288b74db2616fa78#diff-0cd6945eb0fdc8a93cb9d7fef0859ca0R94] Have a look at the tests too to see how I envision it working. This keeps a single PTransform and DoFn, but accepts {code:java} @typehints.with_input_types( Union[Tuple[Union[text_type, binary_type], vision.types.ImageContext], Union[text_type, binary_type]]){code} Defaults the `image_context` to `None` and unpacks the tuple if needed. Only concern I have is it might be confusing having a PTransform that accepts either a tuple of (image, ImageContext) or just string of image. But this reduces code duplication quite a lot. > [Python] PTransform that integrates Cloud Vision functionality > -- > > Key: BEAM-9247 > URL: https://issues.apache.org/jira/browse/BEAM-9247 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Kamil Wasilewski >Assignee: Elias Djurfeldt >Priority: Major > > The goal is to create a PTransform that integrates Google Cloud Vision API > functionality [1]. > [1] https://cloud.google.com/vision/docs/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality
[ https://issues.apache.org/jira/browse/BEAM-9247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036209#comment-17036209 ] Elias Djurfeldt edited comment on BEAM-9247 at 2/13/20 1:31 PM: How about something like this? [https://github.com/EDjur/beam/commit/cae4f82957017df31a68de36288b74db2616fa78#diff-0cd6945eb0fdc8a93cb9d7fef0859ca0R94] Have a look at the tests too to see how I envision it working. This keeps a single PTransform and DoFn, but accepts {code:java} @typehints.with_input_types( Union[Tuple[Union[text_type, binary_type], vision.types.ImageContext], Union[text_type, binary_type]]){code} Defaults the `image_context` to `None` and unpacks the tuple if needed. Only concern I have is it might be confusing having a PTransform that accepts either a tuple of (Union[string, bytes], ImageContext) or just Union[string, bytes] of image. But this reduces code duplication quite a lot. was (Author: eliasdjur): How about something like this? [https://github.com/EDjur/beam/commit/cae4f82957017df31a68de36288b74db2616fa78#diff-0cd6945eb0fdc8a93cb9d7fef0859ca0R94] Have a look at the tests too to see how I envision it working. This keeps a single PTransform and DoFn, but accepts {code:java} @typehints.with_input_types( Union[Tuple[Union[text_type, binary_type], vision.types.ImageContext], Union[text_type, binary_type]]){code} Defaults the `image_context` to `None` and unpacks the tuple if needed. Only concern I have is it might be confusing having a PTransform that accepts either a tuple of (image, ImageContext) or just Union[string, bytes] of image. But this reduces code duplication quite a lot. > [Python] PTransform that integrates Cloud Vision functionality > -- > > Key: BEAM-9247 > URL: https://issues.apache.org/jira/browse/BEAM-9247 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Kamil Wasilewski >Assignee: Elias Djurfeldt >Priority: Major > > The goal is to create a PTransform that integrates Google Cloud Vision API > functionality [1]. > [1] https://cloud.google.com/vision/docs/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9281) Update commons-csv to version 1.8
[ https://issues.apache.org/jira/browse/BEAM-9281?focusedWorklogId=386638&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386638 ] ASF GitHub Bot logged work on BEAM-9281: Author: ASF GitHub Bot Created on: 13/Feb/20 13:57 Start Date: 13/Feb/20 13:57 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on pull request #10818: [BEAM-9281] Update commons-csv to version 1.8 URL: https://github.com/apache/beam/pull/10818 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386638) Time Spent: 2.5h (was: 2h 20m) > Update commons-csv to version 1.8 > - > > Key: BEAM-9281 > URL: https://issues.apache.org/jira/browse/BEAM-9281 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Trivial > Fix For: 2.20.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9281) Update commons-csv to version 1.8
[ https://issues.apache.org/jira/browse/BEAM-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Romanenko resolved BEAM-9281. Resolution: Fixed > Update commons-csv to version 1.8 > - > > Key: BEAM-9281 > URL: https://issues.apache.org/jira/browse/BEAM-9281 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Trivial > Fix For: 2.20.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality
[ https://issues.apache.org/jira/browse/BEAM-9247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036296#comment-17036296 ] Kamil Wasilewski commented on BEAM-9247: If it reduces code duplication significantly, then I'm fine with having only one PTransform. Please make sure that the PTransform has a clear documentation on what are the expected input types and what's the difference between them. > [Python] PTransform that integrates Cloud Vision functionality > -- > > Key: BEAM-9247 > URL: https://issues.apache.org/jira/browse/BEAM-9247 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Kamil Wasilewski >Assignee: Elias Djurfeldt >Priority: Major > > The goal is to create a PTransform that integrates Google Cloud Vision API > functionality [1]. > [1] https://cloud.google.com/vision/docs/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9248) [Python] PTransform that integrates Cloud Natural Language functionality
[ https://issues.apache.org/jira/browse/BEAM-9248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamil Wasilewski reassigned BEAM-9248: -- Assignee: Michał Walenia > [Python] PTransform that integrates Cloud Natural Language functionality > > > Key: BEAM-9248 > URL: https://issues.apache.org/jira/browse/BEAM-9248 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Kamil Wasilewski >Assignee: Michał Walenia >Priority: Major > > The goal is to create a PTransform that integrates Google Cloud Natural > Language API functionality [1]. > [1] https://cloud.google.com/natural-language/docs/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9301) Check in beam-linkage-check.sh
[ https://issues.apache.org/jira/browse/BEAM-9301?focusedWorklogId=386702&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386702 ] ASF GitHub Bot logged work on BEAM-9301: Author: ASF GitHub Bot Created on: 13/Feb/20 15:24 Start Date: 13/Feb/20 15:24 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10841: [BEAM-9301] Check in beam-linkage-check.sh URL: https://github.com/apache/beam/pull/10841#discussion_r378930051 ## File path: sdks/java/build-tools/beam-linkage-check.sh ## @@ -0,0 +1,99 @@ +#!/bin/bash Review comment: No issues, the shebang should cover it so ok for me This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386702) Time Spent: 40m (was: 0.5h) > Check in beam-linkage-check.sh > -- > > Key: BEAM-9301 > URL: https://issues.apache.org/jira/browse/BEAM-9301 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > https://github.com/apache/beam/pull/10769#issuecomment-584571787 > bq. @suztomo can you contribute this script maybe into Beam's build-tools > directory so we can improve it a bit for further use? > This is a temporary solution before exclusion rules in Linkage Checker > (BEAM-9206) are implemented. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.
[ https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=386705&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386705 ] ASF GitHub Bot logged work on BEAM-6522: Author: ASF GitHub Bot Created on: 13/Feb/20 15:26 Start Date: 13/Feb/20 15:26 Worklog Time Spent: 10m Work Description: lazylynx commented on pull request #10838: [BEAM-6522] [BEAM-7455] Unskip Avro IO tests that are now passing. URL: https://github.com/apache/beam/pull/10838#discussion_r378930566 ## File path: sdks/python/apache_beam/examples/fastavro_it_test.py ## @@ -181,7 +175,7 @@ def assertEqual(l, r): if l != r: raise BeamAssertException('Assertion failed: %s == %s' % (l, r)) -assertEqual(v.keys(), ['avro', 'fastavro']) +assertEqual(list(v.keys()), ['avro', 'fastavro']) Review comment: We should use `sorted` instead of `list` if tests run in Python 2.7/3.5/3.6. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386705) Time Spent: 10h (was: 9h 50m) > Dill fails to pickle avro.RecordSchema classes on Python 3. > > > Key: BEAM-6522 > URL: https://issues.apache.org/jira/browse/BEAM-6522 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Valentyn Tymofieiev >Priority: Major > Fix For: 2.16.0 > > Time Spent: 10h > Remaining Estimate: 0h > > The avroio module still has 4 failing tests. This is actually 2 times the > same 2 tests, both for Avro and Fastavro. > *apache_beam.io.avroio_test.TestAvro.test_sink_transform* > *apache_beam.io.avroio_test.TestFastAvro.test_sink_transform* > fail with: > {code:java} > Traceback (most recent call last): > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", > line 432, in test_sink_transform > | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line > 112, in __or__ > return self.pipeline.apply(ptransform, self) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line > 515, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line > 528, in expand > return pcoll | beam.io.iobase.Write(self._sink) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line > 112, in __or__ > return self.pipeline.apply(ptransform, self) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line > 515, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line > 960, in expand > return pcoll | WriteImpl(self.sink) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line > 112, in __or__ > return self.pipeline.apply(ptransform, self) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line > 515, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line > 979, in expand > lambda _, sink: sink.initialize_write(), self.sink) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", > line 1103, in Map > pardo = FlatMap(wrapper, *args, **kwargs) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", > line 1054, in FlatMap > pardo = ParDo(Callable
[jira] [Work logged] (BEAM-9301) Check in beam-linkage-check.sh
[ https://issues.apache.org/jira/browse/BEAM-9301?focusedWorklogId=386712&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386712 ] ASF GitHub Bot logged work on BEAM-9301: Author: ASF GitHub Bot Created on: 13/Feb/20 15:31 Start Date: 13/Feb/20 15:31 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #10841: [BEAM-9301] Check in beam-linkage-check.sh URL: https://github.com/apache/beam/pull/10841 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386712) Time Spent: 50m (was: 40m) > Check in beam-linkage-check.sh > -- > > Key: BEAM-9301 > URL: https://issues.apache.org/jira/browse/BEAM-9301 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > https://github.com/apache/beam/pull/10769#issuecomment-584571787 > bq. @suztomo can you contribute this script maybe into Beam's build-tools > directory so we can improve it a bit for further use? > This is a temporary solution before exclusion rules in Linkage Checker > (BEAM-9206) are implemented. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9301) Check in beam-linkage-check.sh
[ https://issues.apache.org/jira/browse/BEAM-9301?focusedWorklogId=386713&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386713 ] ASF GitHub Bot logged work on BEAM-9301: Author: ASF GitHub Bot Created on: 13/Feb/20 15:32 Start Date: 13/Feb/20 15:32 Worklog Time Spent: 10m Work Description: iemejia commented on issue #10841: [BEAM-9301] Check in beam-linkage-check.sh URL: https://github.com/apache/beam/pull/10841#issuecomment-585817360 Thanks it will be really useful to have this here so we can evolve it together if needed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386713) Time Spent: 1h (was: 50m) > Check in beam-linkage-check.sh > -- > > Key: BEAM-9301 > URL: https://issues.apache.org/jira/browse/BEAM-9301 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > https://github.com/apache/beam/pull/10769#issuecomment-584571787 > bq. @suztomo can you contribute this script maybe into Beam's build-tools > directory so we can improve it a bit for further use? > This is a temporary solution before exclusion rules in Linkage Checker > (BEAM-9206) are implemented. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9301) Check in beam-linkage-check.sh
[ https://issues.apache.org/jira/browse/BEAM-9301?focusedWorklogId=386714&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386714 ] ASF GitHub Bot logged work on BEAM-9301: Author: ASF GitHub Bot Created on: 13/Feb/20 15:33 Start Date: 13/Feb/20 15:33 Worklog Time Spent: 10m Work Description: iemejia commented on issue #10841: [BEAM-9301] Check in beam-linkage-check.sh URL: https://github.com/apache/beam/pull/10841#issuecomment-585817896 Oups first suggestion of improvement detect if master is checked eagerly to not waste time before failing This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386714) Time Spent: 1h 10m (was: 1h) > Check in beam-linkage-check.sh > -- > > Key: BEAM-9301 > URL: https://issues.apache.org/jira/browse/BEAM-9301 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > https://github.com/apache/beam/pull/10769#issuecomment-584571787 > bq. @suztomo can you contribute this script maybe into Beam's build-tools > directory so we can improve it a bit for further use? > This is a temporary solution before exclusion rules in Linkage Checker > (BEAM-9206) are implemented. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9291) upload_graph support in Dataflow Python SDK
[ https://issues.apache.org/jira/browse/BEAM-9291?focusedWorklogId=386717&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386717 ] ASF GitHub Bot logged work on BEAM-9291: Author: ASF GitHub Bot Created on: 13/Feb/20 15:41 Start Date: 13/Feb/20 15:41 Worklog Time Spent: 10m Work Description: stankiewicz commented on issue #10829: [BEAM-9291] Upload graph option in dataflow's python sdk URL: https://github.com/apache/beam/pull/10829#issuecomment-585821888 I've squashed commits. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386717) Time Spent: 2h 10m (was: 2h) > upload_graph support in Dataflow Python SDK > --- > > Key: BEAM-9291 > URL: https://issues.apache.org/jira/browse/BEAM-9291 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Radosław Stankiewicz >Assignee: Radosław Stankiewicz >Priority: Minor > Time Spent: 2h 10m > Remaining Estimate: 0h > > upload_graph option is not supported in Dataflow's Python SDK so there is no > workaround for large graphs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.
[ https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=386718&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386718 ] ASF GitHub Bot logged work on BEAM-6522: Author: ASF GitHub Bot Created on: 13/Feb/20 15:41 Start Date: 13/Feb/20 15:41 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #10838: [BEAM-6522] [BEAM-7455] Unskip Avro IO tests that are now passing. URL: https://github.com/apache/beam/pull/10838#discussion_r378941516 ## File path: sdks/python/apache_beam/examples/fastavro_it_test.py ## @@ -181,7 +175,7 @@ def assertEqual(l, r): if l != r: raise BeamAssertException('Assertion failed: %s == %s' % (l, r)) -assertEqual(v.keys(), ['avro', 'fastavro']) +assertEqual(list(v.keys()), ['avro', 'fastavro']) Review comment: Good point, fixed. Thanks! PTAL. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386718) Time Spent: 10h 10m (was: 10h) > Dill fails to pickle avro.RecordSchema classes on Python 3. > > > Key: BEAM-6522 > URL: https://issues.apache.org/jira/browse/BEAM-6522 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Valentyn Tymofieiev >Priority: Major > Fix For: 2.16.0 > > Time Spent: 10h 10m > Remaining Estimate: 0h > > The avroio module still has 4 failing tests. This is actually 2 times the > same 2 tests, both for Avro and Fastavro. > *apache_beam.io.avroio_test.TestAvro.test_sink_transform* > *apache_beam.io.avroio_test.TestFastAvro.test_sink_transform* > fail with: > {code:java} > Traceback (most recent call last): > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", > line 432, in test_sink_transform > | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line > 112, in __or__ > return self.pipeline.apply(ptransform, self) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line > 515, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line > 528, in expand > return pcoll | beam.io.iobase.Write(self._sink) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line > 112, in __or__ > return self.pipeline.apply(ptransform, self) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line > 515, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line > 960, in expand > return pcoll | WriteImpl(self.sink) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line > 112, in __or__ > return self.pipeline.apply(ptransform, self) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line > 515, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line > 979, in expand > lambda _, sink: sink.initialize_write(), self.sink) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", > line 1103, in Map > pardo = FlatMap(wrapper, *args, **kwargs) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", > line 1054, in FlatMap > pardo = ParDo(CallableWrapperDoFn(fn), *args, **kwargs) > Fi
[jira] [Work logged] (BEAM-9291) upload_graph support in Dataflow Python SDK
[ https://issues.apache.org/jira/browse/BEAM-9291?focusedWorklogId=386716&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386716 ] ASF GitHub Bot logged work on BEAM-9291: Author: ASF GitHub Bot Created on: 13/Feb/20 15:41 Start Date: 13/Feb/20 15:41 Worklog Time Spent: 10m Work Description: stankiewicz commented on issue #10829: [BEAM-9291] Upload graph option in dataflow's python sdk URL: https://github.com/apache/beam/pull/10829#issuecomment-585821810 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386716) Time Spent: 2h (was: 1h 50m) > upload_graph support in Dataflow Python SDK > --- > > Key: BEAM-9291 > URL: https://issues.apache.org/jira/browse/BEAM-9291 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Radosław Stankiewicz >Assignee: Radosław Stankiewicz >Priority: Minor > Time Spent: 2h > Remaining Estimate: 0h > > upload_graph option is not supported in Dataflow's Python SDK so there is no > workaround for large graphs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.
[ https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=386721&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386721 ] ASF GitHub Bot logged work on BEAM-6522: Author: ASF GitHub Bot Created on: 13/Feb/20 15:54 Start Date: 13/Feb/20 15:54 Worklog Time Spent: 10m Work Description: lazylynx commented on issue #10838: [BEAM-6522] [BEAM-7455] Unskip Avro IO tests that are now passing. URL: https://github.com/apache/beam/pull/10838#issuecomment-585829022 LGTM! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386721) Time Spent: 10h 20m (was: 10h 10m) > Dill fails to pickle avro.RecordSchema classes on Python 3. > > > Key: BEAM-6522 > URL: https://issues.apache.org/jira/browse/BEAM-6522 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Valentyn Tymofieiev >Priority: Major > Fix For: 2.16.0 > > Time Spent: 10h 20m > Remaining Estimate: 0h > > The avroio module still has 4 failing tests. This is actually 2 times the > same 2 tests, both for Avro and Fastavro. > *apache_beam.io.avroio_test.TestAvro.test_sink_transform* > *apache_beam.io.avroio_test.TestFastAvro.test_sink_transform* > fail with: > {code:java} > Traceback (most recent call last): > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", > line 432, in test_sink_transform > | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line > 112, in __or__ > return self.pipeline.apply(ptransform, self) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line > 515, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line > 528, in expand > return pcoll | beam.io.iobase.Write(self._sink) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line > 112, in __or__ > return self.pipeline.apply(ptransform, self) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line > 515, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line > 960, in expand > return pcoll | WriteImpl(self.sink) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line > 112, in __or__ > return self.pipeline.apply(ptransform, self) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line > 515, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line > 979, in expand > lambda _, sink: sink.initialize_write(), self.sink) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", > line 1103, in Map > pardo = FlatMap(wrapper, *args, **kwargs) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", > line 1054, in FlatMap > pardo = ParDo(CallableWrapperDoFn(fn), *args, **kwargs) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", > line 864, in __init__ > super(ParDo, self).__init__(fn, *args, **kwargs) > File > "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/ptransform.py", > line 646, in __init__ > self.args = pickler.loads(pickler.dumps(self.args)) > File > "/home/robbe/workspace/beam/sdks/python/apache_beam/internal/pickle
[jira] [Created] (BEAM-9307) new spark runner: put windows inside the key to avoid having all values for the same key in memory
Etienne Chauchot created BEAM-9307: -- Summary: new spark runner: put windows inside the key to avoid having all values for the same key in memory Key: BEAM-9307 URL: https://issues.apache.org/jira/browse/BEAM-9307 Project: Beam Issue Type: Improvement Components: runner-spark Reporter: Etienne Chauchot Like it was done for the current runner. See: [https://www.youtube.com/watch?v=ZIFtmx8nBow&t=721s] min 10 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9308) Optimize state cleanup at end-of-window
Steve Niemitz created BEAM-9308: --- Summary: Optimize state cleanup at end-of-window Key: BEAM-9308 URL: https://issues.apache.org/jira/browse/BEAM-9308 Project: Beam Issue Type: Improvement Components: runner-dataflow Reporter: Steve Niemitz Assignee: Steve Niemitz When using state with a large keyspace, you can end up with a large amount of state cleanup timers set to fire all 1ms after the end of a window. This can cause a momentary (I've observed 1-3 minute) lag in processing while windmill and the java harness fire and process these cleanup timers. By spreading the firing over a short period after the end of the window, we can decorrelate the firing of the timers and smooth the load out, resulting in much less impact from state cleanup. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.
[ https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=386734&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386734 ] ASF GitHub Bot logged work on BEAM-6522: Author: ASF GitHub Bot Created on: 13/Feb/20 16:33 Start Date: 13/Feb/20 16:33 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #10838: [BEAM-6522] [BEAM-7455] Unskip Avro IO tests that are now passing. URL: https://github.com/apache/beam/pull/10838 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386734) Time Spent: 10.5h (was: 10h 20m) > Dill fails to pickle avro.RecordSchema classes on Python 3. > > > Key: BEAM-6522 > URL: https://issues.apache.org/jira/browse/BEAM-6522 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Valentyn Tymofieiev >Priority: Major > Fix For: 2.16.0 > > Time Spent: 10.5h > Remaining Estimate: 0h > > The avroio module still has 4 failing tests. This is actually 2 times the > same 2 tests, both for Avro and Fastavro. > *apache_beam.io.avroio_test.TestAvro.test_sink_transform* > *apache_beam.io.avroio_test.TestFastAvro.test_sink_transform* > fail with: > {code:java} > Traceback (most recent call last): > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", > line 432, in test_sink_transform > | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line > 112, in __or__ > return self.pipeline.apply(ptransform, self) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line > 515, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line > 528, in expand > return pcoll | beam.io.iobase.Write(self._sink) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line > 112, in __or__ > return self.pipeline.apply(ptransform, self) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line > 515, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line > 960, in expand > return pcoll | WriteImpl(self.sink) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line > 112, in __or__ > return self.pipeline.apply(ptransform, self) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line > 515, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line > 979, in expand > lambda _, sink: sink.initialize_write(), self.sink) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", > line 1103, in Map > pardo = FlatMap(wrapper, *args, **kwargs) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", > line 1054, in FlatMap > pardo = ParDo(CallableWrapperDoFn(fn), *args, **kwargs) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", > line 864, in __init__ > super(ParDo, self).__init__(fn, *args, **kwargs) > File > "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/ptransform.py", > line 646, in __init__ > self.args = pickler.loads(pickler.dumps(self.args)) > File > "/home/robbe/workspace/beam/sdks/python/apache_beam/internal/pickler.py", > line 247, in l
[jira] [Commented] (BEAM-7455) Improve Avro IO integration test coverage on Python 3.
[ https://issues.apache.org/jira/browse/BEAM-7455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036345#comment-17036345 ] Valentyn Tymofieiev commented on BEAM-7455: --- With #9466, Py3 integration test coverage of Avro IO matches Py2 test coverage. This is done via fastavro_it_test.py, which runs a pipeline using Avro and FastAvro. If/we drop avro dependency, we may need to update this test. > Improve Avro IO integration test coverage on Python 3. > -- > > Key: BEAM-7455 > URL: https://issues.apache.org/jira/browse/BEAM-7455 > Project: Beam > Issue Type: Sub-task > Components: io-py-avro >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 6h 10m > Remaining Estimate: 0h > > It seems that we don't have an integration test for Avro IO on Python 3: > fastavro_it_test [1] depends on both avro and fastavro, however avro package > currently does not work with Beam on Python 3, so we don't have an > integration test that exercises Avro IO on Python 3. > We should add an integration test for Avro IO that does not need both > libraries at the same time, and instead can run using either library. > [~frederik] is this something you could help with? > cc: [~chamikara] [~Juta] > [1] > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/fastavro_it_test.py -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-7455) Improve Avro IO integration test coverage on Python 3.
[ https://issues.apache.org/jira/browse/BEAM-7455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev resolved BEAM-7455. --- Fix Version/s: Not applicable Resolution: Fixed > Improve Avro IO integration test coverage on Python 3. > -- > > Key: BEAM-7455 > URL: https://issues.apache.org/jira/browse/BEAM-7455 > Project: Beam > Issue Type: Sub-task > Components: io-py-avro >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Fix For: Not applicable > > Time Spent: 6h 10m > Remaining Estimate: 0h > > It seems that we don't have an integration test for Avro IO on Python 3: > fastavro_it_test [1] depends on both avro and fastavro, however avro package > currently does not work with Beam on Python 3, so we don't have an > integration test that exercises Avro IO on Python 3. > We should add an integration test for Avro IO that does not need both > libraries at the same time, and instead can run using either library. > [~frederik] is this something you could help with? > cc: [~chamikara] [~Juta] > [1] > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/fastavro_it_test.py -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9291) upload_graph support in Dataflow Python SDK
[ https://issues.apache.org/jira/browse/BEAM-9291?focusedWorklogId=386739&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386739 ] ASF GitHub Bot logged work on BEAM-9291: Author: ASF GitHub Bot Created on: 13/Feb/20 16:44 Start Date: 13/Feb/20 16:44 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10829: [BEAM-9291] Upload graph option in dataflow's python sdk URL: https://github.com/apache/beam/pull/10829#issuecomment-585853970 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386739) Time Spent: 2h 20m (was: 2h 10m) > upload_graph support in Dataflow Python SDK > --- > > Key: BEAM-9291 > URL: https://issues.apache.org/jira/browse/BEAM-9291 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Radosław Stankiewicz >Assignee: Radosław Stankiewicz >Priority: Minor > Time Spent: 2h 20m > Remaining Estimate: 0h > > upload_graph option is not supported in Dataflow's Python SDK so there is no > workaround for large graphs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=386738&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386738 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 13/Feb/20 16:44 Start Date: 13/Feb/20 16:44 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-585853706 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386738) Time Spent: 21h 20m (was: 21h 10m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 21h 20m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=386741&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386741 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 13/Feb/20 16:49 Start Date: 13/Feb/20 16:49 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-585856253 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386741) Time Spent: 21.5h (was: 21h 20m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 21.5h > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9291) upload_graph support in Dataflow Python SDK
[ https://issues.apache.org/jira/browse/BEAM-9291?focusedWorklogId=386742&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386742 ] ASF GitHub Bot logged work on BEAM-9291: Author: ASF GitHub Bot Created on: 13/Feb/20 16:49 Start Date: 13/Feb/20 16:49 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10829: [BEAM-9291] Upload graph option in dataflow's python sdk URL: https://github.com/apache/beam/pull/10829#issuecomment-585856275 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386742) Time Spent: 2.5h (was: 2h 20m) > upload_graph support in Dataflow Python SDK > --- > > Key: BEAM-9291 > URL: https://issues.apache.org/jira/browse/BEAM-9291 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Radosław Stankiewicz >Assignee: Radosław Stankiewicz >Priority: Minor > Time Spent: 2.5h > Remaining Estimate: 0h > > upload_graph option is not supported in Dataflow's Python SDK so there is no > workaround for large graphs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality
[ https://issues.apache.org/jira/browse/BEAM-9247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036348#comment-17036348 ] Ahmet Altay commented on BEAM-9247: --- I think this will be a confusing API. I will suggest 2 PTransforms of the form # input type : Union[text_type, binary_type], vision.types.ImageContext] # input type: Union[text_type, binary_type] and a side input with: vision.types.ImageContext (side input could be provided statically or dynamically.) If it is possible to build a unified PTransform with reduced code, offering the above two could be done with minimal wrapper code on top of the first. > [Python] PTransform that integrates Cloud Vision functionality > -- > > Key: BEAM-9247 > URL: https://issues.apache.org/jira/browse/BEAM-9247 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Kamil Wasilewski >Assignee: Elias Djurfeldt >Priority: Major > > The goal is to create a PTransform that integrates Google Cloud Vision API > functionality [1]. > [1] https://cloud.google.com/vision/docs/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9308) Optimize state cleanup at end-of-window
[ https://issues.apache.org/jira/browse/BEAM-9308?focusedWorklogId=386743&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386743 ] ASF GitHub Bot logged work on BEAM-9308: Author: ASF GitHub Bot Created on: 13/Feb/20 16:50 Start Date: 13/Feb/20 16:50 Worklog Time Spent: 10m Work Description: steveniemitz commented on pull request #10852: [BEAM-9308] Decorrelate state cleanup timers URL: https://github.com/apache/beam/pull/10852 In our larger streaming pipelines, we generally observe a short blip (1-3 minutes) in event processing, as well as an increase in lag following window closing. One reason for this is the state cleanup timers all firing once a window closes. We've been running this PR in our dev environment for a few days now, and the results are impressive. By decorrelating (jittering) the state cleanup timer, we spread the timer load across a short period of time, with the trade-off of holding state for a slightly longer period of time. In practice though, I've actually noticed our state cleans up QUICKER with this change, because the timers don't all compete with each other. I'd like to contribute this back (and could modify the core StatefulDoFn runner as well) if we agree this is something useful. There's a couple points for discussion: - I chose 3 minutes arbitrarily based on some experimentation, should this be configurable somehow? - I use the "user" key (from their KV input) to derive a consistent jitter amount. The only real reason for this is to prevent the timer from moving around each element (if we used just a random amount each time instead). I'm not sure if this actually matters in practice, since timers are supposed to be cheap to reset? - I added a counter which has been useful for debugging (and seeing how many keys are active each window), but could be removed. Interested to hear thoughts here. Here's a before and after of our pubsub latency: before: ![image](https://user-images.githubusercontent.com/1882981/74457778-bb9cf080-4e56-11ea-900c-69f2a4a28613.png) after: ![image](https://user-images.githubusercontent.com/1882981/74457812-c788b280-4e56-11ea-801e-a4b69a84a10b.png) Based on the counter I added, we're firing ~20 million timers, across 50 workers = ~400,000 timers / worker. So rather than having them all fire in one shot, we can spread them over 3 minutes, for only ~2,000 timers / sec, which is much more reasonable. cc @lukecwik @pabloem Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/j
[jira] [Work logged] (BEAM-9258) [Python] PTransform that connects to Cloud DLP deidentification service
[ https://issues.apache.org/jira/browse/BEAM-9258?focusedWorklogId=386759&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386759 ] ASF GitHub Bot logged work on BEAM-9258: Author: ASF GitHub Bot Created on: 13/Feb/20 17:10 Start Date: 13/Feb/20 17:10 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #10849: [BEAM-9258] Integrate Google Cloud Data loss prevention functionality for Python SDK URL: https://github.com/apache/beam/pull/10849#discussion_r378995313 ## File path: sdks/python/apache_beam/ml/gcp/cloud_dlp.py ## @@ -0,0 +1,224 @@ +# /* +# * Licensed to the Apache Software Foundation (ASF) under one +# * or more contributor license agreements. See the NOTICE file +# * distributed with this work for additional information +# * regarding copyright ownership. The ASF licenses this file +# * to you under the Apache License, Version 2.0 (the +# * "License"); you may not use this file except in compliance +# * with the License. You may obtain a copy of the License at +# * +# * http://www.apache.org/licenses/LICENSE-2.0 +# * +# * Unless required by applicable law or agreed to in writing, software +# * distributed under the License is distributed on an "AS IS" BASIS, +# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# * See the License for the specific language governing permissions and +# * limitations under the License. +# */ + +"""``PTransforms`` that implement Google Cloud Data Loss Prevention +functionality. +""" + +from __future__ import absolute_import + +import logging + +from google.cloud import dlp_v2 + +import apache_beam as beam +from apache_beam.utils import retry +from apache_beam.utils.annotations import experimental + +__all__ = ['MaskDetectedDetails', 'InspectForDetails'] + +_LOGGER = logging.getLogger(__name__) + + +@experimental() +class MaskDetectedDetails(beam.PTransform): + """Scrubs sensitive information detected in text. + The ``PTransform`` returns a ``PCollection`` of ``str`` + Example usage:: +pipeline | MaskDetectedDetails(project='example-gcp-project', + deidentification_config={ + 'info_type_transformations: { + 'transformations': [{ + 'primitive_transformation': { + 'character_mask_config': { + 'masking_character': '#' + } + } + }] + } + }, inspection_config={'info_types': [{'name': 'EMAIL_ADDRESS'}]}) + """ + def __init__( + self, + project=None, + deidentification_template_name=None, + deidentification_config=None, + inspection_template_name=None, + inspection_config=None, + timeout=None): +"""Initializes a :class:`MaskDetectedDetails` transform. +Args: + project (str): Required. GCP project in which the data processing is +to be done + deidentification_template_name (str): Either this or +`deidentification_config` required. Name of +deidentification template to be used on detected sensitive information +instances in text. + deidentification_config +(``Union[dict, google.cloud.dlp_v2.types.DeidentifyConfig]``): +Configuration for the de-identification of the content item. + inspection_template_name (str): This or `inspection_config` required. +Name of inspection template to be used +to detect sensitive data in text. + inspection_config +(``Union[dict, google.cloud.dlp_v2.types.InspectConfig]``): +Configuration for the inspector used to detect sensitive data in text. + timeout (float): Optional. The amount of time, in seconds, to wait for +the request to complete. +""" +self.config = {} +self.project = project +self.timeout = timeout +if project is None: + raise ValueError( + 'GCP project name needs to be specified in "project" property') +if deidentification_template_name is not None \ +and deidentification_config is not None: + raise ValueError( + 'Both deidentification_template_name and ' + 'deidentification_config were specified.' + ' Please specify only one of these.') +elif deidentification_template_name is None \ +and deidentification_config is None: + raise ValueError( + 'deidentification_template_name or ' + 'deidentification_config must be specified.') +elif deidentification_template_name is not None: + self.config['deidentify_template_name'] = deidentification_template_name +else: + self.config['deidentify_config'] = deidentification_config + +if inspection_template_name is not None and inspection_config is not None: + raise ValueError( + 'Both inspe
[jira] [Work logged] (BEAM-9258) [Python] PTransform that connects to Cloud DLP deidentification service
[ https://issues.apache.org/jira/browse/BEAM-9258?focusedWorklogId=386760&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386760 ] ASF GitHub Bot logged work on BEAM-9258: Author: ASF GitHub Bot Created on: 13/Feb/20 17:10 Start Date: 13/Feb/20 17:10 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #10849: [BEAM-9258] Integrate Google Cloud Data loss prevention functionality for Python SDK URL: https://github.com/apache/beam/pull/10849#discussion_r378994491 ## File path: sdks/python/apache_beam/ml/gcp/cloud_dlp.py ## @@ -0,0 +1,224 @@ +# /* +# * Licensed to the Apache Software Foundation (ASF) under one +# * or more contributor license agreements. See the NOTICE file +# * distributed with this work for additional information +# * regarding copyright ownership. The ASF licenses this file +# * to you under the Apache License, Version 2.0 (the +# * "License"); you may not use this file except in compliance +# * with the License. You may obtain a copy of the License at +# * +# * http://www.apache.org/licenses/LICENSE-2.0 +# * +# * Unless required by applicable law or agreed to in writing, software +# * distributed under the License is distributed on an "AS IS" BASIS, +# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# * See the License for the specific language governing permissions and +# * limitations under the License. +# */ + +"""``PTransforms`` that implement Google Cloud Data Loss Prevention +functionality. +""" + +from __future__ import absolute_import + +import logging + +from google.cloud import dlp_v2 + +import apache_beam as beam +from apache_beam.utils import retry +from apache_beam.utils.annotations import experimental + +__all__ = ['MaskDetectedDetails', 'InspectForDetails'] + +_LOGGER = logging.getLogger(__name__) + + +@experimental() +class MaskDetectedDetails(beam.PTransform): + """Scrubs sensitive information detected in text. + The ``PTransform`` returns a ``PCollection`` of ``str`` + Example usage:: +pipeline | MaskDetectedDetails(project='example-gcp-project', + deidentification_config={ + 'info_type_transformations: { + 'transformations': [{ + 'primitive_transformation': { + 'character_mask_config': { + 'masking_character': '#' + } + } + }] + } + }, inspection_config={'info_types': [{'name': 'EMAIL_ADDRESS'}]}) + """ + def __init__( + self, + project=None, + deidentification_template_name=None, + deidentification_config=None, + inspection_template_name=None, + inspection_config=None, + timeout=None): +"""Initializes a :class:`MaskDetectedDetails` transform. +Args: + project (str): Required. GCP project in which the data processing is +to be done + deidentification_template_name (str): Either this or +`deidentification_config` required. Name of +deidentification template to be used on detected sensitive information +instances in text. + deidentification_config +(``Union[dict, google.cloud.dlp_v2.types.DeidentifyConfig]``): +Configuration for the de-identification of the content item. + inspection_template_name (str): This or `inspection_config` required. +Name of inspection template to be used +to detect sensitive data in text. + inspection_config +(``Union[dict, google.cloud.dlp_v2.types.InspectConfig]``): +Configuration for the inspector used to detect sensitive data in text. + timeout (float): Optional. The amount of time, in seconds, to wait for +the request to complete. +""" +self.config = {} +self.project = project +self.timeout = timeout +if project is None: + raise ValueError( + 'GCP project name needs to be specified in "project" property') +if deidentification_template_name is not None \ +and deidentification_config is not None: + raise ValueError( + 'Both deidentification_template_name and ' + 'deidentification_config were specified.' + ' Please specify only one of these.') +elif deidentification_template_name is None \ +and deidentification_config is None: + raise ValueError( + 'deidentification_template_name or ' + 'deidentification_config must be specified.') +elif deidentification_template_name is not None: + self.config['deidentify_template_name'] = deidentification_template_name +else: + self.config['deidentify_config'] = deidentification_config + +if inspection_template_name is not None and inspection_config is not None: + raise ValueError( + 'Both inspe
[jira] [Work logged] (BEAM-9258) [Python] PTransform that connects to Cloud DLP deidentification service
[ https://issues.apache.org/jira/browse/BEAM-9258?focusedWorklogId=386755&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386755 ] ASF GitHub Bot logged work on BEAM-9258: Author: ASF GitHub Bot Created on: 13/Feb/20 17:10 Start Date: 13/Feb/20 17:10 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #10849: [BEAM-9258] Integrate Google Cloud Data loss prevention functionality for Python SDK URL: https://github.com/apache/beam/pull/10849#discussion_r378987762 ## File path: sdks/python/apache_beam/ml/gcp/cloud_dlp.py ## @@ -0,0 +1,224 @@ +# /* +# * Licensed to the Apache Software Foundation (ASF) under one +# * or more contributor license agreements. See the NOTICE file +# * distributed with this work for additional information +# * regarding copyright ownership. The ASF licenses this file +# * to you under the Apache License, Version 2.0 (the +# * "License"); you may not use this file except in compliance +# * with the License. You may obtain a copy of the License at +# * +# * http://www.apache.org/licenses/LICENSE-2.0 +# * +# * Unless required by applicable law or agreed to in writing, software +# * distributed under the License is distributed on an "AS IS" BASIS, +# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# * See the License for the specific language governing permissions and +# * limitations under the License. +# */ + +"""``PTransforms`` that implement Google Cloud Data Loss Prevention +functionality. +""" + +from __future__ import absolute_import + +import logging + +from google.cloud import dlp_v2 + +import apache_beam as beam +from apache_beam.utils import retry +from apache_beam.utils.annotations import experimental + +__all__ = ['MaskDetectedDetails', 'InspectForDetails'] + +_LOGGER = logging.getLogger(__name__) + + +@experimental() +class MaskDetectedDetails(beam.PTransform): + """Scrubs sensitive information detected in text. + The ``PTransform`` returns a ``PCollection`` of ``str`` + Example usage:: +pipeline | MaskDetectedDetails(project='example-gcp-project', + deidentification_config={ + 'info_type_transformations: { + 'transformations': [{ + 'primitive_transformation': { + 'character_mask_config': { + 'masking_character': '#' + } + } + }] + } + }, inspection_config={'info_types': [{'name': 'EMAIL_ADDRESS'}]}) + """ + def __init__( + self, + project=None, + deidentification_template_name=None, + deidentification_config=None, + inspection_template_name=None, + inspection_config=None, + timeout=None): +"""Initializes a :class:`MaskDetectedDetails` transform. +Args: + project (str): Required. GCP project in which the data processing is Review comment: Would it make sense to default to the project from gcp options? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386755) Time Spent: 0.5h (was: 20m) > [Python] PTransform that connects to Cloud DLP deidentification service > --- > > Key: BEAM-9258 > URL: https://issues.apache.org/jira/browse/BEAM-9258 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Michał Walenia >Assignee: Michał Walenia >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9258) [Python] PTransform that connects to Cloud DLP deidentification service
[ https://issues.apache.org/jira/browse/BEAM-9258?focusedWorklogId=386756&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386756 ] ASF GitHub Bot logged work on BEAM-9258: Author: ASF GitHub Bot Created on: 13/Feb/20 17:10 Start Date: 13/Feb/20 17:10 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #10849: [BEAM-9258] Integrate Google Cloud Data loss prevention functionality for Python SDK URL: https://github.com/apache/beam/pull/10849#discussion_r378989261 ## File path: sdks/python/apache_beam/ml/gcp/cloud_dlp.py ## @@ -0,0 +1,224 @@ +# /* +# * Licensed to the Apache Software Foundation (ASF) under one +# * or more contributor license agreements. See the NOTICE file +# * distributed with this work for additional information +# * regarding copyright ownership. The ASF licenses this file +# * to you under the Apache License, Version 2.0 (the +# * "License"); you may not use this file except in compliance +# * with the License. You may obtain a copy of the License at +# * +# * http://www.apache.org/licenses/LICENSE-2.0 +# * +# * Unless required by applicable law or agreed to in writing, software +# * distributed under the License is distributed on an "AS IS" BASIS, +# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# * See the License for the specific language governing permissions and +# * limitations under the License. +# */ + +"""``PTransforms`` that implement Google Cloud Data Loss Prevention +functionality. +""" + +from __future__ import absolute_import + +import logging + +from google.cloud import dlp_v2 + +import apache_beam as beam Review comment: Let's do more explicit imports here e.g. from apache_beam.transforms import ParDo from apache_beam.transforms import PTransform then use them directly like `PTransform` instead of the `beam.PTransform` style. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386756) Time Spent: 40m (was: 0.5h) > [Python] PTransform that connects to Cloud DLP deidentification service > --- > > Key: BEAM-9258 > URL: https://issues.apache.org/jira/browse/BEAM-9258 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Michał Walenia >Assignee: Michał Walenia >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9258) [Python] PTransform that connects to Cloud DLP deidentification service
[ https://issues.apache.org/jira/browse/BEAM-9258?focusedWorklogId=386758&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386758 ] ASF GitHub Bot logged work on BEAM-9258: Author: ASF GitHub Bot Created on: 13/Feb/20 17:10 Start Date: 13/Feb/20 17:10 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #10849: [BEAM-9258] Integrate Google Cloud Data loss prevention functionality for Python SDK URL: https://github.com/apache/beam/pull/10849#discussion_r378998869 ## File path: sdks/python/setup.py ## @@ -203,6 +203,7 @@ def get_version(): 'google-cloud-bigquery>=1.6.0,<1.18.0', 'google-cloud-core>=0.28.1,<2', 'google-cloud-bigtable>=0.31.1,<1.1.0', +'google-cloud-dlp >=0.12.0,<=0.13.0', Review comment: Version after 0.13 will not support py2. (notice at the top: https://googleapis.dev/python/dlp/latest/gapic/v2/api.html) I wonder if we need to add a comment note here for the person that will upgrade version ranges next? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386758) Time Spent: 50m (was: 40m) > [Python] PTransform that connects to Cloud DLP deidentification service > --- > > Key: BEAM-9258 > URL: https://issues.apache.org/jira/browse/BEAM-9258 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Michał Walenia >Assignee: Michał Walenia >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9258) [Python] PTransform that connects to Cloud DLP deidentification service
[ https://issues.apache.org/jira/browse/BEAM-9258?focusedWorklogId=386757&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386757 ] ASF GitHub Bot logged work on BEAM-9258: Author: ASF GitHub Bot Created on: 13/Feb/20 17:10 Start Date: 13/Feb/20 17:10 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #10849: [BEAM-9258] Integrate Google Cloud Data loss prevention functionality for Python SDK URL: https://github.com/apache/beam/pull/10849#discussion_r378990795 ## File path: sdks/python/apache_beam/ml/gcp/cloud_dlp.py ## @@ -0,0 +1,224 @@ +# /* +# * Licensed to the Apache Software Foundation (ASF) under one +# * or more contributor license agreements. See the NOTICE file +# * distributed with this work for additional information +# * regarding copyright ownership. The ASF licenses this file +# * to you under the Apache License, Version 2.0 (the +# * "License"); you may not use this file except in compliance +# * with the License. You may obtain a copy of the License at +# * +# * http://www.apache.org/licenses/LICENSE-2.0 +# * +# * Unless required by applicable law or agreed to in writing, software +# * distributed under the License is distributed on an "AS IS" BASIS, +# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# * See the License for the specific language governing permissions and +# * limitations under the License. +# */ + +"""``PTransforms`` that implement Google Cloud Data Loss Prevention +functionality. +""" + +from __future__ import absolute_import + +import logging + +from google.cloud import dlp_v2 + +import apache_beam as beam +from apache_beam.utils import retry +from apache_beam.utils.annotations import experimental + +__all__ = ['MaskDetectedDetails', 'InspectForDetails'] + +_LOGGER = logging.getLogger(__name__) + + +@experimental() +class MaskDetectedDetails(beam.PTransform): + """Scrubs sensitive information detected in text. + The ``PTransform`` returns a ``PCollection`` of ``str`` + Example usage:: +pipeline | MaskDetectedDetails(project='example-gcp-project', + deidentification_config={ + 'info_type_transformations: { + 'transformations': [{ + 'primitive_transformation': { + 'character_mask_config': { + 'masking_character': '#' + } + } + }] + } + }, inspection_config={'info_types': [{'name': 'EMAIL_ADDRESS'}]}) + """ + def __init__( + self, + project=None, + deidentification_template_name=None, + deidentification_config=None, + inspection_template_name=None, + inspection_config=None, + timeout=None): +"""Initializes a :class:`MaskDetectedDetails` transform. +Args: + project (str): Required. GCP project in which the data processing is +to be done + deidentification_template_name (str): Either this or +`deidentification_config` required. Name of +deidentification template to be used on detected sensitive information +instances in text. + deidentification_config +(``Union[dict, google.cloud.dlp_v2.types.DeidentifyConfig]``): +Configuration for the de-identification of the content item. + inspection_template_name (str): This or `inspection_config` required. +Name of inspection template to be used +to detect sensitive data in text. + inspection_config +(``Union[dict, google.cloud.dlp_v2.types.InspectConfig]``): +Configuration for the inspector used to detect sensitive data in text. + timeout (float): Optional. The amount of time, in seconds, to wait for +the request to complete. +""" +self.config = {} +self.project = project +self.timeout = timeout +if project is None: + raise ValueError( + 'GCP project name needs to be specified in "project" property') +if deidentification_template_name is not None \ +and deidentification_config is not None: + raise ValueError( + 'Both deidentification_template_name and ' + 'deidentification_config were specified.' + ' Please specify only one of these.') +elif deidentification_template_name is None \ +and deidentification_config is None: + raise ValueError( + 'deidentification_template_name or ' + 'deidentification_config must be specified.') +elif deidentification_template_name is not None: + self.config['deidentify_template_name'] = deidentification_template_name +else: + self.config['deidentify_config'] = deidentification_config + +if inspection_template_name is not None and inspection_config is not None: + raise ValueError( + 'Both inspe
[jira] [Work logged] (BEAM-9228) _SDFBoundedSourceWrapper doesn't distribute data to multiple workers
[ https://issues.apache.org/jira/browse/BEAM-9228?focusedWorklogId=386765&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386765 ] ASF GitHub Bot logged work on BEAM-9228: Author: ASF GitHub Bot Created on: 13/Feb/20 17:23 Start Date: 13/Feb/20 17:23 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on issue #10847: [BEAM-9228] Support further partition for FnApi ListBuffer URL: https://github.com/apache/beam/pull/10847#issuecomment-585873015 errors: https://builds.apache.org/job/beam_PreCommit_Python_Commit/11130/testReport/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386765) Time Spent: 50m (was: 40m) > _SDFBoundedSourceWrapper doesn't distribute data to multiple workers > > > Key: BEAM-9228 > URL: https://issues.apache.org/jira/browse/BEAM-9228 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.16.0, 2.18.0 >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > A user reported following issue. > - > I have a set of tfrecord files, obtained by converting parquet files with > Spark. Each file is roughly 1GB and I have 11 of those. > I would expect simple statistics gathering (ie counting number of items of > all files) to scale linearly with respect to the number of cores on my system. > I am able to reproduce the issue with the minimal snippet below > {code:java} > import apache_beam as beam > from apache_beam.options.pipeline_options import PipelineOptions > from apache_beam.runners.portability import fn_api_runner > from apache_beam.portability.api import beam_runner_api_pb2 > from apache_beam.portability import python_urns > import sys > pipeline_options = PipelineOptions(['--direct_num_workers', '4']) > file_pattern = 'part-r-00* > runner=fn_api_runner.FnApiRunner( > default_environment=beam_runner_api_pb2.Environment( > urn=python_urns.SUBPROCESS_SDK, > payload=b'%s -m apache_beam.runners.worker.sdk_worker_main' > % sys.executable.encode('ascii'))) > p = beam.Pipeline(runner=runner, options=pipeline_options) > lines = (p | 'read' >> beam.io.tfrecordio.ReadFromTFRecord(file_pattern) > | beam.combiners.Count.Globally() > | beam.io.WriteToText('/tmp/output')) > p.run() > {code} > Only one combination of apache_beam revision / worker type seems to work (I > refer to https://beam.apache.org/documentation/runners/direct/ for the worker > types) > * beam 2.16; neither multithread nor multiprocess achieve high cpu usage on > multiple cores > * beam 2.17: able to achieve high cpu usage on all 4 cores > * beam 2.18: not tested the mulithreaded mode but the multiprocess mode fails > when trying to serialize the Environment instance most likely because of a > change from 2.17 to 2.18. > I also tried briefly SparkRunner with version 2.16 but was no able to achieve > any throughput. > What is the recommnended way to achieve what I am trying to ? How can I > troubleshoot ? > -- > This is caused by [this > PR|https://github.com/apache/beam/commit/02f8ad4eee3ec0ea8cbdc0f99c1dad29f00a9f60]. > A [workaround|https://github.com/apache/beam/pull/10729] is tried, which is > rolling back iobase.py not to use _SDFBoundedSourceWrapper. This confirmed > that data is distributed to multiple workers, however, there are some > regressions with SDF wrapper tests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9292) Provide an ability to specify additional maven repositories for published POMs
[ https://issues.apache.org/jira/browse/BEAM-9292?focusedWorklogId=386772&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386772 ] ASF GitHub Bot logged work on BEAM-9292: Author: ASF GitHub Bot Created on: 13/Feb/20 17:26 Start Date: 13/Feb/20 17:26 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #10832: [BEAM-9292] Provide an ability to specify additional maven repositories for published POMs URL: https://github.com/apache/beam/pull/10832#issuecomment-585874564 Sorry for the slowness on review - LGTM for sure. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386772) Time Spent: 1h 40m (was: 1.5h) > Provide an ability to specify additional maven repositories for published POMs > -- > > Key: BEAM-9292 > URL: https://issues.apache.org/jira/browse/BEAM-9292 > Project: Beam > Issue Type: Improvement > Components: build-system, io-java-kafka >Reporter: Alexey Romanenko >Assignee: Alexey Romanenko >Priority: Major > Fix For: 2.20.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > To support Confluent Schema Registry, KafkaIO has a dependency on > {{io.confluent:kafka-avro-serializer}} from > https://packages.confluent.io/maven/ repository. In this case, it should add > this repository into published KafkaIO POM file. Otherwise, it will fail with > the following error during building a user pipeline: > {code} > [ERROR] Failed to execute goal on project kafka-io: Could not resolve > dependencies for project org.apache.beam.issues:kafka-io:jar:1.0.0-SNAPSHOT: > Could not find artifact io.confluent:kafka-avro-serializer:jar:5.3.2 in > central (https://repo.maven.apache.org/maven2) -> [Help 1] > {code} > The repositories for publishing can be added by {{mavenRepositories}} > argument in build script for Java configuration. > For example (KafkaIO: > {code} > $ cat sdks/java/io/kafka/build.gradle > ... > applyJavaNature( > ... > mavenRepositories: [ > [id: 'io.confluent', url: 'https://packages.confluent.io/maven/'] > ] > ) > ... > {code} > It will generate the following xml code snippet in pom file of > {{beam-sdks-java-io-kafka}} artifact after publishing: > {code} > > > io.confluent > https://packages.confluent.io/maven/ > > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-3788) Implement a Kafka IO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chamikara Madhusanka Jayalath reassigned BEAM-3788: --- Assignee: Chamikara Madhusanka Jayalath > Implement a Kafka IO for Python SDK > --- > > Key: BEAM-3788 > URL: https://issues.apache.org/jira/browse/BEAM-3788 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > > This will be implemented using the Splittable DoFn framework. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-3788) Implement a Kafka IO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chamikara Madhusanka Jayalath updated BEAM-3788: Description: Java KafkaIO will be made available to Python users as a cross-language transform. (was: This will be implemented using the Splittable DoFn framework.) > Implement a Kafka IO for Python SDK > --- > > Key: BEAM-3788 > URL: https://issues.apache.org/jira/browse/BEAM-3788 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > > Java KafkaIO will be made available to Python users as a cross-language > transform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read
[ https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=386776&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386776 ] ASF GitHub Bot logged work on BEAM-8382: Author: ASF GitHub Bot Created on: 13/Feb/20 17:32 Start Date: 13/Feb/20 17:32 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on issue #9765: [WIP][BEAM-8382] Add rate limit policy to KinesisIO.Read URL: https://github.com/apache/beam/pull/9765#issuecomment-585877558 @jfarr Thank you very much for detailed testing and explanation. In the end, what do you think would be the best solution in this case? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386776) Time Spent: 12h 20m (was: 12h 10m) > Add polling interval to KinesisIO.Read > -- > > Key: BEAM-8382 > URL: https://issues.apache.org/jira/browse/BEAM-8382 > Project: Beam > Issue Type: Improvement > Components: io-java-kinesis >Affects Versions: 2.13.0, 2.14.0, 2.15.0 >Reporter: Jonothan Farr >Assignee: Jonothan Farr >Priority: Major > Time Spent: 12h 20m > Remaining Estimate: 0h > > With the current implementation we are observing Kinesis throttling due to > ReadProvisionedThroughputExceeded on the order of hundreds of times per > second, regardless of the actual Kinesis throughput. This is because the > ShardReadersPool readLoop() method is polling getRecords() as fast as > possible. > From the KDS documentation: > {quote}Each shard can support up to five read transactions per second. > {quote} > and > {quote}For best results, sleep for at least 1 second (1,000 milliseconds) > between calls to getRecords to avoid exceeding the limit on getRecords > frequency. > {quote} > [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html] > [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9059) Migrate PTransformTranslation to use string constants
[ https://issues.apache.org/jira/browse/BEAM-9059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Cwik resolved BEAM-9059. - Fix Version/s: 2.20.0 Resolution: Fixed > Migrate PTransformTranslation to use string constants > - > > Key: BEAM-9059 > URL: https://issues.apache.org/jira/browse/BEAM-9059 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: Trivial > Fix For: 2.20.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > This allows for the values to be used within switch case statements. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9309) Remove deprecated primitive beam:transform:read:v1 and beam_fn_api_use_deprecated_read experiment everywhere
Luke Cwik created BEAM-9309: --- Summary: Remove deprecated primitive beam:transform:read:v1 and beam_fn_api_use_deprecated_read experiment everywhere Key: BEAM-9309 URL: https://issues.apache.org/jira/browse/BEAM-9309 Project: Beam Issue Type: Bug Components: beam-model, sdk-java-core Reporter: Luke Cwik Remove all references and usages of the experiment beam_fn_api_use_deprecated_read and also the deprecated transform beam:transform:read:v1 everywhere. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-3788) Implement a Kafka IO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036376#comment-17036376 ] Chamikara Madhusanka Jayalath commented on BEAM-3788: - We are actively working on making Java KafkaIO available to Python users as a cross-language transform for all runners. Current solution [1]makes Java Kafka source which is based on the UnboundedSource framework available through cross-language transforms framework. But this will not really work for any portable runners without transform overrides since UnboundedSource framework is not available for portable runners (This currently works for Flink since FlinkRunner overrides this source with a native source implementation). We need a Kafka cross-language transform based on a SDF version of KafkaIO which can be made available to Python users. On the other hand we are working on an UnboundedSource to SDF converter. So this can just be a cross-language transform for a Kafka transform that uses a SDF for existing Kafka UnboundedSource generated through this converter. [1][https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/kafka.py] > Implement a Kafka IO for Python SDK > --- > > Key: BEAM-3788 > URL: https://issues.apache.org/jira/browse/BEAM-3788 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > > Java KafkaIO will be made available to Python users as a cross-language > transform. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9085) Investigate performance difference between Python 2/3 on Dataflow
[ https://issues.apache.org/jira/browse/BEAM-9085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev reassigned BEAM-9085: - Assignee: Kamil Wasilewski (was: Valentyn Tymofieiev) > Investigate performance difference between Python 2/3 on Dataflow > - > > Key: BEAM-9085 > URL: https://issues.apache.org/jira/browse/BEAM-9085 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > > Tests show that the performance of core Beam operations in Python 3.x on > Dataflow can be a few time slower than in Python 2.7. We should investigate > what's the cause of the problem. > Currently, we have one ParDo test that is run both in Py3 and Py2 [1]. A > dashboard with runtime results can be found here [2]. > [1] sdks/python/apache_beam/testing/load_tests/pardo_test.py > [2] https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9085) Investigate performance difference between Python 2/3 on Dataflow
[ https://issues.apache.org/jira/browse/BEAM-9085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036377#comment-17036377 ] Valentyn Tymofieiev commented on BEAM-9085: --- Thanks, [~kamilwu]. I'll reassign the issue to you for now, for the follow-up changes in the test infra. > Investigate performance difference between Python 2/3 on Dataflow > - > > Key: BEAM-9085 > URL: https://issues.apache.org/jira/browse/BEAM-9085 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Kamil Wasilewski >Assignee: Valentyn Tymofieiev >Priority: Major > > Tests show that the performance of core Beam operations in Python 3.x on > Dataflow can be a few time slower than in Python 2.7. We should investigate > what's the cause of the problem. > Currently, we have one ParDo test that is run both in Py3 and Py2 [1]. A > dashboard with runtime results can be found here [2]. > [1] sdks/python/apache_beam/testing/load_tests/pardo_test.py > [2] https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9085) Performance regression in np.random.RandomState() skews performance test results across Python 2/3 on Dataflow
[ https://issues.apache.org/jira/browse/BEAM-9085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev updated BEAM-9085: -- Summary: Performance regression in np.random.RandomState() skews performance test results across Python 2/3 on Dataflow (was: Investigate performance difference between Python 2/3 on Dataflow) > Performance regression in np.random.RandomState() skews performance test > results across Python 2/3 on Dataflow > -- > > Key: BEAM-9085 > URL: https://issues.apache.org/jira/browse/BEAM-9085 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > > Tests show that the performance of core Beam operations in Python 3.x on > Dataflow can be a few time slower than in Python 2.7. We should investigate > what's the cause of the problem. > Currently, we have one ParDo test that is run both in Py3 and Py2 [1]. A > dashboard with runtime results can be found here [2]. > [1] sdks/python/apache_beam/testing/load_tests/pardo_test.py > [2] https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9279) Make HBase.ReadAll based on Reads instead of HBaseQuery
[ https://issues.apache.org/jira/browse/BEAM-9279?focusedWorklogId=386782&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386782 ] ASF GitHub Bot logged work on BEAM-9279: Author: ASF GitHub Bot Created on: 13/Feb/20 17:49 Start Date: 13/Feb/20 17:49 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on issue #10815: [BEAM-9279] Make HBase.ReadAll based on Reads instead of HBaseQuery URL: https://github.com/apache/beam/pull/10815#issuecomment-585885902 Thank you for contribution! Sorry for delay with review - I'll be off next week and will get back to this PR after. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386782) Time Spent: 50m (was: 40m) > Make HBase.ReadAll based on Reads instead of HBaseQuery > --- > > Key: BEAM-9279 > URL: https://issues.apache.org/jira/browse/BEAM-9279 > Project: Beam > Issue Type: Improvement > Components: io-java-hbase >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > HBaseIO support for SplittableDoFn introduced a new request type HBaseQuery, > however the attributes defined in that class are already available in > HBase.Read. Allowing users to define pipelines based on HBaseIO.Read allows > to create pipelines that can read from multiple clusters because the > Configuration now is part of the request object. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9228) _SDFBoundedSourceWrapper doesn't distribute data to multiple workers
[ https://issues.apache.org/jira/browse/BEAM-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hannah Jiang updated BEAM-9228: --- Fix Version/s: 2.20.0 > _SDFBoundedSourceWrapper doesn't distribute data to multiple workers > > > Key: BEAM-9228 > URL: https://issues.apache.org/jira/browse/BEAM-9228 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.16.0, 2.18.0 >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.20.0 > > Time Spent: 50m > Remaining Estimate: 0h > > A user reported following issue. > - > I have a set of tfrecord files, obtained by converting parquet files with > Spark. Each file is roughly 1GB and I have 11 of those. > I would expect simple statistics gathering (ie counting number of items of > all files) to scale linearly with respect to the number of cores on my system. > I am able to reproduce the issue with the minimal snippet below > {code:java} > import apache_beam as beam > from apache_beam.options.pipeline_options import PipelineOptions > from apache_beam.runners.portability import fn_api_runner > from apache_beam.portability.api import beam_runner_api_pb2 > from apache_beam.portability import python_urns > import sys > pipeline_options = PipelineOptions(['--direct_num_workers', '4']) > file_pattern = 'part-r-00* > runner=fn_api_runner.FnApiRunner( > default_environment=beam_runner_api_pb2.Environment( > urn=python_urns.SUBPROCESS_SDK, > payload=b'%s -m apache_beam.runners.worker.sdk_worker_main' > % sys.executable.encode('ascii'))) > p = beam.Pipeline(runner=runner, options=pipeline_options) > lines = (p | 'read' >> beam.io.tfrecordio.ReadFromTFRecord(file_pattern) > | beam.combiners.Count.Globally() > | beam.io.WriteToText('/tmp/output')) > p.run() > {code} > Only one combination of apache_beam revision / worker type seems to work (I > refer to https://beam.apache.org/documentation/runners/direct/ for the worker > types) > * beam 2.16; neither multithread nor multiprocess achieve high cpu usage on > multiple cores > * beam 2.17: able to achieve high cpu usage on all 4 cores > * beam 2.18: not tested the mulithreaded mode but the multiprocess mode fails > when trying to serialize the Environment instance most likely because of a > change from 2.17 to 2.18. > I also tried briefly SparkRunner with version 2.16 but was no able to achieve > any throughput. > What is the recommnended way to achieve what I am trying to ? How can I > troubleshoot ? > -- > This is caused by [this > PR|https://github.com/apache/beam/commit/02f8ad4eee3ec0ea8cbdc0f99c1dad29f00a9f60]. > A [workaround|https://github.com/apache/beam/pull/10729] is tried, which is > rolling back iobase.py not to use _SDFBoundedSourceWrapper. This confirmed > that data is distributed to multiple workers, however, there are some > regressions with SDF wrapper tests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9228) _SDFBoundedSourceWrapper doesn't distribute data to multiple workers
[ https://issues.apache.org/jira/browse/BEAM-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hannah Jiang updated BEAM-9228: --- Affects Version/s: 2.19.0 > _SDFBoundedSourceWrapper doesn't distribute data to multiple workers > > > Key: BEAM-9228 > URL: https://issues.apache.org/jira/browse/BEAM-9228 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.16.0, 2.18.0, 2.19.0 >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.20.0 > > Time Spent: 50m > Remaining Estimate: 0h > > A user reported following issue. > - > I have a set of tfrecord files, obtained by converting parquet files with > Spark. Each file is roughly 1GB and I have 11 of those. > I would expect simple statistics gathering (ie counting number of items of > all files) to scale linearly with respect to the number of cores on my system. > I am able to reproduce the issue with the minimal snippet below > {code:java} > import apache_beam as beam > from apache_beam.options.pipeline_options import PipelineOptions > from apache_beam.runners.portability import fn_api_runner > from apache_beam.portability.api import beam_runner_api_pb2 > from apache_beam.portability import python_urns > import sys > pipeline_options = PipelineOptions(['--direct_num_workers', '4']) > file_pattern = 'part-r-00* > runner=fn_api_runner.FnApiRunner( > default_environment=beam_runner_api_pb2.Environment( > urn=python_urns.SUBPROCESS_SDK, > payload=b'%s -m apache_beam.runners.worker.sdk_worker_main' > % sys.executable.encode('ascii'))) > p = beam.Pipeline(runner=runner, options=pipeline_options) > lines = (p | 'read' >> beam.io.tfrecordio.ReadFromTFRecord(file_pattern) > | beam.combiners.Count.Globally() > | beam.io.WriteToText('/tmp/output')) > p.run() > {code} > Only one combination of apache_beam revision / worker type seems to work (I > refer to https://beam.apache.org/documentation/runners/direct/ for the worker > types) > * beam 2.16; neither multithread nor multiprocess achieve high cpu usage on > multiple cores > * beam 2.17: able to achieve high cpu usage on all 4 cores > * beam 2.18: not tested the mulithreaded mode but the multiprocess mode fails > when trying to serialize the Environment instance most likely because of a > change from 2.17 to 2.18. > I also tried briefly SparkRunner with version 2.16 but was no able to achieve > any throughput. > What is the recommnended way to achieve what I am trying to ? How can I > troubleshoot ? > -- > This is caused by [this > PR|https://github.com/apache/beam/commit/02f8ad4eee3ec0ea8cbdc0f99c1dad29f00a9f60]. > A [workaround|https://github.com/apache/beam/pull/10729] is tried, which is > rolling back iobase.py not to use _SDFBoundedSourceWrapper. This confirmed > that data is distributed to multiple workers, however, there are some > regressions with SDF wrapper tests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9279) Make HBase.ReadAll based on Reads instead of HBaseQuery
[ https://issues.apache.org/jira/browse/BEAM-9279?focusedWorklogId=386784&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386784 ] ASF GitHub Bot logged work on BEAM-9279: Author: ASF GitHub Bot Created on: 13/Feb/20 17:59 Start Date: 13/Feb/20 17:59 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on pull request #10815: [BEAM-9279] Make HBase.ReadAll based on Reads instead of HBaseQuery URL: https://github.com/apache/beam/pull/10815#discussion_r379025287 ## File path: sdks/java/io/hbase/src/main/java/org/apache/beam/sdk/io/hbase/HBaseIO.java ## @@ -240,63 +245,109 @@ private Read( @Override public void populateDisplayData(DisplayData.Builder builder) { super.populateDisplayData(builder); - builder.add(DisplayData.item("configuration", serializableConfiguration.get().toString())); + builder.add(DisplayData.item("configuration", configuration.toString())); builder.add(DisplayData.item("tableId", tableId)); - builder.addIfNotNull(DisplayData.item("scan", serializableScan.get().toString())); + builder.addIfNotNull(DisplayData.item("scan", scan.toString())); } public Configuration getConfiguration() { - return serializableConfiguration.get(); + return configuration; } public String getTableId() { return tableId; } public Scan getScan() { - return serializableScan.get(); + return scan; } /** Returns the range of keys that will be read from the table. */ public ByteKeyRange getKeyRange() { - byte[] startRow = serializableScan.get().getStartRow(); - byte[] stopRow = serializableScan.get().getStopRow(); + byte[] startRow = scan.getStartRow(); + byte[] stopRow = scan.getStopRow(); return ByteKeyRange.of(ByteKey.copyFrom(startRow), ByteKey.copyFrom(stopRow)); } -private final SerializableConfiguration serializableConfiguration; +@Override +public boolean equals(Object o) { + if (this == o) { +return true; + } + if (o == null || getClass() != o.getClass()) { +return false; + } + Read read = (Read) o; + return configuration.toString().equals(read.configuration.toString()) + && Objects.equals(tableId, read.tableId) + && scan.toString().equals(read.scan.toString()); +} + +@Override +public int hashCode() { + return Objects.hash(configuration, tableId, scan); +} + +private Object writeReplace() { + return new SerializationProxy(this); +} + +private static class SerializationProxy implements Serializable { + public SerializationProxy() {} + + public SerializationProxy(Read read) { +configuration = read.configuration; +tableId = read.tableId; +scan = read.scan; + } + + private void writeObject(ObjectOutputStream out) throws IOException { +SerializableCoder.of(SerializableConfiguration.class) +.encode(new SerializableConfiguration(this.configuration), out); +StringUtf8Coder.of().encode(this.tableId, out); +ProtobufUtil.toScan(this.scan).writeDelimitedTo(out); + } + + private void readObject(ObjectInputStream in) throws IOException { +this.configuration = SerializableCoder.of(SerializableConfiguration.class).decode(in).get(); +this.tableId = StringUtf8Coder.of().decode(in); +this.scan = ProtobufUtil.toScan(ClientProtos.Scan.parseDelimitedFrom(in)); + } + + Object readResolve() { +return HBaseIO.read().withConfiguration(configuration).withTableId(tableId).withScan(scan); + } + + private Configuration configuration; + private String tableId; + private Scan scan; +} + +@SuppressFBWarnings("SE_BAD_FIELD") +private final Configuration configuration; + private final String tableId; -private final SerializableScan serializableScan; + +@SuppressFBWarnings("SE_BAD_FIELD") +private final Scan scan; } /** * A {@link PTransform} that works like {@link #read}, but executes read operations coming from a - * {@link PCollection} of {@link HBaseQuery}. + * {@link PCollection} of {@link Read}. */ public static ReadAll readAll() { -return new ReadAll(null); +return new ReadAll(); } /** Implementation of {@link #readAll}. */ - public static class ReadAll extends PTransform, PCollection> { - -private ReadAll(SerializableConfiguration serializableConfiguration) { - this.serializableConfiguration = serializableConfiguration; -} - -/** Reads from the HBase instance indicated by the* given configuration. */ -public ReadAll withConfiguration(Configuration configuration) { - checkArgument(configuration != null, "configuration can not be null"); - return
[jira] [Work logged] (BEAM-8758) Beam Dependency Update Request: com.google.cloud:google-cloud-spanner
[ https://issues.apache.org/jira/browse/BEAM-8758?focusedWorklogId=386785&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386785 ] ASF GitHub Bot logged work on BEAM-8758: Author: ASF GitHub Bot Created on: 13/Feb/20 18:08 Start Date: 13/Feb/20 18:08 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10765: [BEAM-8758] Google-cloud-spanner upgrade to 1.49.1 URL: https://github.com/apache/beam/pull/10765#issuecomment-585894780 Going to implement Luke's suggestion. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386785) Time Spent: 4h 20m (was: 4h 10m) > Beam Dependency Update Request: com.google.cloud:google-cloud-spanner > - > > Key: BEAM-8758 > URL: https://issues.apache.org/jira/browse/BEAM-8758 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Tomo Suzuki >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > - 2019-11-19 21:05:29.289016 > - > Please consider upgrading the dependency > com.google.cloud:google-cloud-spanner. > The current version is 1.6.0. The latest version is 1.46.0 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-02 12:11:08.926875 > - > Please consider upgrading the dependency > com.google.cloud:google-cloud-spanner. > The current version is 1.6.0. The latest version is 1.46.0 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-09 12:10:16.400168 > - > Please consider upgrading the dependency > com.google.cloud:google-cloud-spanner. > The current version is 1.6.0. The latest version is 1.47.0 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-23 12:10:17.656471 > - > Please consider upgrading the dependency > com.google.cloud:google-cloud-spanner. > The current version is 1.6.0. The latest version is 1.47.0 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-30 14:05:49.080960 > - > Please consider upgrading the dependency > com.google.cloud:google-cloud-spanner. > The current version is 1.6.0. The latest version is 1.47.0 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2020-01-06 12:09:23.346857 > - > Please consider upgrading the dependency > com.google.cloud:google-cloud-spanner. > The current version is 1.6.0. The latest version is 1.47.0 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2020-01-13 12:09:02.023131 > - > Please consider upgrading the dependency > com.google.cloud:google-cloud-spanner. > The current version is 1.6.0. The latest version is 1.47.0 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2020-01-20 12:08:38.419575 > - > Please consider upgrading the dependency > com.google.cloud:google-cloud-spanner. > The current version is 1.6.0. The latest version is 1.48.0 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2020-01-27 12:09:44.29834
[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage
[ https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=386794&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386794 ] ASF GitHub Bot logged work on BEAM-8889: Author: ASF GitHub Bot Created on: 13/Feb/20 18:34 Start Date: 13/Feb/20 18:34 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10769: [BEAM-8889] Upgrades gcsio to 2.0.0 URL: https://github.com/apache/beam/pull/10769#issuecomment-585905517 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386794) Remaining Estimate: 153h (was: 153h 10m) Time Spent: 15h (was: 14h 50m) > Make GcsUtil use GoogleCloudStorage > --- > > Key: BEAM-8889 > URL: https://issues.apache.org/jira/browse/BEAM-8889 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.16.0 >Reporter: Esun Kim >Assignee: VASU NORI >Priority: Major > Labels: gcs > Original Estimate: 168h > Time Spent: 15h > Remaining Estimate: 153h > > [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java] > is a primary class to access Google Cloud Storage on Apache Beam. Current > implementation directly creates GoogleCloudStorageReadChannel and > GoogleCloudStorageWriteChannel by itself to read and write GCS data rather > than using > [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java] > which is an abstract class providing basic IO capability which eventually > creates channel objects. This request is about updating GcsUtil to use > GoogleCloudStorage to create read and write channel, which is expected > flexible because it can easily pick up the new change; e.g. new channel > implementation using new protocol without code change. -- This message was sent by Atlassian Jira (v8.3.4#803005)