[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386436&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386436
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 13/Feb/20 08:11
Start Date: 13/Feb/20 08:11
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10764: [BEAM-9146] Integrate 
GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-585603649
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386436)
Time Spent: 8h 50m  (was: 8h 40m)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386435&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386435
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 13/Feb/20 08:11
Start Date: 13/Feb/20 08:11
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10764: [BEAM-9146] Integrate 
GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-585603580
 
 
   I cannot seem to make the tests fail locally so this is proving to be quite 
a struggle 🙄 But hopefully this should fix it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386435)
Time Spent: 8h 40m  (was: 8.5h)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386437&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386437
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 13/Feb/20 08:12
Start Date: 13/Feb/20 08:12
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10764: [BEAM-9146] 
Integrate GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-585603857
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386437)
Time Spent: 9h  (was: 8h 50m)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386440&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386440
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 13/Feb/20 08:15
Start Date: 13/Feb/20 08:15
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10764: [BEAM-9146] 
Integrate GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-585605111
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386440)
Time Spent: 9h 10m  (was: 9h)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386441&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386441
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 13/Feb/20 08:16
Start Date: 13/Feb/20 08:16
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10764: [BEAM-9146] 
Integrate GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-585603857
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386441)
Time Spent: 9h 20m  (was: 9h 10m)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9302) No space left on device - apache-beam-jenkins-7

2020-02-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-9302:
---
Status: Open  (was: Triage Needed)

> No space left on device - apache-beam-jenkins-7
> ---
>
> Key: BEAM-9302
> URL: https://issues.apache.org/jira/browse/BEAM-9302
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Michał Walenia
>Priority: Blocker
>
> [https://builds.apache.org/job/beam_PreCommit_SQL_Commit/543/consoleFull] log 
> of a failed job with this error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9302) No space left on device - apache-beam-jenkins-7

2020-02-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-9302:
--

Assignee: Udi Meiri

> No space left on device - apache-beam-jenkins-7
> ---
>
> Key: BEAM-9302
> URL: https://issues.apache.org/jira/browse/BEAM-9302
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Michał Walenia
>Assignee: Udi Meiri
>Priority: Blocker
>
> [https://builds.apache.org/job/beam_PreCommit_SQL_Commit/543/consoleFull] log 
> of a failed job with this error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9302) No space left on device - apache-beam-jenkins-7

2020-02-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-9302.

Fix Version/s: Not applicable
   Resolution: Fixed

> No space left on device - apache-beam-jenkins-7
> ---
>
> Key: BEAM-9302
> URL: https://issues.apache.org/jira/browse/BEAM-9302
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Michał Walenia
>Assignee: Udi Meiri
>Priority: Blocker
> Fix For: Not applicable
>
>
> [https://builds.apache.org/job/beam_PreCommit_SQL_Commit/543/consoleFull] log 
> of a failed job with this error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9305) Support ValueProvider for BigQuerySource query string

2020-02-13 Thread Elias Djurfeldt (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036040#comment-17036040
 ] 

Elias Djurfeldt commented on BEAM-9305:
---

[~udim] is this something that we'd like to support? If so, I can implement it.

> Support ValueProvider for BigQuerySource query string
> -
>
> Key: BEAM-9305
> URL: https://issues.apache.org/jira/browse/BEAM-9305
> Project: Beam
>  Issue Type: New Feature
>  Components: io-py-gcp
>Reporter: Elias Djurfeldt
>Assignee: Elias Djurfeldt
>Priority: Minor
>
> Users should be able to use ValueProviders for the query string in 
> BigQuerySource.
> Ref: 
> [https://stackoverflow.com/questions/60146887/expected-eta-to-avail-pipeline-i-o-and-runtime-parameters-in-apache-beam-gcp-dat/60170614?noredirect=1#comment106464448_60170614]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9265) @RequiresTimeSortedInput does not respect allowedLateness

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9265?focusedWorklogId=386453&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386453
 ]

ASF GitHub Bot logged work on BEAM-9265:


Author: ASF GitHub Bot
Created on: 13/Feb/20 09:02
Start Date: 13/Feb/20 09:02
Worklog Time Spent: 10m 
  Work Description: dmvk commented on pull request #10795: [BEAM-9265] 
@RequiresTimeSortedInput respects allowedLateness
URL: https://github.com/apache/beam/pull/10795#discussion_r378724102
 
 

 ##
 File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java
 ##
 @@ -2409,6 +2410,39 @@ public void testRequiresTimeSortedInputWithTestStream() 
{
   testTimeSortedInput(numElements, 
pipeline.apply(stream.advanceWatermarkToInfinity()));
 }
 
+@Test
+@Category({
+  ValidatesRunner.class,
+  UsesStatefulParDo.class,
+  UsesRequiresTimeSortedInput.class,
+  UsesStrictTimerOrdering.class,
+  UsesTestStream.class
+})
+public void testRequiresTimeSortedInputWithLateDataAndAllowedLateness() {
+  // generate list long enough to rule out random shuffle in sorted order
+  int numElements = 1000;
+  List eventStamps =
+  LongStream.range(0, numElements)
+  .mapToObj(i -> numElements - i)
+  .collect(Collectors.toList());
+  TestStream.Builder input = TestStream.create(VarLongCoder.of());
+  for (Long stamp : eventStamps) {
+input = input.addElements(TimestampedValue.of(stamp, 
Instant.ofEpochMilli(stamp)));
+if (stamp == 100) {
+  // advance watermark when we have 100 remaining elements
 
 Review comment:
   900 remaining elements?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386453)
Time Spent: 2h 50m  (was: 2h 40m)

> @RequiresTimeSortedInput does not respect allowedLateness
> -
>
> Key: BEAM-9265
> URL: https://issues.apache.org/jira/browse/BEAM-9265
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.20.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently, @RequiresTimeSortedInput drops data with respect to 
> allowedLateness, but timers are triggered without respecting it. We have to:
>  - drop data that is too late (after allowedLateness)
>  - setup timer for _minTimestamp + allowedLateness_
>  -  hold output watermark at _minTimestamp_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9265) @RequiresTimeSortedInput does not respect allowedLateness

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9265?focusedWorklogId=386454&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386454
 ]

ASF GitHub Bot logged work on BEAM-9265:


Author: ASF GitHub Bot
Created on: 13/Feb/20 09:02
Start Date: 13/Feb/20 09:02
Worklog Time Spent: 10m 
  Work Description: dmvk commented on pull request #10795: [BEAM-9265] 
@RequiresTimeSortedInput respects allowedLateness
URL: https://github.com/apache/beam/pull/10795#discussion_r378725413
 
 

 ##
 File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/StatefulDoFnRunner.java
 ##
 @@ -252,18 +255,29 @@ private void onSortFlushTimer(BoundedWindow window, 
Instant timestamp) {
 keep.forEach(sortBuffer::add);
 minStampState.write(newMinStamp);
 if (newMinStamp.isBefore(BoundedWindow.TIMESTAMP_MAX_VALUE)) {
-  setupFlushTimerAndWatermarkHold(namespace, newMinStamp);
+  setupFlushTimerAndWatermarkHold(namespace, window, newMinStamp);
 } else {
   clearWatermarkHold(namespace);
 }
   }
 
-  private void setupFlushTimerAndWatermarkHold(StateNamespace namespace, 
Instant flush) {
+  private void setupFlushTimerAndWatermarkHold(
 
 Review comment:
   Can you add a javadoc for this method?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386454)
Time Spent: 2h 50m  (was: 2h 40m)

> @RequiresTimeSortedInput does not respect allowedLateness
> -
>
> Key: BEAM-9265
> URL: https://issues.apache.org/jira/browse/BEAM-9265
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.20.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently, @RequiresTimeSortedInput drops data with respect to 
> allowedLateness, but timers are triggered without respecting it. We have to:
>  - drop data that is too late (after allowedLateness)
>  - setup timer for _minTimestamp + allowedLateness_
>  -  hold output watermark at _minTimestamp_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9160) Update AWS SDK to support Kubernetes Pod Level Identity

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9160?focusedWorklogId=386477&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386477
 ]

ASF GitHub Bot logged work on BEAM-9160:


Author: ASF GitHub Bot
Created on: 13/Feb/20 09:24
Start Date: 13/Feb/20 09:24
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10846: [BEAM-9160] 
Removed WebIdentityTokenCredentialsProvider explicit json (de)serialization in 
AWS2 module 
URL: https://github.com/apache/beam/pull/10846
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386477)
Time Spent: 3.5h  (was: 3h 20m)

> Update AWS SDK to support Kubernetes Pod Level Identity
> ---
>
> Key: BEAM-9160
> URL: https://issues.apache.org/jira/browse/BEAM-9160
> Project: Beam
>  Issue Type: Improvement
>  Components: dependencies
>Affects Versions: 2.17.0
>Reporter: Mohamed Noah
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Many organizations have started leveraging pod level identity in Kubernetes. 
> The current version of the AWS SDK packaged with Beam 2.17.0 is out of date 
> and doesn't provide native support to pod level identity access management.
>  
> It is recommended that we introduce support to access AWS resources such as 
> S3 using pod level identity. 
> Current Version of the AWS Java SDK in Beam:
> def aws_java_sdk_version = "1.11.519"
> Proposed AWS Java SDK Version:
> 
>  com.amazonaws
>  aws-java-sdk
>  1.11.710
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9160) Update AWS SDK to support Kubernetes Pod Level Identity

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9160?focusedWorklogId=386476&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386476
 ]

ASF GitHub Bot logged work on BEAM-9160:


Author: ASF GitHub Bot
Created on: 13/Feb/20 09:24
Start Date: 13/Feb/20 09:24
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10846: [BEAM-9160] Removed 
WebIdentityTokenCredentialsProvider explicit json (de)serialization in AWS2 
module 
URL: https://github.com/apache/beam/pull/10846#issuecomment-585631045
 
 
   Merged manually to squash the commits
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386476)
Time Spent: 3h 20m  (was: 3h 10m)

> Update AWS SDK to support Kubernetes Pod Level Identity
> ---
>
> Key: BEAM-9160
> URL: https://issues.apache.org/jira/browse/BEAM-9160
> Project: Beam
>  Issue Type: Improvement
>  Components: dependencies
>Affects Versions: 2.17.0
>Reporter: Mohamed Noah
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Many organizations have started leveraging pod level identity in Kubernetes. 
> The current version of the AWS SDK packaged with Beam 2.17.0 is out of date 
> and doesn't provide native support to pod level identity access management.
>  
> It is recommended that we introduce support to access AWS resources such as 
> S3 using pod level identity. 
> Current Version of the AWS Java SDK in Beam:
> def aws_java_sdk_version = "1.11.519"
> Proposed AWS Java SDK Version:
> 
>  com.amazonaws
>  aws-java-sdk
>  1.11.710
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9292) Provide an ability to specify additional maven repositories for published POMs

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9292?focusedWorklogId=386481&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386481
 ]

ASF GitHub Bot logged work on BEAM-9292:


Author: ASF GitHub Bot
Created on: 13/Feb/20 09:28
Start Date: 13/Feb/20 09:28
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10832: [BEAM-9292] 
Provide an ability to specify additional maven repositories for published POMs
URL: https://github.com/apache/beam/pull/10832
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386481)
Time Spent: 1h 10m  (was: 1h)

> Provide an ability to specify additional maven repositories for published POMs
> --
>
> Key: BEAM-9292
> URL: https://issues.apache.org/jira/browse/BEAM-9292
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, io-java-kafka
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> To support Confluent Schema Registry, KafkaIO has a dependency on 
> {{io.confluent:kafka-avro-serializer}} from 
> https://packages.confluent.io/maven/ repository. In this case, it should add 
> this repository into published KafkaIO POM file. Otherwise, it will fail with 
> the following error during building a user pipeline:
> {code}
> [ERROR] Failed to execute goal on project kafka-io: Could not resolve 
> dependencies for project org.apache.beam.issues:kafka-io:jar:1.0.0-SNAPSHOT: 
> Could not find artifact io.confluent:kafka-avro-serializer:jar:5.3.2 in 
> central (https://repo.maven.apache.org/maven2) -> [Help 1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9292) Provide an ability to specify additional maven repositories for published POMs

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9292?focusedWorklogId=386483&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386483
 ]

ASF GitHub Bot logged work on BEAM-9292:


Author: ASF GitHub Bot
Created on: 13/Feb/20 09:29
Start Date: 13/Feb/20 09:29
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10832: [BEAM-9292] Provide 
an ability to specify additional maven repositories for published POMs
URL: https://github.com/apache/beam/pull/10832#issuecomment-585632888
 
 
   Merged eagerly because this is breaking the SNAPSHOTs + linkage error 
analysis
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386483)
Time Spent: 1h 20m  (was: 1h 10m)

> Provide an ability to specify additional maven repositories for published POMs
> --
>
> Key: BEAM-9292
> URL: https://issues.apache.org/jira/browse/BEAM-9292
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, io-java-kafka
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> To support Confluent Schema Registry, KafkaIO has a dependency on 
> {{io.confluent:kafka-avro-serializer}} from 
> https://packages.confluent.io/maven/ repository. In this case, it should add 
> this repository into published KafkaIO POM file. Otherwise, it will fail with 
> the following error during building a user pipeline:
> {code}
> [ERROR] Failed to execute goal on project kafka-io: Could not resolve 
> dependencies for project org.apache.beam.issues:kafka-io:jar:1.0.0-SNAPSHOT: 
> Could not find artifact io.confluent:kafka-avro-serializer:jar:5.3.2 in 
> central (https://repo.maven.apache.org/maven2) -> [Help 1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9292) Provide an ability to specify additional maven repositories for published POMs

2020-02-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-9292.

Fix Version/s: 2.20.0
   Resolution: Fixed

> Provide an ability to specify additional maven repositories for published POMs
> --
>
> Key: BEAM-9292
> URL: https://issues.apache.org/jira/browse/BEAM-9292
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, io-java-kafka
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> To support Confluent Schema Registry, KafkaIO has a dependency on 
> {{io.confluent:kafka-avro-serializer}} from 
> https://packages.confluent.io/maven/ repository. In this case, it should add 
> this repository into published KafkaIO POM file. Otherwise, it will fail with 
> the following error during building a user pipeline:
> {code}
> [ERROR] Failed to execute goal on project kafka-io: Could not resolve 
> dependencies for project org.apache.beam.issues:kafka-io:jar:1.0.0-SNAPSHOT: 
> Could not find artifact io.confluent:kafka-avro-serializer:jar:5.3.2 in 
> central (https://repo.maven.apache.org/maven2) -> [Help 1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9291) upload_graph support in Dataflow Python SDK

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9291?focusedWorklogId=386486&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386486
 ]

ASF GitHub Bot logged work on BEAM-9291:


Author: ASF GitHub Bot
Created on: 13/Feb/20 09:32
Start Date: 13/Feb/20 09:32
Worklog Time Spent: 10m 
  Work Description: stankiewicz commented on issue #10829: [BEAM-9291] 
Upload graph option in dataflow's python sdk
URL: https://github.com/apache/beam/pull/10829#issuecomment-585634276
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386486)
Time Spent: 1h 50m  (was: 1h 40m)

> upload_graph support in Dataflow Python SDK
> ---
>
> Key: BEAM-9291
> URL: https://issues.apache.org/jira/browse/BEAM-9291
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Radosław Stankiewicz
>Assignee: Radosław Stankiewicz
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> upload_graph option is not supported in Dataflow's Python SDK so there is no 
> workaround for large graphs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9265) @RequiresTimeSortedInput does not respect allowedLateness

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9265?focusedWorklogId=386497&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386497
 ]

ASF GitHub Bot logged work on BEAM-9265:


Author: ASF GitHub Bot
Created on: 13/Feb/20 09:49
Start Date: 13/Feb/20 09:49
Worklog Time Spent: 10m 
  Work Description: je-ik commented on pull request #10795: [BEAM-9265] 
@RequiresTimeSortedInput respects allowedLateness
URL: https://github.com/apache/beam/pull/10795#discussion_r378753051
 
 

 ##
 File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java
 ##
 @@ -2409,6 +2410,39 @@ public void testRequiresTimeSortedInputWithTestStream() 
{
   testTimeSortedInput(numElements, 
pipeline.apply(stream.advanceWatermarkToInfinity()));
 }
 
+@Test
+@Category({
+  ValidatesRunner.class,
+  UsesStatefulParDo.class,
+  UsesRequiresTimeSortedInput.class,
+  UsesStrictTimerOrdering.class,
+  UsesTestStream.class
+})
+public void testRequiresTimeSortedInputWithLateDataAndAllowedLateness() {
+  // generate list long enough to rule out random shuffle in sorted order
+  int numElements = 1000;
+  List eventStamps =
+  LongStream.range(0, numElements)
+  .mapToObj(i -> numElements - i)
+  .collect(Collectors.toList());
+  TestStream.Builder input = TestStream.create(VarLongCoder.of());
+  for (Long stamp : eventStamps) {
+input = input.addElements(TimestampedValue.of(stamp, 
Instant.ofEpochMilli(stamp)));
+if (stamp == 100) {
+  // advance watermark when we have 100 remaining elements
 
 Review comment:
   100 because the stamp is descending. The watermark advances past the last 
100 elements which should get dropped.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386497)
Time Spent: 3h  (was: 2h 50m)

> @RequiresTimeSortedInput does not respect allowedLateness
> -
>
> Key: BEAM-9265
> URL: https://issues.apache.org/jira/browse/BEAM-9265
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.20.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Currently, @RequiresTimeSortedInput drops data with respect to 
> allowedLateness, but timers are triggered without respecting it. We have to:
>  - drop data that is too late (after allowedLateness)
>  - setup timer for _minTimestamp + allowedLateness_
>  -  hold output watermark at _minTimestamp_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9292) Provide an ability to specify additional maven repositories for published POMs

2020-02-13 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko updated BEAM-9292:
---
Description: 
To support Confluent Schema Registry, KafkaIO has a dependency on 
{{io.confluent:kafka-avro-serializer}} from 
https://packages.confluent.io/maven/ repository. In this case, it should add 
this repository into published KafkaIO POM file. Otherwise, it will fail with 
the following error during building a user pipeline:
{code}
[ERROR] Failed to execute goal on project kafka-io: Could not resolve 
dependencies for project org.apache.beam.issues:kafka-io:jar:1.0.0-SNAPSHOT: 
Could not find artifact io.confluent:kafka-avro-serializer:jar:5.3.2 in central 
(https://repo.maven.apache.org/maven2) -> [Help 1]
{code}

The repositories for publishing can be added by {{mavenRepositories}} argument 
in build script for Java configuration. 

For example (KafkaIO:
{code}
$ cat sdks/java/io/kafka/build.gradle 

...
applyJavaNature(
  ...
  mavenRepositories: [
[id: 'io.confluent', url: 'https://packages.confluent.io/maven/']
  ]
)
...
{code}
It will generate the following xml code snippet in pom file of 
{{beam-sdks-java-io-kafka}} artifact after publishing:
{code}
  

  io.confluent
  https://packages.confluent.io/maven/

  
{code}

  was:
To support Confluent Schema Registry, KafkaIO has a dependency on 
{{io.confluent:kafka-avro-serializer}} from 
https://packages.confluent.io/maven/ repository. In this case, it should add 
this repository into published KafkaIO POM file. Otherwise, it will fail with 
the following error during building a user pipeline:
{code}
[ERROR] Failed to execute goal on project kafka-io: Could not resolve 
dependencies for project org.apache.beam.issues:kafka-io:jar:1.0.0-SNAPSHOT: 
Could not find artifact io.confluent:kafka-avro-serializer:jar:5.3.2 in central 
(https://repo.maven.apache.org/maven2) -> [Help 1]
{code}


> Provide an ability to specify additional maven repositories for published POMs
> --
>
> Key: BEAM-9292
> URL: https://issues.apache.org/jira/browse/BEAM-9292
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, io-java-kafka
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> To support Confluent Schema Registry, KafkaIO has a dependency on 
> {{io.confluent:kafka-avro-serializer}} from 
> https://packages.confluent.io/maven/ repository. In this case, it should add 
> this repository into published KafkaIO POM file. Otherwise, it will fail with 
> the following error during building a user pipeline:
> {code}
> [ERROR] Failed to execute goal on project kafka-io: Could not resolve 
> dependencies for project org.apache.beam.issues:kafka-io:jar:1.0.0-SNAPSHOT: 
> Could not find artifact io.confluent:kafka-avro-serializer:jar:5.3.2 in 
> central (https://repo.maven.apache.org/maven2) -> [Help 1]
> {code}
> The repositories for publishing can be added by {{mavenRepositories}} 
> argument in build script for Java configuration. 
> For example (KafkaIO:
> {code}
> $ cat sdks/java/io/kafka/build.gradle 
> ...
> applyJavaNature(
>   ...
>   mavenRepositories: [
> [id: 'io.confluent', url: 'https://packages.confluent.io/maven/']
>   ]
> )
> ...
> {code}
> It will generate the following xml code snippet in pom file of 
> {{beam-sdks-java-io-kafka}} artifact after publishing:
> {code}
>   
> 
>   io.confluent
>   https://packages.confluent.io/maven/
> 
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386507&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386507
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 13/Feb/20 10:07
Start Date: 13/Feb/20 10:07
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10764: [BEAM-9146] Integrate 
GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-585648977
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386507)
Time Spent: 9.5h  (was: 9h 20m)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-13 Thread Kamil Wasilewski (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036100#comment-17036100
 ] 

Kamil Wasilewski commented on BEAM-9247:


I think in some cases it makes sense to accept both image and image_context in 
the form of tuple as an input. But, on the other hand, only few Video 
Intelligence and Cloud Vision features require additional parameters. In this 
case, video_context/image_context is always empty.

What about creating two public PTransforms? The first transform could let the 
user specify one unique context per element and the second transform could 
accept the simpler form of an input (without context). Then, the user could 
choose between better control over the input data or simplicity. 

In terms of implementation, these transforms could use the same DoFn 
underneath. 

Let me know what you think about this.

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9292) Provide an ability to specify additional maven repositories for published POMs

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9292?focusedWorklogId=386509&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386509
 ]

ASF GitHub Bot logged work on BEAM-9292:


Author: ASF GitHub Bot
Created on: 13/Feb/20 10:15
Start Date: 13/Feb/20 10:15
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10832: [BEAM-9292] 
Provide an ability to specify additional maven repositories for published POMs
URL: https://github.com/apache/beam/pull/10832#issuecomment-585652309
 
 
   Thanks, @iemejia !
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386509)
Time Spent: 1.5h  (was: 1h 20m)

> Provide an ability to specify additional maven repositories for published POMs
> --
>
> Key: BEAM-9292
> URL: https://issues.apache.org/jira/browse/BEAM-9292
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, io-java-kafka
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> To support Confluent Schema Registry, KafkaIO has a dependency on 
> {{io.confluent:kafka-avro-serializer}} from 
> https://packages.confluent.io/maven/ repository. In this case, it should add 
> this repository into published KafkaIO POM file. Otherwise, it will fail with 
> the following error during building a user pipeline:
> {code}
> [ERROR] Failed to execute goal on project kafka-io: Could not resolve 
> dependencies for project org.apache.beam.issues:kafka-io:jar:1.0.0-SNAPSHOT: 
> Could not find artifact io.confluent:kafka-avro-serializer:jar:5.3.2 in 
> central (https://repo.maven.apache.org/maven2) -> [Help 1]
> {code}
> The repositories for publishing can be added by {{mavenRepositories}} 
> argument in build script for Java configuration. 
> For example (KafkaIO:
> {code}
> $ cat sdks/java/io/kafka/build.gradle 
> ...
> applyJavaNature(
>   ...
>   mavenRepositories: [
> [id: 'io.confluent', url: 'https://packages.confluent.io/maven/']
>   ]
> )
> ...
> {code}
> It will generate the following xml code snippet in pom file of 
> {{beam-sdks-java-io-kafka}} artifact after publishing:
> {code}
>   
> 
>   io.confluent
>   https://packages.confluent.io/maven/
> 
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9281) Update commons-csv to version 1.8

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9281?focusedWorklogId=386523&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386523
 ]

ASF GitHub Bot logged work on BEAM-9281:


Author: ASF GitHub Bot
Created on: 13/Feb/20 10:30
Start Date: 13/Feb/20 10:30
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10818: [BEAM-9281] 
Update commons-csv to version 1.8
URL: https://github.com/apache/beam/pull/10818#issuecomment-585658529
 
 
   Run PythonLint PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386523)
Time Spent: 50m  (was: 40m)

> Update commons-csv to version 1.8
> -
>
> Key: BEAM-9281
> URL: https://issues.apache.org/jira/browse/BEAM-9281
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Trivial
> Fix For: 2.20.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9281) Update commons-csv to version 1.8

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9281?focusedWorklogId=386525&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386525
 ]

ASF GitHub Bot logged work on BEAM-9281:


Author: ASF GitHub Bot
Created on: 13/Feb/20 10:31
Start Date: 13/Feb/20 10:31
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10818: [BEAM-9281] 
Update commons-csv to version 1.8
URL: https://github.com/apache/beam/pull/10818#issuecomment-585658639
 
 
   Run Python2_PVR_Flink PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386525)
Time Spent: 1h 10m  (was: 1h)

> Update commons-csv to version 1.8
> -
>
> Key: BEAM-9281
> URL: https://issues.apache.org/jira/browse/BEAM-9281
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Trivial
> Fix For: 2.20.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9281) Update commons-csv to version 1.8

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9281?focusedWorklogId=386524&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386524
 ]

ASF GitHub Bot logged work on BEAM-9281:


Author: ASF GitHub Bot
Created on: 13/Feb/20 10:31
Start Date: 13/Feb/20 10:31
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10818: [BEAM-9281] 
Update commons-csv to version 1.8
URL: https://github.com/apache/beam/pull/10818#issuecomment-585658584
 
 
   Run PythonFormatter PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386524)
Time Spent: 1h  (was: 50m)

> Update commons-csv to version 1.8
> -
>
> Key: BEAM-9281
> URL: https://issues.apache.org/jira/browse/BEAM-9281
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Trivial
> Fix For: 2.20.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9281) Update commons-csv to version 1.8

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9281?focusedWorklogId=386526&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386526
 ]

ASF GitHub Bot logged work on BEAM-9281:


Author: ASF GitHub Bot
Created on: 13/Feb/20 10:31
Start Date: 13/Feb/20 10:31
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10818: [BEAM-9281] 
Update commons-csv to version 1.8
URL: https://github.com/apache/beam/pull/10818#issuecomment-585658720
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386526)
Time Spent: 1h 20m  (was: 1h 10m)

> Update commons-csv to version 1.8
> -
>
> Key: BEAM-9281
> URL: https://issues.apache.org/jira/browse/BEAM-9281
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Trivial
> Fix For: 2.20.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9298) Drop support for Flink 1.7

2020-02-13 Thread Maximilian Michels (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036121#comment-17036121
 ] 

Maximilian Michels commented on BEAM-9298:
--

+1 to removing 1.7

It's a good idea to announce this on the mailing list, prior to the removal. 
It'll help to communicate the change in case any users are still relying on the 
build. 

> Drop support for Flink 1.7 
> ---
>
> Key: BEAM-9298
> URL: https://issues.apache.org/jira/browse/BEAM-9298
> Project: Beam
>  Issue Type: Task
>  Components: runner-flink
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.20.0
>
>
> With Flink 1.10 around the corner, more detail can be found in BEAM-9295, we 
> should consider dropping support for Flink 1.7. Then dropping 1.7 will also 
> decrease the build time.
> What do you think?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386530&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386530
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 13/Feb/20 10:33
Start Date: 13/Feb/20 10:33
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10764: [BEAM-9146] 
Integrate GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-585659566
 
 
   @EDjur your phrase triggers unfortunately won't trigger the tests - there 
are Jenkins restrictions that won't let you do it
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386530)
Time Spent: 9h 50m  (was: 9h 40m)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386529&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386529
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 13/Feb/20 10:33
Start Date: 13/Feb/20 10:33
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10764: [BEAM-9146] 
Integrate GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-585659326
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386529)
Time Spent: 9h 40m  (was: 9.5h)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386532&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386532
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 13/Feb/20 10:37
Start Date: 13/Feb/20 10:37
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10764: [BEAM-9146] Integrate 
GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-585661147
 
 
   > @EDjur your phrase triggers unfortunately won't trigger the tests - there 
are Jenkins restrictions that won't let you do it
   
   Gotcha. Thanks for triggering them!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386532)
Time Spent: 10h  (was: 9h 50m)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386534&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386534
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 13/Feb/20 10:37
Start Date: 13/Feb/20 10:37
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10764: [BEAM-9146] Integrate 
GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-585603649
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386534)
Time Spent: 10h 20m  (was: 10h 10m)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386533&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386533
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 13/Feb/20 10:37
Start Date: 13/Feb/20 10:37
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10764: [BEAM-9146] Integrate 
GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-585648977
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386533)
Time Spent: 10h 10m  (was: 10h)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386535&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386535
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 13/Feb/20 10:37
Start Date: 13/Feb/20 10:37
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10764: [BEAM-9146] Integrate 
GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-585431841
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386535)
Time Spent: 10.5h  (was: 10h 20m)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9281) Update commons-csv to version 1.8

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9281?focusedWorklogId=386540&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386540
 ]

ASF GitHub Bot logged work on BEAM-9281:


Author: ASF GitHub Bot
Created on: 13/Feb/20 10:47
Start Date: 13/Feb/20 10:47
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10818: [BEAM-9281] 
Update commons-csv to version 1.8
URL: https://github.com/apache/beam/pull/10818#issuecomment-585658639
 
 
   Run Python2_PVR_Flink PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386540)
Time Spent: 2h  (was: 1h 50m)

> Update commons-csv to version 1.8
> -
>
> Key: BEAM-9281
> URL: https://issues.apache.org/jira/browse/BEAM-9281
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Trivial
> Fix For: 2.20.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9281) Update commons-csv to version 1.8

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9281?focusedWorklogId=386537&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386537
 ]

ASF GitHub Bot logged work on BEAM-9281:


Author: ASF GitHub Bot
Created on: 13/Feb/20 10:47
Start Date: 13/Feb/20 10:47
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10818: [BEAM-9281] 
Update commons-csv to version 1.8
URL: https://github.com/apache/beam/pull/10818#issuecomment-585665000
 
 
   Seems like this dependency is used only in `sdks/java/extensions/sql`. Can 
someone from Beam SQL developers take a look?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386537)
Time Spent: 1.5h  (was: 1h 20m)

> Update commons-csv to version 1.8
> -
>
> Key: BEAM-9281
> URL: https://issues.apache.org/jira/browse/BEAM-9281
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Trivial
> Fix For: 2.20.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9281) Update commons-csv to version 1.8

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9281?focusedWorklogId=386541&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386541
 ]

ASF GitHub Bot logged work on BEAM-9281:


Author: ASF GitHub Bot
Created on: 13/Feb/20 10:47
Start Date: 13/Feb/20 10:47
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10818: [BEAM-9281] 
Update commons-csv to version 1.8
URL: https://github.com/apache/beam/pull/10818#issuecomment-585658720
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386541)
Time Spent: 2h 10m  (was: 2h)

> Update commons-csv to version 1.8
> -
>
> Key: BEAM-9281
> URL: https://issues.apache.org/jira/browse/BEAM-9281
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Trivial
> Fix For: 2.20.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9281) Update commons-csv to version 1.8

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9281?focusedWorklogId=386539&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386539
 ]

ASF GitHub Bot logged work on BEAM-9281:


Author: ASF GitHub Bot
Created on: 13/Feb/20 10:47
Start Date: 13/Feb/20 10:47
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10818: [BEAM-9281] 
Update commons-csv to version 1.8
URL: https://github.com/apache/beam/pull/10818#issuecomment-585658584
 
 
   Run PythonFormatter PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386539)
Time Spent: 1h 50m  (was: 1h 40m)

> Update commons-csv to version 1.8
> -
>
> Key: BEAM-9281
> URL: https://issues.apache.org/jira/browse/BEAM-9281
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Trivial
> Fix For: 2.20.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9281) Update commons-csv to version 1.8

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9281?focusedWorklogId=386538&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386538
 ]

ASF GitHub Bot logged work on BEAM-9281:


Author: ASF GitHub Bot
Created on: 13/Feb/20 10:47
Start Date: 13/Feb/20 10:47
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10818: [BEAM-9281] 
Update commons-csv to version 1.8
URL: https://github.com/apache/beam/pull/10818#issuecomment-585658529
 
 
   Run PythonLint PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386538)
Time Spent: 1h 40m  (was: 1.5h)

> Update commons-csv to version 1.8
> -
>
> Key: BEAM-9281
> URL: https://issues.apache.org/jira/browse/BEAM-9281
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Trivial
> Fix For: 2.20.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9258) [Python] PTransform that connects to Cloud DLP deidentification service

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9258?focusedWorklogId=386544&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386544
 ]

ASF GitHub Bot logged work on BEAM-9258:


Author: ASF GitHub Bot
Created on: 13/Feb/20 10:51
Start Date: 13/Feb/20 10:51
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on pull request #10849: [BEAM-9258] 
Integrate Google Cloud Data loss prevention functionality for Python SDK
URL: https://github.com/apache/beam/pull/10849
 
 
   
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https:/

[jira] [Work logged] (BEAM-9258) [Python] PTransform that connects to Cloud DLP deidentification service

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9258?focusedWorklogId=386545&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386545
 ]

ASF GitHub Bot logged work on BEAM-9258:


Author: ASF GitHub Bot
Created on: 13/Feb/20 10:52
Start Date: 13/Feb/20 10:52
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10849: [BEAM-9258] 
Integrate Google Cloud Data loss prevention functionality for Python SDK
URL: https://github.com/apache/beam/pull/10849#issuecomment-585666957
 
 
   R: @aaltay Can you take a look at this? Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386545)
Time Spent: 20m  (was: 10m)

> [Python] PTransform that connects to Cloud DLP deidentification service
> ---
>
> Key: BEAM-9258
> URL: https://issues.apache.org/jira/browse/BEAM-9258
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Michał Walenia
>Assignee: Michał Walenia
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386556&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386556
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 13/Feb/20 11:21
Start Date: 13/Feb/20 11:21
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10764: [BEAM-9146] 
Integrate GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-585678000
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386556)
Time Spent: 10h 40m  (was: 10.5h)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9085) Investigate performance difference between Python 2/3 on Dataflow

2020-02-13 Thread Kamil Wasilewski (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036145#comment-17036145
 ] 

Kamil Wasilewski commented on BEAM-9085:


Thanks [~tvalentyn], great work!

I'll take a look at this in the next couple of days. I'll keep you posted on 
any progress.

> Investigate performance difference between Python 2/3 on Dataflow
> -
>
> Key: BEAM-9085
> URL: https://issues.apache.org/jira/browse/BEAM-9085
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Kamil Wasilewski
>Assignee: Valentyn Tymofieiev
>Priority: Major
>
> Tests show that the performance of core Beam operations in Python 3.x on 
> Dataflow can be a few time slower than in Python 2.7. We should investigate 
> what's the cause of the problem.
> Currently, we have one ParDo test that is run both in Py3 and Py2 [1]. A 
> dashboard with runtime results can be found here [2].
> [1] sdks/python/apache_beam/testing/load_tests/pardo_test.py
> [2] https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9265) @RequiresTimeSortedInput does not respect allowedLateness

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9265?focusedWorklogId=386577&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386577
 ]

ASF GitHub Bot logged work on BEAM-9265:


Author: ASF GitHub Bot
Created on: 13/Feb/20 11:45
Start Date: 13/Feb/20 11:45
Worklog Time Spent: 10m 
  Work Description: dmvk commented on pull request #10795: [BEAM-9265] 
@RequiresTimeSortedInput respects allowedLateness
URL: https://github.com/apache/beam/pull/10795#discussion_r378811377
 
 

 ##
 File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java
 ##
 @@ -2409,6 +2410,39 @@ public void testRequiresTimeSortedInputWithTestStream() 
{
   testTimeSortedInput(numElements, 
pipeline.apply(stream.advanceWatermarkToInfinity()));
 }
 
+@Test
+@Category({
+  ValidatesRunner.class,
+  UsesStatefulParDo.class,
+  UsesRequiresTimeSortedInput.class,
+  UsesStrictTimerOrdering.class,
+  UsesTestStream.class
+})
+public void testRequiresTimeSortedInputWithLateDataAndAllowedLateness() {
+  // generate list long enough to rule out random shuffle in sorted order
+  int numElements = 1000;
+  List eventStamps =
+  LongStream.range(0, numElements)
+  .mapToObj(i -> numElements - i)
+  .collect(Collectors.toList());
+  TestStream.Builder input = TestStream.create(VarLongCoder.of());
+  for (Long stamp : eventStamps) {
+input = input.addElements(TimestampedValue.of(stamp, 
Instant.ofEpochMilli(stamp)));
+if (stamp == 100) {
+  // advance watermark when we have 100 remaining elements
 
 Review comment:
   🤦‍♂ make sense, thanks for clarification
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386577)
Time Spent: 3h 10m  (was: 3h)

> @RequiresTimeSortedInput does not respect allowedLateness
> -
>
> Key: BEAM-9265
> URL: https://issues.apache.org/jira/browse/BEAM-9265
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.20.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Currently, @RequiresTimeSortedInput drops data with respect to 
> allowedLateness, but timers are triggered without respecting it. We have to:
>  - drop data that is too late (after allowedLateness)
>  - setup timer for _minTimestamp + allowedLateness_
>  -  hold output watermark at _minTimestamp_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9265) @RequiresTimeSortedInput does not respect allowedLateness

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9265?focusedWorklogId=386578&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386578
 ]

ASF GitHub Bot logged work on BEAM-9265:


Author: ASF GitHub Bot
Created on: 13/Feb/20 11:45
Start Date: 13/Feb/20 11:45
Worklog Time Spent: 10m 
  Work Description: dmvk commented on pull request #10795: [BEAM-9265] 
@RequiresTimeSortedInput respects allowedLateness
URL: https://github.com/apache/beam/pull/10795#discussion_r378811377
 
 

 ##
 File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/ParDoTest.java
 ##
 @@ -2409,6 +2410,39 @@ public void testRequiresTimeSortedInputWithTestStream() 
{
   testTimeSortedInput(numElements, 
pipeline.apply(stream.advanceWatermarkToInfinity()));
 }
 
+@Test
+@Category({
+  ValidatesRunner.class,
+  UsesStatefulParDo.class,
+  UsesRequiresTimeSortedInput.class,
+  UsesStrictTimerOrdering.class,
+  UsesTestStream.class
+})
+public void testRequiresTimeSortedInputWithLateDataAndAllowedLateness() {
+  // generate list long enough to rule out random shuffle in sorted order
+  int numElements = 1000;
+  List eventStamps =
+  LongStream.range(0, numElements)
+  .mapToObj(i -> numElements - i)
+  .collect(Collectors.toList());
+  TestStream.Builder input = TestStream.create(VarLongCoder.of());
+  for (Long stamp : eventStamps) {
+input = input.addElements(TimestampedValue.of(stamp, 
Instant.ofEpochMilli(stamp)));
+if (stamp == 100) {
+  // advance watermark when we have 100 remaining elements
 
 Review comment:
   🤦‍♂ makes sense, thanks for clarification
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386578)
Time Spent: 3h 20m  (was: 3h 10m)

> @RequiresTimeSortedInput does not respect allowedLateness
> -
>
> Key: BEAM-9265
> URL: https://issues.apache.org/jira/browse/BEAM-9265
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.20.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Currently, @RequiresTimeSortedInput drops data with respect to 
> allowedLateness, but timers are triggered without respecting it. We have to:
>  - drop data that is too late (after allowedLateness)
>  - setup timer for _minTimestamp + allowedLateness_
>  -  hold output watermark at _minTimestamp_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9298) Drop support for Flink 1.7

2020-02-13 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036154#comment-17036154
 ] 

sunjincheng commented on BEAM-9298:
---

Thank you all! I will do it accodingly after add Flink 1.10 build target. :)

> Drop support for Flink 1.7 
> ---
>
> Key: BEAM-9298
> URL: https://issues.apache.org/jira/browse/BEAM-9298
> Project: Beam
>  Issue Type: Task
>  Components: runner-flink
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.20.0
>
>
> With Flink 1.10 around the corner, more detail can be found in BEAM-9295, we 
> should consider dropping support for Flink 1.7. Then dropping 1.7 will also 
> decrease the build time.
> What do you think?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9298) Drop support for Flink 1.7

2020-02-13 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036154#comment-17036154
 ] 

sunjincheng edited comment on BEAM-9298 at 2/13/20 11:46 AM:
-

I see, thank you all! I will do it accodingly after add Flink 1.10 build 
target. :)


was (Author: sunjincheng121):
Thank you all! I will do it accodingly after add Flink 1.10 build target. :)

> Drop support for Flink 1.7 
> ---
>
> Key: BEAM-9298
> URL: https://issues.apache.org/jira/browse/BEAM-9298
> Project: Beam
>  Issue Type: Task
>  Components: runner-flink
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.20.0
>
>
> With Flink 1.10 around the corner, more detail can be found in BEAM-9295, we 
> should consider dropping support for Flink 1.7. Then dropping 1.7 will also 
> decrease the build time.
> What do you think?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9299) Upgrade Flink Runner to 1.8.3 and 1.9.2

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9299?focusedWorklogId=386587&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386587
 ]

ASF GitHub Bot logged work on BEAM-9299:


Author: ASF GitHub Bot
Created on: 13/Feb/20 11:53
Start Date: 13/Feb/20 11:53
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on pull request #10850: 
[BEAM-9299] Upgrade Flink Runner from 1.8.2 to 1.8.3
URL: https://github.com/apache/beam/pull/10850
 
 
   Upgrade Flink Runner 1.8.x to 1.8.3
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
St

[jira] [Work logged] (BEAM-9281) Update commons-csv to version 1.8

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9281?focusedWorklogId=386605&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386605
 ]

ASF GitHub Bot logged work on BEAM-9281:


Author: ASF GitHub Bot
Created on: 13/Feb/20 12:56
Start Date: 13/Feb/20 12:56
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10818: [BEAM-9281] Update 
commons-csv to version 1.8
URL: https://github.com/apache/beam/pull/10818#issuecomment-585741925
 
 
   The PR is passing in total 17 checks including both `Run SQL PreCommit` and 
`SQL Post Commit Tests`. If there is an error then we have bad test coverage, 
what about merging as it is?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386605)
Time Spent: 2h 20m  (was: 2h 10m)

> Update commons-csv to version 1.8
> -
>
> Key: BEAM-9281
> URL: https://issues.apache.org/jira/browse/BEAM-9281
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Trivial
> Fix For: 2.20.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-13 Thread Elias Djurfeldt (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036209#comment-17036209
 ] 

Elias Djurfeldt commented on BEAM-9247:
---

How about something like this?
[https://github.com/EDjur/beam/commit/cae4f82957017df31a68de36288b74db2616fa78#diff-0cd6945eb0fdc8a93cb9d7fef0859ca0R94]

This keeps a single PTransform and DoFn, but accepts 
{code:java}
@typehints.with_input_types(
Union[Tuple[Union[text_type, binary_type], vision.types.ImageContext],
  Union[text_type, binary_type]]){code}
Defaults the `image_context` to `None` and unpacks the tuple if needed.

Only concern I have is it might be confusing having a PTransform that accepts 
either a tuple of (image, ImageContext) or just string of image. But this 
reduces code duplication quite a lot.

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-13 Thread Elias Djurfeldt (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036209#comment-17036209
 ] 

Elias Djurfeldt edited comment on BEAM-9247 at 2/13/20 1:12 PM:


How about something like this?
 
[https://github.com/EDjur/beam/commit/cae4f82957017df31a68de36288b74db2616fa78#diff-0cd6945eb0fdc8a93cb9d7fef0859ca0R94]

Have a look at the tests too to see how I envision it working.

This keeps a single PTransform and DoFn, but accepts 
{code:java}
@typehints.with_input_types(
Union[Tuple[Union[text_type, binary_type], vision.types.ImageContext],
  Union[text_type, binary_type]]){code}
Defaults the `image_context` to `None` and unpacks the tuple if needed.

Only concern I have is it might be confusing having a PTransform that accepts 
either a tuple of (image, ImageContext) or just string of image. But this 
reduces code duplication quite a lot.


was (Author: eliasdjur):
How about something like this?
[https://github.com/EDjur/beam/commit/cae4f82957017df31a68de36288b74db2616fa78#diff-0cd6945eb0fdc8a93cb9d7fef0859ca0R94]

This keeps a single PTransform and DoFn, but accepts 
{code:java}
@typehints.with_input_types(
Union[Tuple[Union[text_type, binary_type], vision.types.ImageContext],
  Union[text_type, binary_type]]){code}
Defaults the `image_context` to `None` and unpacks the tuple if needed.

Only concern I have is it might be confusing having a PTransform that accepts 
either a tuple of (image, ImageContext) or just string of image. But this 
reduces code duplication quite a lot.

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386614&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386614
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 13/Feb/20 13:19
Start Date: 13/Feb/20 13:19
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10764: [BEAM-9146] Integrate 
GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-585750091
 
 
   I'm having a small discussion regarding the `video_context` with @kamilwu 
here: https://issues.apache.org/jira/browse/BEAM-9247
   
   Perhaps this is something we should look at integrating to this PR before 
(or after considering the use-case seems small) merging too.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386614)
Time Spent: 10h 50m  (was: 10h 40m)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-13 Thread Elias Djurfeldt (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036209#comment-17036209
 ] 

Elias Djurfeldt edited comment on BEAM-9247 at 2/13/20 1:31 PM:


How about something like this?
 
[https://github.com/EDjur/beam/commit/cae4f82957017df31a68de36288b74db2616fa78#diff-0cd6945eb0fdc8a93cb9d7fef0859ca0R94]

Have a look at the tests too to see how I envision it working.

This keeps a single PTransform and DoFn, but accepts 
{code:java}
@typehints.with_input_types(
Union[Tuple[Union[text_type, binary_type], vision.types.ImageContext],
  Union[text_type, binary_type]]){code}
Defaults the `image_context` to `None` and unpacks the tuple if needed.

Only concern I have is it might be confusing having a PTransform that accepts 
either a tuple of (image, ImageContext) or just Union[string, bytes] of image. 
But this reduces code duplication quite a lot.


was (Author: eliasdjur):
How about something like this?
 
[https://github.com/EDjur/beam/commit/cae4f82957017df31a68de36288b74db2616fa78#diff-0cd6945eb0fdc8a93cb9d7fef0859ca0R94]

Have a look at the tests too to see how I envision it working.

This keeps a single PTransform and DoFn, but accepts 
{code:java}
@typehints.with_input_types(
Union[Tuple[Union[text_type, binary_type], vision.types.ImageContext],
  Union[text_type, binary_type]]){code}
Defaults the `image_context` to `None` and unpacks the tuple if needed.

Only concern I have is it might be confusing having a PTransform that accepts 
either a tuple of (image, ImageContext) or just string of image. But this 
reduces code duplication quite a lot.

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-13 Thread Elias Djurfeldt (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036209#comment-17036209
 ] 

Elias Djurfeldt edited comment on BEAM-9247 at 2/13/20 1:31 PM:


How about something like this?
 
[https://github.com/EDjur/beam/commit/cae4f82957017df31a68de36288b74db2616fa78#diff-0cd6945eb0fdc8a93cb9d7fef0859ca0R94]

Have a look at the tests too to see how I envision it working.

This keeps a single PTransform and DoFn, but accepts 
{code:java}
@typehints.with_input_types(
Union[Tuple[Union[text_type, binary_type], vision.types.ImageContext],
  Union[text_type, binary_type]]){code}
Defaults the `image_context` to `None` and unpacks the tuple if needed.

Only concern I have is it might be confusing having a PTransform that accepts 
either a tuple of (Union[string, bytes], ImageContext) or just Union[string, 
bytes] of image. But this reduces code duplication quite a lot.


was (Author: eliasdjur):
How about something like this?
 
[https://github.com/EDjur/beam/commit/cae4f82957017df31a68de36288b74db2616fa78#diff-0cd6945eb0fdc8a93cb9d7fef0859ca0R94]

Have a look at the tests too to see how I envision it working.

This keeps a single PTransform and DoFn, but accepts 
{code:java}
@typehints.with_input_types(
Union[Tuple[Union[text_type, binary_type], vision.types.ImageContext],
  Union[text_type, binary_type]]){code}
Defaults the `image_context` to `None` and unpacks the tuple if needed.

Only concern I have is it might be confusing having a PTransform that accepts 
either a tuple of (image, ImageContext) or just Union[string, bytes] of image. 
But this reduces code duplication quite a lot.

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9281) Update commons-csv to version 1.8

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9281?focusedWorklogId=386638&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386638
 ]

ASF GitHub Bot logged work on BEAM-9281:


Author: ASF GitHub Bot
Created on: 13/Feb/20 13:57
Start Date: 13/Feb/20 13:57
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #10818: 
[BEAM-9281] Update commons-csv to version 1.8
URL: https://github.com/apache/beam/pull/10818
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386638)
Time Spent: 2.5h  (was: 2h 20m)

> Update commons-csv to version 1.8
> -
>
> Key: BEAM-9281
> URL: https://issues.apache.org/jira/browse/BEAM-9281
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Trivial
> Fix For: 2.20.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9281) Update commons-csv to version 1.8

2020-02-13 Thread Alexey Romanenko (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Romanenko resolved BEAM-9281.

Resolution: Fixed

> Update commons-csv to version 1.8
> -
>
> Key: BEAM-9281
> URL: https://issues.apache.org/jira/browse/BEAM-9281
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Trivial
> Fix For: 2.20.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-13 Thread Kamil Wasilewski (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036296#comment-17036296
 ] 

Kamil Wasilewski commented on BEAM-9247:


If it reduces code duplication significantly, then I'm fine with having only 
one PTransform. Please make sure that the PTransform has a clear documentation 
on what are the expected input types and what's the difference between them.

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9248) [Python] PTransform that integrates Cloud Natural Language functionality

2020-02-13 Thread Kamil Wasilewski (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamil Wasilewski reassigned BEAM-9248:
--

Assignee: Michał Walenia

> [Python] PTransform that integrates Cloud Natural Language functionality
> 
>
> Key: BEAM-9248
> URL: https://issues.apache.org/jira/browse/BEAM-9248
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Michał Walenia
>Priority: Major
>
> The goal is to create a PTransform that integrates Google Cloud Natural 
> Language API functionality [1].
> [1] https://cloud.google.com/natural-language/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9301) Check in beam-linkage-check.sh

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9301?focusedWorklogId=386702&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386702
 ]

ASF GitHub Bot logged work on BEAM-9301:


Author: ASF GitHub Bot
Created on: 13/Feb/20 15:24
Start Date: 13/Feb/20 15:24
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10841: [BEAM-9301] 
Check in beam-linkage-check.sh
URL: https://github.com/apache/beam/pull/10841#discussion_r378930051
 
 

 ##
 File path: sdks/java/build-tools/beam-linkage-check.sh
 ##
 @@ -0,0 +1,99 @@
+#!/bin/bash
 
 Review comment:
   No issues, the shebang should cover it so ok for me
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386702)
Time Spent: 40m  (was: 0.5h)

> Check in beam-linkage-check.sh
> --
>
> Key: BEAM-9301
> URL: https://issues.apache.org/jira/browse/BEAM-9301
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/pull/10769#issuecomment-584571787
> bq. @suztomo can you contribute this script maybe into Beam's build-tools 
> directory so we can improve it a bit for further use?
> This is a temporary solution before exclusion rules in Linkage Checker 
> (BEAM-9206) are implemented.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=386705&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386705
 ]

ASF GitHub Bot logged work on BEAM-6522:


Author: ASF GitHub Bot
Created on: 13/Feb/20 15:26
Start Date: 13/Feb/20 15:26
Worklog Time Spent: 10m 
  Work Description: lazylynx commented on pull request #10838: [BEAM-6522] 
[BEAM-7455] Unskip Avro IO tests that are now passing.
URL: https://github.com/apache/beam/pull/10838#discussion_r378930566
 
 

 ##
 File path: sdks/python/apache_beam/examples/fastavro_it_test.py
 ##
 @@ -181,7 +175,7 @@ def assertEqual(l, r):
   if l != r:
 raise BeamAssertException('Assertion failed: %s == %s' % (l, r))
 
-assertEqual(v.keys(), ['avro', 'fastavro'])
+assertEqual(list(v.keys()), ['avro', 'fastavro'])
 
 Review comment:
   We should use `sorted` instead of `list` if tests run in Python 2.7/3.5/3.6.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386705)
Time Spent: 10h  (was: 9h 50m)

> Dill fails to pickle  avro.RecordSchema classes on Python 3.
> 
>
> Key: BEAM-6522
> URL: https://issues.apache.org/jira/browse/BEAM-6522
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> The avroio module still has 4 failing tests. This is actually 2 times the 
> same 2 tests, both for Avro and Fastavro.
> *apache_beam.io.avroio_test.TestAvro.test_sink_transform*
>  *apache_beam.io.avroio_test.TestFastAvro.test_sink_transform*
> fail with:
> {code:java}
> Traceback (most recent call last):
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", 
> line 432, in test_sink_transform
> | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line 
> 528, in expand
> return pcoll | beam.io.iobase.Write(self._sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 960, in expand
> return pcoll | WriteImpl(self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 979, in expand
> lambda _, sink: sink.initialize_write(), self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1103, in Map
> pardo = FlatMap(wrapper, *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1054, in FlatMap
> pardo = ParDo(Callable

[jira] [Work logged] (BEAM-9301) Check in beam-linkage-check.sh

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9301?focusedWorklogId=386712&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386712
 ]

ASF GitHub Bot logged work on BEAM-9301:


Author: ASF GitHub Bot
Created on: 13/Feb/20 15:31
Start Date: 13/Feb/20 15:31
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10841: [BEAM-9301] 
Check in beam-linkage-check.sh
URL: https://github.com/apache/beam/pull/10841
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386712)
Time Spent: 50m  (was: 40m)

> Check in beam-linkage-check.sh
> --
>
> Key: BEAM-9301
> URL: https://issues.apache.org/jira/browse/BEAM-9301
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/pull/10769#issuecomment-584571787
> bq. @suztomo can you contribute this script maybe into Beam's build-tools 
> directory so we can improve it a bit for further use?
> This is a temporary solution before exclusion rules in Linkage Checker 
> (BEAM-9206) are implemented.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9301) Check in beam-linkage-check.sh

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9301?focusedWorklogId=386713&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386713
 ]

ASF GitHub Bot logged work on BEAM-9301:


Author: ASF GitHub Bot
Created on: 13/Feb/20 15:32
Start Date: 13/Feb/20 15:32
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10841: [BEAM-9301] Check in 
beam-linkage-check.sh
URL: https://github.com/apache/beam/pull/10841#issuecomment-585817360
 
 
   Thanks it will be really useful to have this here so we can evolve it 
together if needed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386713)
Time Spent: 1h  (was: 50m)

> Check in beam-linkage-check.sh
> --
>
> Key: BEAM-9301
> URL: https://issues.apache.org/jira/browse/BEAM-9301
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/pull/10769#issuecomment-584571787
> bq. @suztomo can you contribute this script maybe into Beam's build-tools 
> directory so we can improve it a bit for further use?
> This is a temporary solution before exclusion rules in Linkage Checker 
> (BEAM-9206) are implemented.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9301) Check in beam-linkage-check.sh

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9301?focusedWorklogId=386714&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386714
 ]

ASF GitHub Bot logged work on BEAM-9301:


Author: ASF GitHub Bot
Created on: 13/Feb/20 15:33
Start Date: 13/Feb/20 15:33
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10841: [BEAM-9301] Check in 
beam-linkage-check.sh
URL: https://github.com/apache/beam/pull/10841#issuecomment-585817896
 
 
   Oups first suggestion of improvement detect if master is checked eagerly to 
not waste time before failing
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386714)
Time Spent: 1h 10m  (was: 1h)

> Check in beam-linkage-check.sh
> --
>
> Key: BEAM-9301
> URL: https://issues.apache.org/jira/browse/BEAM-9301
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/pull/10769#issuecomment-584571787
> bq. @suztomo can you contribute this script maybe into Beam's build-tools 
> directory so we can improve it a bit for further use?
> This is a temporary solution before exclusion rules in Linkage Checker 
> (BEAM-9206) are implemented.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9291) upload_graph support in Dataflow Python SDK

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9291?focusedWorklogId=386717&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386717
 ]

ASF GitHub Bot logged work on BEAM-9291:


Author: ASF GitHub Bot
Created on: 13/Feb/20 15:41
Start Date: 13/Feb/20 15:41
Worklog Time Spent: 10m 
  Work Description: stankiewicz commented on issue #10829: [BEAM-9291] 
Upload graph option in dataflow's python sdk
URL: https://github.com/apache/beam/pull/10829#issuecomment-585821888
 
 
   I've squashed commits. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386717)
Time Spent: 2h 10m  (was: 2h)

> upload_graph support in Dataflow Python SDK
> ---
>
> Key: BEAM-9291
> URL: https://issues.apache.org/jira/browse/BEAM-9291
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Radosław Stankiewicz
>Assignee: Radosław Stankiewicz
>Priority: Minor
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> upload_graph option is not supported in Dataflow's Python SDK so there is no 
> workaround for large graphs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=386718&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386718
 ]

ASF GitHub Bot logged work on BEAM-6522:


Author: ASF GitHub Bot
Created on: 13/Feb/20 15:41
Start Date: 13/Feb/20 15:41
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #10838: [BEAM-6522] 
[BEAM-7455] Unskip Avro IO tests that are now passing.
URL: https://github.com/apache/beam/pull/10838#discussion_r378941516
 
 

 ##
 File path: sdks/python/apache_beam/examples/fastavro_it_test.py
 ##
 @@ -181,7 +175,7 @@ def assertEqual(l, r):
   if l != r:
 raise BeamAssertException('Assertion failed: %s == %s' % (l, r))
 
-assertEqual(v.keys(), ['avro', 'fastavro'])
+assertEqual(list(v.keys()), ['avro', 'fastavro'])
 
 Review comment:
   Good point, fixed. Thanks! PTAL.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386718)
Time Spent: 10h 10m  (was: 10h)

> Dill fails to pickle  avro.RecordSchema classes on Python 3.
> 
>
> Key: BEAM-6522
> URL: https://issues.apache.org/jira/browse/BEAM-6522
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> The avroio module still has 4 failing tests. This is actually 2 times the 
> same 2 tests, both for Avro and Fastavro.
> *apache_beam.io.avroio_test.TestAvro.test_sink_transform*
>  *apache_beam.io.avroio_test.TestFastAvro.test_sink_transform*
> fail with:
> {code:java}
> Traceback (most recent call last):
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", 
> line 432, in test_sink_transform
> | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line 
> 528, in expand
> return pcoll | beam.io.iobase.Write(self._sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 960, in expand
> return pcoll | WriteImpl(self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 979, in expand
> lambda _, sink: sink.initialize_write(), self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1103, in Map
> pardo = FlatMap(wrapper, *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1054, in FlatMap
> pardo = ParDo(CallableWrapperDoFn(fn), *args, **kwargs)
> Fi

[jira] [Work logged] (BEAM-9291) upload_graph support in Dataflow Python SDK

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9291?focusedWorklogId=386716&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386716
 ]

ASF GitHub Bot logged work on BEAM-9291:


Author: ASF GitHub Bot
Created on: 13/Feb/20 15:41
Start Date: 13/Feb/20 15:41
Worklog Time Spent: 10m 
  Work Description: stankiewicz commented on issue #10829: [BEAM-9291] 
Upload graph option in dataflow's python sdk
URL: https://github.com/apache/beam/pull/10829#issuecomment-585821810
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386716)
Time Spent: 2h  (was: 1h 50m)

> upload_graph support in Dataflow Python SDK
> ---
>
> Key: BEAM-9291
> URL: https://issues.apache.org/jira/browse/BEAM-9291
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Radosław Stankiewicz
>Assignee: Radosław Stankiewicz
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> upload_graph option is not supported in Dataflow's Python SDK so there is no 
> workaround for large graphs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=386721&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386721
 ]

ASF GitHub Bot logged work on BEAM-6522:


Author: ASF GitHub Bot
Created on: 13/Feb/20 15:54
Start Date: 13/Feb/20 15:54
Worklog Time Spent: 10m 
  Work Description: lazylynx commented on issue #10838: [BEAM-6522] 
[BEAM-7455] Unskip Avro IO tests that are now passing.
URL: https://github.com/apache/beam/pull/10838#issuecomment-585829022
 
 
   LGTM!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386721)
Time Spent: 10h 20m  (was: 10h 10m)

> Dill fails to pickle  avro.RecordSchema classes on Python 3.
> 
>
> Key: BEAM-6522
> URL: https://issues.apache.org/jira/browse/BEAM-6522
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> The avroio module still has 4 failing tests. This is actually 2 times the 
> same 2 tests, both for Avro and Fastavro.
> *apache_beam.io.avroio_test.TestAvro.test_sink_transform*
>  *apache_beam.io.avroio_test.TestFastAvro.test_sink_transform*
> fail with:
> {code:java}
> Traceback (most recent call last):
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", 
> line 432, in test_sink_transform
> | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line 
> 528, in expand
> return pcoll | beam.io.iobase.Write(self._sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 960, in expand
> return pcoll | WriteImpl(self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 979, in expand
> lambda _, sink: sink.initialize_write(), self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1103, in Map
> pardo = FlatMap(wrapper, *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1054, in FlatMap
> pardo = ParDo(CallableWrapperDoFn(fn), *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 864, in __init__
> super(ParDo, self).__init__(fn, *args, **kwargs)
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 646, in __init__
> self.args = pickler.loads(pickler.dumps(self.args))
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_beam/internal/pickle

[jira] [Created] (BEAM-9307) new spark runner: put windows inside the key to avoid having all values for the same key in memory

2020-02-13 Thread Etienne Chauchot (Jira)
Etienne Chauchot created BEAM-9307:
--

 Summary: new spark runner: put windows inside the key to avoid 
having all values for the same key in memory
 Key: BEAM-9307
 URL: https://issues.apache.org/jira/browse/BEAM-9307
 Project: Beam
  Issue Type: Improvement
  Components: runner-spark
Reporter: Etienne Chauchot


Like it was done for the current runner.

See: [https://www.youtube.com/watch?v=ZIFtmx8nBow&t=721s] min 10



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9308) Optimize state cleanup at end-of-window

2020-02-13 Thread Steve Niemitz (Jira)
Steve Niemitz created BEAM-9308:
---

 Summary: Optimize state cleanup at end-of-window
 Key: BEAM-9308
 URL: https://issues.apache.org/jira/browse/BEAM-9308
 Project: Beam
  Issue Type: Improvement
  Components: runner-dataflow
Reporter: Steve Niemitz
Assignee: Steve Niemitz


When using state with a large keyspace, you can end up with a large amount of 
state cleanup timers set to fire all 1ms after the end of a window.  This can 
cause a momentary (I've observed 1-3 minute) lag in processing while windmill 
and the java harness fire and process these cleanup timers.

By spreading the firing over a short period after the end of the window, we can 
decorrelate the firing of the timers and smooth the load out, resulting in much 
less impact from state cleanup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=386734&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386734
 ]

ASF GitHub Bot logged work on BEAM-6522:


Author: ASF GitHub Bot
Created on: 13/Feb/20 16:33
Start Date: 13/Feb/20 16:33
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #10838: [BEAM-6522] 
[BEAM-7455] Unskip Avro IO tests that are now passing.
URL: https://github.com/apache/beam/pull/10838
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386734)
Time Spent: 10.5h  (was: 10h 20m)

> Dill fails to pickle  avro.RecordSchema classes on Python 3.
> 
>
> Key: BEAM-6522
> URL: https://issues.apache.org/jira/browse/BEAM-6522
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> The avroio module still has 4 failing tests. This is actually 2 times the 
> same 2 tests, both for Avro and Fastavro.
> *apache_beam.io.avroio_test.TestAvro.test_sink_transform*
>  *apache_beam.io.avroio_test.TestFastAvro.test_sink_transform*
> fail with:
> {code:java}
> Traceback (most recent call last):
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", 
> line 432, in test_sink_transform
> | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line 
> 528, in expand
> return pcoll | beam.io.iobase.Write(self._sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 960, in expand
> return pcoll | WriteImpl(self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 979, in expand
> lambda _, sink: sink.initialize_write(), self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1103, in Map
> pardo = FlatMap(wrapper, *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1054, in FlatMap
> pardo = ParDo(CallableWrapperDoFn(fn), *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 864, in __init__
> super(ParDo, self).__init__(fn, *args, **kwargs)
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 646, in __init__
> self.args = pickler.loads(pickler.dumps(self.args))
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_beam/internal/pickler.py", 
> line 247, in l

[jira] [Commented] (BEAM-7455) Improve Avro IO integration test coverage on Python 3.

2020-02-13 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036345#comment-17036345
 ] 

Valentyn Tymofieiev commented on BEAM-7455:
---

With  #9466, Py3 integration test coverage of Avro IO matches Py2 test 
coverage. This is done via fastavro_it_test.py, which runs a pipeline using 
Avro and FastAvro. If/we drop avro dependency, we may need to update this test. 

> Improve Avro IO integration test coverage on Python 3.
> --
>
> Key: BEAM-7455
> URL: https://issues.apache.org/jira/browse/BEAM-7455
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-avro
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> It seems that we don't have an integration test for Avro IO on Python 3:
> fastavro_it_test [1] depends on both avro and fastavro, however avro package 
> currently does not work with Beam on Python 3, so we don't have an 
> integration test that exercises Avro IO on Python 3. 
> We should add an integration test for Avro IO that does not need both 
> libraries at the same time, and instead can run using either library. 
> [~frederik] is this something you could help with? 
> cc: [~chamikara] [~Juta]
> [1] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/fastavro_it_test.py



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-7455) Improve Avro IO integration test coverage on Python 3.

2020-02-13 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev resolved BEAM-7455.
---
Fix Version/s: Not applicable
   Resolution: Fixed

> Improve Avro IO integration test coverage on Python 3.
> --
>
> Key: BEAM-7455
> URL: https://issues.apache.org/jira/browse/BEAM-7455
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-avro
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> It seems that we don't have an integration test for Avro IO on Python 3:
> fastavro_it_test [1] depends on both avro and fastavro, however avro package 
> currently does not work with Beam on Python 3, so we don't have an 
> integration test that exercises Avro IO on Python 3. 
> We should add an integration test for Avro IO that does not need both 
> libraries at the same time, and instead can run using either library. 
> [~frederik] is this something you could help with? 
> cc: [~chamikara] [~Juta]
> [1] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/fastavro_it_test.py



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9291) upload_graph support in Dataflow Python SDK

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9291?focusedWorklogId=386739&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386739
 ]

ASF GitHub Bot logged work on BEAM-9291:


Author: ASF GitHub Bot
Created on: 13/Feb/20 16:44
Start Date: 13/Feb/20 16:44
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10829: [BEAM-9291] Upload 
graph option in dataflow's python sdk
URL: https://github.com/apache/beam/pull/10829#issuecomment-585853970
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386739)
Time Spent: 2h 20m  (was: 2h 10m)

> upload_graph support in Dataflow Python SDK
> ---
>
> Key: BEAM-9291
> URL: https://issues.apache.org/jira/browse/BEAM-9291
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Radosław Stankiewicz
>Assignee: Radosław Stankiewicz
>Priority: Minor
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> upload_graph option is not supported in Dataflow's Python SDK so there is no 
> workaround for large graphs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=386738&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386738
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 13/Feb/20 16:44
Start Date: 13/Feb/20 16:44
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10712: [BEAM-7246] Added 
Google Spanner Write Transform
URL: https://github.com/apache/beam/pull/10712#issuecomment-585853706
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386738)
Time Spent: 21h 20m  (was: 21h 10m)

> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 21h 20m
>  Remaining Estimate: 0h
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=386741&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386741
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 13/Feb/20 16:49
Start Date: 13/Feb/20 16:49
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10712: [BEAM-7246] Added 
Google Spanner Write Transform
URL: https://github.com/apache/beam/pull/10712#issuecomment-585856253
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386741)
Time Spent: 21.5h  (was: 21h 20m)

> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 21.5h
>  Remaining Estimate: 0h
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9291) upload_graph support in Dataflow Python SDK

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9291?focusedWorklogId=386742&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386742
 ]

ASF GitHub Bot logged work on BEAM-9291:


Author: ASF GitHub Bot
Created on: 13/Feb/20 16:49
Start Date: 13/Feb/20 16:49
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10829: [BEAM-9291] Upload 
graph option in dataflow's python sdk
URL: https://github.com/apache/beam/pull/10829#issuecomment-585856275
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386742)
Time Spent: 2.5h  (was: 2h 20m)

> upload_graph support in Dataflow Python SDK
> ---
>
> Key: BEAM-9291
> URL: https://issues.apache.org/jira/browse/BEAM-9291
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Radosław Stankiewicz
>Assignee: Radosław Stankiewicz
>Priority: Minor
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> upload_graph option is not supported in Dataflow's Python SDK so there is no 
> workaround for large graphs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-13 Thread Ahmet Altay (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036348#comment-17036348
 ] 

Ahmet Altay commented on BEAM-9247:
---

I think this will be a confusing API. I will suggest 2 PTransforms of the form
 # input type : Union[text_type, binary_type], vision.types.ImageContext]
 # input type: Union[text_type, binary_type] and a  side input with: 
vision.types.ImageContext (side input could be provided statically or 
dynamically.)

If it is possible to build a unified PTransform with reduced code, offering the 
above two could be done with minimal wrapper code on top of the first.

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9308) Optimize state cleanup at end-of-window

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9308?focusedWorklogId=386743&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386743
 ]

ASF GitHub Bot logged work on BEAM-9308:


Author: ASF GitHub Bot
Created on: 13/Feb/20 16:50
Start Date: 13/Feb/20 16:50
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on pull request #10852: 
[BEAM-9308] Decorrelate state cleanup timers
URL: https://github.com/apache/beam/pull/10852
 
 
   In our larger streaming pipelines, we generally observe a short blip (1-3 
minutes) in event processing, as well as an increase in lag following window 
closing.  One reason for this is the state cleanup timers all firing once a 
window closes.
   
   We've been running this PR in our dev environment for a few days now, and 
the results are impressive.  By decorrelating (jittering) the state cleanup 
timer, we spread the timer load across a short period of time, with the 
trade-off of holding state for a slightly longer period of time.  In practice 
though, I've actually noticed our state cleans up QUICKER with this change, 
because the timers don't all compete with each other.
   
   I'd like to contribute this back (and could modify the core StatefulDoFn 
runner as well) if we agree this is something useful.
   
   There's a couple points for discussion:
   - I chose 3 minutes arbitrarily based on some experimentation, should this 
be configurable somehow?
   - I use the "user" key (from their KV input) to derive a consistent jitter 
amount.  The only real reason for this is to prevent the timer from moving 
around each element (if we used just a random amount each time instead).  I'm 
not sure if this actually matters in practice, since timers are supposed to be 
cheap to reset?
   - I added a counter which has been useful for debugging (and seeing how many 
keys are active each window), but could be removed.
   
   Interested to hear thoughts here.  
   
   Here's a before and after of our pubsub latency:
   
   before:
   
![image](https://user-images.githubusercontent.com/1882981/74457778-bb9cf080-4e56-11ea-900c-69f2a4a28613.png)
   
   after:
   
![image](https://user-images.githubusercontent.com/1882981/74457812-c788b280-4e56-11ea-801e-a4b69a84a10b.png)
   
   Based on the counter I added, we're firing ~20 million timers, across 50 
workers = ~400,000 timers / worker.  So rather than having them all fire in one 
shot, we can spread them over 3 minutes, for only ~2,000 timers / sec, which is 
much more reasonable.
   
   cc @lukecwik @pabloem 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/j

[jira] [Work logged] (BEAM-9258) [Python] PTransform that connects to Cloud DLP deidentification service

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9258?focusedWorklogId=386759&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386759
 ]

ASF GitHub Bot logged work on BEAM-9258:


Author: ASF GitHub Bot
Created on: 13/Feb/20 17:10
Start Date: 13/Feb/20 17:10
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10849: [BEAM-9258] 
Integrate Google Cloud Data loss prevention functionality for Python SDK
URL: https://github.com/apache/beam/pull/10849#discussion_r378995313
 
 

 ##
 File path: sdks/python/apache_beam/ml/gcp/cloud_dlp.py
 ##
 @@ -0,0 +1,224 @@
+#  /*
+#   * Licensed to the Apache Software Foundation (ASF) under one
+#   * or more contributor license agreements.  See the NOTICE file
+#   * distributed with this work for additional information
+#   * regarding copyright ownership.  The ASF licenses this file
+#   * to you under the Apache License, Version 2.0 (the
+#   * "License"); you may not use this file except in compliance
+#   * with the License.  You may obtain a copy of the License at
+#   *
+#   * http://www.apache.org/licenses/LICENSE-2.0
+#   *
+#   * Unless required by applicable law or agreed to in writing, software
+#   * distributed under the License is distributed on an "AS IS" BASIS,
+#   * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#   * See the License for the specific language governing permissions and
+#   * limitations under the License.
+#   */
+
+"""``PTransforms`` that implement Google Cloud Data Loss Prevention
+functionality.
+"""
+
+from __future__ import absolute_import
+
+import logging
+
+from google.cloud import dlp_v2
+
+import apache_beam as beam
+from apache_beam.utils import retry
+from apache_beam.utils.annotations import experimental
+
+__all__ = ['MaskDetectedDetails', 'InspectForDetails']
+
+_LOGGER = logging.getLogger(__name__)
+
+
+@experimental()
+class MaskDetectedDetails(beam.PTransform):
+  """Scrubs sensitive information detected in text.
+  The ``PTransform`` returns a ``PCollection`` of ``str``
+  Example usage::
+pipeline | MaskDetectedDetails(project='example-gcp-project',
+  deidentification_config={
+  'info_type_transformations: {
+  'transformations': [{
+  'primitive_transformation': {
+  'character_mask_config': {
+  'masking_character': '#'
+  }
+  }
+  }]
+  }
+  }, inspection_config={'info_types': [{'name': 'EMAIL_ADDRESS'}]})
+  """
+  def __init__(
+  self,
+  project=None,
+  deidentification_template_name=None,
+  deidentification_config=None,
+  inspection_template_name=None,
+  inspection_config=None,
+  timeout=None):
+"""Initializes a :class:`MaskDetectedDetails` transform.
+Args:
+  project (str): Required. GCP project in which the data processing is
+to be done
+  deidentification_template_name (str): Either this or
+`deidentification_config` required. Name of
+deidentification template to be used on detected sensitive information
+instances in text.
+  deidentification_config
+(``Union[dict, google.cloud.dlp_v2.types.DeidentifyConfig]``):
+Configuration for the de-identification of the content item.
+  inspection_template_name (str): This or `inspection_config` required.
+Name of inspection template to be used
+to detect sensitive data in text.
+  inspection_config
+(``Union[dict, google.cloud.dlp_v2.types.InspectConfig]``):
+Configuration for the inspector used to detect sensitive data in text.
+  timeout (float): Optional. The amount of time, in seconds, to wait for
+the request to complete.
+"""
+self.config = {}
+self.project = project
+self.timeout = timeout
+if project is None:
+  raise ValueError(
+  'GCP project name needs to be specified in "project" property')
+if deidentification_template_name is not None \
+and deidentification_config is not None:
+  raise ValueError(
+  'Both deidentification_template_name and '
+  'deidentification_config were specified.'
+  ' Please specify only one of these.')
+elif deidentification_template_name is None \
+and deidentification_config is None:
+  raise ValueError(
+  'deidentification_template_name or '
+  'deidentification_config must be specified.')
+elif deidentification_template_name is not None:
+  self.config['deidentify_template_name'] = deidentification_template_name
+else:
+  self.config['deidentify_config'] = deidentification_config
+
+if inspection_template_name is not None and inspection_config is not None:
+  raise ValueError(
+  'Both inspe

[jira] [Work logged] (BEAM-9258) [Python] PTransform that connects to Cloud DLP deidentification service

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9258?focusedWorklogId=386760&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386760
 ]

ASF GitHub Bot logged work on BEAM-9258:


Author: ASF GitHub Bot
Created on: 13/Feb/20 17:10
Start Date: 13/Feb/20 17:10
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10849: [BEAM-9258] 
Integrate Google Cloud Data loss prevention functionality for Python SDK
URL: https://github.com/apache/beam/pull/10849#discussion_r378994491
 
 

 ##
 File path: sdks/python/apache_beam/ml/gcp/cloud_dlp.py
 ##
 @@ -0,0 +1,224 @@
+#  /*
+#   * Licensed to the Apache Software Foundation (ASF) under one
+#   * or more contributor license agreements.  See the NOTICE file
+#   * distributed with this work for additional information
+#   * regarding copyright ownership.  The ASF licenses this file
+#   * to you under the Apache License, Version 2.0 (the
+#   * "License"); you may not use this file except in compliance
+#   * with the License.  You may obtain a copy of the License at
+#   *
+#   * http://www.apache.org/licenses/LICENSE-2.0
+#   *
+#   * Unless required by applicable law or agreed to in writing, software
+#   * distributed under the License is distributed on an "AS IS" BASIS,
+#   * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#   * See the License for the specific language governing permissions and
+#   * limitations under the License.
+#   */
+
+"""``PTransforms`` that implement Google Cloud Data Loss Prevention
+functionality.
+"""
+
+from __future__ import absolute_import
+
+import logging
+
+from google.cloud import dlp_v2
+
+import apache_beam as beam
+from apache_beam.utils import retry
+from apache_beam.utils.annotations import experimental
+
+__all__ = ['MaskDetectedDetails', 'InspectForDetails']
+
+_LOGGER = logging.getLogger(__name__)
+
+
+@experimental()
+class MaskDetectedDetails(beam.PTransform):
+  """Scrubs sensitive information detected in text.
+  The ``PTransform`` returns a ``PCollection`` of ``str``
+  Example usage::
+pipeline | MaskDetectedDetails(project='example-gcp-project',
+  deidentification_config={
+  'info_type_transformations: {
+  'transformations': [{
+  'primitive_transformation': {
+  'character_mask_config': {
+  'masking_character': '#'
+  }
+  }
+  }]
+  }
+  }, inspection_config={'info_types': [{'name': 'EMAIL_ADDRESS'}]})
+  """
+  def __init__(
+  self,
+  project=None,
+  deidentification_template_name=None,
+  deidentification_config=None,
+  inspection_template_name=None,
+  inspection_config=None,
+  timeout=None):
+"""Initializes a :class:`MaskDetectedDetails` transform.
+Args:
+  project (str): Required. GCP project in which the data processing is
+to be done
+  deidentification_template_name (str): Either this or
+`deidentification_config` required. Name of
+deidentification template to be used on detected sensitive information
+instances in text.
+  deidentification_config
+(``Union[dict, google.cloud.dlp_v2.types.DeidentifyConfig]``):
+Configuration for the de-identification of the content item.
+  inspection_template_name (str): This or `inspection_config` required.
+Name of inspection template to be used
+to detect sensitive data in text.
+  inspection_config
+(``Union[dict, google.cloud.dlp_v2.types.InspectConfig]``):
+Configuration for the inspector used to detect sensitive data in text.
+  timeout (float): Optional. The amount of time, in seconds, to wait for
+the request to complete.
+"""
+self.config = {}
+self.project = project
+self.timeout = timeout
+if project is None:
+  raise ValueError(
+  'GCP project name needs to be specified in "project" property')
+if deidentification_template_name is not None \
+and deidentification_config is not None:
+  raise ValueError(
+  'Both deidentification_template_name and '
+  'deidentification_config were specified.'
+  ' Please specify only one of these.')
+elif deidentification_template_name is None \
+and deidentification_config is None:
+  raise ValueError(
+  'deidentification_template_name or '
+  'deidentification_config must be specified.')
+elif deidentification_template_name is not None:
+  self.config['deidentify_template_name'] = deidentification_template_name
+else:
+  self.config['deidentify_config'] = deidentification_config
+
+if inspection_template_name is not None and inspection_config is not None:
+  raise ValueError(
+  'Both inspe

[jira] [Work logged] (BEAM-9258) [Python] PTransform that connects to Cloud DLP deidentification service

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9258?focusedWorklogId=386755&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386755
 ]

ASF GitHub Bot logged work on BEAM-9258:


Author: ASF GitHub Bot
Created on: 13/Feb/20 17:10
Start Date: 13/Feb/20 17:10
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10849: [BEAM-9258] 
Integrate Google Cloud Data loss prevention functionality for Python SDK
URL: https://github.com/apache/beam/pull/10849#discussion_r378987762
 
 

 ##
 File path: sdks/python/apache_beam/ml/gcp/cloud_dlp.py
 ##
 @@ -0,0 +1,224 @@
+#  /*
+#   * Licensed to the Apache Software Foundation (ASF) under one
+#   * or more contributor license agreements.  See the NOTICE file
+#   * distributed with this work for additional information
+#   * regarding copyright ownership.  The ASF licenses this file
+#   * to you under the Apache License, Version 2.0 (the
+#   * "License"); you may not use this file except in compliance
+#   * with the License.  You may obtain a copy of the License at
+#   *
+#   * http://www.apache.org/licenses/LICENSE-2.0
+#   *
+#   * Unless required by applicable law or agreed to in writing, software
+#   * distributed under the License is distributed on an "AS IS" BASIS,
+#   * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#   * See the License for the specific language governing permissions and
+#   * limitations under the License.
+#   */
+
+"""``PTransforms`` that implement Google Cloud Data Loss Prevention
+functionality.
+"""
+
+from __future__ import absolute_import
+
+import logging
+
+from google.cloud import dlp_v2
+
+import apache_beam as beam
+from apache_beam.utils import retry
+from apache_beam.utils.annotations import experimental
+
+__all__ = ['MaskDetectedDetails', 'InspectForDetails']
+
+_LOGGER = logging.getLogger(__name__)
+
+
+@experimental()
+class MaskDetectedDetails(beam.PTransform):
+  """Scrubs sensitive information detected in text.
+  The ``PTransform`` returns a ``PCollection`` of ``str``
+  Example usage::
+pipeline | MaskDetectedDetails(project='example-gcp-project',
+  deidentification_config={
+  'info_type_transformations: {
+  'transformations': [{
+  'primitive_transformation': {
+  'character_mask_config': {
+  'masking_character': '#'
+  }
+  }
+  }]
+  }
+  }, inspection_config={'info_types': [{'name': 'EMAIL_ADDRESS'}]})
+  """
+  def __init__(
+  self,
+  project=None,
+  deidentification_template_name=None,
+  deidentification_config=None,
+  inspection_template_name=None,
+  inspection_config=None,
+  timeout=None):
+"""Initializes a :class:`MaskDetectedDetails` transform.
+Args:
+  project (str): Required. GCP project in which the data processing is
 
 Review comment:
   Would it make sense to default to the project from gcp options?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386755)
Time Spent: 0.5h  (was: 20m)

> [Python] PTransform that connects to Cloud DLP deidentification service
> ---
>
> Key: BEAM-9258
> URL: https://issues.apache.org/jira/browse/BEAM-9258
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Michał Walenia
>Assignee: Michał Walenia
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9258) [Python] PTransform that connects to Cloud DLP deidentification service

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9258?focusedWorklogId=386756&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386756
 ]

ASF GitHub Bot logged work on BEAM-9258:


Author: ASF GitHub Bot
Created on: 13/Feb/20 17:10
Start Date: 13/Feb/20 17:10
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10849: [BEAM-9258] 
Integrate Google Cloud Data loss prevention functionality for Python SDK
URL: https://github.com/apache/beam/pull/10849#discussion_r378989261
 
 

 ##
 File path: sdks/python/apache_beam/ml/gcp/cloud_dlp.py
 ##
 @@ -0,0 +1,224 @@
+#  /*
+#   * Licensed to the Apache Software Foundation (ASF) under one
+#   * or more contributor license agreements.  See the NOTICE file
+#   * distributed with this work for additional information
+#   * regarding copyright ownership.  The ASF licenses this file
+#   * to you under the Apache License, Version 2.0 (the
+#   * "License"); you may not use this file except in compliance
+#   * with the License.  You may obtain a copy of the License at
+#   *
+#   * http://www.apache.org/licenses/LICENSE-2.0
+#   *
+#   * Unless required by applicable law or agreed to in writing, software
+#   * distributed under the License is distributed on an "AS IS" BASIS,
+#   * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#   * See the License for the specific language governing permissions and
+#   * limitations under the License.
+#   */
+
+"""``PTransforms`` that implement Google Cloud Data Loss Prevention
+functionality.
+"""
+
+from __future__ import absolute_import
+
+import logging
+
+from google.cloud import dlp_v2
+
+import apache_beam as beam
 
 Review comment:
   Let's do more explicit imports here
   
   e.g.
   from apache_beam.transforms import ParDo
   from apache_beam.transforms import PTransform
   
   then use them directly like `PTransform` instead of the `beam.PTransform` 
style.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386756)
Time Spent: 40m  (was: 0.5h)

> [Python] PTransform that connects to Cloud DLP deidentification service
> ---
>
> Key: BEAM-9258
> URL: https://issues.apache.org/jira/browse/BEAM-9258
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Michał Walenia
>Assignee: Michał Walenia
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9258) [Python] PTransform that connects to Cloud DLP deidentification service

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9258?focusedWorklogId=386758&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386758
 ]

ASF GitHub Bot logged work on BEAM-9258:


Author: ASF GitHub Bot
Created on: 13/Feb/20 17:10
Start Date: 13/Feb/20 17:10
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10849: [BEAM-9258] 
Integrate Google Cloud Data loss prevention functionality for Python SDK
URL: https://github.com/apache/beam/pull/10849#discussion_r378998869
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -203,6 +203,7 @@ def get_version():
 'google-cloud-bigquery>=1.6.0,<1.18.0',
 'google-cloud-core>=0.28.1,<2',
 'google-cloud-bigtable>=0.31.1,<1.1.0',
+'google-cloud-dlp >=0.12.0,<=0.13.0',
 
 Review comment:
   Version after 0.13 will not support py2. (notice at the top: 
https://googleapis.dev/python/dlp/latest/gapic/v2/api.html)
   
   I wonder if we need to add a comment note here for the person that will 
upgrade version ranges next?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386758)
Time Spent: 50m  (was: 40m)

> [Python] PTransform that connects to Cloud DLP deidentification service
> ---
>
> Key: BEAM-9258
> URL: https://issues.apache.org/jira/browse/BEAM-9258
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Michał Walenia
>Assignee: Michał Walenia
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9258) [Python] PTransform that connects to Cloud DLP deidentification service

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9258?focusedWorklogId=386757&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386757
 ]

ASF GitHub Bot logged work on BEAM-9258:


Author: ASF GitHub Bot
Created on: 13/Feb/20 17:10
Start Date: 13/Feb/20 17:10
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10849: [BEAM-9258] 
Integrate Google Cloud Data loss prevention functionality for Python SDK
URL: https://github.com/apache/beam/pull/10849#discussion_r378990795
 
 

 ##
 File path: sdks/python/apache_beam/ml/gcp/cloud_dlp.py
 ##
 @@ -0,0 +1,224 @@
+#  /*
+#   * Licensed to the Apache Software Foundation (ASF) under one
+#   * or more contributor license agreements.  See the NOTICE file
+#   * distributed with this work for additional information
+#   * regarding copyright ownership.  The ASF licenses this file
+#   * to you under the Apache License, Version 2.0 (the
+#   * "License"); you may not use this file except in compliance
+#   * with the License.  You may obtain a copy of the License at
+#   *
+#   * http://www.apache.org/licenses/LICENSE-2.0
+#   *
+#   * Unless required by applicable law or agreed to in writing, software
+#   * distributed under the License is distributed on an "AS IS" BASIS,
+#   * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#   * See the License for the specific language governing permissions and
+#   * limitations under the License.
+#   */
+
+"""``PTransforms`` that implement Google Cloud Data Loss Prevention
+functionality.
+"""
+
+from __future__ import absolute_import
+
+import logging
+
+from google.cloud import dlp_v2
+
+import apache_beam as beam
+from apache_beam.utils import retry
+from apache_beam.utils.annotations import experimental
+
+__all__ = ['MaskDetectedDetails', 'InspectForDetails']
+
+_LOGGER = logging.getLogger(__name__)
+
+
+@experimental()
+class MaskDetectedDetails(beam.PTransform):
+  """Scrubs sensitive information detected in text.
+  The ``PTransform`` returns a ``PCollection`` of ``str``
+  Example usage::
+pipeline | MaskDetectedDetails(project='example-gcp-project',
+  deidentification_config={
+  'info_type_transformations: {
+  'transformations': [{
+  'primitive_transformation': {
+  'character_mask_config': {
+  'masking_character': '#'
+  }
+  }
+  }]
+  }
+  }, inspection_config={'info_types': [{'name': 'EMAIL_ADDRESS'}]})
+  """
+  def __init__(
+  self,
+  project=None,
+  deidentification_template_name=None,
+  deidentification_config=None,
+  inspection_template_name=None,
+  inspection_config=None,
+  timeout=None):
+"""Initializes a :class:`MaskDetectedDetails` transform.
+Args:
+  project (str): Required. GCP project in which the data processing is
+to be done
+  deidentification_template_name (str): Either this or
+`deidentification_config` required. Name of
+deidentification template to be used on detected sensitive information
+instances in text.
+  deidentification_config
+(``Union[dict, google.cloud.dlp_v2.types.DeidentifyConfig]``):
+Configuration for the de-identification of the content item.
+  inspection_template_name (str): This or `inspection_config` required.
+Name of inspection template to be used
+to detect sensitive data in text.
+  inspection_config
+(``Union[dict, google.cloud.dlp_v2.types.InspectConfig]``):
+Configuration for the inspector used to detect sensitive data in text.
+  timeout (float): Optional. The amount of time, in seconds, to wait for
+the request to complete.
+"""
+self.config = {}
+self.project = project
+self.timeout = timeout
+if project is None:
+  raise ValueError(
+  'GCP project name needs to be specified in "project" property')
+if deidentification_template_name is not None \
+and deidentification_config is not None:
+  raise ValueError(
+  'Both deidentification_template_name and '
+  'deidentification_config were specified.'
+  ' Please specify only one of these.')
+elif deidentification_template_name is None \
+and deidentification_config is None:
+  raise ValueError(
+  'deidentification_template_name or '
+  'deidentification_config must be specified.')
+elif deidentification_template_name is not None:
+  self.config['deidentify_template_name'] = deidentification_template_name
+else:
+  self.config['deidentify_config'] = deidentification_config
+
+if inspection_template_name is not None and inspection_config is not None:
+  raise ValueError(
+  'Both inspe

[jira] [Work logged] (BEAM-9228) _SDFBoundedSourceWrapper doesn't distribute data to multiple workers

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9228?focusedWorklogId=386765&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386765
 ]

ASF GitHub Bot logged work on BEAM-9228:


Author: ASF GitHub Bot
Created on: 13/Feb/20 17:23
Start Date: 13/Feb/20 17:23
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on issue #10847: [BEAM-9228] 
Support further partition for FnApi ListBuffer
URL: https://github.com/apache/beam/pull/10847#issuecomment-585873015
 
 
   errors: 
https://builds.apache.org/job/beam_PreCommit_Python_Commit/11130/testReport/
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386765)
Time Spent: 50m  (was: 40m)

> _SDFBoundedSourceWrapper doesn't distribute data to multiple workers
> 
>
> Key: BEAM-9228
> URL: https://issues.apache.org/jira/browse/BEAM-9228
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.16.0, 2.18.0
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> A user reported following issue.
> -
> I have a set of tfrecord files, obtained by converting parquet files with 
> Spark. Each file is roughly 1GB and I have 11 of those.
> I would expect simple statistics gathering (ie counting number of items of 
> all files) to scale linearly with respect to the number of cores on my system.
> I am able to reproduce the issue with the minimal snippet below
> {code:java}
> import apache_beam as beam
> from apache_beam.options.pipeline_options import PipelineOptions
> from apache_beam.runners.portability import fn_api_runner
> from apache_beam.portability.api import beam_runner_api_pb2
> from apache_beam.portability import python_urns
> import sys
> pipeline_options = PipelineOptions(['--direct_num_workers', '4'])
> file_pattern = 'part-r-00*
> runner=fn_api_runner.FnApiRunner(
>   default_environment=beam_runner_api_pb2.Environment(
>   urn=python_urns.SUBPROCESS_SDK,
>   payload=b'%s -m apache_beam.runners.worker.sdk_worker_main'
> % sys.executable.encode('ascii')))
> p = beam.Pipeline(runner=runner, options=pipeline_options)
> lines = (p | 'read' >> beam.io.tfrecordio.ReadFromTFRecord(file_pattern)
>  | beam.combiners.Count.Globally()
>  | beam.io.WriteToText('/tmp/output'))
> p.run()
> {code}
> Only one combination of apache_beam revision / worker type seems to work (I 
> refer to https://beam.apache.org/documentation/runners/direct/ for the worker 
> types)
> * beam 2.16; neither multithread nor multiprocess achieve high cpu usage on 
> multiple cores
> * beam 2.17: able to achieve high cpu usage on all 4 cores
> * beam 2.18: not tested the mulithreaded mode but the multiprocess mode fails 
> when trying to serialize the Environment instance most likely because of a 
> change from 2.17 to 2.18.
> I also tried briefly SparkRunner with version 2.16 but was no able to achieve 
> any throughput.
> What is the recommnended way to achieve what I am trying to ? How can I 
> troubleshoot ?
> --
> This is caused by [this 
> PR|https://github.com/apache/beam/commit/02f8ad4eee3ec0ea8cbdc0f99c1dad29f00a9f60].
> A [workaround|https://github.com/apache/beam/pull/10729] is tried, which is 
> rolling back iobase.py not to use _SDFBoundedSourceWrapper. This confirmed 
> that data is distributed to multiple workers, however, there are some 
> regressions with SDF wrapper tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9292) Provide an ability to specify additional maven repositories for published POMs

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9292?focusedWorklogId=386772&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386772
 ]

ASF GitHub Bot logged work on BEAM-9292:


Author: ASF GitHub Bot
Created on: 13/Feb/20 17:26
Start Date: 13/Feb/20 17:26
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on issue #10832: [BEAM-9292] 
Provide an ability to specify additional maven repositories for published POMs
URL: https://github.com/apache/beam/pull/10832#issuecomment-585874564
 
 
   Sorry for the slowness on review - LGTM for sure.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386772)
Time Spent: 1h 40m  (was: 1.5h)

> Provide an ability to specify additional maven repositories for published POMs
> --
>
> Key: BEAM-9292
> URL: https://issues.apache.org/jira/browse/BEAM-9292
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, io-java-kafka
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> To support Confluent Schema Registry, KafkaIO has a dependency on 
> {{io.confluent:kafka-avro-serializer}} from 
> https://packages.confluent.io/maven/ repository. In this case, it should add 
> this repository into published KafkaIO POM file. Otherwise, it will fail with 
> the following error during building a user pipeline:
> {code}
> [ERROR] Failed to execute goal on project kafka-io: Could not resolve 
> dependencies for project org.apache.beam.issues:kafka-io:jar:1.0.0-SNAPSHOT: 
> Could not find artifact io.confluent:kafka-avro-serializer:jar:5.3.2 in 
> central (https://repo.maven.apache.org/maven2) -> [Help 1]
> {code}
> The repositories for publishing can be added by {{mavenRepositories}} 
> argument in build script for Java configuration. 
> For example (KafkaIO:
> {code}
> $ cat sdks/java/io/kafka/build.gradle 
> ...
> applyJavaNature(
>   ...
>   mavenRepositories: [
> [id: 'io.confluent', url: 'https://packages.confluent.io/maven/']
>   ]
> )
> ...
> {code}
> It will generate the following xml code snippet in pom file of 
> {{beam-sdks-java-io-kafka}} artifact after publishing:
> {code}
>   
> 
>   io.confluent
>   https://packages.confluent.io/maven/
> 
>   
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-3788) Implement a Kafka IO for Python SDK

2020-02-13 Thread Chamikara Madhusanka Jayalath (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chamikara Madhusanka Jayalath reassigned BEAM-3788:
---

Assignee: Chamikara Madhusanka Jayalath

> Implement a Kafka IO for Python SDK
> ---
>
> Key: BEAM-3788
> URL: https://issues.apache.org/jira/browse/BEAM-3788
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>
> This will be implemented using the Splittable DoFn framework.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-3788) Implement a Kafka IO for Python SDK

2020-02-13 Thread Chamikara Madhusanka Jayalath (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chamikara Madhusanka Jayalath updated BEAM-3788:

Description: Java KafkaIO will be made available to Python users as a 
cross-language transform.  (was: This will be implemented using the Splittable 
DoFn framework.)

> Implement a Kafka IO for Python SDK
> ---
>
> Key: BEAM-3788
> URL: https://issues.apache.org/jira/browse/BEAM-3788
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>
> Java KafkaIO will be made available to Python users as a cross-language 
> transform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=386776&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386776
 ]

ASF GitHub Bot logged work on BEAM-8382:


Author: ASF GitHub Bot
Created on: 13/Feb/20 17:32
Start Date: 13/Feb/20 17:32
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #9765: 
[WIP][BEAM-8382] Add rate limit policy to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#issuecomment-585877558
 
 
   @jfarr Thank you very much for detailed testing and explanation. In the end, 
what do you think would be the best solution in this case?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386776)
Time Spent: 12h 20m  (was: 12h 10m)

> Add polling interval to KinesisIO.Read
> --
>
> Key: BEAM-8382
> URL: https://issues.apache.org/jira/browse/BEAM-8382
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 12h 20m
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9059) Migrate PTransformTranslation to use string constants

2020-02-13 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-9059.
-
Fix Version/s: 2.20.0
   Resolution: Fixed

> Migrate PTransformTranslation to use string constants
> -
>
> Key: BEAM-9059
> URL: https://issues.apache.org/jira/browse/BEAM-9059
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Luke Cwik
>Assignee: Luke Cwik
>Priority: Trivial
> Fix For: 2.20.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> This allows for the values to be used within switch case statements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9309) Remove deprecated primitive beam:transform:read:v1 and beam_fn_api_use_deprecated_read experiment everywhere

2020-02-13 Thread Luke Cwik (Jira)
Luke Cwik created BEAM-9309:
---

 Summary: Remove deprecated primitive beam:transform:read:v1 and 
beam_fn_api_use_deprecated_read experiment everywhere
 Key: BEAM-9309
 URL: https://issues.apache.org/jira/browse/BEAM-9309
 Project: Beam
  Issue Type: Bug
  Components: beam-model, sdk-java-core
Reporter: Luke Cwik


Remove all references and usages of the experiment 
beam_fn_api_use_deprecated_read and also the deprecated transform 
beam:transform:read:v1 everywhere.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-3788) Implement a Kafka IO for Python SDK

2020-02-13 Thread Chamikara Madhusanka Jayalath (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036376#comment-17036376
 ] 

Chamikara Madhusanka Jayalath commented on BEAM-3788:
-

We are actively working on making Java KafkaIO available to Python users as a 
cross-language transform for all runners. Current solution [1]makes Java Kafka 
source which is based on the UnboundedSource framework available through 
cross-language transforms framework. But this will not really work for any 
portable runners without transform overrides since UnboundedSource framework is 
not available for portable runners (This currently works for Flink since 
FlinkRunner overrides this source with a native source implementation). We need 
a Kafka cross-language transform based on a SDF version of KafkaIO which can be 
made available to Python users.

On the other hand we are working on an UnboundedSource to SDF converter. So 
this can just be a cross-language transform for a Kafka transform that uses a 
SDF for existing Kafka UnboundedSource generated through this converter.

 

[1][https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/external/kafka.py]

> Implement a Kafka IO for Python SDK
> ---
>
> Key: BEAM-3788
> URL: https://issues.apache.org/jira/browse/BEAM-3788
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>
> Java KafkaIO will be made available to Python users as a cross-language 
> transform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9085) Investigate performance difference between Python 2/3 on Dataflow

2020-02-13 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev reassigned BEAM-9085:
-

Assignee: Kamil Wasilewski  (was: Valentyn Tymofieiev)

> Investigate performance difference between Python 2/3 on Dataflow
> -
>
> Key: BEAM-9085
> URL: https://issues.apache.org/jira/browse/BEAM-9085
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>
> Tests show that the performance of core Beam operations in Python 3.x on 
> Dataflow can be a few time slower than in Python 2.7. We should investigate 
> what's the cause of the problem.
> Currently, we have one ParDo test that is run both in Py3 and Py2 [1]. A 
> dashboard with runtime results can be found here [2].
> [1] sdks/python/apache_beam/testing/load_tests/pardo_test.py
> [2] https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9085) Investigate performance difference between Python 2/3 on Dataflow

2020-02-13 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036377#comment-17036377
 ] 

Valentyn Tymofieiev commented on BEAM-9085:
---

Thanks, [~kamilwu]. I'll reassign the issue to you for now, for the follow-up 
changes in the test infra. 

> Investigate performance difference between Python 2/3 on Dataflow
> -
>
> Key: BEAM-9085
> URL: https://issues.apache.org/jira/browse/BEAM-9085
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Kamil Wasilewski
>Assignee: Valentyn Tymofieiev
>Priority: Major
>
> Tests show that the performance of core Beam operations in Python 3.x on 
> Dataflow can be a few time slower than in Python 2.7. We should investigate 
> what's the cause of the problem.
> Currently, we have one ParDo test that is run both in Py3 and Py2 [1]. A 
> dashboard with runtime results can be found here [2].
> [1] sdks/python/apache_beam/testing/load_tests/pardo_test.py
> [2] https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9085) Performance regression in np.random.RandomState() skews performance test results across Python 2/3 on Dataflow

2020-02-13 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-9085:
--
Summary: Performance regression in np.random.RandomState() skews 
performance test results across Python 2/3 on Dataflow  (was: Investigate 
performance difference between Python 2/3 on Dataflow)

> Performance regression in np.random.RandomState() skews performance test 
> results across Python 2/3 on Dataflow
> --
>
> Key: BEAM-9085
> URL: https://issues.apache.org/jira/browse/BEAM-9085
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>
> Tests show that the performance of core Beam operations in Python 3.x on 
> Dataflow can be a few time slower than in Python 2.7. We should investigate 
> what's the cause of the problem.
> Currently, we have one ParDo test that is run both in Py3 and Py2 [1]. A 
> dashboard with runtime results can be found here [2].
> [1] sdks/python/apache_beam/testing/load_tests/pardo_test.py
> [2] https://apache-beam-testing.appspot.com/explore?dashboard=5678187241537536



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9279) Make HBase.ReadAll based on Reads instead of HBaseQuery

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9279?focusedWorklogId=386782&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386782
 ]

ASF GitHub Bot logged work on BEAM-9279:


Author: ASF GitHub Bot
Created on: 13/Feb/20 17:49
Start Date: 13/Feb/20 17:49
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10815: [BEAM-9279] 
Make HBase.ReadAll based on Reads instead of HBaseQuery
URL: https://github.com/apache/beam/pull/10815#issuecomment-585885902
 
 
   Thank you for contribution! Sorry for delay with review - I'll be off next 
week and will get back to this PR after.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386782)
Time Spent: 50m  (was: 40m)

> Make HBase.ReadAll based on Reads instead of HBaseQuery
> ---
>
> Key: BEAM-9279
> URL: https://issues.apache.org/jira/browse/BEAM-9279
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-hbase
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> HBaseIO support for SplittableDoFn introduced a new request type HBaseQuery, 
> however the attributes defined in that class are already available in 
> HBase.Read. Allowing users to define pipelines based on HBaseIO.Read allows 
> to create pipelines that can read from multiple clusters because the 
> Configuration now is part of the request object.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9228) _SDFBoundedSourceWrapper doesn't distribute data to multiple workers

2020-02-13 Thread Hannah Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hannah Jiang updated BEAM-9228:
---
Fix Version/s: 2.20.0

> _SDFBoundedSourceWrapper doesn't distribute data to multiple workers
> 
>
> Key: BEAM-9228
> URL: https://issues.apache.org/jira/browse/BEAM-9228
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.16.0, 2.18.0
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> A user reported following issue.
> -
> I have a set of tfrecord files, obtained by converting parquet files with 
> Spark. Each file is roughly 1GB and I have 11 of those.
> I would expect simple statistics gathering (ie counting number of items of 
> all files) to scale linearly with respect to the number of cores on my system.
> I am able to reproduce the issue with the minimal snippet below
> {code:java}
> import apache_beam as beam
> from apache_beam.options.pipeline_options import PipelineOptions
> from apache_beam.runners.portability import fn_api_runner
> from apache_beam.portability.api import beam_runner_api_pb2
> from apache_beam.portability import python_urns
> import sys
> pipeline_options = PipelineOptions(['--direct_num_workers', '4'])
> file_pattern = 'part-r-00*
> runner=fn_api_runner.FnApiRunner(
>   default_environment=beam_runner_api_pb2.Environment(
>   urn=python_urns.SUBPROCESS_SDK,
>   payload=b'%s -m apache_beam.runners.worker.sdk_worker_main'
> % sys.executable.encode('ascii')))
> p = beam.Pipeline(runner=runner, options=pipeline_options)
> lines = (p | 'read' >> beam.io.tfrecordio.ReadFromTFRecord(file_pattern)
>  | beam.combiners.Count.Globally()
>  | beam.io.WriteToText('/tmp/output'))
> p.run()
> {code}
> Only one combination of apache_beam revision / worker type seems to work (I 
> refer to https://beam.apache.org/documentation/runners/direct/ for the worker 
> types)
> * beam 2.16; neither multithread nor multiprocess achieve high cpu usage on 
> multiple cores
> * beam 2.17: able to achieve high cpu usage on all 4 cores
> * beam 2.18: not tested the mulithreaded mode but the multiprocess mode fails 
> when trying to serialize the Environment instance most likely because of a 
> change from 2.17 to 2.18.
> I also tried briefly SparkRunner with version 2.16 but was no able to achieve 
> any throughput.
> What is the recommnended way to achieve what I am trying to ? How can I 
> troubleshoot ?
> --
> This is caused by [this 
> PR|https://github.com/apache/beam/commit/02f8ad4eee3ec0ea8cbdc0f99c1dad29f00a9f60].
> A [workaround|https://github.com/apache/beam/pull/10729] is tried, which is 
> rolling back iobase.py not to use _SDFBoundedSourceWrapper. This confirmed 
> that data is distributed to multiple workers, however, there are some 
> regressions with SDF wrapper tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9228) _SDFBoundedSourceWrapper doesn't distribute data to multiple workers

2020-02-13 Thread Hannah Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hannah Jiang updated BEAM-9228:
---
Affects Version/s: 2.19.0

> _SDFBoundedSourceWrapper doesn't distribute data to multiple workers
> 
>
> Key: BEAM-9228
> URL: https://issues.apache.org/jira/browse/BEAM-9228
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.16.0, 2.18.0, 2.19.0
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> A user reported following issue.
> -
> I have a set of tfrecord files, obtained by converting parquet files with 
> Spark. Each file is roughly 1GB and I have 11 of those.
> I would expect simple statistics gathering (ie counting number of items of 
> all files) to scale linearly with respect to the number of cores on my system.
> I am able to reproduce the issue with the minimal snippet below
> {code:java}
> import apache_beam as beam
> from apache_beam.options.pipeline_options import PipelineOptions
> from apache_beam.runners.portability import fn_api_runner
> from apache_beam.portability.api import beam_runner_api_pb2
> from apache_beam.portability import python_urns
> import sys
> pipeline_options = PipelineOptions(['--direct_num_workers', '4'])
> file_pattern = 'part-r-00*
> runner=fn_api_runner.FnApiRunner(
>   default_environment=beam_runner_api_pb2.Environment(
>   urn=python_urns.SUBPROCESS_SDK,
>   payload=b'%s -m apache_beam.runners.worker.sdk_worker_main'
> % sys.executable.encode('ascii')))
> p = beam.Pipeline(runner=runner, options=pipeline_options)
> lines = (p | 'read' >> beam.io.tfrecordio.ReadFromTFRecord(file_pattern)
>  | beam.combiners.Count.Globally()
>  | beam.io.WriteToText('/tmp/output'))
> p.run()
> {code}
> Only one combination of apache_beam revision / worker type seems to work (I 
> refer to https://beam.apache.org/documentation/runners/direct/ for the worker 
> types)
> * beam 2.16; neither multithread nor multiprocess achieve high cpu usage on 
> multiple cores
> * beam 2.17: able to achieve high cpu usage on all 4 cores
> * beam 2.18: not tested the mulithreaded mode but the multiprocess mode fails 
> when trying to serialize the Environment instance most likely because of a 
> change from 2.17 to 2.18.
> I also tried briefly SparkRunner with version 2.16 but was no able to achieve 
> any throughput.
> What is the recommnended way to achieve what I am trying to ? How can I 
> troubleshoot ?
> --
> This is caused by [this 
> PR|https://github.com/apache/beam/commit/02f8ad4eee3ec0ea8cbdc0f99c1dad29f00a9f60].
> A [workaround|https://github.com/apache/beam/pull/10729] is tried, which is 
> rolling back iobase.py not to use _SDFBoundedSourceWrapper. This confirmed 
> that data is distributed to multiple workers, however, there are some 
> regressions with SDF wrapper tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9279) Make HBase.ReadAll based on Reads instead of HBaseQuery

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9279?focusedWorklogId=386784&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386784
 ]

ASF GitHub Bot logged work on BEAM-9279:


Author: ASF GitHub Bot
Created on: 13/Feb/20 17:59
Start Date: 13/Feb/20 17:59
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #10815: 
[BEAM-9279] Make HBase.ReadAll based on Reads instead of HBaseQuery
URL: https://github.com/apache/beam/pull/10815#discussion_r379025287
 
 

 ##
 File path: 
sdks/java/io/hbase/src/main/java/org/apache/beam/sdk/io/hbase/HBaseIO.java
 ##
 @@ -240,63 +245,109 @@ private Read(
 @Override
 public void populateDisplayData(DisplayData.Builder builder) {
   super.populateDisplayData(builder);
-  builder.add(DisplayData.item("configuration", 
serializableConfiguration.get().toString()));
+  builder.add(DisplayData.item("configuration", configuration.toString()));
   builder.add(DisplayData.item("tableId", tableId));
-  builder.addIfNotNull(DisplayData.item("scan", 
serializableScan.get().toString()));
+  builder.addIfNotNull(DisplayData.item("scan", scan.toString()));
 }
 
 public Configuration getConfiguration() {
-  return serializableConfiguration.get();
+  return configuration;
 }
 
 public String getTableId() {
   return tableId;
 }
 
 public Scan getScan() {
-  return serializableScan.get();
+  return scan;
 }
 
 /** Returns the range of keys that will be read from the table. */
 public ByteKeyRange getKeyRange() {
-  byte[] startRow = serializableScan.get().getStartRow();
-  byte[] stopRow = serializableScan.get().getStopRow();
+  byte[] startRow = scan.getStartRow();
+  byte[] stopRow = scan.getStopRow();
   return ByteKeyRange.of(ByteKey.copyFrom(startRow), 
ByteKey.copyFrom(stopRow));
 }
 
-private final SerializableConfiguration serializableConfiguration;
+@Override
+public boolean equals(Object o) {
+  if (this == o) {
+return true;
+  }
+  if (o == null || getClass() != o.getClass()) {
+return false;
+  }
+  Read read = (Read) o;
+  return configuration.toString().equals(read.configuration.toString())
+  && Objects.equals(tableId, read.tableId)
+  && scan.toString().equals(read.scan.toString());
+}
+
+@Override
+public int hashCode() {
+  return Objects.hash(configuration, tableId, scan);
+}
+
+private Object writeReplace() {
+  return new SerializationProxy(this);
+}
+
+private static class SerializationProxy implements Serializable {
+  public SerializationProxy() {}
+
+  public SerializationProxy(Read read) {
+configuration = read.configuration;
+tableId = read.tableId;
+scan = read.scan;
+  }
+
+  private void writeObject(ObjectOutputStream out) throws IOException {
+SerializableCoder.of(SerializableConfiguration.class)
+.encode(new SerializableConfiguration(this.configuration), out);
+StringUtf8Coder.of().encode(this.tableId, out);
+ProtobufUtil.toScan(this.scan).writeDelimitedTo(out);
+  }
+
+  private void readObject(ObjectInputStream in) throws IOException {
+this.configuration = 
SerializableCoder.of(SerializableConfiguration.class).decode(in).get();
+this.tableId = StringUtf8Coder.of().decode(in);
+this.scan = 
ProtobufUtil.toScan(ClientProtos.Scan.parseDelimitedFrom(in));
+  }
+
+  Object readResolve() {
+return 
HBaseIO.read().withConfiguration(configuration).withTableId(tableId).withScan(scan);
+  }
+
+  private Configuration configuration;
+  private String tableId;
+  private Scan scan;
+}
+
+@SuppressFBWarnings("SE_BAD_FIELD")
+private final Configuration configuration;
+
 private final String tableId;
-private final SerializableScan serializableScan;
+
+@SuppressFBWarnings("SE_BAD_FIELD")
+private final Scan scan;
   }
 
   /**
* A {@link PTransform} that works like {@link #read}, but executes read 
operations coming from a
-   * {@link PCollection} of {@link HBaseQuery}.
+   * {@link PCollection} of {@link Read}.
*/
   public static ReadAll readAll() {
-return new ReadAll(null);
+return new ReadAll();
   }
 
   /** Implementation of {@link #readAll}. */
-  public static class ReadAll extends PTransform, 
PCollection> {
-
-private ReadAll(SerializableConfiguration serializableConfiguration) {
-  this.serializableConfiguration = serializableConfiguration;
-}
-
-/** Reads from the HBase instance indicated by the* given configuration. */
-public ReadAll withConfiguration(Configuration configuration) {
-  checkArgument(configuration != null, "configuration can not be null");
-  return 

[jira] [Work logged] (BEAM-8758) Beam Dependency Update Request: com.google.cloud:google-cloud-spanner

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8758?focusedWorklogId=386785&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386785
 ]

ASF GitHub Bot logged work on BEAM-8758:


Author: ASF GitHub Bot
Created on: 13/Feb/20 18:08
Start Date: 13/Feb/20 18:08
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10765: [BEAM-8758] 
Google-cloud-spanner upgrade to 1.49.1
URL: https://github.com/apache/beam/pull/10765#issuecomment-585894780
 
 
   Going to implement Luke's suggestion.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386785)
Time Spent: 4h 20m  (was: 4h 10m)

> Beam Dependency Update Request: com.google.cloud:google-cloud-spanner
> -
>
> Key: BEAM-8758
> URL: https://issues.apache.org/jira/browse/BEAM-8758
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
>  - 2019-11-19 21:05:29.289016 
> -
> Please consider upgrading the dependency 
> com.google.cloud:google-cloud-spanner. 
> The current version is 1.6.0. The latest version is 1.46.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-02 12:11:08.926875 
> -
> Please consider upgrading the dependency 
> com.google.cloud:google-cloud-spanner. 
> The current version is 1.6.0. The latest version is 1.46.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-09 12:10:16.400168 
> -
> Please consider upgrading the dependency 
> com.google.cloud:google-cloud-spanner. 
> The current version is 1.6.0. The latest version is 1.47.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-23 12:10:17.656471 
> -
> Please consider upgrading the dependency 
> com.google.cloud:google-cloud-spanner. 
> The current version is 1.6.0. The latest version is 1.47.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-30 14:05:49.080960 
> -
> Please consider upgrading the dependency 
> com.google.cloud:google-cloud-spanner. 
> The current version is 1.6.0. The latest version is 1.47.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-01-06 12:09:23.346857 
> -
> Please consider upgrading the dependency 
> com.google.cloud:google-cloud-spanner. 
> The current version is 1.6.0. The latest version is 1.47.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-01-13 12:09:02.023131 
> -
> Please consider upgrading the dependency 
> com.google.cloud:google-cloud-spanner. 
> The current version is 1.6.0. The latest version is 1.47.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-01-20 12:08:38.419575 
> -
> Please consider upgrading the dependency 
> com.google.cloud:google-cloud-spanner. 
> The current version is 1.6.0. The latest version is 1.48.0 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2020-01-27 12:09:44.29834

[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-02-13 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=386794&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386794
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 13/Feb/20 18:34
Start Date: 13/Feb/20 18:34
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10769: [BEAM-8889] 
Upgrades gcsio to 2.0.0
URL: https://github.com/apache/beam/pull/10769#issuecomment-585905517
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386794)
Remaining Estimate: 153h  (was: 153h 10m)
Time Spent: 15h  (was: 14h 50m)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 15h
>  Remaining Estimate: 153h
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   >