[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=385715&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385715
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 12/Feb/20 08:02
Start Date: 12/Feb/20 08:02
Worklog Time Spent: 10m 
  Work Description: ananvay commented on issue #10835: [BEAM-8575] Removed 
MAX_TIMESTAMP from testing data
URL: https://github.com/apache/beam/pull/10835#issuecomment-585080387
 
 
   R: @robertwb 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385715)
Time Spent: 50h 10m  (was: 50h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 50h 10m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9265) @RequiresTimeSortedInput does not respect allowedLateness

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9265?focusedWorklogId=385716&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385716
 ]

ASF GitHub Bot logged work on BEAM-9265:


Author: ASF GitHub Bot
Created on: 12/Feb/20 08:04
Start Date: 12/Feb/20 08:04
Worklog Time Spent: 10m 
  Work Description: dmvk commented on issue #10795: [BEAM-9265] 
@RequiresTimeSortedInput respects allowedLateness
URL: https://github.com/apache/beam/pull/10795#issuecomment-585080936
 
 
   Can you please add a description for the related 
[JIRA](https://issues.apache.org/jira/browse/BEAM-9265) issue?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385716)
Time Spent: 2.5h  (was: 2h 20m)

> @RequiresTimeSortedInput does not respect allowedLateness
> -
>
> Key: BEAM-9265
> URL: https://issues.apache.org/jira/browse/BEAM-9265
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.20.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9265) @RequiresTimeSortedInput does not respect allowedLateness

2020-02-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Lukavský updated BEAM-9265:
---
Description: 
Currently, @RequiresTimeSortedInput drops data with respect to allowedLateness, 
but timers are triggered without respecting it. We have to:

 - drop data that is too late (after allowedLateness)

 - setup timer for _minTimestamp + allowedLateness_

 -  hold output watermark at _minTimestamp_

> @RequiresTimeSortedInput does not respect allowedLateness
> -
>
> Key: BEAM-9265
> URL: https://issues.apache.org/jira/browse/BEAM-9265
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.20.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Currently, @RequiresTimeSortedInput drops data with respect to 
> allowedLateness, but timers are triggered without respecting it. We have to:
>  - drop data that is too late (after allowedLateness)
>  - setup timer for _minTimestamp + allowedLateness_
>  -  hold output watermark at _minTimestamp_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9247) [Python] PTransform that integrates Cloud Vision functionality

2020-02-12 Thread Elias Djurfeldt (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035156#comment-17035156
 ] 

Elias Djurfeldt commented on BEAM-9247:
---

I realise after some more thinking that the logic for `image_context` is 
probably the same as for `video_context` for the videointelligenceml 
PTransform. 
[https://github.com/apache/beam/pull/10764/files#diff-069a81f5184d053ea32dd1558d4ee67bR92]

So either we
1. Assume all videos/images in the same PCollection have the same 
video_context/image_context.
or
2. Let the user specify 1 unique video_context/image_context per element in the 
PCollection.

What are your thoughts [~kamilwu]?

> [Python] PTransform that integrates Cloud Vision functionality
> --
>
> Key: BEAM-9247
> URL: https://issues.apache.org/jira/browse/BEAM-9247
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Elias Djurfeldt
>Priority: Major
>
> The goal is to create a PTransform that integrates Google Cloud Vision API 
> functionality [1].
> [1] https://cloud.google.com/vision/docs/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9302) No space left on device - apache-beam-jenkins-7

2020-02-12 Thread Jira
Michał Walenia created BEAM-9302:


 Summary: No space left on device - apache-beam-jenkins-7
 Key: BEAM-9302
 URL: https://issues.apache.org/jira/browse/BEAM-9302
 Project: Beam
  Issue Type: Bug
  Components: build-system
Reporter: Michał Walenia






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9302) No space left on device - apache-beam-jenkins-7

2020-02-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michał Walenia updated BEAM-9302:
-
Description: 
[https://builds.apache.org/job/beam_PreCommit_SQL_Commit/543/consoleFull] log 
of a failed job with this error

> No space left on device - apache-beam-jenkins-7
> ---
>
> Key: BEAM-9302
> URL: https://issues.apache.org/jira/browse/BEAM-9302
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Michał Walenia
>Priority: Blocker
>
> [https://builds.apache.org/job/beam_PreCommit_SQL_Commit/543/consoleFull] log 
> of a failed job with this error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9160) Update AWS SDK to support Kubernetes Pod Level Identity

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9160?focusedWorklogId=385746&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385746
 ]

ASF GitHub Bot logged work on BEAM-9160:


Author: ASF GitHub Bot
Created on: 12/Feb/20 09:19
Start Date: 12/Feb/20 09:19
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #10836: [BEAM-9160] 
Removed WebIdentityTokenCredentialsProvider explicit json (de)serialization in 
AWS module
URL: https://github.com/apache/beam/pull/10836
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385746)
Time Spent: 3h  (was: 2h 50m)

> Update AWS SDK to support Kubernetes Pod Level Identity
> ---
>
> Key: BEAM-9160
> URL: https://issues.apache.org/jira/browse/BEAM-9160
> Project: Beam
>  Issue Type: Improvement
>  Components: dependencies
>Affects Versions: 2.17.0
>Reporter: Mohamed Noah
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Many organizations have started leveraging pod level identity in Kubernetes. 
> The current version of the AWS SDK packaged with Beam 2.17.0 is out of date 
> and doesn't provide native support to pod level identity access management.
>  
> It is recommended that we introduce support to access AWS resources such as 
> S3 using pod level identity. 
> Current Version of the AWS Java SDK in Beam:
> def aws_java_sdk_version = "1.11.519"
> Proposed AWS Java SDK Version:
> 
>  com.amazonaws
>  aws-java-sdk
>  1.11.710
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9160) Update AWS SDK to support Kubernetes Pod Level Identity

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9160?focusedWorklogId=385745&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385745
 ]

ASF GitHub Bot logged work on BEAM-9160:


Author: ASF GitHub Bot
Created on: 12/Feb/20 09:19
Start Date: 12/Feb/20 09:19
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10836: [BEAM-9160] Removed 
WebIdentityTokenCredentialsProvider explicit json (de)serialization in AWS 
module
URL: https://github.com/apache/beam/pull/10836#issuecomment-585109076
 
 
   Merged manually to adjust the commit message
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385745)
Time Spent: 2h 50m  (was: 2h 40m)

> Update AWS SDK to support Kubernetes Pod Level Identity
> ---
>
> Key: BEAM-9160
> URL: https://issues.apache.org/jira/browse/BEAM-9160
> Project: Beam
>  Issue Type: Improvement
>  Components: dependencies
>Affects Versions: 2.17.0
>Reporter: Mohamed Noah
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Many organizations have started leveraging pod level identity in Kubernetes. 
> The current version of the AWS SDK packaged with Beam 2.17.0 is out of date 
> and doesn't provide native support to pod level identity access management.
>  
> It is recommended that we introduce support to access AWS resources such as 
> S3 using pod level identity. 
> Current Version of the AWS Java SDK in Beam:
> def aws_java_sdk_version = "1.11.519"
> Proposed AWS Java SDK Version:
> 
>  com.amazonaws
>  aws-java-sdk
>  1.11.710
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-2535) Allow explicit output time independent of firing specification for all timers

2020-02-12 Thread Shehzaad Nakhoda (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035192#comment-17035192
 ] 

Shehzaad Nakhoda commented on BEAM-2535:


[~reuvenlax] [~kenn]  can this be marked as resolved?

> Allow explicit output time independent of firing specification for all timers
> -
>
> Key: BEAM-2535
> URL: https://issues.apache.org/jira/browse/BEAM-2535
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 25h 10m
>  Remaining Estimate: 0h
>
> Today, we have insufficient control over the event time timestamp of elements 
> output from a timer callback.
> 1. For an event time timer, it is the timestamp of the timer itself.
>  2. For a processing time timer, it is the current input watermark at the 
> time of processing.
> But for both of these, we may want to reserve the right to output a 
> particular time, aka set a "watermark hold".
> A naive implementation of a {{TimerWithWatermarkHold}} would work for making 
> sure output is not droppable, but does not fully explain window expiration 
> and late data/timer dropping.
> In the natural interpretation of a timer as a feedback loop on a transform, 
> timers should be viewed as another channel of input, with a watermark, and 
> items on that channel _all need event time timestamps even if they are 
> delivered according to a different time domain_.
> I propose that the specification for when a timer should fire should be 
> separated (with nice defaults) from the specification of the event time of 
> resulting outputs. These timestamps will determine a side channel with a new 
> "timer watermark" that constrains the output watermark.
>  - We still need to fire event time timers according to the input watermark, 
> so that event time timers fire.
>  - Late data dropping and window expiration will be in terms of the minimum 
> of the input watermark and the timer watermark. In this way, whenever a timer 
> is set, the window is not going to be garbage collected.
>  - We will need to make sure we have a way to "wake up" a window once it is 
> expired; this may be as simple as exhausting the timer channel as soon as the 
> input watermark indicates expiration of a window
> This is mostly aimed at end-user timers in a stateful+timely {{DoFn}}. It 
> seems reasonable to use timers as an implementation detail (e.g. in 
> runners-core utilities) without wanting any of this additional machinery. For 
> example, if there is no possibility of output from the timer callback.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-6857) Support dynamic timers

2020-02-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-6857:
---
Issue Type: New Feature  (was: Bug)

> Support dynamic timers
> --
>
> Key: BEAM-6857
> URL: https://issues.apache.org/jira/browse/BEAM-6857
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 30h 20m
>  Remaining Estimate: 0h
>
> The Beam timers API currently requires each timer to be statically specified 
> in the DoFn. The user must provide a separate callback method per timer. For 
> example:
>  
> {code:java}
> DoFn()
> {   
>   @TimerId("timer1") 
>   private final TimerSpec timer1 = TimerSpecs.timer(...);   
>   @TimerId("timer2") 
>   private final TimerSpec timer2 = TimerSpecs.timer(...);                 
>   .. set timers in processElement    
>   @OnTimer("timer1") 
>   public void onTimer1() { .}
>   @OnTimer("timer2") 
>   public void onTimer2() {}
> }
> {code}
>  
> However there are many cases where the user does not know the set of timers 
> statically when writing their code. This happens when the timer tag should be 
> based on the data. It also happens when writing a DSL on top of Beam, where 
> the DSL author has to create DoFns but does not know statically which timers 
> their users will want to set (e.g. Scio).
>  
> The goal is to support dynamic timers. Something as follows;
>  
> {code:java}
> DoFn() 
> {
>   @TimerId("timer") 
>   private final TimerSpec timer1 = TimerSpecs.dynamicTimer(...);
>   @ProcessElement process(@TimerId("timer") DynamicTimer timer)
>   {
>        timer.set("tag1'", ts);       
>timer.set("tag2", ts);     
>   }
>   @OnTimer("timer") 
>   public void onTimer1(@TimerTag String tag) { .}
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-6857) Support dynamic timers

2020-02-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved BEAM-6857.

Fix Version/s: 2.20.0
   Resolution: Fixed

> Support dynamic timers
> --
>
> Key: BEAM-6857
> URL: https://issues.apache.org/jira/browse/BEAM-6857
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 30h 20m
>  Remaining Estimate: 0h
>
> The Beam timers API currently requires each timer to be statically specified 
> in the DoFn. The user must provide a separate callback method per timer. For 
> example:
>  
> {code:java}
> DoFn()
> {   
>   @TimerId("timer1") 
>   private final TimerSpec timer1 = TimerSpecs.timer(...);   
>   @TimerId("timer2") 
>   private final TimerSpec timer2 = TimerSpecs.timer(...);                 
>   .. set timers in processElement    
>   @OnTimer("timer1") 
>   public void onTimer1() { .}
>   @OnTimer("timer2") 
>   public void onTimer2() {}
> }
> {code}
>  
> However there are many cases where the user does not know the set of timers 
> statically when writing their code. This happens when the timer tag should be 
> based on the data. It also happens when writing a DSL on top of Beam, where 
> the DSL author has to create DoFns but does not know statically which timers 
> their users will want to set (e.g. Scio).
>  
> The goal is to support dynamic timers. Something as follows;
>  
> {code:java}
> DoFn() 
> {
>   @TimerId("timer") 
>   private final TimerSpec timer1 = TimerSpecs.dynamicTimer(...);
>   @ProcessElement process(@TimerId("timer") DynamicTimer timer)
>   {
>        timer.set("tag1'", ts);       
>timer.set("tag2", ts);     
>   }
>   @OnTimer("timer") 
>   public void onTimer1(@TimerTag String tag) { .}
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=385839&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385839
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 12/Feb/20 12:01
Start Date: 12/Feb/20 12:01
Worklog Time Spent: 10m 
  Work Description: mwalenia commented on issue #10764: [BEAM-9146] 
Integrate GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-585173351
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385839)
Time Spent: 6h 50m  (was: 6h 40m)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9265) @RequiresTimeSortedInput does not respect allowedLateness

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9265?focusedWorklogId=385900&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385900
 ]

ASF GitHub Bot logged work on BEAM-9265:


Author: ASF GitHub Bot
Created on: 12/Feb/20 13:41
Start Date: 12/Feb/20 13:41
Worklog Time Spent: 10m 
  Work Description: je-ik commented on issue #10795: [BEAM-9265] 
@RequiresTimeSortedInput respects allowedLateness
URL: https://github.com/apache/beam/pull/10795#issuecomment-585210389
 
 
   Run Java11 PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385900)
Time Spent: 2h 40m  (was: 2.5h)

> @RequiresTimeSortedInput does not respect allowedLateness
> -
>
> Key: BEAM-9265
> URL: https://issues.apache.org/jira/browse/BEAM-9265
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.20.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently, @RequiresTimeSortedInput drops data with respect to 
> allowedLateness, but timers are triggered without respecting it. We have to:
>  - drop data that is too late (after allowedLateness)
>  - setup timer for _minTimestamp + allowedLateness_
>  -  hold output watermark at _minTimestamp_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9292) Provide an ability to specify additional maven repositories for published POMs

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9292?focusedWorklogId=385907&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385907
 ]

ASF GitHub Bot logged work on BEAM-9292:


Author: ASF GitHub Bot
Created on: 12/Feb/20 13:51
Start Date: 12/Feb/20 13:51
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10832: [BEAM-9292] 
Provide an ability to specify additional maven repositories for published POMs
URL: https://github.com/apache/beam/pull/10832#issuecomment-585214400
 
 
   Run PythonFormatter PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385907)
Time Spent: 50m  (was: 40m)

> Provide an ability to specify additional maven repositories for published POMs
> --
>
> Key: BEAM-9292
> URL: https://issues.apache.org/jira/browse/BEAM-9292
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, io-java-kafka
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> To support Confluent Schema Registry, KafkaIO has a dependency on 
> {{io.confluent:kafka-avro-serializer}} from 
> https://packages.confluent.io/maven/ repository. In this case, it should add 
> this repository into published KafkaIO POM file. Otherwise, it will fail with 
> the following error during building a user pipeline:
> {code}
> [ERROR] Failed to execute goal on project kafka-io: Could not resolve 
> dependencies for project org.apache.beam.issues:kafka-io:jar:1.0.0-SNAPSHOT: 
> Could not find artifact io.confluent:kafka-avro-serializer:jar:5.3.2 in 
> central (https://repo.maven.apache.org/maven2) -> [Help 1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9292) Provide an ability to specify additional maven repositories for published POMs

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9292?focusedWorklogId=385905&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385905
 ]

ASF GitHub Bot logged work on BEAM-9292:


Author: ASF GitHub Bot
Created on: 12/Feb/20 13:51
Start Date: 12/Feb/20 13:51
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10832: [BEAM-9292] 
Provide an ability to specify additional maven repositories for published POMs
URL: https://github.com/apache/beam/pull/10832#issuecomment-585214359
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385905)
Time Spent: 40m  (was: 0.5h)

> Provide an ability to specify additional maven repositories for published POMs
> --
>
> Key: BEAM-9292
> URL: https://issues.apache.org/jira/browse/BEAM-9292
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, io-java-kafka
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> To support Confluent Schema Registry, KafkaIO has a dependency on 
> {{io.confluent:kafka-avro-serializer}} from 
> https://packages.confluent.io/maven/ repository. In this case, it should add 
> this repository into published KafkaIO POM file. Otherwise, it will fail with 
> the following error during building a user pipeline:
> {code}
> [ERROR] Failed to execute goal on project kafka-io: Could not resolve 
> dependencies for project org.apache.beam.issues:kafka-io:jar:1.0.0-SNAPSHOT: 
> Could not find artifact io.confluent:kafka-avro-serializer:jar:5.3.2 in 
> central (https://repo.maven.apache.org/maven2) -> [Help 1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9292) Provide an ability to specify additional maven repositories for published POMs

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9292?focusedWorklogId=385908&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385908
 ]

ASF GitHub Bot logged work on BEAM-9292:


Author: ASF GitHub Bot
Created on: 12/Feb/20 13:52
Start Date: 12/Feb/20 13:52
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10832: [BEAM-9292] 
Provide an ability to specify additional maven repositories for published POMs
URL: https://github.com/apache/beam/pull/10832#issuecomment-585214466
 
 
   Run PythonLint PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385908)
Time Spent: 1h  (was: 50m)

> Provide an ability to specify additional maven repositories for published POMs
> --
>
> Key: BEAM-9292
> URL: https://issues.apache.org/jira/browse/BEAM-9292
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system, io-java-kafka
>Reporter: Alexey Romanenko
>Assignee: Alexey Romanenko
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> To support Confluent Schema Registry, KafkaIO has a dependency on 
> {{io.confluent:kafka-avro-serializer}} from 
> https://packages.confluent.io/maven/ repository. In this case, it should add 
> this repository into published KafkaIO POM file. Otherwise, it will fail with 
> the following error during building a user pipeline:
> {code}
> [ERROR] Failed to execute goal on project kafka-io: Could not resolve 
> dependencies for project org.apache.beam.issues:kafka-io:jar:1.0.0-SNAPSHOT: 
> Could not find artifact io.confluent:kafka-avro-serializer:jar:5.3.2 in 
> central (https://repo.maven.apache.org/maven2) -> [Help 1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=385953&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385953
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 12/Feb/20 14:47
Start Date: 12/Feb/20 14:47
Worklog Time Spent: 10m 
  Work Description: mszb commented on issue #10712: [BEAM-7246] Added 
Google Spanner Write Transform
URL: https://github.com/apache/beam/pull/10712#issuecomment-585239918
 
 
   @aaltay @markflyhigh Seems like jobs are triggered and completed 
successfully but showing no activity on github!
   
   https://builds.apache.org/job/beam_PreCommit_Python_Commit/11080/
   https://builds.apache.org/job/beam_PreCommit_PythonFormatter_Commit/67/
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385953)
Time Spent: 20.5h  (was: 20h 20m)

> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 20.5h
>  Remaining Estimate: 0h
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=385967&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385967
 ]

ASF GitHub Bot logged work on BEAM-6522:


Author: ASF GitHub Bot
Created on: 12/Feb/20 14:58
Start Date: 12/Feb/20 14:58
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10838: [BEAM-6522] 
[BEAM-7455] Unskip Avro IO tests that are now passing.
URL: https://github.com/apache/beam/pull/10838#issuecomment-585245841
 
 
   Run Python 3.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385967)
Time Spent: 8h 20m  (was: 8h 10m)

> Dill fails to pickle  avro.RecordSchema classes on Python 3.
> 
>
> Key: BEAM-6522
> URL: https://issues.apache.org/jira/browse/BEAM-6522
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> The avroio module still has 4 failing tests. This is actually 2 times the 
> same 2 tests, both for Avro and Fastavro.
> *apache_beam.io.avroio_test.TestAvro.test_sink_transform*
>  *apache_beam.io.avroio_test.TestFastAvro.test_sink_transform*
> fail with:
> {code:java}
> Traceback (most recent call last):
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", 
> line 432, in test_sink_transform
> | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line 
> 528, in expand
> return pcoll | beam.io.iobase.Write(self._sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 960, in expand
> return pcoll | WriteImpl(self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 979, in expand
> lambda _, sink: sink.initialize_write(), self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1103, in Map
> pardo = FlatMap(wrapper, *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1054, in FlatMap
> pardo = ParDo(CallableWrapperDoFn(fn), *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 864, in __init__
> super(ParDo, self).__init__(fn, *args, **kwargs)
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 646, in __init__
> self.args = pickler.loads(pickler.dumps(self.args))
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_be

[jira] [Created] (BEAM-9303) HDFS IT test fails on apache-beam-jenkins-7 : no space left on device

2020-02-12 Thread Valentyn Tymofieiev (Jira)
Valentyn Tymofieiev created BEAM-9303:
-

 Summary: HDFS IT test fails on apache-beam-jenkins-7 : no space 
left on device
 Key: BEAM-9303
 URL: https://issues.apache.org/jira/browse/BEAM-9303
 Project: Beam
  Issue Type: Improvement
  Components: test-failures
Reporter: Valentyn Tymofieiev
Assignee: Yifan Zou


22:34:34  > Task :sdks:python:test-suites:direct:py37:hdfsIntegrationTest
22:34:34  namenode_1  | 20/02/12 06:34:34 WARN 
namenode.NameNodeResourceChecker: Space available on volume '/dev/sda1' is 
20590592, which is below the configured reserved amount 104857600
22:34:34  namenode_1  | 20/02/12 06:34:34 WARN namenode.FSNamesystem: NameNode 
low on available disk space. Entering safe mode.
22:34:34  namenode_1  | 20/02/12 06:34:34 INFO namenode.FSEditLog: logSyncAll 
toSyncToTxId=1 lastSyncedTxid=1 mostRecentTxid=1
22:34:34  namenode_1  | 20/02/12 06:34:34 INFO namenode.FSEditLog: Number of 
transactions: 1 Total time for transactions(ms): 0 Number of transactions 
batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 9 
22:34:34  namenode_1  | 20/02/12 06:34:34 INFO namenode.FSEditLog: Done 
logSyncAll lastWrittenTxId=1 lastSyncedTxid=1 mostRecentTxid=1
22:34:34  namenode_1  | 20/02/12 06:34:34 INFO hdfs.StateChange: STATE* Safe 
mode is ON. 
22:34:34  namenode_1  | Resources are low on NN. Please add or free up more 
resources then turn off safe mode manually. NOTE:  If you turn off safe mode 
before adding resources, the NN will immediately return to safe mode. Use "hdfs 
dfsadmin -safemode leave" to turn safe mode off.
22:34:36  test_1  | ERROR: invocation failed (exit code 1), logfile: 
/app/sdks/python/target/.tox/hdfs_integration_test/log/hdfs_integration_test-2.log
22:34:36  test_1  | == log start 
===
22:34:36  test_1  | Processing 
./target/.tox/.tmp/package/1/apache-beam-2.20.0.dev0.zip
22:34:36  test_1  | Requirement already satisfied: crcmod<2.0,>=1.7 in 
./target/.tox/hdfs_integration_test/lib/python3.7/site-packages (from 
apache-beam==2.20.0.dev0) (1.7)
22:34:36  test_1  | Collecting dill<0.3.2,>=0.3.1.1
22:34:36  test_1  |   Downloading dill-0.3.1.1.tar.gz (151 kB)
22:34:36  test_1  | Collecting fastavro<0.22,>=0.21.4
22:34:36  test_1  |   Downloading 
fastavro-0.21.24-cp37-cp37m-manylinux1_x86_64.whl (1.2 MB)
22:34:36  test_1  | Requirement already satisfied: future<1.0.0,>=0.16.0 in 
./target/.tox/hdfs_integration_test/lib/python3.7/site-packages (from 
apache-beam==2.20.0.dev0) (0.16.0)
22:34:36  test_1  | Requirement already satisfied: grpcio<2,>=1.12.1 in 
./target/.tox/hdfs_integration_test/lib/python3.7/site-packages (from 
apache-beam==2.20.0.dev0) (1.27.1)
22:34:36  test_1  | Collecting hdfs<3.0.0,>=2.1.0
22:34:36  test_1  |   Downloading hdfs-2.5.8.tar.gz (41 kB)
22:34:36  test_1  | Collecting httplib2<=0.12.0,>=0.8
22:34:36  test_1  |   Downloading httplib2-0.12.0.tar.gz (218 kB)
22:34:36  test_1  | Requirement already satisfied: mock<3.0.0,>=1.0.1 in 
./target/.tox/hdfs_integration_test/lib/python3.7/site-packages (from 
apache-beam==2.20.0.dev0) (2.0.0)
22:34:36  test_1  | Collecting numpy<2,>=1.14.3
22:34:36  test_1  |   Downloading 
numpy-1.18.1-cp37-cp37m-manylinux1_x86_64.whl (20.1 MB)
22:34:36  test_1  | Collecting pymongo<4.0.0,>=3.8.0
22:34:36  test_1  |   Downloading 
pymongo-3.10.1-cp37-cp37m-manylinux2014_x86_64.whl (462 kB)
22:34:36  test_1  | Collecting oauth2client<4,>=2.0.1
22:34:36  test_1  |   Downloading oauth2client-3.0.0.tar.gz (77 kB)
22:34:36  test_1  | Requirement already satisfied: protobuf<4,>=3.5.0.post1 
in ./target/.tox/hdfs_integration_test/lib/python3.7/site-packages (from 
apache-beam==2.20.0.dev0) (3.11.3)
22:34:36  test_1  | Collecting pydot<2,>=1.2.0
22:34:36  test_1  |   Downloading pydot-1.4.1-py2.py3-none-any.whl (19 kB)
22:34:36  test_1  | Collecting python-dateutil<3,>=2.8.0
22:34:36  test_1  |   Downloading 
python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB)
22:34:36  test_1  | Collecting pytz>=2018.3
22:34:36  test_1  |   Downloading pytz-2019.3-py2.py3-none-any.whl (509 kB)
22:34:36  test_1  | Collecting typing<3.8.0,>=3.7.0
22:34:36  test_1  |   Downloading typing-3.7.4.1-py3-none-any.whl (25 kB)
22:34:36  test_1  | Collecting typing-extensions<3.8.0,>=3.7.0
22:34:36  test_1  |   Downloading 
typing_extensions-3.7.4.1-py3-none-any.whl (20 kB)
22:34:36  test_1  | Collecting avro-python3<2.0.0,>=1.8.1
22:34:36  test_1  |   Downloading avro-python3-1.9.1.tar.gz (36 kB)
22:34:36  test_1  | Collecting pyarrow<0.16.0,>=0.15.1
22:34:36  test_1  |   Downloading 
pyarrow-0.15.1-cp37-cp37m-manylinux2010_x86_64.whl (59.2 MB)
22:34:36  test_1  | Collecting cachetools<4,>=3.1.0
22:34:36  test_1

[jira] [Commented] (BEAM-9303) HDFS IT test fails on apache-beam-jenkins-7 : no space left on device

2020-02-12 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035431#comment-17035431
 ] 

Valentyn Tymofieiev commented on BEAM-9303:
---

Hi [~yifanzou], [~udim], what is our current approach to cleaning up jenkins 
VMs? Are there any plans to have automated periodic cleanups?

Thanks.


> HDFS IT test fails on apache-beam-jenkins-7 : no space left on device
> -
>
> Key: BEAM-9303
> URL: https://issues.apache.org/jira/browse/BEAM-9303
> Project: Beam
>  Issue Type: Improvement
>  Components: test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Yifan Zou
>Priority: Major
>
> 22:34:34  > Task :sdks:python:test-suites:direct:py37:hdfsIntegrationTest
> 22:34:34  namenode_1  | 20/02/12 06:34:34 WARN 
> namenode.NameNodeResourceChecker: Space available on volume '/dev/sda1' is 
> 20590592, which is below the configured reserved amount 104857600
> 22:34:34  namenode_1  | 20/02/12 06:34:34 WARN namenode.FSNamesystem: 
> NameNode low on available disk space. Entering safe mode.
> 22:34:34  namenode_1  | 20/02/12 06:34:34 INFO namenode.FSEditLog: logSyncAll 
> toSyncToTxId=1 lastSyncedTxid=1 mostRecentTxid=1
> 22:34:34  namenode_1  | 20/02/12 06:34:34 INFO namenode.FSEditLog: Number of 
> transactions: 1 Total time for transactions(ms): 0 Number of transactions 
> batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 9 
> 22:34:34  namenode_1  | 20/02/12 06:34:34 INFO namenode.FSEditLog: Done 
> logSyncAll lastWrittenTxId=1 lastSyncedTxid=1 mostRecentTxid=1
> 22:34:34  namenode_1  | 20/02/12 06:34:34 INFO hdfs.StateChange: STATE* Safe 
> mode is ON. 
> 22:34:34  namenode_1  | Resources are low on NN. Please add or free up more 
> resources then turn off safe mode manually. NOTE:  If you turn off safe mode 
> before adding resources, the NN will immediately return to safe mode. Use 
> "hdfs dfsadmin -safemode leave" to turn safe mode off.
> 22:34:36  test_1  | ERROR: invocation failed (exit code 1), logfile: 
> /app/sdks/python/target/.tox/hdfs_integration_test/log/hdfs_integration_test-2.log
> 22:34:36  test_1  | == log start 
> ===
> 22:34:36  test_1  | Processing 
> ./target/.tox/.tmp/package/1/apache-beam-2.20.0.dev0.zip
> 22:34:36  test_1  | Requirement already satisfied: crcmod<2.0,>=1.7 in 
> ./target/.tox/hdfs_integration_test/lib/python3.7/site-packages (from 
> apache-beam==2.20.0.dev0) (1.7)
> 22:34:36  test_1  | Collecting dill<0.3.2,>=0.3.1.1
> 22:34:36  test_1  |   Downloading dill-0.3.1.1.tar.gz (151 kB)
> 22:34:36  test_1  | Collecting fastavro<0.22,>=0.21.4
> 22:34:36  test_1  |   Downloading 
> fastavro-0.21.24-cp37-cp37m-manylinux1_x86_64.whl (1.2 MB)
> 22:34:36  test_1  | Requirement already satisfied: future<1.0.0,>=0.16.0 
> in ./target/.tox/hdfs_integration_test/lib/python3.7/site-packages (from 
> apache-beam==2.20.0.dev0) (0.16.0)
> 22:34:36  test_1  | Requirement already satisfied: grpcio<2,>=1.12.1 in 
> ./target/.tox/hdfs_integration_test/lib/python3.7/site-packages (from 
> apache-beam==2.20.0.dev0) (1.27.1)
> 22:34:36  test_1  | Collecting hdfs<3.0.0,>=2.1.0
> 22:34:36  test_1  |   Downloading hdfs-2.5.8.tar.gz (41 kB)
> 22:34:36  test_1  | Collecting httplib2<=0.12.0,>=0.8
> 22:34:36  test_1  |   Downloading httplib2-0.12.0.tar.gz (218 kB)
> 22:34:36  test_1  | Requirement already satisfied: mock<3.0.0,>=1.0.1 in 
> ./target/.tox/hdfs_integration_test/lib/python3.7/site-packages (from 
> apache-beam==2.20.0.dev0) (2.0.0)
> 22:34:36  test_1  | Collecting numpy<2,>=1.14.3
> 22:34:36  test_1  |   Downloading 
> numpy-1.18.1-cp37-cp37m-manylinux1_x86_64.whl (20.1 MB)
> 22:34:36  test_1  | Collecting pymongo<4.0.0,>=3.8.0
> 22:34:36  test_1  |   Downloading 
> pymongo-3.10.1-cp37-cp37m-manylinux2014_x86_64.whl (462 kB)
> 22:34:36  test_1  | Collecting oauth2client<4,>=2.0.1
> 22:34:36  test_1  |   Downloading oauth2client-3.0.0.tar.gz (77 kB)
> 22:34:36  test_1  | Requirement already satisfied: 
> protobuf<4,>=3.5.0.post1 in 
> ./target/.tox/hdfs_integration_test/lib/python3.7/site-packages (from 
> apache-beam==2.20.0.dev0) (3.11.3)
> 22:34:36  test_1  | Collecting pydot<2,>=1.2.0
> 22:34:36  test_1  |   Downloading pydot-1.4.1-py2.py3-none-any.whl (19 kB)
> 22:34:36  test_1  | Collecting python-dateutil<3,>=2.8.0
> 22:34:36  test_1  |   Downloading 
> python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB)
> 22:34:36  test_1  | Collecting pytz>=2018.3
> 22:34:36  test_1  |   Downloading pytz-2019.3-py2.py3-none-any.whl (509 
> kB)
> 22:34:36  test_1  | Collecting typing<3.8.0,>=3.7.0
> 22:34:36  test_1  |   Downloading typing

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385991&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385991
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 12/Feb/20 15:39
Start Date: 12/Feb/20 15:39
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on pull request #10290: 
[BEAM-8561] Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r378331613
 
 

 ##
 File path: 
sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftStruct.java
 ##
 @@ -0,0 +1,1232 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
 
 Review comment:
   I don't think that thrift guarantees that generated code is forwards or 
backwards compatible, meaning that if we upgrade libthrift here, the code might 
no longer compile and need to be regenerated.  I'm +1 for having this being 
generated at compile time.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385991)
Time Spent: 14h 20m  (was: 14h 10m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 14h 20m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=386000&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386000
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 12/Feb/20 15:43
Start Date: 12/Feb/20 15:43
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-585267600
 
 
   One extra thing we just merged a plugin to auto label PRs for different 
components/extensions/ios, can you please add the label + path for thrift in 
this file:
   https://github.com/apache/beam/blob/master/.github/autolabeler.yml
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386000)
Time Spent: 14h 40m  (was: 14.5h)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 14h 40m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=385999&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-385999
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 12/Feb/20 15:43
Start Date: 12/Feb/20 15:43
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-585267600
 
 
   One extra thing we just merged plugin to auto label PRs for different 
components/extensions/ios, can you please add the label + path for thrift in 
this file:
   https://github.com/apache/beam/blob/master/.github/autolabeler.yml
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 385999)
Time Spent: 14.5h  (was: 14h 20m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=386003&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386003
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 12/Feb/20 15:49
Start Date: 12/Feb/20 15:49
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-585267600
 
 
   One extra thing we just merged a plugin to auto label PRs for different 
components/extensions/ios, can you please also add the label + path for thrift 
in this file:
   https://github.com/apache/beam/blob/master/.github/autolabeler.yml
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386003)
Time Spent: 14h 50m  (was: 14h 40m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 14h 50m
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9291) upload_graph support in Dataflow Python SDK

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9291?focusedWorklogId=386013&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386013
 ]

ASF GitHub Bot logged work on BEAM-9291:


Author: ASF GitHub Bot
Created on: 12/Feb/20 16:00
Start Date: 12/Feb/20 16:00
Worklog Time Spent: 10m 
  Work Description: stankiewicz commented on issue #10829: [BEAM-9291] 
Upload graph option in dataflow's python sdk
URL: https://github.com/apache/beam/pull/10829#issuecomment-585276594
 
 
   @aaltay 
   Before: "The job graph is too large. Please try again with a smaller job 
graph, or split your job into two or more smaller jobs.", 400,  during REST 
submit
   After: "Workflow failed. Causes: The job graph is too large. Please try 
again with a smaller graph." on runtime..
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386013)
Time Spent: 1h  (was: 50m)

> upload_graph support in Dataflow Python SDK
> ---
>
> Key: BEAM-9291
> URL: https://issues.apache.org/jira/browse/BEAM-9291
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Radosław Stankiewicz
>Assignee: Radosław Stankiewicz
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> upload_graph option is not supported in Dataflow's Python SDK so there is no 
> workaround for large graphs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9304) beam-sdks-java-io-google-cloud-platform imports conflicting versions for BigTable and Spanner

2020-02-12 Thread Knut Olav Loite (Jira)
Knut Olav Loite created BEAM-9304:
-

 Summary: beam-sdks-java-io-google-cloud-platform imports 
conflicting versions for BigTable and Spanner
 Key: BEAM-9304
 URL: https://issues.apache.org/jira/browse/BEAM-9304
 Project: Beam
  Issue Type: Bug
  Components: io-java-gcp
Affects Versions: 2.18.0
Reporter: Knut Olav Loite
 Attachments: SpannerRead.java, pom.xml

If I include `beam-sdks-java-io-google-cloud-platform` version 2.18.0 in a 
project and try to use `SpannerIO`, the exception 
`java.lang.NoClassDefFoundError: io/opencensus/trace/Tracestate`. This seems to 
be caused by conflicting versions of `io.opencensus:opencensus-api` being 
included by the BigTable client and the Spanner client. BigTable imports 
version 0.15.0. Spanner depends on 0.18.0, but as they are at the same level in 
the dependency tree and BigTable is defined first, version 0.15.0 is used.

 

The workaround for this issue is to exclude the BigTable client in the project 
pom in order to be able to use SpannerIO.

 

An example pom and simple Java class are listed below. If the commented 
exclusion of the BigTable client is removed, the example will run without 
problems. The example will also run without problems on Beam version 2.17 
without the exclusion.

 

--- POM File ---

http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
  4.0.0
  com.google.cloud
  beam-sdk-test
  0.0.1-SNAPSHOT
  Beam SDK Test

  
UTF-8
1.8
1.8
1.8
2.18.0
  
  
  

  org.apache.beam
  beam-sdks-java-io-google-cloud-platform
  ${apache_beam.version}
  



  org.apache.beam
  beam-runners-direct-java
  ${apache_beam.version}

  



--- JAVA FILE ---
/*
 * Copyright 2017 Google Inc.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package com.google.cloud.beamsdk.test;

import com.google.cloud.spanner.Struct;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.io.gcp.spanner.SpannerIO;
import org.apache.beam.sdk.options.Description;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.options.Validation;
import org.apache.beam.sdk.transforms.Count;
import org.apache.beam.sdk.transforms.Sum;
import org.apache.beam.sdk.transforms.ToString;
import org.apache.beam.sdk.values.PCollection;

/*
This sample demonstrates how to read from a Spanner table.

## Prerequisites
* Maven installed
* Set up GCP default credentials, one of the following:
- export GOOGLE_APPLICATION_CREDENTIALS=path/to/credentials.json
- gcloud auth application-default login
  
[https://developers.google.com/identity/protocols/application-default-credentials]
* Create the Spanner table to read from, you'll need:
- Instance ID
- Database ID
- Any table, preferably populated
  [https://cloud.google.com/spanner/docs/quickstart-console]

## How to run
mvn clean
mvn compile
mvn exec:java \
-Dexec.mainClass=com.example.dataflow.SpannerRead \
-Dexec.args="--instanceId=my-instance-id \
 --databaseId=my-database-id \
 --table=my_table \
 --output=path/to/output_file"
*/
public class SpannerRead {

  public interface Options extends PipelineOptions {

@Description("Spanner instance ID to query from")
@Validation.Required
String getInstanceId();

void setInstanceId(String value);

@Description("Spanner database name to query from")
@Validation.Required
String getDatabaseId();

void setDatabaseId(String value);

@Description("Spanner table name to query from")
@Validation.Required
String getTable();

void setTable(String value);

@Description("Output filename for row count")
@Validation.Required
String getOutput();

void setOutput(String value);
  }


  public static void main(String[] args) {
Options options = 
PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);
Pipeline p = Pipeline.create(options);

String instanceId = options.getInstanceId();
String databaseId = options.getDatabaseId();
PCollection records = p.apply(
SpannerIO.read()
.wi

[jira] [Updated] (BEAM-9304) beam-sdks-java-io-google-cloud-platform imports conflicting versions for BigTable and Spanner

2020-02-12 Thread Knut Olav Loite (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Knut Olav Loite updated BEAM-9304:
--
Attachment: SpannerRead.java

> beam-sdks-java-io-google-cloud-platform imports conflicting versions for 
> BigTable and Spanner
> -
>
> Key: BEAM-9304
> URL: https://issues.apache.org/jira/browse/BEAM-9304
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.18.0
>Reporter: Knut Olav Loite
>Priority: Minor
> Attachments: SpannerRead.java, pom.xml
>
>
> If I include `beam-sdks-java-io-google-cloud-platform` version 2.18.0 in a 
> project and try to use `SpannerIO`, the exception 
> `java.lang.NoClassDefFoundError: io/opencensus/trace/Tracestate`. This seems 
> to be caused by conflicting versions of `io.opencensus:opencensus-api` being 
> included by the BigTable client and the Spanner client. BigTable imports 
> version 0.15.0. Spanner depends on 0.18.0, but as they are at the same level 
> in the dependency tree and BigTable is defined first, version 0.15.0 is used.
>  
> The workaround for this issue is to exclude the BigTable client in the 
> project pom in order to be able to use SpannerIO.
>  
> An example pom and simple Java class are listed below. If the commented 
> exclusion of the BigTable client is removed, the example will run without 
> problems. The example will also run without problems on Beam version 2.17 
> without the exclusion.
>  
> --- POM File ---
> http://maven.apache.org/POM/4.0.0"; 
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
> xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
> http://maven.apache.org/xsd/maven-4.0.0.xsd";>
>   4.0.0
>   com.google.cloud
>   beam-sdk-test
>   0.0.1-SNAPSHOT
>   Beam SDK Test
>   
> UTF-8
> 1.8
> 1.8
> 1.8
> 2.18.0
>   
>   
>   
> 
>   org.apache.beam
>   beam-sdks-java-io-google-cloud-platform
>   ${apache_beam.version}
>   
> 
> 
>   org.apache.beam
>   beam-runners-direct-java
>   ${apache_beam.version}
> 
>   
> 
> --- JAVA FILE ---
> /*
>  * Copyright 2017 Google Inc.
>  *
>  * Licensed under the Apache License, Version 2.0 (the "License");
>  * you may not use this file except in compliance with the License.
>  * You may obtain a copy of the License at
>  *
>  * http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> package com.google.cloud.beamsdk.test;
> import com.google.cloud.spanner.Struct;
> import org.apache.beam.sdk.Pipeline;
> import org.apache.beam.sdk.io.TextIO;
> import org.apache.beam.sdk.io.gcp.spanner.SpannerIO;
> import org.apache.beam.sdk.options.Description;
> import org.apache.beam.sdk.options.PipelineOptions;
> import org.apache.beam.sdk.options.PipelineOptionsFactory;
> import org.apache.beam.sdk.options.Validation;
> import org.apache.beam.sdk.transforms.Count;
> import org.apache.beam.sdk.transforms.Sum;
> import org.apache.beam.sdk.transforms.ToString;
> import org.apache.beam.sdk.values.PCollection;
> /*
> This sample demonstrates how to read from a Spanner table.
> ## Prerequisites
> * Maven installed
> * Set up GCP default credentials, one of the following:
> - export GOOGLE_APPLICATION_CREDENTIALS=path/to/credentials.json
> - gcloud auth application-default login
>   
> [https://developers.google.com/identity/protocols/application-default-credentials]
> * Create the Spanner table to read from, you'll need:
> - Instance ID
> - Database ID
> - Any table, preferably populated
>   [https://cloud.google.com/spanner/docs/quickstart-console]
> ## How to run
> mvn clean
> mvn compile
> mvn exec:java \
> -Dexec.mainClass=com.example.dataflow.SpannerRead \
> -Dexec.args="--instanceId=my-instance-id \
>  --databaseId=my-database-id \
>  --table=my_table \
>  --output=path/to/output_file"
> */
> public class SpannerRead {
>   public interface Options extends PipelineOptions {
> @Description("Spanner instance ID to query from")
> @Validation.Required
> String getInstanceId();
> void setInstanceId(String value);
> @Description("Spanner database name to query from")
> @Validation.Required
> String getDatabaseId();
> void setDatabaseId(String value);
> @Description("Spanner table name to query from")
> @Validation.Required
> String getTable();
> void setTable(String value);
> @Descri

[jira] [Updated] (BEAM-9304) beam-sdks-java-io-google-cloud-platform imports conflicting versions for BigTable and Spanner

2020-02-12 Thread Knut Olav Loite (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Knut Olav Loite updated BEAM-9304:
--
Attachment: pom.xml

> beam-sdks-java-io-google-cloud-platform imports conflicting versions for 
> BigTable and Spanner
> -
>
> Key: BEAM-9304
> URL: https://issues.apache.org/jira/browse/BEAM-9304
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.18.0
>Reporter: Knut Olav Loite
>Priority: Minor
> Attachments: SpannerRead.java, pom.xml
>
>
> If I include `beam-sdks-java-io-google-cloud-platform` version 2.18.0 in a 
> project and try to use `SpannerIO`, the exception 
> `java.lang.NoClassDefFoundError: io/opencensus/trace/Tracestate`. This seems 
> to be caused by conflicting versions of `io.opencensus:opencensus-api` being 
> included by the BigTable client and the Spanner client. BigTable imports 
> version 0.15.0. Spanner depends on 0.18.0, but as they are at the same level 
> in the dependency tree and BigTable is defined first, version 0.15.0 is used.
>  
> The workaround for this issue is to exclude the BigTable client in the 
> project pom in order to be able to use SpannerIO.
>  
> An example pom and simple Java class are listed below. If the commented 
> exclusion of the BigTable client is removed, the example will run without 
> problems. The example will also run without problems on Beam version 2.17 
> without the exclusion.
>  
> --- POM File ---
> http://maven.apache.org/POM/4.0.0"; 
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
> xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
> http://maven.apache.org/xsd/maven-4.0.0.xsd";>
>   4.0.0
>   com.google.cloud
>   beam-sdk-test
>   0.0.1-SNAPSHOT
>   Beam SDK Test
>   
> UTF-8
> 1.8
> 1.8
> 1.8
> 2.18.0
>   
>   
>   
> 
>   org.apache.beam
>   beam-sdks-java-io-google-cloud-platform
>   ${apache_beam.version}
>   
> 
> 
>   org.apache.beam
>   beam-runners-direct-java
>   ${apache_beam.version}
> 
>   
> 
> --- JAVA FILE ---
> /*
>  * Copyright 2017 Google Inc.
>  *
>  * Licensed under the Apache License, Version 2.0 (the "License");
>  * you may not use this file except in compliance with the License.
>  * You may obtain a copy of the License at
>  *
>  * http://www.apache.org/licenses/LICENSE-2.0
>  *
>  * Unless required by applicable law or agreed to in writing, software
>  * distributed under the License is distributed on an "AS IS" BASIS,
>  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>  * See the License for the specific language governing permissions and
>  * limitations under the License.
>  */
> package com.google.cloud.beamsdk.test;
> import com.google.cloud.spanner.Struct;
> import org.apache.beam.sdk.Pipeline;
> import org.apache.beam.sdk.io.TextIO;
> import org.apache.beam.sdk.io.gcp.spanner.SpannerIO;
> import org.apache.beam.sdk.options.Description;
> import org.apache.beam.sdk.options.PipelineOptions;
> import org.apache.beam.sdk.options.PipelineOptionsFactory;
> import org.apache.beam.sdk.options.Validation;
> import org.apache.beam.sdk.transforms.Count;
> import org.apache.beam.sdk.transforms.Sum;
> import org.apache.beam.sdk.transforms.ToString;
> import org.apache.beam.sdk.values.PCollection;
> /*
> This sample demonstrates how to read from a Spanner table.
> ## Prerequisites
> * Maven installed
> * Set up GCP default credentials, one of the following:
> - export GOOGLE_APPLICATION_CREDENTIALS=path/to/credentials.json
> - gcloud auth application-default login
>   
> [https://developers.google.com/identity/protocols/application-default-credentials]
> * Create the Spanner table to read from, you'll need:
> - Instance ID
> - Database ID
> - Any table, preferably populated
>   [https://cloud.google.com/spanner/docs/quickstart-console]
> ## How to run
> mvn clean
> mvn compile
> mvn exec:java \
> -Dexec.mainClass=com.example.dataflow.SpannerRead \
> -Dexec.args="--instanceId=my-instance-id \
>  --databaseId=my-database-id \
>  --table=my_table \
>  --output=path/to/output_file"
> */
> public class SpannerRead {
>   public interface Options extends PipelineOptions {
> @Description("Spanner instance ID to query from")
> @Validation.Required
> String getInstanceId();
> void setInstanceId(String value);
> @Description("Spanner database name to query from")
> @Validation.Required
> String getDatabaseId();
> void setDatabaseId(String value);
> @Description("Spanner table name to query from")
> @Validation.Required
> String getTable();
> void setTable(String value);
> @Description("Ou

[jira] [Updated] (BEAM-9304) beam-sdks-java-io-google-cloud-platform imports conflicting versions for BigTable and Spanner

2020-02-12 Thread Knut Olav Loite (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Knut Olav Loite updated BEAM-9304:
--
Description: 
If I include `beam-sdks-java-io-google-cloud-platform` version 2.18.0 in a 
project and try to use `SpannerIO`, the exception 
`java.lang.NoClassDefFoundError: io/opencensus/trace/Tracestate`. This seems to 
be caused by conflicting versions of `io.opencensus:opencensus-api` being 
included by the BigTable client and the Spanner client. BigTable imports 
version 0.15.0. Spanner depends on 0.18.0, but as they are at the same level in 
the dependency tree and BigTable is defined first, version 0.15.0 is used.

 

The workaround for this issue is to exclude the BigTable client in the project 
pom in order to be able to use SpannerIO.

 

An example pom and simple Java class are included. If the commented exclusion 
of the BigTable client is removed, the example will run without problems. The 
example will also run without problems on Beam version 2.17 without the 
exclusion.

 

  was:
If I include `beam-sdks-java-io-google-cloud-platform` version 2.18.0 in a 
project and try to use `SpannerIO`, the exception 
`java.lang.NoClassDefFoundError: io/opencensus/trace/Tracestate`. This seems to 
be caused by conflicting versions of `io.opencensus:opencensus-api` being 
included by the BigTable client and the Spanner client. BigTable imports 
version 0.15.0. Spanner depends on 0.18.0, but as they are at the same level in 
the dependency tree and BigTable is defined first, version 0.15.0 is used.

 

The workaround for this issue is to exclude the BigTable client in the project 
pom in order to be able to use SpannerIO.

 

An example pom and simple Java class are listed below. If the commented 
exclusion of the BigTable client is removed, the example will run without 
problems. The example will also run without problems on Beam version 2.17 
without the exclusion.

 

--- POM File ---

http://maven.apache.org/POM/4.0.0"; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd";>
  4.0.0
  com.google.cloud
  beam-sdk-test
  0.0.1-SNAPSHOT
  Beam SDK Test

  
UTF-8
1.8
1.8
1.8
2.18.0
  
  
  

  org.apache.beam
  beam-sdks-java-io-google-cloud-platform
  ${apache_beam.version}
  



  org.apache.beam
  beam-runners-direct-java
  ${apache_beam.version}

  



--- JAVA FILE ---
/*
 * Copyright 2017 Google Inc.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package com.google.cloud.beamsdk.test;

import com.google.cloud.spanner.Struct;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.io.gcp.spanner.SpannerIO;
import org.apache.beam.sdk.options.Description;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.options.Validation;
import org.apache.beam.sdk.transforms.Count;
import org.apache.beam.sdk.transforms.Sum;
import org.apache.beam.sdk.transforms.ToString;
import org.apache.beam.sdk.values.PCollection;

/*
This sample demonstrates how to read from a Spanner table.

## Prerequisites
* Maven installed
* Set up GCP default credentials, one of the following:
- export GOOGLE_APPLICATION_CREDENTIALS=path/to/credentials.json
- gcloud auth application-default login
  
[https://developers.google.com/identity/protocols/application-default-credentials]
* Create the Spanner table to read from, you'll need:
- Instance ID
- Database ID
- Any table, preferably populated
  [https://cloud.google.com/spanner/docs/quickstart-console]

## How to run
mvn clean
mvn compile
mvn exec:java \
-Dexec.mainClass=com.example.dataflow.SpannerRead \
-Dexec.args="--instanceId=my-instance-id \
 --databaseId=my-database-id \
 --table=my_table \
 --output=path/to/output_file"
*/
public class SpannerRead {

  public interface Options extends PipelineOptions {

@Description("Spanner instance ID to query from")
@Validation.Required
String getInstanceId();

void setInstanceId(String value);

@Description("Spanner database name to query from")
@Validation.Required
String getDatabaseId();

void setDatabaseId(String value);

@Description(

[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=386050&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386050
 ]

ASF GitHub Bot logged work on BEAM-6522:


Author: ASF GitHub Bot
Created on: 12/Feb/20 16:29
Start Date: 12/Feb/20 16:29
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10838: [BEAM-6522] 
[BEAM-7455] Unskip Avro IO tests that are now passing.
URL: https://github.com/apache/beam/pull/10838#issuecomment-585291567
 
 
   Run Python 3.6 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386050)
Time Spent: 8.5h  (was: 8h 20m)

> Dill fails to pickle  avro.RecordSchema classes on Python 3.
> 
>
> Key: BEAM-6522
> URL: https://issues.apache.org/jira/browse/BEAM-6522
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> The avroio module still has 4 failing tests. This is actually 2 times the 
> same 2 tests, both for Avro and Fastavro.
> *apache_beam.io.avroio_test.TestAvro.test_sink_transform*
>  *apache_beam.io.avroio_test.TestFastAvro.test_sink_transform*
> fail with:
> {code:java}
> Traceback (most recent call last):
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", 
> line 432, in test_sink_transform
> | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line 
> 528, in expand
> return pcoll | beam.io.iobase.Write(self._sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 960, in expand
> return pcoll | WriteImpl(self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 979, in expand
> lambda _, sink: sink.initialize_write(), self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1103, in Map
> pardo = FlatMap(wrapper, *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1054, in FlatMap
> pardo = ParDo(CallableWrapperDoFn(fn), *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 864, in __init__
> super(ParDo, self).__init__(fn, *args, **kwargs)
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 646, in __init__
> self.args = pickler.loads(pickler.dumps(self.args))
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_beam/i

[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=386052&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386052
 ]

ASF GitHub Bot logged work on BEAM-6522:


Author: ASF GitHub Bot
Created on: 12/Feb/20 16:30
Start Date: 12/Feb/20 16:30
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10838: [BEAM-6522] 
[BEAM-7455] Unskip Avro IO tests that are now passing.
URL: https://github.com/apache/beam/pull/10838#issuecomment-585245841
 
 
   Run Python 3.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386052)
Time Spent: 8h 50m  (was: 8h 40m)

> Dill fails to pickle  avro.RecordSchema classes on Python 3.
> 
>
> Key: BEAM-6522
> URL: https://issues.apache.org/jira/browse/BEAM-6522
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> The avroio module still has 4 failing tests. This is actually 2 times the 
> same 2 tests, both for Avro and Fastavro.
> *apache_beam.io.avroio_test.TestAvro.test_sink_transform*
>  *apache_beam.io.avroio_test.TestFastAvro.test_sink_transform*
> fail with:
> {code:java}
> Traceback (most recent call last):
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", 
> line 432, in test_sink_transform
> | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line 
> 528, in expand
> return pcoll | beam.io.iobase.Write(self._sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 960, in expand
> return pcoll | WriteImpl(self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 979, in expand
> lambda _, sink: sink.initialize_write(), self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1103, in Map
> pardo = FlatMap(wrapper, *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1054, in FlatMap
> pardo = ParDo(CallableWrapperDoFn(fn), *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 864, in __init__
> super(ParDo, self).__init__(fn, *args, **kwargs)
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 646, in __init__
> self.args = pickler.loads(pickler.dumps(self.args))
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_be

[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=386054&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386054
 ]

ASF GitHub Bot logged work on BEAM-6522:


Author: ASF GitHub Bot
Created on: 12/Feb/20 16:30
Start Date: 12/Feb/20 16:30
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10838: [BEAM-6522] 
[BEAM-7455] Unskip Avro IO tests that are now passing.
URL: https://github.com/apache/beam/pull/10838#issuecomment-584992554
 
 
   Run Python 3.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386054)
Time Spent: 9h 10m  (was: 9h)

> Dill fails to pickle  avro.RecordSchema classes on Python 3.
> 
>
> Key: BEAM-6522
> URL: https://issues.apache.org/jira/browse/BEAM-6522
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> The avroio module still has 4 failing tests. This is actually 2 times the 
> same 2 tests, both for Avro and Fastavro.
> *apache_beam.io.avroio_test.TestAvro.test_sink_transform*
>  *apache_beam.io.avroio_test.TestFastAvro.test_sink_transform*
> fail with:
> {code:java}
> Traceback (most recent call last):
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", 
> line 432, in test_sink_transform
> | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line 
> 528, in expand
> return pcoll | beam.io.iobase.Write(self._sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 960, in expand
> return pcoll | WriteImpl(self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 979, in expand
> lambda _, sink: sink.initialize_write(), self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1103, in Map
> pardo = FlatMap(wrapper, *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1054, in FlatMap
> pardo = ParDo(CallableWrapperDoFn(fn), *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 864, in __init__
> super(ParDo, self).__init__(fn, *args, **kwargs)
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 646, in __init__
> self.args = pickler.loads(pickler.dumps(self.args))
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_beam/i

[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=386051&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386051
 ]

ASF GitHub Bot logged work on BEAM-6522:


Author: ASF GitHub Bot
Created on: 12/Feb/20 16:30
Start Date: 12/Feb/20 16:30
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10838: [BEAM-6522] 
[BEAM-7455] Unskip Avro IO tests that are now passing.
URL: https://github.com/apache/beam/pull/10838#issuecomment-585049296
 
 
   Run Python 3.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386051)
Time Spent: 8h 40m  (was: 8.5h)

> Dill fails to pickle  avro.RecordSchema classes on Python 3.
> 
>
> Key: BEAM-6522
> URL: https://issues.apache.org/jira/browse/BEAM-6522
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> The avroio module still has 4 failing tests. This is actually 2 times the 
> same 2 tests, both for Avro and Fastavro.
> *apache_beam.io.avroio_test.TestAvro.test_sink_transform*
>  *apache_beam.io.avroio_test.TestFastAvro.test_sink_transform*
> fail with:
> {code:java}
> Traceback (most recent call last):
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", 
> line 432, in test_sink_transform
> | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line 
> 528, in expand
> return pcoll | beam.io.iobase.Write(self._sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 960, in expand
> return pcoll | WriteImpl(self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 979, in expand
> lambda _, sink: sink.initialize_write(), self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1103, in Map
> pardo = FlatMap(wrapper, *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1054, in FlatMap
> pardo = ParDo(CallableWrapperDoFn(fn), *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 864, in __init__
> super(ParDo, self).__init__(fn, *args, **kwargs)
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 646, in __init__
> self.args = pickler.loads(pickler.dumps(self.args))
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_beam

[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=386053&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386053
 ]

ASF GitHub Bot logged work on BEAM-6522:


Author: ASF GitHub Bot
Created on: 12/Feb/20 16:30
Start Date: 12/Feb/20 16:30
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10838: [BEAM-6522] 
[BEAM-7455] Unskip Avro IO tests that are now passing.
URL: https://github.com/apache/beam/pull/10838#issuecomment-584957126
 
 
   Run Python 3.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386053)
Time Spent: 9h  (was: 8h 50m)

> Dill fails to pickle  avro.RecordSchema classes on Python 3.
> 
>
> Key: BEAM-6522
> URL: https://issues.apache.org/jira/browse/BEAM-6522
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> The avroio module still has 4 failing tests. This is actually 2 times the 
> same 2 tests, both for Avro and Fastavro.
> *apache_beam.io.avroio_test.TestAvro.test_sink_transform*
>  *apache_beam.io.avroio_test.TestFastAvro.test_sink_transform*
> fail with:
> {code:java}
> Traceback (most recent call last):
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", 
> line 432, in test_sink_transform
> | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line 
> 528, in expand
> return pcoll | beam.io.iobase.Write(self._sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 960, in expand
> return pcoll | WriteImpl(self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 979, in expand
> lambda _, sink: sink.initialize_write(), self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1103, in Map
> pardo = FlatMap(wrapper, *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1054, in FlatMap
> pardo = ParDo(CallableWrapperDoFn(fn), *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 864, in __init__
> super(ParDo, self).__init__(fn, *args, **kwargs)
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 646, in __init__
> self.args = pickler.loads(pickler.dumps(self.args))
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_beam/inter

[jira] [Work logged] (BEAM-7198) Rename ToStringCoder into ToBytesCoder

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7198?focusedWorklogId=386058&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386058
 ]

ASF GitHub Bot logged work on BEAM-7198:


Author: ASF GitHub Bot
Created on: 12/Feb/20 16:32
Start Date: 12/Feb/20 16:32
Worklog Time Spent: 10m 
  Work Description: lazylynx commented on issue #10828: [BEAM-7198] rename 
ToStringCoder to ToBytesCoder for proper representation of its role
URL: https://github.com/apache/beam/pull/10828#issuecomment-585293258
 
 
   @tvalentyn PTAL
   I missed formatting with yapf.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386058)
Time Spent: 1h  (was: 50m)

> Rename ToStringCoder into ToBytesCoder
> --
>
> Key: BEAM-7198
> URL: https://issues.apache.org/jira/browse/BEAM-7198
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Minor
>  Labels: easy-fix, starter
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The name of ToStringCoder class [1] is confusing, since the output of 
> encode() on Python3 will be bytes. On Python 2 the output is also bytes, 
> since bytes and string are synonyms on Py2.
> ToBytesCoder would be a better name for this class. 
> Note that this class is not listed in coders that constitute Public APIs [2], 
> so we can treat this as internal change. As a courtesy to users  who happened 
> to reference a non-public coder in their pipelines we can keep the old class 
> name as an alias, e.g. ToStringCoder = ToBytesCoder to avoid friction, but 
> clean up Beam codeabase to use the new name.
> [1] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L344
> [2] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L20
> cc: [~yoshiki.obata] [~chamikara]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3221) Model pipeline representation improvements

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3221?focusedWorklogId=386066&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386066
 ]

ASF GitHub Bot logged work on BEAM-3221:


Author: ASF GitHub Bot
Created on: 12/Feb/20 16:49
Start Date: 12/Feb/20 16:49
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10779: [BEAM-3221, 
BEAM-4180] Clarify documentation for StandardTransforms.Primitives, Pipeline, 
and PTransform.
URL: https://github.com/apache/beam/pull/10779
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386066)
Time Spent: 5h 50m  (was: 5h 40m)

> Model pipeline representation improvements
> --
>
> Key: BEAM-3221
> URL: https://issues.apache.org/jira/browse/BEAM-3221
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model
>Reporter: Henning Rohde
>Priority: Major
>  Labels: portability
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Collections of various (breaking) tweaks to the Runner API, notably the 
> pipeline representation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7198) Rename ToStringCoder into ToBytesCoder

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7198?focusedWorklogId=386073&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386073
 ]

ASF GitHub Bot logged work on BEAM-7198:


Author: ASF GitHub Bot
Created on: 12/Feb/20 17:05
Start Date: 12/Feb/20 17:05
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10828: [BEAM-7198] rename 
ToStringCoder to ToBytesCoder for proper representation of its role
URL: https://github.com/apache/beam/pull/10828#issuecomment-585310203
 
 
   test test test
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386073)
Time Spent: 1h 10m  (was: 1h)

> Rename ToStringCoder into ToBytesCoder
> --
>
> Key: BEAM-7198
> URL: https://issues.apache.org/jira/browse/BEAM-7198
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Minor
>  Labels: easy-fix, starter
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The name of ToStringCoder class [1] is confusing, since the output of 
> encode() on Python3 will be bytes. On Python 2 the output is also bytes, 
> since bytes and string are synonyms on Py2.
> ToBytesCoder would be a better name for this class. 
> Note that this class is not listed in coders that constitute Public APIs [2], 
> so we can treat this as internal change. As a courtesy to users  who happened 
> to reference a non-public coder in their pipelines we can keep the old class 
> name as an alias, e.g. ToStringCoder = ToBytesCoder to avoid friction, but 
> clean up Beam codeabase to use the new name.
> [1] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L344
> [2] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L20
> cc: [~yoshiki.obata] [~chamikara]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7198) Rename ToStringCoder into ToBytesCoder

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7198?focusedWorklogId=386075&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386075
 ]

ASF GitHub Bot logged work on BEAM-7198:


Author: ASF GitHub Bot
Created on: 12/Feb/20 17:06
Start Date: 12/Feb/20 17:06
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10828: [BEAM-7198] rename 
ToStringCoder to ToBytesCoder for proper representation of its role
URL: https://github.com/apache/beam/pull/10828#issuecomment-585310695
 
 
   Run PythonFormatter PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386075)
Time Spent: 1.5h  (was: 1h 20m)

> Rename ToStringCoder into ToBytesCoder
> --
>
> Key: BEAM-7198
> URL: https://issues.apache.org/jira/browse/BEAM-7198
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Minor
>  Labels: easy-fix, starter
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The name of ToStringCoder class [1] is confusing, since the output of 
> encode() on Python3 will be bytes. On Python 2 the output is also bytes, 
> since bytes and string are synonyms on Py2.
> ToBytesCoder would be a better name for this class. 
> Note that this class is not listed in coders that constitute Public APIs [2], 
> so we can treat this as internal change. As a courtesy to users  who happened 
> to reference a non-public coder in their pipelines we can keep the old class 
> name as an alias, e.g. ToStringCoder = ToBytesCoder to avoid friction, but 
> clean up Beam codeabase to use the new name.
> [1] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L344
> [2] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L20
> cc: [~yoshiki.obata] [~chamikara]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7198) Rename ToStringCoder into ToBytesCoder

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7198?focusedWorklogId=386074&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386074
 ]

ASF GitHub Bot logged work on BEAM-7198:


Author: ASF GitHub Bot
Created on: 12/Feb/20 17:06
Start Date: 12/Feb/20 17:06
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10828: [BEAM-7198] rename 
ToStringCoder to ToBytesCoder for proper representation of its role
URL: https://github.com/apache/beam/pull/10828#issuecomment-585310567
 
 
   hmm... Looks like Jenkins is not listening  this time.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386074)
Time Spent: 1h 20m  (was: 1h 10m)

> Rename ToStringCoder into ToBytesCoder
> --
>
> Key: BEAM-7198
> URL: https://issues.apache.org/jira/browse/BEAM-7198
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Minor
>  Labels: easy-fix, starter
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The name of ToStringCoder class [1] is confusing, since the output of 
> encode() on Python3 will be bytes. On Python 2 the output is also bytes, 
> since bytes and string are synonyms on Py2.
> ToBytesCoder would be a better name for this class. 
> Note that this class is not listed in coders that constitute Public APIs [2], 
> so we can treat this as internal change. As a courtesy to users  who happened 
> to reference a non-public coder in their pipelines we can keep the old class 
> name as an alias, e.g. ToStringCoder = ToBytesCoder to avoid friction, but 
> clean up Beam codeabase to use the new name.
> [1] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L344
> [2] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L20
> cc: [~yoshiki.obata] [~chamikara]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7198) Rename ToStringCoder into ToBytesCoder

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7198?focusedWorklogId=386076&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386076
 ]

ASF GitHub Bot logged work on BEAM-7198:


Author: ASF GitHub Bot
Created on: 12/Feb/20 17:07
Start Date: 12/Feb/20 17:07
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10828: [BEAM-7198] rename 
ToStringCoder to ToBytesCoder for proper representation of its role
URL: https://github.com/apache/beam/pull/10828#issuecomment-585310876
 
 
   Yep, not listening :(
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386076)
Time Spent: 1h 40m  (was: 1.5h)

> Rename ToStringCoder into ToBytesCoder
> --
>
> Key: BEAM-7198
> URL: https://issues.apache.org/jira/browse/BEAM-7198
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Minor
>  Labels: easy-fix, starter
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The name of ToStringCoder class [1] is confusing, since the output of 
> encode() on Python3 will be bytes. On Python 2 the output is also bytes, 
> since bytes and string are synonyms on Py2.
> ToBytesCoder would be a better name for this class. 
> Note that this class is not listed in coders that constitute Public APIs [2], 
> so we can treat this as internal change. As a courtesy to users  who happened 
> to reference a non-public coder in their pipelines we can keep the old class 
> name as an alias, e.g. ToStringCoder = ToBytesCoder to avoid friction, but 
> clean up Beam codeabase to use the new name.
> [1] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L344
> [2] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L20
> cc: [~yoshiki.obata] [~chamikara]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9302) No space left on device - apache-beam-jenkins-7

2020-02-12 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035526#comment-17035526
 ] 

Valentyn Tymofieiev commented on BEAM-9302:
---

cc: [~markliu]

> No space left on device - apache-beam-jenkins-7
> ---
>
> Key: BEAM-9302
> URL: https://issues.apache.org/jira/browse/BEAM-9302
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Michał Walenia
>Priority: Blocker
>
> [https://builds.apache.org/job/beam_PreCommit_SQL_Commit/543/consoleFull] log 
> of a failed job with this error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9302) No space left on device - apache-beam-jenkins-7

2020-02-12 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035524#comment-17035524
 ] 

Valentyn Tymofieiev commented on BEAM-9302:
---

cc: [~yifanzou]

> No space left on device - apache-beam-jenkins-7
> ---
>
> Key: BEAM-9302
> URL: https://issues.apache.org/jira/browse/BEAM-9302
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Michał Walenia
>Priority: Blocker
>
> [https://builds.apache.org/job/beam_PreCommit_SQL_Commit/543/consoleFull] log 
> of a failed job with this error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9303) HDFS IT test fails on apache-beam-jenkins-7 : no space left on device

2020-02-12 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035527#comment-17035527
 ] 

Valentyn Tymofieiev commented on BEAM-9303:
---

This is a duplicate of BEAM-9302.

> HDFS IT test fails on apache-beam-jenkins-7 : no space left on device
> -
>
> Key: BEAM-9303
> URL: https://issues.apache.org/jira/browse/BEAM-9303
> Project: Beam
>  Issue Type: Improvement
>  Components: test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Yifan Zou
>Priority: Major
>
> 22:34:34  > Task :sdks:python:test-suites:direct:py37:hdfsIntegrationTest
> 22:34:34  namenode_1  | 20/02/12 06:34:34 WARN 
> namenode.NameNodeResourceChecker: Space available on volume '/dev/sda1' is 
> 20590592, which is below the configured reserved amount 104857600
> 22:34:34  namenode_1  | 20/02/12 06:34:34 WARN namenode.FSNamesystem: 
> NameNode low on available disk space. Entering safe mode.
> 22:34:34  namenode_1  | 20/02/12 06:34:34 INFO namenode.FSEditLog: logSyncAll 
> toSyncToTxId=1 lastSyncedTxid=1 mostRecentTxid=1
> 22:34:34  namenode_1  | 20/02/12 06:34:34 INFO namenode.FSEditLog: Number of 
> transactions: 1 Total time for transactions(ms): 0 Number of transactions 
> batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 9 
> 22:34:34  namenode_1  | 20/02/12 06:34:34 INFO namenode.FSEditLog: Done 
> logSyncAll lastWrittenTxId=1 lastSyncedTxid=1 mostRecentTxid=1
> 22:34:34  namenode_1  | 20/02/12 06:34:34 INFO hdfs.StateChange: STATE* Safe 
> mode is ON. 
> 22:34:34  namenode_1  | Resources are low on NN. Please add or free up more 
> resources then turn off safe mode manually. NOTE:  If you turn off safe mode 
> before adding resources, the NN will immediately return to safe mode. Use 
> "hdfs dfsadmin -safemode leave" to turn safe mode off.
> 22:34:36  test_1  | ERROR: invocation failed (exit code 1), logfile: 
> /app/sdks/python/target/.tox/hdfs_integration_test/log/hdfs_integration_test-2.log
> 22:34:36  test_1  | == log start 
> ===
> 22:34:36  test_1  | Processing 
> ./target/.tox/.tmp/package/1/apache-beam-2.20.0.dev0.zip
> 22:34:36  test_1  | Requirement already satisfied: crcmod<2.0,>=1.7 in 
> ./target/.tox/hdfs_integration_test/lib/python3.7/site-packages (from 
> apache-beam==2.20.0.dev0) (1.7)
> 22:34:36  test_1  | Collecting dill<0.3.2,>=0.3.1.1
> 22:34:36  test_1  |   Downloading dill-0.3.1.1.tar.gz (151 kB)
> 22:34:36  test_1  | Collecting fastavro<0.22,>=0.21.4
> 22:34:36  test_1  |   Downloading 
> fastavro-0.21.24-cp37-cp37m-manylinux1_x86_64.whl (1.2 MB)
> 22:34:36  test_1  | Requirement already satisfied: future<1.0.0,>=0.16.0 
> in ./target/.tox/hdfs_integration_test/lib/python3.7/site-packages (from 
> apache-beam==2.20.0.dev0) (0.16.0)
> 22:34:36  test_1  | Requirement already satisfied: grpcio<2,>=1.12.1 in 
> ./target/.tox/hdfs_integration_test/lib/python3.7/site-packages (from 
> apache-beam==2.20.0.dev0) (1.27.1)
> 22:34:36  test_1  | Collecting hdfs<3.0.0,>=2.1.0
> 22:34:36  test_1  |   Downloading hdfs-2.5.8.tar.gz (41 kB)
> 22:34:36  test_1  | Collecting httplib2<=0.12.0,>=0.8
> 22:34:36  test_1  |   Downloading httplib2-0.12.0.tar.gz (218 kB)
> 22:34:36  test_1  | Requirement already satisfied: mock<3.0.0,>=1.0.1 in 
> ./target/.tox/hdfs_integration_test/lib/python3.7/site-packages (from 
> apache-beam==2.20.0.dev0) (2.0.0)
> 22:34:36  test_1  | Collecting numpy<2,>=1.14.3
> 22:34:36  test_1  |   Downloading 
> numpy-1.18.1-cp37-cp37m-manylinux1_x86_64.whl (20.1 MB)
> 22:34:36  test_1  | Collecting pymongo<4.0.0,>=3.8.0
> 22:34:36  test_1  |   Downloading 
> pymongo-3.10.1-cp37-cp37m-manylinux2014_x86_64.whl (462 kB)
> 22:34:36  test_1  | Collecting oauth2client<4,>=2.0.1
> 22:34:36  test_1  |   Downloading oauth2client-3.0.0.tar.gz (77 kB)
> 22:34:36  test_1  | Requirement already satisfied: 
> protobuf<4,>=3.5.0.post1 in 
> ./target/.tox/hdfs_integration_test/lib/python3.7/site-packages (from 
> apache-beam==2.20.0.dev0) (3.11.3)
> 22:34:36  test_1  | Collecting pydot<2,>=1.2.0
> 22:34:36  test_1  |   Downloading pydot-1.4.1-py2.py3-none-any.whl (19 kB)
> 22:34:36  test_1  | Collecting python-dateutil<3,>=2.8.0
> 22:34:36  test_1  |   Downloading 
> python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB)
> 22:34:36  test_1  | Collecting pytz>=2018.3
> 22:34:36  test_1  |   Downloading pytz-2019.3-py2.py3-none-any.whl (509 
> kB)
> 22:34:36  test_1  | Collecting typing<3.8.0,>=3.7.0
> 22:34:36  test_1  |   Downloading typing-3.7.4.1-py3-none-any.whl (25 kB)
> 22:34:36  test_1  | Collecting typing-extensions<3.8.0,>=3.7.0
> 22:34:36  t

[jira] [Closed] (BEAM-9303) HDFS IT test fails on apache-beam-jenkins-7 : no space left on device

2020-02-12 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev closed BEAM-9303.
-
Fix Version/s: Not applicable
   Resolution: Duplicate

> HDFS IT test fails on apache-beam-jenkins-7 : no space left on device
> -
>
> Key: BEAM-9303
> URL: https://issues.apache.org/jira/browse/BEAM-9303
> Project: Beam
>  Issue Type: Improvement
>  Components: test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Yifan Zou
>Priority: Major
> Fix For: Not applicable
>
>
> 22:34:34  > Task :sdks:python:test-suites:direct:py37:hdfsIntegrationTest
> 22:34:34  namenode_1  | 20/02/12 06:34:34 WARN 
> namenode.NameNodeResourceChecker: Space available on volume '/dev/sda1' is 
> 20590592, which is below the configured reserved amount 104857600
> 22:34:34  namenode_1  | 20/02/12 06:34:34 WARN namenode.FSNamesystem: 
> NameNode low on available disk space. Entering safe mode.
> 22:34:34  namenode_1  | 20/02/12 06:34:34 INFO namenode.FSEditLog: logSyncAll 
> toSyncToTxId=1 lastSyncedTxid=1 mostRecentTxid=1
> 22:34:34  namenode_1  | 20/02/12 06:34:34 INFO namenode.FSEditLog: Number of 
> transactions: 1 Total time for transactions(ms): 0 Number of transactions 
> batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 9 
> 22:34:34  namenode_1  | 20/02/12 06:34:34 INFO namenode.FSEditLog: Done 
> logSyncAll lastWrittenTxId=1 lastSyncedTxid=1 mostRecentTxid=1
> 22:34:34  namenode_1  | 20/02/12 06:34:34 INFO hdfs.StateChange: STATE* Safe 
> mode is ON. 
> 22:34:34  namenode_1  | Resources are low on NN. Please add or free up more 
> resources then turn off safe mode manually. NOTE:  If you turn off safe mode 
> before adding resources, the NN will immediately return to safe mode. Use 
> "hdfs dfsadmin -safemode leave" to turn safe mode off.
> 22:34:36  test_1  | ERROR: invocation failed (exit code 1), logfile: 
> /app/sdks/python/target/.tox/hdfs_integration_test/log/hdfs_integration_test-2.log
> 22:34:36  test_1  | == log start 
> ===
> 22:34:36  test_1  | Processing 
> ./target/.tox/.tmp/package/1/apache-beam-2.20.0.dev0.zip
> 22:34:36  test_1  | Requirement already satisfied: crcmod<2.0,>=1.7 in 
> ./target/.tox/hdfs_integration_test/lib/python3.7/site-packages (from 
> apache-beam==2.20.0.dev0) (1.7)
> 22:34:36  test_1  | Collecting dill<0.3.2,>=0.3.1.1
> 22:34:36  test_1  |   Downloading dill-0.3.1.1.tar.gz (151 kB)
> 22:34:36  test_1  | Collecting fastavro<0.22,>=0.21.4
> 22:34:36  test_1  |   Downloading 
> fastavro-0.21.24-cp37-cp37m-manylinux1_x86_64.whl (1.2 MB)
> 22:34:36  test_1  | Requirement already satisfied: future<1.0.0,>=0.16.0 
> in ./target/.tox/hdfs_integration_test/lib/python3.7/site-packages (from 
> apache-beam==2.20.0.dev0) (0.16.0)
> 22:34:36  test_1  | Requirement already satisfied: grpcio<2,>=1.12.1 in 
> ./target/.tox/hdfs_integration_test/lib/python3.7/site-packages (from 
> apache-beam==2.20.0.dev0) (1.27.1)
> 22:34:36  test_1  | Collecting hdfs<3.0.0,>=2.1.0
> 22:34:36  test_1  |   Downloading hdfs-2.5.8.tar.gz (41 kB)
> 22:34:36  test_1  | Collecting httplib2<=0.12.0,>=0.8
> 22:34:36  test_1  |   Downloading httplib2-0.12.0.tar.gz (218 kB)
> 22:34:36  test_1  | Requirement already satisfied: mock<3.0.0,>=1.0.1 in 
> ./target/.tox/hdfs_integration_test/lib/python3.7/site-packages (from 
> apache-beam==2.20.0.dev0) (2.0.0)
> 22:34:36  test_1  | Collecting numpy<2,>=1.14.3
> 22:34:36  test_1  |   Downloading 
> numpy-1.18.1-cp37-cp37m-manylinux1_x86_64.whl (20.1 MB)
> 22:34:36  test_1  | Collecting pymongo<4.0.0,>=3.8.0
> 22:34:36  test_1  |   Downloading 
> pymongo-3.10.1-cp37-cp37m-manylinux2014_x86_64.whl (462 kB)
> 22:34:36  test_1  | Collecting oauth2client<4,>=2.0.1
> 22:34:36  test_1  |   Downloading oauth2client-3.0.0.tar.gz (77 kB)
> 22:34:36  test_1  | Requirement already satisfied: 
> protobuf<4,>=3.5.0.post1 in 
> ./target/.tox/hdfs_integration_test/lib/python3.7/site-packages (from 
> apache-beam==2.20.0.dev0) (3.11.3)
> 22:34:36  test_1  | Collecting pydot<2,>=1.2.0
> 22:34:36  test_1  |   Downloading pydot-1.4.1-py2.py3-none-any.whl (19 kB)
> 22:34:36  test_1  | Collecting python-dateutil<3,>=2.8.0
> 22:34:36  test_1  |   Downloading 
> python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB)
> 22:34:36  test_1  | Collecting pytz>=2018.3
> 22:34:36  test_1  |   Downloading pytz-2019.3-py2.py3-none-any.whl (509 
> kB)
> 22:34:36  test_1  | Collecting typing<3.8.0,>=3.7.0
> 22:34:36  test_1  |   Downloading typing-3.7.4.1-py3-none-any.whl (25 kB)
> 22:34:36  test_1  | Collecting typing-extensions<3.8.0,>=3.7.0
> 22

[jira] [Work logged] (BEAM-8201) clean up the current container API

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8201?focusedWorklogId=386083&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386083
 ]

ASF GitHub Bot logged work on BEAM-8201:


Author: ASF GitHub Bot
Created on: 12/Feb/20 17:21
Start Date: 12/Feb/20 17:21
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10839: [BEAM-8201] Add 
other endpoint fields to provision API.
URL: https://github.com/apache/beam/pull/10839#issuecomment-585317785
 
 
   The test failures are irrelevant--this only add unused fields to a proto. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386083)
Time Spent: 20m  (was: 10m)

> clean up the current container API
> --
>
> Key: BEAM-8201
> URL: https://issues.apache.org/jira/browse/BEAM-8201
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> From [~robertwb]
> As part of this project, I propose we look at and clean up the current 
> container API before we "release" it as public and stable. IIRC, we currently 
> provide the worker arguments through a combination of (1) environment 
> variables (2) command line parameters to docker and (3) via the provisioning 
> API. It would be good to have a more principled approach to specifying 
> arguments (either all the same way, or if they vary, good reason for doing so 
> rather than by historical accident).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8201) clean up the current container API

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8201?focusedWorklogId=386086&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386086
 ]

ASF GitHub Bot logged work on BEAM-8201:


Author: ASF GitHub Bot
Created on: 12/Feb/20 17:26
Start Date: 12/Feb/20 17:26
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10839: [BEAM-8201] 
Add other endpoint fields to provision API.
URL: https://github.com/apache/beam/pull/10839
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386086)
Time Spent: 0.5h  (was: 20m)

> clean up the current container API
> --
>
> Key: BEAM-8201
> URL: https://issues.apache.org/jira/browse/BEAM-8201
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> From [~robertwb]
> As part of this project, I propose we look at and clean up the current 
> container API before we "release" it as public and stable. IIRC, we currently 
> provide the worker arguments through a combination of (1) environment 
> variables (2) command line parameters to docker and (3) via the provisioning 
> API. It would be good to have a more principled approach to specifying 
> arguments (either all the same way, or if they vary, good reason for doing so 
> rather than by historical accident).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=386089&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386089
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 12/Feb/20 17:34
Start Date: 12/Feb/20 17:34
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r378404834
 
 

 ##
 File path: 
sdks/java/io/thrift/src/test/java/org/apache/beam/sdk/io/thrift/TestThriftStruct.java
 ##
 @@ -0,0 +1,1232 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
 
 Review comment:
   Sounds good, I'll work on this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386089)
Time Spent: 15h  (was: 14h 50m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 15h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5605) Support Portable SplittableDoFn for batch

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5605?focusedWorklogId=386090&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386090
 ]

ASF GitHub Bot logged work on BEAM-5605:


Author: ASF GitHub Bot
Created on: 12/Feb/20 17:38
Start Date: 12/Feb/20 17:38
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10576: [BEAM-5605] Convert 
all BoundedSources to SplittableDoFns when using beam_fn_api experiment.
URL: https://github.com/apache/beam/pull/10576#issuecomment-585325143
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386090)
Time Spent: 12h 50m  (was: 12h 40m)

> Support Portable SplittableDoFn for batch
> -
>
> Key: BEAM-5605
> URL: https://issues.apache.org/jira/browse/BEAM-5605
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>  Time Spent: 12h 50m
>  Remaining Estimate: 0h
>
> Roll-up item tracking work towards supporting portable SplittableDoFn for 
> batch



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5605) Support Portable SplittableDoFn for batch

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5605?focusedWorklogId=386091&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386091
 ]

ASF GitHub Bot logged work on BEAM-5605:


Author: ASF GitHub Bot
Created on: 12/Feb/20 17:38
Start Date: 12/Feb/20 17:38
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10576: [BEAM-5605] Convert 
all BoundedSources to SplittableDoFns when using beam_fn_api experiment.
URL: https://github.com/apache/beam/pull/10576#issuecomment-585325730
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386091)
Time Spent: 13h  (was: 12h 50m)

> Support Portable SplittableDoFn for batch
> -
>
> Key: BEAM-5605
> URL: https://issues.apache.org/jira/browse/BEAM-5605
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> Roll-up item tracking work towards supporting portable SplittableDoFn for 
> batch



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5605) Support Portable SplittableDoFn for batch

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5605?focusedWorklogId=386092&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386092
 ]

ASF GitHub Bot logged work on BEAM-5605:


Author: ASF GitHub Bot
Created on: 12/Feb/20 17:38
Start Date: 12/Feb/20 17:38
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10576: [BEAM-5605] Convert 
all BoundedSources to SplittableDoFns when using beam_fn_api experiment.
URL: https://github.com/apache/beam/pull/10576#issuecomment-585325987
 
 
   Run Python2_PVR_Flink PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386092)
Time Spent: 13h 10m  (was: 13h)

> Support Portable SplittableDoFn for batch
> -
>
> Key: BEAM-5605
> URL: https://issues.apache.org/jira/browse/BEAM-5605
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>  Time Spent: 13h 10m
>  Remaining Estimate: 0h
>
> Roll-up item tracking work towards supporting portable SplittableDoFn for 
> batch



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5605) Support Portable SplittableDoFn for batch

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5605?focusedWorklogId=386094&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386094
 ]

ASF GitHub Bot logged work on BEAM-5605:


Author: ASF GitHub Bot
Created on: 12/Feb/20 17:39
Start Date: 12/Feb/20 17:39
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10576: [BEAM-5605] Convert 
all BoundedSources to SplittableDoFns when using beam_fn_api experiment.
URL: https://github.com/apache/beam/pull/10576#issuecomment-585327146
 
 
   Run Python2_PVR_Flink PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386094)
Time Spent: 13h 20m  (was: 13h 10m)

> Support Portable SplittableDoFn for batch
> -
>
> Key: BEAM-5605
> URL: https://issues.apache.org/jira/browse/BEAM-5605
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> Roll-up item tracking work towards supporting portable SplittableDoFn for 
> batch



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9269) Set shorter Commit Deadline and handle with backoff/retry

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9269?focusedWorklogId=386100&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386100
 ]

ASF GitHub Bot logged work on BEAM-9269:


Author: ASF GitHub Bot
Created on: 12/Feb/20 17:45
Start Date: 12/Feb/20 17:45
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #10752: 
[BEAM-9269] Add commit deadline for Spanner writes.
URL: https://github.com/apache/beam/pull/10752
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386100)
Time Spent: 4h  (was: 3h 50m)

> Set shorter Commit Deadline and handle with backoff/retry
> -
>
> Key: BEAM-9269
> URL: https://issues.apache.org/jira/browse/BEAM-9269
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0, 2.17.0, 2.18.0, 2.19.0
>Reporter: Niel Markwick
>Assignee: Niel Markwick
>Priority: Major
>  Labels: google-cloud-spanner
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Default commit deadline in Spanner is 1hr, which can lead to a variety of 
> issues including database overload and session expiry.
> Shorter deadline should be set with backoff/retry when deadline expires, so 
> that the Spanner database does not become overloaded.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9302) No space left on device - apache-beam-jenkins-7

2020-02-12 Thread Yifan Zou (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035557#comment-17035557
 ] 

Yifan Zou commented on BEAM-9302:
-

Sorry for jumping into this late. Checked on that machine, and seems like 
dev/sha1 is full. I'm tracking down where the excess usage is being stored.

> No space left on device - apache-beam-jenkins-7
> ---
>
> Key: BEAM-9302
> URL: https://issues.apache.org/jira/browse/BEAM-9302
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Michał Walenia
>Priority: Blocker
>
> [https://builds.apache.org/job/beam_PreCommit_SQL_Commit/543/consoleFull] log 
> of a failed job with this error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=386105&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386105
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 12/Feb/20 18:02
Start Date: 12/Feb/20 18:02
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10712: [BEAM-7246] Added 
Google Spanner Write Transform
URL: https://github.com/apache/beam/pull/10712#issuecomment-585337286
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386105)
Time Spent: 20h 40m  (was: 20.5h)

> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 20h 40m
>  Remaining Estimate: 0h
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386103&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386103
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 12/Feb/20 18:02
Start Date: 12/Feb/20 18:02
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10764: [BEAM-9146] Integrate 
GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-585337024
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386103)
Time Spent: 7h  (was: 6h 50m)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386104&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386104
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 12/Feb/20 18:02
Start Date: 12/Feb/20 18:02
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10764: [BEAM-9146] Integrate 
GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-585337160
 
 
   Finally, tests are running.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386104)
Time Spent: 7h 10m  (was: 7h)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8201) clean up the current container API

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8201?focusedWorklogId=386130&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386130
 ]

ASF GitHub Bot logged work on BEAM-8201:


Author: ASF GitHub Bot
Created on: 12/Feb/20 18:50
Start Date: 12/Feb/20 18:50
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10843: [BEAM-8201] 
Pass all other endpoints through provisioning service.
URL: https://github.com/apache/beam/pull/10843
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job

[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=386155&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386155
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 12/Feb/20 19:18
Start Date: 12/Feb/20 19:18
Worklog Time Spent: 10m 
  Work Description: mszb commented on issue #10712: [BEAM-7246] Added 
Google Spanner Write Transform
URL: https://github.com/apache/beam/pull/10712#issuecomment-585371923
 
 
   @aaltay All three tests are failed due to `ImportError: No module named 
'pycodestyle'` in `avro-python3` package. 
   
   Portable_Python PreCommit
   
https://scans.gradle.com/s/3qf5sqnettmmq/console-log?task=:sdks:python:test-suites:portable:py35:installGcpTest#L299
   
   PreCommit
   
https://scans.gradle.com/s/c5ncivj7k2pko/console-log?task=:sdks:python:test-suites:dataflow:py37:installGcpTest#L658
   
   PythonFormatter PreCommit
   
https://scans.gradle.com/s/i6nvgyym5tfqk/console-log?task=:sdks:python:test-suites:tox:py37:formatter#L267
   
   Could you please rerun these test, possibly it'll fix the issue!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386155)
Time Spent: 20h 50m  (was: 20h 40m)

> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=386156&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386156
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 12/Feb/20 19:21
Start Date: 12/Feb/20 19:21
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10826: [BEAM-8335] 
Modify the TestStreamPayload to accept an argument of output_tags and…
URL: https://github.com/apache/beam/pull/10826#discussion_r378460923
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -576,7 +581,13 @@ service TestStreamService {
   // A TestStream will request for events using this RPC.
   rpc Events(EventsRequest) returns (stream TestStreamPayload.Event) {}
 }
-message EventsRequest {}
+
+message EventsRequest {
+  // The set of tags to read from. These tags are a subset of the
+  // TestStreamPayload's output_tags. This allows Interactive Beam to cache
+  // many PCollections from a pipeline then replay a subset of them.
+  repeated string tags = 1;
 
 Review comment:
   The usage of `tag` and `tags` is a poor name choice. I would suggest 
replacing them with `output_id` and `output_ids` and please update the comments 
to refer to the [PTransform outputs local 
names](https://github.com/apache/beam/blob/dd9f5d28cc2f55697283058963fd2d8607aa3cd6/model/pipeline/src/main/proto/beam_runner_api.proto#L165).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386156)
Time Spent: 56h 40m  (was: 56.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 56h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=386158&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386158
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 12/Feb/20 19:21
Start Date: 12/Feb/20 19:21
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10826: [BEAM-8335] 
Modify the TestStreamPayload to accept an argument of output_tags and…
URL: https://github.com/apache/beam/pull/10826#discussion_r378457573
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -525,6 +525,11 @@ message TestStreamPayload {
   // used to retrieve events.
   ApiServiceDescriptor endpoint = 3;
 
+  // (Optional) The PCollection tags this TestStream will be outputting to. If
+  // empty, this will assume it will be outputting to the single main
+  // output PCollection.
+  repeated string output_tags = 4;
 
 Review comment:
   This information duplicates the "output" map keys on the PTransform that 
contains the TestStreamPayload.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386158)
Time Spent: 56h 50m  (was: 56h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 56h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=386157&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386157
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 12/Feb/20 19:21
Start Date: 12/Feb/20 19:21
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10826: [BEAM-8335] 
Modify the TestStreamPayload to accept an argument of output_tags and…
URL: https://github.com/apache/beam/pull/10826#discussion_r378459440
 
 

 ##
 File path: model/interactive/src/main/proto/beam_interactive_api.proto
 ##
 @@ -40,18 +40,22 @@ import "google/protobuf/timestamp.proto";
 message TestStreamFileHeader {
   // The PCollection tag this stream is associated with.
   string tag = 1;
+
+  // The file format version. This is used to ensure backwards compatibility
+  // when decoding from file.
+  int32 version = 2;
 }
 
 // A record is a recorded element that a source produced. Its function is to
 // give enough information to create a faithful recreation of the original
 // stream of data.
 message TestStreamFileRecord {
   oneof recorded_event {
-// The recorded element with its event timestamp (when it was produced).
-org.apache.beam.model.pipeline.v1.TestStreamPayload.TimestampedElement 
element = 1;
+// The recorded bundle with its event timestamp (when it was produced).
+org.apache.beam.model.pipeline.v1.TestStreamPayload.Event.AddElements 
element_event = 1;
 
 // Indicating the output watermark of the source changed.
-google.protobuf.Timestamp watermark = 2;
+org.apache.beam.model.pipeline.v1.TestStreamPayload.Event.AdvanceWatermark 
watermark_event = 2;
   }
 
   // The wall-time timestamp of either the new element or watermark change.
 
 Review comment:
   You would still be able to do that "compression" yourself and only add a 
single AdvanceProcessingTime yet still have the flexibility to record multiple 
AdvanceProcessingTime events if they were ever necessary.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386157)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 56h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7198) Rename ToStringCoder into ToBytesCoder

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7198?focusedWorklogId=386164&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386164
 ]

ASF GitHub Bot logged work on BEAM-7198:


Author: ASF GitHub Bot
Created on: 12/Feb/20 19:26
Start Date: 12/Feb/20 19:26
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10828: [BEAM-7198] rename 
ToStringCoder to ToBytesCoder for proper representation of its role
URL: https://github.com/apache/beam/pull/10828#issuecomment-585375853
 
 
   Run PythonFormatter PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386164)
Time Spent: 2h  (was: 1h 50m)

> Rename ToStringCoder into ToBytesCoder
> --
>
> Key: BEAM-7198
> URL: https://issues.apache.org/jira/browse/BEAM-7198
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Minor
>  Labels: easy-fix, starter
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The name of ToStringCoder class [1] is confusing, since the output of 
> encode() on Python3 will be bytes. On Python 2 the output is also bytes, 
> since bytes and string are synonyms on Py2.
> ToBytesCoder would be a better name for this class. 
> Note that this class is not listed in coders that constitute Public APIs [2], 
> so we can treat this as internal change. As a courtesy to users  who happened 
> to reference a non-public coder in their pipelines we can keep the old class 
> name as an alias, e.g. ToStringCoder = ToBytesCoder to avoid friction, but 
> clean up Beam codeabase to use the new name.
> [1] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L344
> [2] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L20
> cc: [~yoshiki.obata] [~chamikara]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7198) Rename ToStringCoder into ToBytesCoder

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7198?focusedWorklogId=386163&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386163
 ]

ASF GitHub Bot logged work on BEAM-7198:


Author: ASF GitHub Bot
Created on: 12/Feb/20 19:26
Start Date: 12/Feb/20 19:26
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10828: [BEAM-7198] rename 
ToStringCoder to ToBytesCoder for proper representation of its role
URL: https://github.com/apache/beam/pull/10828#issuecomment-585375732
 
 
   test test test?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386163)
Time Spent: 1h 50m  (was: 1h 40m)

> Rename ToStringCoder into ToBytesCoder
> --
>
> Key: BEAM-7198
> URL: https://issues.apache.org/jira/browse/BEAM-7198
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Minor
>  Labels: easy-fix, starter
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The name of ToStringCoder class [1] is confusing, since the output of 
> encode() on Python3 will be bytes. On Python 2 the output is also bytes, 
> since bytes and string are synonyms on Py2.
> ToBytesCoder would be a better name for this class. 
> Note that this class is not listed in coders that constitute Public APIs [2], 
> so we can treat this as internal change. As a courtesy to users  who happened 
> to reference a non-public coder in their pipelines we can keep the old class 
> name as an alias, e.g. ToStringCoder = ToBytesCoder to avoid friction, but 
> clean up Beam codeabase to use the new name.
> [1] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L344
> [2] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L20
> cc: [~yoshiki.obata] [~chamikara]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7198) Rename ToStringCoder into ToBytesCoder

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7198?focusedWorklogId=386167&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386167
 ]

ASF GitHub Bot logged work on BEAM-7198:


Author: ASF GitHub Bot
Created on: 12/Feb/20 19:27
Start Date: 12/Feb/20 19:27
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10828: [BEAM-7198] rename 
ToStringCoder to ToBytesCoder for proper representation of its role
URL: https://github.com/apache/beam/pull/10828#issuecomment-585376002
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386167)
Time Spent: 2.5h  (was: 2h 20m)

> Rename ToStringCoder into ToBytesCoder
> --
>
> Key: BEAM-7198
> URL: https://issues.apache.org/jira/browse/BEAM-7198
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Minor
>  Labels: easy-fix, starter
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> The name of ToStringCoder class [1] is confusing, since the output of 
> encode() on Python3 will be bytes. On Python 2 the output is also bytes, 
> since bytes and string are synonyms on Py2.
> ToBytesCoder would be a better name for this class. 
> Note that this class is not listed in coders that constitute Public APIs [2], 
> so we can treat this as internal change. As a courtesy to users  who happened 
> to reference a non-public coder in their pipelines we can keep the old class 
> name as an alias, e.g. ToStringCoder = ToBytesCoder to avoid friction, but 
> clean up Beam codeabase to use the new name.
> [1] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L344
> [2] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L20
> cc: [~yoshiki.obata] [~chamikara]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7198) Rename ToStringCoder into ToBytesCoder

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7198?focusedWorklogId=386166&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386166
 ]

ASF GitHub Bot logged work on BEAM-7198:


Author: ASF GitHub Bot
Created on: 12/Feb/20 19:27
Start Date: 12/Feb/20 19:27
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10828: [BEAM-7198] rename 
ToStringCoder to ToBytesCoder for proper representation of its role
URL: https://github.com/apache/beam/pull/10828#issuecomment-585375960
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386166)
Time Spent: 2h 20m  (was: 2h 10m)

> Rename ToStringCoder into ToBytesCoder
> --
>
> Key: BEAM-7198
> URL: https://issues.apache.org/jira/browse/BEAM-7198
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Minor
>  Labels: easy-fix, starter
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The name of ToStringCoder class [1] is confusing, since the output of 
> encode() on Python3 will be bytes. On Python 2 the output is also bytes, 
> since bytes and string are synonyms on Py2.
> ToBytesCoder would be a better name for this class. 
> Note that this class is not listed in coders that constitute Public APIs [2], 
> so we can treat this as internal change. As a courtesy to users  who happened 
> to reference a non-public coder in their pipelines we can keep the old class 
> name as an alias, e.g. ToStringCoder = ToBytesCoder to avoid friction, but 
> clean up Beam codeabase to use the new name.
> [1] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L344
> [2] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L20
> cc: [~yoshiki.obata] [~chamikara]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7198) Rename ToStringCoder into ToBytesCoder

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7198?focusedWorklogId=386165&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386165
 ]

ASF GitHub Bot logged work on BEAM-7198:


Author: ASF GitHub Bot
Created on: 12/Feb/20 19:27
Start Date: 12/Feb/20 19:27
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10828: [BEAM-7198] rename 
ToStringCoder to ToBytesCoder for proper representation of its role
URL: https://github.com/apache/beam/pull/10828#issuecomment-585375921
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386165)
Time Spent: 2h 10m  (was: 2h)

> Rename ToStringCoder into ToBytesCoder
> --
>
> Key: BEAM-7198
> URL: https://issues.apache.org/jira/browse/BEAM-7198
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Minor
>  Labels: easy-fix, starter
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The name of ToStringCoder class [1] is confusing, since the output of 
> encode() on Python3 will be bytes. On Python 2 the output is also bytes, 
> since bytes and string are synonyms on Py2.
> ToBytesCoder would be a better name for this class. 
> Note that this class is not listed in coders that constitute Public APIs [2], 
> so we can treat this as internal change. As a courtesy to users  who happened 
> to reference a non-public coder in their pipelines we can keep the old class 
> name as an alias, e.g. ToStringCoder = ToBytesCoder to avoid friction, but 
> clean up Beam codeabase to use the new name.
> [1] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L344
> [2] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L20
> cc: [~yoshiki.obata] [~chamikara]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7198) Rename ToStringCoder into ToBytesCoder

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7198?focusedWorklogId=386169&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386169
 ]

ASF GitHub Bot logged work on BEAM-7198:


Author: ASF GitHub Bot
Created on: 12/Feb/20 19:28
Start Date: 12/Feb/20 19:28
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10828: [BEAM-7198] rename 
ToStringCoder to ToBytesCoder for proper representation of its role
URL: https://github.com/apache/beam/pull/10828#issuecomment-585376513
 
 
   Run PythonLint PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386169)
Time Spent: 2h 40m  (was: 2.5h)

> Rename ToStringCoder into ToBytesCoder
> --
>
> Key: BEAM-7198
> URL: https://issues.apache.org/jira/browse/BEAM-7198
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Minor
>  Labels: easy-fix, starter
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> The name of ToStringCoder class [1] is confusing, since the output of 
> encode() on Python3 will be bytes. On Python 2 the output is also bytes, 
> since bytes and string are synonyms on Py2.
> ToBytesCoder would be a better name for this class. 
> Note that this class is not listed in coders that constitute Public APIs [2], 
> so we can treat this as internal change. As a courtesy to users  who happened 
> to reference a non-public coder in their pipelines we can keep the old class 
> name as an alias, e.g. ToStringCoder = ToBytesCoder to avoid friction, but 
> clean up Beam codeabase to use the new name.
> [1] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L344
> [2] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L20
> cc: [~yoshiki.obata] [~chamikara]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=386170&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386170
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 12/Feb/20 19:28
Start Date: 12/Feb/20 19:28
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on pull request #10835: 
[BEAM-8575] Removed MAX_TIMESTAMP from testing data
URL: https://github.com/apache/beam/pull/10835#discussion_r378465184
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util_test.py
 ##
 @@ -551,9 +551,6 @@ def test_reshuffle_preserves_timestamps(self):
   {
   'name': 'bar', 'timestamp': 33
   },
 
 Review comment:
   I'm not a member, I can't run the Jenkins VR test or see the results, so I 
can't say much about  BEAM-9003.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386170)
Time Spent: 50h 20m  (was: 50h 10m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 50h 20m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=386172&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386172
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 12/Feb/20 19:34
Start Date: 12/Feb/20 19:34
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #10375: [BEAM-8537] 
Provide WatermarkEstimator to track watermark
URL: https://github.com/apache/beam/pull/10375#discussion_r378468113
 
 

 ##
 File path: sdks/python/apache_beam/io/iobase.py
 ##
 @@ -1238,128 +1235,38 @@ def try_claim(self, position):
 raise NotImplementedError
 
 
-class ThreadsafeRestrictionTracker(object):
 
 Review comment:
   Finished rebasing on the top of master branch,
   Filed https://issues.apache.org/jira/browse/BEAM-9296 for adding type check.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386172)
Time Spent: 14.5h  (was: 14h 20m)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5605) Support Portable SplittableDoFn for batch

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5605?focusedWorklogId=386173&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386173
 ]

ASF GitHub Bot logged work on BEAM-5605:


Author: ASF GitHub Bot
Created on: 12/Feb/20 19:37
Start Date: 12/Feb/20 19:37
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10576: [BEAM-5605] Convert 
all BoundedSources to SplittableDoFns when using beam_fn_api experiment.
URL: https://github.com/apache/beam/pull/10576#issuecomment-585380318
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386173)
Time Spent: 13.5h  (was: 13h 20m)

> Support Portable SplittableDoFn for batch
> -
>
> Key: BEAM-5605
> URL: https://issues.apache.org/jira/browse/BEAM-5605
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>
> Roll-up item tracking work towards supporting portable SplittableDoFn for 
> batch



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5605) Support Portable SplittableDoFn for batch

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5605?focusedWorklogId=386175&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386175
 ]

ASF GitHub Bot logged work on BEAM-5605:


Author: ASF GitHub Bot
Created on: 12/Feb/20 19:37
Start Date: 12/Feb/20 19:37
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10576: [BEAM-5605] Convert 
all BoundedSources to SplittableDoFns when using beam_fn_api experiment.
URL: https://github.com/apache/beam/pull/10576#issuecomment-585380436
 
 
   Run JavaPortabilityApi PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386175)
Time Spent: 13h 50m  (was: 13h 40m)

> Support Portable SplittableDoFn for batch
> -
>
> Key: BEAM-5605
> URL: https://issues.apache.org/jira/browse/BEAM-5605
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>  Time Spent: 13h 50m
>  Remaining Estimate: 0h
>
> Roll-up item tracking work towards supporting portable SplittableDoFn for 
> batch



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5605) Support Portable SplittableDoFn for batch

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5605?focusedWorklogId=386174&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386174
 ]

ASF GitHub Bot logged work on BEAM-5605:


Author: ASF GitHub Bot
Created on: 12/Feb/20 19:37
Start Date: 12/Feb/20 19:37
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10576: [BEAM-5605] Convert 
all BoundedSources to SplittableDoFns when using beam_fn_api experiment.
URL: https://github.com/apache/beam/pull/10576#issuecomment-585380395
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386174)
Time Spent: 13h 40m  (was: 13.5h)

> Support Portable SplittableDoFn for batch
> -
>
> Key: BEAM-5605
> URL: https://issues.apache.org/jira/browse/BEAM-5605
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>  Time Spent: 13h 40m
>  Remaining Estimate: 0h
>
> Roll-up item tracking work towards supporting portable SplittableDoFn for 
> batch



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9273) Explicitly fail pipeline with @RequiresTimeSortedInput with unsupported runner

2020-02-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/BEAM-9273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Lukavský updated BEAM-9273:
---
Fix Version/s: 2.20.0

> Explicitly fail pipeline with @RequiresTimeSortedInput with unsupported runner
> --
>
> Key: BEAM-9273
> URL: https://issues.apache.org/jira/browse/BEAM-9273
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Fail pipeline with @RequiresTimeSortedInput annotation in pipeline 
> translation time when being run with unsupported runner. Currently, 
> unsupported runners are:
>  - apex
>  - portable flink
>  - gearpump
>  - dataflow
>  - jet
>  - samza
>  - spark structured streaming
> These runners should reject the pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9273) Explicitly fail pipeline with @RequiresTimeSortedInput with unsupported runner

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9273?focusedWorklogId=386178&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386178
 ]

ASF GitHub Bot logged work on BEAM-9273:


Author: ASF GitHub Bot
Created on: 12/Feb/20 19:50
Start Date: 12/Feb/20 19:50
Worklog Time Spent: 10m 
  Work Description: je-ik commented on pull request #10816: [BEAM-9273] 
Explicitly disable @RequiresTimeSortedInput on unsupported runners
URL: https://github.com/apache/beam/pull/10816#discussion_r378476839
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DoFnFeatures.java
 ##
 @@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction;
+
+import org.apache.beam.sdk.state.BagState;
+import org.apache.beam.sdk.state.MapState;
+import org.apache.beam.sdk.state.SetState;
+import org.apache.beam.sdk.state.State;
+import org.apache.beam.sdk.state.ValueState;
+import org.apache.beam.sdk.state.WatermarkHoldState;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.reflect.DoFnSignatures;
+import org.apache.beam.sdk.values.TypeDescriptor;
+
+/**
+ * Features a {@link DoFn} can posses. Each runner might implement a different 
(sub)set of this
 
 Review comment:
   @kennknowles would you have any suggestions about naming the class? I think 
this code really should not go to `DoFnSignatures`, can we agree on some 
alternative name?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386178)
Time Spent: 3h 50m  (was: 3h 40m)

> Explicitly fail pipeline with @RequiresTimeSortedInput with unsupported runner
> --
>
> Key: BEAM-9273
> URL: https://issues.apache.org/jira/browse/BEAM-9273
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Fail pipeline with @RequiresTimeSortedInput annotation in pipeline 
> translation time when being run with unsupported runner. Currently, 
> unsupported runners are:
>  - apex
>  - portable flink
>  - gearpump
>  - dataflow
>  - jet
>  - samza
>  - spark structured streaming
> These runners should reject the pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9035) Typed options for Row Schema and Fields

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9035?focusedWorklogId=386188&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386188
 ]

ASF GitHub Bot logged work on BEAM-9035:


Author: ASF GitHub Bot
Created on: 12/Feb/20 20:00
Start Date: 12/Feb/20 20:00
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #10413: 
[BEAM-9035] Typed options for Row Schema and Field
URL: https://github.com/apache/beam/pull/10413#discussion_r378479658
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/values/ValueUtils.java
 ##
 @@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.values;
+
+import java.io.Serializable;
+import java.math.BigDecimal;
+import java.nio.ByteBuffer;
+import java.util.List;
+import java.util.Map;
+import java.util.Map.Entry;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.schemas.Schema.LogicalType;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps;
+import org.joda.time.Instant;
+import org.joda.time.base.AbstractInstant;
+
+@Experimental
+public abstract class ValueUtils implements Serializable {
 
 Review comment:
   Maybe consider renaming this `RowVerifiication` or `SchemaVerification` 
and/or giving the public static methods unique descriptive names so they can be 
imported statically. Maybe `verifyRowValues` amd `verifyFieldValue` or 
something for the methods.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386188)
Time Spent: 2h  (was: 1h 50m)

> Typed options for Row Schema and Fields
> ---
>
> Key: BEAM-9035
> URL: https://issues.apache.org/jira/browse/BEAM-9035
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> This is the first issue of a multipart commit: this ticket implements the 
> basic infrastructure of options on row and field.
> Full explanation:
> Introduce the concept of Options in Beam Schema’s to add extra context to 
> fields and schema. In contracts to metadata, options would be added to 
> fields, logical types and rows. In the options schema convertors can add 
> options/annotations/decorators that were in the original schema, this context 
> can be used in the rest of the pipeline for specific transformations or 
> augment the end schema in the target output.
> Examples of options are:
>  * informational: like the source of the data, ...
>  * drive decisions further in the pipeline: flatten a row into another, 
> rename a field, ...
>  * influence something in the output: like cluster index, primary key, ...
>  * logical type information
> And option is a key/typed value combination. The advantages of having the 
> value types is: 
>  * Having strongly typed options would give a *portable way of Logical Types* 
> to have structured information that could be shared over different languages.
>  * This could keep the type intact when mapping from a formats that have 
> strongly typed options (example: Protobuf).
> This is part of a multi ticket implementation. The following tickets are 
> related:
>  # Typed options for Row Schema and Fields
>  # Convert Proto Options to Beam Schema options
>  # Convert Avro extra information for Beam string opti

[jira] [Work logged] (BEAM-9035) Typed options for Row Schema and Fields

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9035?focusedWorklogId=386187&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386187
 ]

ASF GitHub Bot logged work on BEAM-9035:


Author: ASF GitHub Bot
Created on: 12/Feb/20 20:00
Start Date: 12/Feb/20 20:00
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #10413: 
[BEAM-9035] Typed options for Row Schema and Field
URL: https://github.com/apache/beam/pull/10413#discussion_r378479843
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/values/ValueUtils.java
 ##
 @@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.values;
+
+import java.io.Serializable;
+import java.math.BigDecimal;
+import java.nio.ByteBuffer;
+import java.util.List;
+import java.util.Map;
+import java.util.Map.Entry;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.schemas.Schema.LogicalType;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps;
+import org.joda.time.Instant;
+import org.joda.time.base.AbstractInstant;
+
+@Experimental
+public abstract class ValueUtils implements Serializable {
+
+  static List verify(Schema schema, List values) {
 
 Review comment:
   Should this be public?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386187)
Time Spent: 2h  (was: 1h 50m)

> Typed options for Row Schema and Fields
> ---
>
> Key: BEAM-9035
> URL: https://issues.apache.org/jira/browse/BEAM-9035
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> This is the first issue of a multipart commit: this ticket implements the 
> basic infrastructure of options on row and field.
> Full explanation:
> Introduce the concept of Options in Beam Schema’s to add extra context to 
> fields and schema. In contracts to metadata, options would be added to 
> fields, logical types and rows. In the options schema convertors can add 
> options/annotations/decorators that were in the original schema, this context 
> can be used in the rest of the pipeline for specific transformations or 
> augment the end schema in the target output.
> Examples of options are:
>  * informational: like the source of the data, ...
>  * drive decisions further in the pipeline: flatten a row into another, 
> rename a field, ...
>  * influence something in the output: like cluster index, primary key, ...
>  * logical type information
> And option is a key/typed value combination. The advantages of having the 
> value types is: 
>  * Having strongly typed options would give a *portable way of Logical Types* 
> to have structured information that could be shared over different languages.
>  * This could keep the type intact when mapping from a formats that have 
> strongly typed options (example: Protobuf).
> This is part of a multi ticket implementation. The following tickets are 
> related:
>  # Typed options for Row Schema and Fields
>  # Convert Proto Options to Beam Schema options
>  # Convert Avro extra information for Beam string options
>  # Replace meta data with Logical Type options
>  # Extract meta data in Calcite SQL to Beam options
>  # Extract meta data in Zeta SQL to Beam options
>  # Add java example of u

[jira] [Work logged] (BEAM-9035) Typed options for Row Schema and Fields

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9035?focusedWorklogId=386189&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386189
 ]

ASF GitHub Bot logged work on BEAM-9035:


Author: ASF GitHub Bot
Created on: 12/Feb/20 20:00
Start Date: 12/Feb/20 20:00
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #10413: 
[BEAM-9035] Typed options for Row Schema and Field
URL: https://github.com/apache/beam/pull/10413#discussion_r378480933
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java
 ##
 @@ -950,6 +1007,338 @@ public int hashCode() {
 }
   }
 
+  public static class Options implements Serializable {
+private Map options;
+
+@Override
+public String toString() {
+  TreeMap sorted = new TreeMap(options);
+  return "{" + sorted + '}';
+}
+
+Map getAllOptions() {
+  return options;
+}
+
+public Set getOptionNames() {
+  return options.keySet();
+}
+
+public boolean hasOptions() {
+  return options.size() > 0;
+}
+
+@Override
+public boolean equals(Object o) {
+  if (this == o) {
+return true;
+  }
+  if (o == null || getClass() != o.getClass()) {
+return false;
+  }
+  Options options1 = (Options) o;
+  if (!options.keySet().equals(options1.options.keySet())) {
+return false;
+  }
+  for (Map.Entry optionEntry : options.entrySet()) {
+Option thisOption = optionEntry.getValue();
+Option otherOption = options1.options.get(optionEntry.getKey());
+if (!thisOption.getType().equals(otherOption.getType())) {
+  return false;
+}
+switch (thisOption.getType().getTypeName()) {
+  case BYTE:
+  case INT16:
+  case INT32:
+  case INT64:
+  case DECIMAL:
+  case FLOAT:
+  case DOUBLE:
+  case STRING:
+  case DATETIME:
+  case BOOLEAN:
+  case ARRAY:
+  case ITERABLE:
+  case MAP:
+  case ROW:
+  case LOGICAL_TYPE:
+if (!thisOption.getValue().equals(otherOption.getValue())) {
+  return false;
+}
+break;
+  case BYTES:
+if (!Arrays.equals((byte[]) thisOption.getValue(), 
otherOption.getValue())) {
+  return false;
+}
+}
+  }
+  return true;
+}
+
+@Override
+public int hashCode() {
+  return Objects.hash(options);
+}
+
+static class Option implements Serializable {
+  Option(FieldType type, Object value) {
+this.type = type;
+this.value = value;
+  }
+
+  private FieldType type;
+  private Object value;
+
+  @SuppressWarnings("TypeParameterUnusedInFormals")
+   T getValue() {
+return (T) value;
+  }
+
+  FieldType getType() {
+return type;
+  }
+
+  @Override
+  public String toString() {
+return "Option{type=" + type + ", value=" + value + '}';
+  }
+
+  @Override
+  public boolean equals(Object o) {
+if (this == o) {
+  return true;
+}
+if (o == null || getClass() != o.getClass()) {
+  return false;
+}
+Option option = (Option) o;
+return Objects.equals(type, option.type) && Objects.equals(value, 
option.value);
+  }
+
+  @Override
+  public int hashCode() {
+return Objects.hash(type, value);
+  }
+}
+
+public static class Builder {
+  private Map options;
+
+  Builder(Map init) {
+this.options = new HashMap<>(init);
+  }
+
+  Builder() {
+this(new HashMap<>());
+  }
+
+  public Builder setByteOption(String optionName, Byte value) {
+setOption(optionName, FieldType.BYTE, value);
+return this;
+  }
+
+  public Builder setBytesOption(String optionName, byte[] value) {
+setOption(optionName, FieldType.BYTES, value);
+return this;
+  }
+
+  public Builder setInt16Option(String optionName, Short value) {
+setOption(optionName, FieldType.INT16, value);
+return this;
+  }
+
+  public Builder setInt32Option(String optionName, Integer value) {
+setOption(optionName, FieldType.INT32, value);
+return this;
+  }
+
+  public Builder setInt64Option(String optionName, Long value) {
+setOption(optionName, FieldType.INT64, value);
+return this;
+  }
+
+  public Builder setDecimalOption(String optionName, BigDecimal value) {
+setOption(optionName, FieldType.DECIMAL, value);
+return this;
+  }
+
+  public Builder setFloatOption(String optionName, Float value) {
+setOption(optionName, FieldType.FLOAT, value);
+return this;
+  }

[jira] [Work logged] (BEAM-9035) Typed options for Row Schema and Fields

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9035?focusedWorklogId=386193&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386193
 ]

ASF GitHub Bot logged work on BEAM-9035:


Author: ASF GitHub Bot
Created on: 12/Feb/20 20:00
Start Date: 12/Feb/20 20:00
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #10413: 
[BEAM-9035] Typed options for Row Schema and Field
URL: https://github.com/apache/beam/pull/10413#discussion_r378482011
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/values/ValueUtils.java
 ##
 @@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.values;
+
+import java.io.Serializable;
+import java.math.BigDecimal;
+import java.nio.ByteBuffer;
+import java.util.List;
+import java.util.Map;
+import java.util.Map.Entry;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.schemas.Schema.LogicalType;
+import org.apache.beam.sdk.schemas.Schema.TypeName;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Lists;
+import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Maps;
+import org.joda.time.Instant;
+import org.joda.time.base.AbstractInstant;
+
+@Experimental
+public abstract class ValueUtils implements Serializable {
+
+  static List verify(Schema schema, List values) {
 
 Review comment:
   Oh whoops I meant to remove this comment. I noticed this doesn't need to be 
public because it's only used in Row right now.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386193)
Time Spent: 2h 10m  (was: 2h)

> Typed options for Row Schema and Fields
> ---
>
> Key: BEAM-9035
> URL: https://issues.apache.org/jira/browse/BEAM-9035
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Alex Van Boxel
>Assignee: Alex Van Boxel
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> This is the first issue of a multipart commit: this ticket implements the 
> basic infrastructure of options on row and field.
> Full explanation:
> Introduce the concept of Options in Beam Schema’s to add extra context to 
> fields and schema. In contracts to metadata, options would be added to 
> fields, logical types and rows. In the options schema convertors can add 
> options/annotations/decorators that were in the original schema, this context 
> can be used in the rest of the pipeline for specific transformations or 
> augment the end schema in the target output.
> Examples of options are:
>  * informational: like the source of the data, ...
>  * drive decisions further in the pipeline: flatten a row into another, 
> rename a field, ...
>  * influence something in the output: like cluster index, primary key, ...
>  * logical type information
> And option is a key/typed value combination. The advantages of having the 
> value types is: 
>  * Having strongly typed options would give a *portable way of Logical Types* 
> to have structured information that could be shared over different languages.
>  * This could keep the type intact when mapping from a formats that have 
> strongly typed options (example: Protobuf).
> This is part of a multi ticket implementation. The following tickets are 
> related:
>  # Typed options for Row Schema and Fields
>  # Convert Proto Options to Beam Schema options
>  # Convert Avro extra information for Beam string options
>  # Replace meta data with Logical Type options
>  # Extract meta data in

[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=386195&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386195
 ]

ASF GitHub Bot logged work on BEAM-6522:


Author: ASF GitHub Bot
Created on: 12/Feb/20 20:02
Start Date: 12/Feb/20 20:02
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10838: [BEAM-6522] 
[BEAM-7455] Unskip Avro IO tests that are now passing.
URL: https://github.com/apache/beam/pull/10838#issuecomment-585391837
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386195)
Time Spent: 9h 20m  (was: 9h 10m)

> Dill fails to pickle  avro.RecordSchema classes on Python 3.
> 
>
> Key: BEAM-6522
> URL: https://issues.apache.org/jira/browse/BEAM-6522
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> The avroio module still has 4 failing tests. This is actually 2 times the 
> same 2 tests, both for Avro and Fastavro.
> *apache_beam.io.avroio_test.TestAvro.test_sink_transform*
>  *apache_beam.io.avroio_test.TestFastAvro.test_sink_transform*
> fail with:
> {code:java}
> Traceback (most recent call last):
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", 
> line 432, in test_sink_transform
> | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line 
> 528, in expand
> return pcoll | beam.io.iobase.Write(self._sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 960, in expand
> return pcoll | WriteImpl(self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 979, in expand
> lambda _, sink: sink.initialize_write(), self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1103, in Map
> pardo = FlatMap(wrapper, *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1054, in FlatMap
> pardo = ParDo(CallableWrapperDoFn(fn), *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 864, in __init__
> super(ParDo, self).__init__(fn, *args, **kwargs)
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 646, in __init__
> self.args = pickler.loads(pickler.dumps(self.args))
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_beam/inte

[jira] [Work logged] (BEAM-7198) Rename ToStringCoder into ToBytesCoder

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7198?focusedWorklogId=386197&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386197
 ]

ASF GitHub Bot logged work on BEAM-7198:


Author: ASF GitHub Bot
Created on: 12/Feb/20 20:02
Start Date: 12/Feb/20 20:02
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10828: [BEAM-7198] rename 
ToStringCoder to ToBytesCoder for proper representation of its role
URL: https://github.com/apache/beam/pull/10828#issuecomment-585392120
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386197)
Time Spent: 2h 50m  (was: 2h 40m)

> Rename ToStringCoder into ToBytesCoder
> --
>
> Key: BEAM-7198
> URL: https://issues.apache.org/jira/browse/BEAM-7198
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Minor
>  Labels: easy-fix, starter
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> The name of ToStringCoder class [1] is confusing, since the output of 
> encode() on Python3 will be bytes. On Python 2 the output is also bytes, 
> since bytes and string are synonyms on Py2.
> ToBytesCoder would be a better name for this class. 
> Note that this class is not listed in coders that constitute Public APIs [2], 
> so we can treat this as internal change. As a courtesy to users  who happened 
> to reference a non-public coder in their pipelines we can keep the old class 
> name as an alias, e.g. ToStringCoder = ToBytesCoder to avoid friction, but 
> clean up Beam codeabase to use the new name.
> [1] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L344
> [2] 
> https://github.com/apache/beam/blob/ef4b2ef7e5fa2fb87e1491df82d2797947f51be9/sdks/python/apache_beam/coders/coders.py#L20
> cc: [~yoshiki.obata] [~chamikara]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=386196&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386196
 ]

ASF GitHub Bot logged work on BEAM-6522:


Author: ASF GitHub Bot
Created on: 12/Feb/20 20:02
Start Date: 12/Feb/20 20:02
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10838: [BEAM-6522] 
[BEAM-7455] Unskip Avro IO tests that are now passing.
URL: https://github.com/apache/beam/pull/10838#issuecomment-585391932
 
 
   Run Python 3.6 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386196)
Time Spent: 9.5h  (was: 9h 20m)

> Dill fails to pickle  avro.RecordSchema classes on Python 3.
> 
>
> Key: BEAM-6522
> URL: https://issues.apache.org/jira/browse/BEAM-6522
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> The avroio module still has 4 failing tests. This is actually 2 times the 
> same 2 tests, both for Avro and Fastavro.
> *apache_beam.io.avroio_test.TestAvro.test_sink_transform*
>  *apache_beam.io.avroio_test.TestFastAvro.test_sink_transform*
> fail with:
> {code:java}
> Traceback (most recent call last):
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", 
> line 432, in test_sink_transform
> | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line 
> 528, in expand
> return pcoll | beam.io.iobase.Write(self._sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 960, in expand
> return pcoll | WriteImpl(self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 979, in expand
> lambda _, sink: sink.initialize_write(), self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1103, in Map
> pardo = FlatMap(wrapper, *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1054, in FlatMap
> pardo = ParDo(CallableWrapperDoFn(fn), *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 864, in __init__
> super(ParDo, self).__init__(fn, *args, **kwargs)
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 646, in __init__
> self.args = pickler.loads(pickler.dumps(self.args))
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_beam/i

[jira] [Work logged] (BEAM-9273) Explicitly fail pipeline with @RequiresTimeSortedInput with unsupported runner

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9273?focusedWorklogId=386219&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386219
 ]

ASF GitHub Bot logged work on BEAM-9273:


Author: ASF GitHub Bot
Created on: 12/Feb/20 20:36
Start Date: 12/Feb/20 20:36
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #10816: 
[BEAM-9273] Explicitly disable @RequiresTimeSortedInput on unsupported runners
URL: https://github.com/apache/beam/pull/10816#discussion_r378498417
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DoFnFeatures.java
 ##
 @@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction;
+
+import org.apache.beam.sdk.state.BagState;
+import org.apache.beam.sdk.state.MapState;
+import org.apache.beam.sdk.state.SetState;
+import org.apache.beam.sdk.state.State;
+import org.apache.beam.sdk.state.ValueState;
+import org.apache.beam.sdk.state.WatermarkHoldState;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.reflect.DoFnSignatures;
+import org.apache.beam.sdk.values.TypeDescriptor;
+
+/**
+ * Features a {@link DoFn} can posses. Each runner might implement a different 
(sub)set of this
 
 Review comment:
   I quite strongly believe it belongs in `DoFnSignatures` and that what you 
describe in the rest belongs in `Pipelines`. Static-method-only utility classes 
tend to be disorganized and undiscoverable. It is better to attach them to the 
thing that they are most related to.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386219)
Time Spent: 4h  (was: 3h 50m)

> Explicitly fail pipeline with @RequiresTimeSortedInput with unsupported runner
> --
>
> Key: BEAM-9273
> URL: https://issues.apache.org/jira/browse/BEAM-9273
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Fail pipeline with @RequiresTimeSortedInput annotation in pipeline 
> translation time when being run with unsupported runner. Currently, 
> unsupported runners are:
>  - apex
>  - portable flink
>  - gearpump
>  - dataflow
>  - jet
>  - samza
>  - spark structured streaming
> These runners should reject the pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=386230&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386230
 ]

ASF GitHub Bot logged work on BEAM-6522:


Author: ASF GitHub Bot
Created on: 12/Feb/20 21:02
Start Date: 12/Feb/20 21:02
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10838: [BEAM-6522] 
[BEAM-7455] Unskip Avro IO tests that are now passing.
URL: https://github.com/apache/beam/pull/10838#issuecomment-585415828
 
 
   Run PythonLint PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386230)
Time Spent: 9h 40m  (was: 9.5h)

> Dill fails to pickle  avro.RecordSchema classes on Python 3.
> 
>
> Key: BEAM-6522
> URL: https://issues.apache.org/jira/browse/BEAM-6522
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> The avroio module still has 4 failing tests. This is actually 2 times the 
> same 2 tests, both for Avro and Fastavro.
> *apache_beam.io.avroio_test.TestAvro.test_sink_transform*
>  *apache_beam.io.avroio_test.TestFastAvro.test_sink_transform*
> fail with:
> {code:java}
> Traceback (most recent call last):
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", 
> line 432, in test_sink_transform
> | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line 
> 528, in expand
> return pcoll | beam.io.iobase.Write(self._sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 960, in expand
> return pcoll | WriteImpl(self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 979, in expand
> lambda _, sink: sink.initialize_write(), self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1103, in Map
> pardo = FlatMap(wrapper, *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1054, in FlatMap
> pardo = ParDo(CallableWrapperDoFn(fn), *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 864, in __init__
> super(ParDo, self).__init__(fn, *args, **kwargs)
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 646, in __init__
> self.args = pickler.loads(pickler.dumps(self.args))
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_beam/

[jira] [Work logged] (BEAM-9273) Explicitly fail pipeline with @RequiresTimeSortedInput with unsupported runner

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9273?focusedWorklogId=386232&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386232
 ]

ASF GitHub Bot logged work on BEAM-9273:


Author: ASF GitHub Bot
Created on: 12/Feb/20 21:04
Start Date: 12/Feb/20 21:04
Worklog Time Spent: 10m 
  Work Description: je-ik commented on pull request #10816: [BEAM-9273] 
Explicitly disable @RequiresTimeSortedInput on unsupported runners
URL: https://github.com/apache/beam/pull/10816#discussion_r378511266
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/BatchStatefulParDoOverrides.java
 ##
 @@ -176,7 +176,7 @@ private MultiOutputOverrideFactory(boolean isFnApi) {
 public PCollection expand(PCollection> input) {
   DoFn, OutputT> fn = originalParDo.getFn();
   verifyFnIsStateful(fn);
-  DataflowRunner.verifyStateSupported(fn);
+  DataflowRunner.verifyDoFnSupported(fn, false);
 
 Review comment:
   I added both versions, although the second one 
`verifyDoFnSupportedStreaming` is not used. I added that so that the methods 
are not imbalanced. The streaming case is called from 
`DataflowRunner.verifyDoFnSupported(fn, 
context.getPipelineOptions().isStreaming())`, where it would be weird to do `if 
(context..isStream()) verifyStreaming() else ...`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386232)
Time Spent: 4h 10m  (was: 4h)

> Explicitly fail pipeline with @RequiresTimeSortedInput with unsupported runner
> --
>
> Key: BEAM-9273
> URL: https://issues.apache.org/jira/browse/BEAM-9273
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Fail pipeline with @RequiresTimeSortedInput annotation in pipeline 
> translation time when being run with unsupported runner. Currently, 
> unsupported runners are:
>  - apex
>  - portable flink
>  - gearpump
>  - dataflow
>  - jet
>  - samza
>  - spark structured streaming
> These runners should reject the pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=386233&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386233
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 12/Feb/20 21:04
Start Date: 12/Feb/20 21:04
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10712: [BEAM-7246] Added 
Google Spanner Write Transform
URL: https://github.com/apache/beam/pull/10712#issuecomment-585416739
 
 
   I believe this is fixed with https://github.com/apache/beam/pull/10844, you 
may need to rebase.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386233)
Time Spent: 21h  (was: 20h 50m)

> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 21h
>  Remaining Estimate: 0h
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9273) Explicitly fail pipeline with @RequiresTimeSortedInput with unsupported runner

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9273?focusedWorklogId=386234&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386234
 ]

ASF GitHub Bot logged work on BEAM-9273:


Author: ASF GitHub Bot
Created on: 12/Feb/20 21:06
Start Date: 12/Feb/20 21:06
Worklog Time Spent: 10m 
  Work Description: je-ik commented on pull request #10816: [BEAM-9273] 
Explicitly disable @RequiresTimeSortedInput on unsupported runners
URL: https://github.com/apache/beam/pull/10816#discussion_r378512014
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/DoFnFeatures.java
 ##
 @@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.core.construction;
+
+import org.apache.beam.sdk.state.BagState;
+import org.apache.beam.sdk.state.MapState;
+import org.apache.beam.sdk.state.SetState;
+import org.apache.beam.sdk.state.State;
+import org.apache.beam.sdk.state.ValueState;
+import org.apache.beam.sdk.state.WatermarkHoldState;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.reflect.DoFnSignatures;
+import org.apache.beam.sdk.values.TypeDescriptor;
+
+/**
+ * Features a {@link DoFn} can posses. Each runner might implement a different 
(sub)set of this
 
 Review comment:
   Moved the code there. The reason I didn't like it is that `DoFnSignatures` 
probably when created served the purpose to create signatures from DoFn, while 
the code I added makes no use of `DoFnSignature` itself. But there seems to be 
more code like that already, so I'm fine with that.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386234)
Time Spent: 4h 20m  (was: 4h 10m)

> Explicitly fail pipeline with @RequiresTimeSortedInput with unsupported runner
> --
>
> Key: BEAM-9273
> URL: https://issues.apache.org/jira/browse/BEAM-9273
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Fail pipeline with @RequiresTimeSortedInput annotation in pipeline 
> translation time when being run with unsupported runner. Currently, 
> unsupported runners are:
>  - apex
>  - portable flink
>  - gearpump
>  - dataflow
>  - jet
>  - samza
>  - spark structured streaming
> These runners should reject the pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386239&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386239
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 12/Feb/20 21:14
Start Date: 12/Feb/20 21:14
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10764: [BEAM-9146] Integrate 
GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-585420761
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386239)
Time Spent: 7h 20m  (was: 7h 10m)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386240&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386240
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 12/Feb/20 21:15
Start Date: 12/Feb/20 21:15
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10764: [BEAM-9146] Integrate 
GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-585421104
 
 
   Some minor lint issues should be fixed with that last commit. But it seemed 
Jenkins was failing on something else: `failed to import pycodestyle` when 
running the tests. Not sure that is related to this PR.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386240)
Time Spent: 7.5h  (was: 7h 20m)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386241&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386241
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 12/Feb/20 21:15
Start Date: 12/Feb/20 21:15
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10764: [BEAM-9146] Integrate 
GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-585421104
 
 
   Some minor lint issues should be fixed with that last commit. But it seemed 
Jenkins was failing on something else: `failed to import pycodestyle` when 
running the tests. Not sure if that is related to this PR.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386241)
Time Spent: 7h 40m  (was: 7.5h)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386242&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386242
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 12/Feb/20 21:16
Start Date: 12/Feb/20 21:16
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10764: [BEAM-9146] Integrate 
GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-585421104
 
 
   Some minor lint issues should be fixed with that last commit. But it seemed 
Jenkins was failing on something else: `ModuleNotFoundError: No module named 
'pycodestyle'` when running the tests. Not sure if that is related to this PR.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386242)
Time Spent: 7h 50m  (was: 7h 40m)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9305) Support ValueProvider for BigQuerySource query string

2020-02-12 Thread Elias Djurfeldt (Jira)
Elias Djurfeldt created BEAM-9305:
-

 Summary: Support ValueProvider for BigQuerySource query string
 Key: BEAM-9305
 URL: https://issues.apache.org/jira/browse/BEAM-9305
 Project: Beam
  Issue Type: New Feature
  Components: io-py-gcp
Reporter: Elias Djurfeldt
Assignee: Elias Djurfeldt


Users should be able to use ValueProviders for the query string in 
BigQuerySource.

Ref: 
[https://stackoverflow.com/questions/60146887/expected-eta-to-avail-pipeline-i-o-and-runtime-parameters-in-apache-beam-gcp-dat/60170614?noredirect=1#comment106464448_60170614]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386245&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386245
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 12/Feb/20 21:36
Start Date: 12/Feb/20 21:36
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10764: [BEAM-9146] Integrate 
GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-585429958
 
 
   > Some minor lint issues should be fixed with that last commit. But it 
seemed Jenkins was failing on something else: `ModuleNotFoundError: No module 
named 'pycodestyle'` when running the tests. Not sure if that is related to 
this PR.
   
   This is fixed in (https://github.com/apache/beam/pull/10844). You may need 
to rebase.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386245)
Time Spent: 8h  (was: 7h 50m)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9302) No space left on device - apache-beam-jenkins-7

2020-02-12 Thread Udi Meiri (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035729#comment-17035729
 ] 

Udi Meiri commented on BEAM-9302:
-

I'm trying to run "ncdu" on node 7 and it's taking more than an hour.

It did slow down considerably on 
~jenkins/.gradle/caches/5.2.1/scripts-remapped/, which is supposed to be 
cleaned up but has files from as early as April 2019 in it. 
(https://github.com/gradle/gradle/issues/1586)

Any luck Yifan?

> No space left on device - apache-beam-jenkins-7
> ---
>
> Key: BEAM-9302
> URL: https://issues.apache.org/jira/browse/BEAM-9302
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Michał Walenia
>Priority: Blocker
>
> [https://builds.apache.org/job/beam_PreCommit_SQL_Commit/543/consoleFull] log 
> of a failed job with this error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386247&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386247
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 12/Feb/20 21:41
Start Date: 12/Feb/20 21:41
Worklog Time Spent: 10m 
  Work Description: EDjur commented on issue #10764: [BEAM-9146] Integrate 
GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-585431841
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386247)
Time Spent: 8h 10m  (was: 8h)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9302) No space left on device - apache-beam-jenkins-7

2020-02-12 Thread Yifan Zou (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035730#comment-17035730
 ] 

Yifan Zou commented on BEAM-9302:
-

I ran the du command from HOME and it took several hours and still running.

I can clean the Jenkins workspace, which will release 150G space. I believe 
this could bring the node back to normal.

 

> No space left on device - apache-beam-jenkins-7
> ---
>
> Key: BEAM-9302
> URL: https://issues.apache.org/jira/browse/BEAM-9302
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Michał Walenia
>Priority: Blocker
>
> [https://builds.apache.org/job/beam_PreCommit_SQL_Commit/543/consoleFull] log 
> of a failed job with this error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9302) No space left on device - apache-beam-jenkins-7

2020-02-12 Thread Yifan Zou (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035730#comment-17035730
 ] 

Yifan Zou edited comment on BEAM-9302 at 2/12/20 9:44 PM:
--

I ran the du command from HOME and it took several hours and still running.

I can clean the Jenkins workspace, which will release 150G disk. I believe this 
could bring the node back to normal.

 


was (Author: yifanzou):
I ran the du command from HOME and it took several hours and still running.

I can clean the Jenkins workspace, which will release 150G space. I believe 
this could bring the node back to normal.

 

> No space left on device - apache-beam-jenkins-7
> ---
>
> Key: BEAM-9302
> URL: https://issues.apache.org/jira/browse/BEAM-9302
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Michał Walenia
>Priority: Blocker
>
> [https://builds.apache.org/job/beam_PreCommit_SQL_Commit/543/consoleFull] log 
> of a failed job with this error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=386258&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386258
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 12/Feb/20 21:54
Start Date: 12/Feb/20 21:54
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10764: [BEAM-9146] Integrate 
GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-585437008
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386258)
Time Spent: 8h 20m  (was: 8h 10m)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9302) No space left on device - apache-beam-jenkins-7

2020-02-12 Thread Udi Meiri (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035740#comment-17035740
 ] 

Udi Meiri commented on BEAM-9302:
-

Here are my findings:
{code}
--- /home/jenkins/.gradle 

/..
   39.8 GiB 12,881 [##] /daemon 

  
   20.9 GiB > 100k [# ] /caches
{code}
The daemon directory is just full of gradle daemon logs that don't seem to get 
cleaned up:
https://github.com/gradle/gradle/issues/2688

{code}
--- /home/jenkins/.gradle/caches/5.2.1 
---
/.. 

  
   14.6 GiB > 100k [##] /scripts-remapped
{code}
The scripts-remapped contains ~15G and millions of files. Should be cleaned up 
as well.

/tmp has ~28G

/var/lib/docker - 225G
/home - 225G
/home/jenkins/jenkins-slave/workspace - 150G

The largest workspaces are python. 4 versions x 2 tox virtualenvs @ 0.5G per 
virtualenv.
Top 6 packages in virtualenvs:
{code}
--- 
/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-...s/sdks/python/target/.tox-py37-cython-pytest/py37-cython-pytest/lib/python3.7/site-packages
 ---
/.. 

  
  197.9 MiB573 [##] /pyarrow
   78.9 MiB929 [###   ] /numpy
   70.4 MiB 24 [###   ] /grpc_tools
   45.3 MiB  1,700 [##] /pandas
   33.8 MiB  1,428 [# ] /apache_beam
   31.4 MiB435 [# ] /Cython
{code}


> No space left on device - apache-beam-jenkins-7
> ---
>
> Key: BEAM-9302
> URL: https://issues.apache.org/jira/browse/BEAM-9302
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Michał Walenia
>Priority: Blocker
>
> [https://builds.apache.org/job/beam_PreCommit_SQL_Commit/543/consoleFull] log 
> of a failed job with this error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9302) No space left on device - apache-beam-jenkins-7

2020-02-12 Thread Yifan Zou (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035739#comment-17035739
 ] 

Yifan Zou commented on BEAM-9302:
-

Cleaned the workspace, released 147G disk.

> No space left on device - apache-beam-jenkins-7
> ---
>
> Key: BEAM-9302
> URL: https://issues.apache.org/jira/browse/BEAM-9302
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Michał Walenia
>Priority: Blocker
>
> [https://builds.apache.org/job/beam_PreCommit_SQL_Commit/543/consoleFull] log 
> of a failed job with this error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2020-02-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=386264&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386264
 ]

ASF GitHub Bot logged work on BEAM-8399:


Author: ASF GitHub Bot
Created on: 12/Feb/20 22:03
Start Date: 12/Feb/20 22:03
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #10223: [BEAM-8399] Add 
--hdfs_full_urls option
URL: https://github.com/apache/beam/pull/10223#issuecomment-585440374
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386264)
Time Spent: 3h  (was: 2h 50m)

> Python HDFS implementation should support filenames of the format 
> "hdfs://namenodehost/parent/child"
> 
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the 
> correct filename formats for HDFS based on [1] but we currently support 
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both 
> versions by default (based on [2] seems like HDFS does not allow colons in 
> file path so this might be possible) (2) make  
> "hdfs://namenodehost/parent/child" optional for now and change it to default 
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system 
> implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >