date:20200213

[jira] [Work logged] (BEAM-9287) Python Validates runner tests for Unified Worker

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9287?focusedWorklogId=387194&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387194
 ]

ASF GitHub Bot logged work on BEAM-9287:


Author: ASF GitHub Bot
Created on: 14/Feb/20 07:56
Start Date: 14/Feb/20 07:56
Worklog Time Spent: 10m 
  Work Description: ananvay commented on issue #10863: [BEAM-9287] Add 
Python streaming Validates runner tests for Unified Worker
URL: https://github.com/apache/beam/pull/10863#issuecomment-586141508
 
 
   Thanks Ankur! LGTM
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387194)
Time Spent: 0.5h  (was: 20m)

> Python Validates runner tests for Unified Worker
> 
>
> Key: BEAM-9287
> URL: https://issues.apache.org/jira/browse/BEAM-9287
> Project: Beam
>  Issue Type: Test
>  Components: runner-dataflow, testing
>Reporter: Ankur Goenka
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=387187&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387187
 ]

ASF GitHub Bot logged work on BEAM-8564:


Author: ASF GitHub Bot
Created on: 14/Feb/20 07:40
Start Date: 14/Feb/20 07:40
Worklog Time Spent: 10m 
  Work Description: amoght commented on issue #10254: [BEAM-8564] Add LZO 
compression and decompression support
URL: https://github.com/apache/beam/pull/10254#issuecomment-586136772
 
 
   @lukecwik we are working on all the suggestions provided by you, will be 
updating the PR in a few days. Thank you for your patience.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387187)
Time Spent: 8h 40m  (was: 8.5h)

> Add LZO compression and decompression support
> -
>
> Key: BEAM-8564
> URL: https://issues.apache.org/jira/browse/BEAM-8564
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Amogh Tiwari
>Assignee: Amogh Tiwari
>Priority: Minor
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> LZO is a lossless data compression algorithm which is focused on compression 
> and decompression speeds.
> This will enable Apache Beam sdk to compress/decompress files using LZO 
> compression algorithm. 
> This will include the following functionalities:
>  # compress() : for compressing files into an LZO archive
>  # decompress() : for decompressing files archived using LZO compression
> Appropriate Input and Output stream will also be added to enable working with 
> LZO files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (BEAM-9299) Upgrade Flink Runner to 1.8.3 and 1.9.2

2020-02-13 Thread sunjincheng (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036700#comment-17036700
 ] 

sunjincheng edited comment on BEAM-9299 at 2/14/20 5:42 AM:


I found that the issue FLINK-15844 also exists in Flink 1.8.3. I noticed that 
FLINK-15844 has provided a fix for 1.9.3. However, AFAIK, there will be no new 
1.8 releases any more according to FLink's [release 
policy|#update-policy-for-old-releases]]. There are two solutions in my mind:
 - Solution1: As the signature of `JobWithJars.buildUserCodeClassLoader `has 
changed in both Flink 1.8.3 and 1.9.2 and Flink 1.7.2 still uses the old 
signature, we could drop the Flink 1.7 support firstly and then update the 
implementation of `FlinkExecutionEnvironments` to use the new signature.

 - Solution2:  We could make a copy of `FlinkExecutionEnvironments` in each 
version of Flink runner and update the implementation for each copy according 
to the Flink version. This solution decouples the drop of Flink 1.7 support and 
the upgrades of 1.8 and 1.9. Besides, Flink community has made big change for 
the job submission path(FLINK-13954) in 1.10, e.g. `JobWithJars` has been 
removed in 1.10. It means that the job submission logic will be separate for 
1.8/1.9 and 1.10 anyway.

Personally I prefer solution2 and what's your thought? :) [~iemejia] [~mxm] 
[~thw]

 


was (Author: sunjincheng121):
I found that the issue FLINK-15844 also exists in Flink 1.8.3. I noticed that 
FLINK-15844 has provided a fix for 1.9.3. However, AFAIK, there will be no new 
1.8 releases any more according to FLink's [release 
policy|#update-policy-for-old-releases]]. There are two solutions in my mind:
 - Solution1: As the signature of `JobWithJars.buildUserCodeClassLoader `has 
changed in both Flink 1.8.3 and 1.9.2 and Flink 1.7.2 still uses the old 
signature, we could drop the Flink 1.7 support firstly and then update the 
implementation of `FlinkExecutionEnvironments` to use the new signature.

- Solution2:  We could make a copy of `FlinkExecutionEnvironments` in each 
version of Flink runner and update the implementation for each copy according 
to the Flink version. This solution decouples the drop of Flink 1.7 support and 
the upgrades of 1.8 and 1.9. Besides, Flink community has made big change for 
the job submission path(FLINK-13954) in 1.10, e.g. `JobWithJars` has been 
removed in 1.10. It means that the job submission logic will be separate for 
1.8/1.9 and 1.10 anyway.

Personally I prefer solution2 and what's your thought? :)

 

> Upgrade Flink Runner to 1.8.3 and 1.9.2
> ---
>
> Key: BEAM-9299
> URL: https://issues.apache.org/jira/browse/BEAM-9299
> Project: Beam
>  Issue Type: Task
>  Components: runner-flink
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I would like to Upgrade Flink Runner to 18.3 and 1.9.2 due to both the Apache 
> Flink 1.8.3 and Apache Flink 1.9.2 have been released [1]. 
> What do you think?
> [1] https://dist.apache.org/repos/dist/release/flink/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9299) Upgrade Flink Runner to 1.8.3 and 1.9.2

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9299?focusedWorklogId=387147&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387147
 ]

ASF GitHub Bot logged work on BEAM-9299:


Author: ASF GitHub Bot
Created on: 14/Feb/20 05:39
Start Date: 14/Feb/20 05:39
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #10850: [BEAM-9299] 
Upgrade Flink Runner from 1.8.2 to 1.8.3
URL: https://github.com/apache/beam/pull/10850#issuecomment-586106747
 
 
   Hi @angoenka Thanks for your comment, I found that the issue FLINK-15844 
also exists in Flink 1.8.3. I noticed that FLINK-15844 has provided a fix for 
1.9.3. However, AFAIK, there will be no new 1.8 releases any more according to 
Flink's release policy]. There are two solutions I have left in the JIRA. I 
appreciate if you have a look at it :)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387147)
Time Spent: 40m  (was: 0.5h)

> Upgrade Flink Runner to 1.8.3 and 1.9.2
> ---
>
> Key: BEAM-9299
> URL: https://issues.apache.org/jira/browse/BEAM-9299
> Project: Beam
>  Issue Type: Task
>  Components: runner-flink
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I would like to Upgrade Flink Runner to 18.3 and 1.9.2 due to both the Apache 
> Flink 1.8.3 and Apache Flink 1.9.2 have been released [1]. 
> What do you think?
> [1] https://dist.apache.org/repos/dist/release/flink/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (BEAM-9299) Upgrade Flink Runner to 1.8.3 and 1.9.2

2020-02-13 Thread sunjincheng (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036700#comment-17036700
 ] 

sunjincheng edited comment on BEAM-9299 at 2/14/20 5:35 AM:


I found that the issue FLINK-15844 also exists in Flink 1.8.3. I noticed that 
FLINK-15844 has provided a fix for 1.9.3. However, AFAIK, there will be no new 
1.8 releases any more according to FLink's [release 
policy|#update-policy-for-old-releases]]. There are two solutions in my mind:
 - Solution1: As the signature of `JobWithJars.buildUserCodeClassLoader `has 
changed in both Flink 1.8.3 and 1.9.2 and Flink 1.7.2 still uses the old 
signature, we could drop the Flink 1.7 support firstly and then update the 
implementation of `FlinkExecutionEnvironments` to use the new signature.

- Solution2:  We could make a copy of `FlinkExecutionEnvironments` in each 
version of Flink runner and update the implementation for each copy according 
to the Flink version. This solution decouples the drop of Flink 1.7 support and 
the upgrades of 1.8 and 1.9. Besides, Flink community has made big change for 
the job submission path(FLINK-13954) in 1.10, e.g. `JobWithJars` has been 
removed in 1.10. It means that the job submission logic will be separate for 
1.8/1.9 and 1.10 anyway.

Personally I prefer solution2 and what's your thought? :)

 


was (Author: sunjincheng121):
I found that the issue FLINK-15844 also exists in Flink 1.8.3. I noticed that 
FLINK-15844 has provided a fix for 1.9.3. However, AFAIK, there will be no new 
1.8 releases any more according to FLink's [release 
policy|[https://flink.apache.org/downloads.html#update-policy-for-old-releases]].
 There are two solutions in my mind:


- Solution1: As the signature of `JobWithJars.buildUserCodeClassLoader `has 
changed in both Flink 1.8.3 and 1.9.2 and Flink 1.7.2 still uses the old 
signature, we could drop the Flink 1.7 support firstly and then update the 
implementation of `FlinkExecutionEnvironments` to use the new signature.


-Solution2:  We could make a copy of `FlinkExecutionEnvironments` in each 
version of Flink runner and update the implementation for each copy according 
to the Flink version. This solution decouples the drop of Flink 1.7 support and 
the upgrades of 1.8 and 1.9. Besides, Flink community has made big change for 
the job submission path(FLINK-13954) in 1.10, e.g. `JobWithJars` has been 
removed in 1.10. It means that the job submission logic will be separate for 
1.8/1.9 and 1.10 anyway.

Personally I prefer solution2 and what's your thought? :)

 

> Upgrade Flink Runner to 1.8.3 and 1.9.2
> ---
>
> Key: BEAM-9299
> URL: https://issues.apache.org/jira/browse/BEAM-9299
> Project: Beam
>  Issue Type: Task
>  Components: runner-flink
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I would like to Upgrade Flink Runner to 18.3 and 1.9.2 due to both the Apache 
> Flink 1.8.3 and Apache Flink 1.9.2 have been released [1]. 
> What do you think?
> [1] https://dist.apache.org/repos/dist/release/flink/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9299) Upgrade Flink Runner to 1.8.3 and 1.9.2

2020-02-13 Thread sunjincheng (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036700#comment-17036700
 ] 

sunjincheng commented on BEAM-9299:
---

I found that the issue FLINK-15844 also exists in Flink 1.8.3. I noticed that 
FLINK-15844 has provided a fix for 1.9.3. However, AFAIK, there will be no new 
1.8 releases any more according to FLink's [release 
policy|[https://flink.apache.org/downloads.html#update-policy-for-old-releases]].
 There are two solutions in my mind:


- Solution1: As the signature of `JobWithJars.buildUserCodeClassLoader `has 
changed in both Flink 1.8.3 and 1.9.2 and Flink 1.7.2 still uses the old 
signature, we could drop the Flink 1.7 support firstly and then update the 
implementation of `FlinkExecutionEnvironments` to use the new signature.


-Solution2:  We could make a copy of `FlinkExecutionEnvironments` in each 
version of Flink runner and update the implementation for each copy according 
to the Flink version. This solution decouples the drop of Flink 1.7 support and 
the upgrades of 1.8 and 1.9. Besides, Flink community has made big change for 
the job submission path(FLINK-13954) in 1.10, e.g. `JobWithJars` has been 
removed in 1.10. It means that the job submission logic will be separate for 
1.8/1.9 and 1.10 anyway.

Personally I prefer solution2 and what's your thought? :)

 

> Upgrade Flink Runner to 1.8.3 and 1.9.2
> ---
>
> Key: BEAM-9299
> URL: https://issues.apache.org/jira/browse/BEAM-9299
> Project: Beam
>  Issue Type: Task
>  Components: runner-flink
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I would like to Upgrade Flink Runner to 18.3 and 1.9.2 due to both the Apache 
> Flink 1.8.3 and Apache Flink 1.9.2 have been released [1]. 
> What do you think?
> [1] https://dist.apache.org/repos/dist/release/flink/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=387097&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387097
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 14/Feb/20 04:05
Start Date: 14/Feb/20 04:05
Worklog Time Spent: 10m 
  Work Description: veblush commented on pull request #10857: [BEAM-8889] 
Upgrade guava to 28.0-jre
URL: https://github.com/apache/beam/pull/10857#discussion_r379241854
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -383,7 +383,7 @@ class BeamModulePlugin implements Plugin {
 def google_cloud_spanner_version = "1.49.1"
 def google_http_clients_version = "1.34.0"
 def grpc_version = "1.25.0"
-def guava_version = "25.1-jre"
+def guava_version = "28.0-jre"
 
 Review comment:
   This was because `28.0-jre` is the first version having missing symbols. 
`28.2-jre` would work too. It's up to Beam's team to decide which version to go.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387097)
Remaining Estimate: 150.5h  (was: 150h 40m)
Time Spent: 17.5h  (was: 17h 20m)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 17.5h
>  Remaining Estimate: 150.5h
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9301) Check in beam-linkage-check.sh

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9301?focusedWorklogId=387092&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387092
 ]

ASF GitHub Bot logged work on BEAM-9301:


Author: ASF GitHub Bot
Created on: 14/Feb/20 03:28
Start Date: 14/Feb/20 03:28
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10841: [BEAM-9301] Check in 
beam-linkage-check.sh
URL: https://github.com/apache/beam/pull/10841#issuecomment-586080883
 
 
   > master is checked eagerly
   
   What is it?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387092)
Time Spent: 1h 20m  (was: 1h 10m)

> Check in beam-linkage-check.sh
> --
>
> Key: BEAM-9301
> URL: https://issues.apache.org/jira/browse/BEAM-9301
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/pull/10769#issuecomment-584571787
> bq. @suztomo can you contribute this script maybe into Beam's build-tools 
> directory so we can improve it a bit for further use?
> This is a temporary solution before exclusion rules in Linkage Checker 
> (BEAM-9206) are implemented.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=387088&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387088
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 14/Feb/20 03:17
Start Date: 14/Feb/20 03:17
Worklog Time Spent: 10m 
  Work Description: medb commented on pull request #10857: [BEAM-8889] 
Upgrade guava to 28.0-jre
URL: https://github.com/apache/beam/pull/10857#discussion_r379233105
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -383,7 +383,7 @@ class BeamModulePlugin implements Plugin {
 def google_cloud_spanner_version = "1.49.1"
 def google_http_clients_version = "1.34.0"
 def grpc_version = "1.25.0"
-def guava_version = "25.1-jre"
+def guava_version = "28.0-jre"
 
 Review comment:
   Why to not use the latest `28.2-jre` version?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387088)
Remaining Estimate: 150h 40m  (was: 150h 50m)
Time Spent: 17h 20m  (was: 17h 10m)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 17h 20m
>  Remaining Estimate: 150h 40m
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9211) Spark portable jar test script is missing

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9211?focusedWorklogId=387086&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387086
 ]

ASF GitHub Bot logged work on BEAM-9211:


Author: ASF GitHub Bot
Created on: 14/Feb/20 03:12
Start Date: 14/Feb/20 03:12
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #10723: [BEAM-9211] 
upload missing Spark portable jar test script
URL: https://github.com/apache/beam/pull/10723#discussion_r379231967
 
 

 ##
 File path: runners/portability/test_pipeline_jar.sh
 ##
 @@ -96,11 +101,17 @@ result = pipeline.run()
 result.wait_until_finish()
 "
 
+if [[ "$RUNNER" -eq "FlinkRunner" ]]; then
 
 Review comment:
   ```suggestion
   if [[ "$RUNNER" = "FlinkRunner" ]]; then
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387086)
Time Spent: 3h 10m  (was: 3h)

> Spark portable jar test script is missing
> -
>
> Key: BEAM-9211
> URL: https://issues.apache.org/jira/browse/BEAM-9211
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> beam_PostCommit_PortableJar_Spark has been failing since its creation because 
> I forgot to upload the test script it calls. Whoops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=387082&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387082
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 14/Feb/20 03:05
Start Date: 14/Feb/20 03:05
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #10375: [BEAM-8537] Provide 
WatermarkEstimator to track watermark
URL: https://github.com/apache/beam/pull/10375#issuecomment-586075968
 
 
   Run PythonFormatter PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387082)
Time Spent: 16h 20m  (was: 16h 10m)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 16h 20m
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9211) Spark portable jar test script is missing

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9211?focusedWorklogId=387084&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387084
 ]

ASF GitHub Bot logged work on BEAM-9211:


Author: ASF GitHub Bot
Created on: 14/Feb/20 03:05
Start Date: 14/Feb/20 03:05
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10723: [BEAM-9211] upload 
missing Spark portable jar test script
URL: https://github.com/apache/beam/pull/10723#issuecomment-586075989
 
 
   Run PortableJar_Spark PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387084)
Time Spent: 3h  (was: 2h 50m)

> Spark portable jar test script is missing
> -
>
> Key: BEAM-9211
> URL: https://issues.apache.org/jira/browse/BEAM-9211
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> beam_PostCommit_PortableJar_Spark has been failing since its creation because 
> I forgot to upload the test script it calls. Whoops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9211) Spark portable jar test script is missing

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9211?focusedWorklogId=387083&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387083
 ]

ASF GitHub Bot logged work on BEAM-9211:


Author: ASF GitHub Bot
Created on: 14/Feb/20 03:05
Start Date: 14/Feb/20 03:05
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10723: [BEAM-9211] upload 
missing Spark portable jar test script
URL: https://github.com/apache/beam/pull/10723#issuecomment-586075970
 
 
   Run PortableJar_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387083)
Time Spent: 2h 50m  (was: 2h 40m)

> Spark portable jar test script is missing
> -
>
> Key: BEAM-9211
> URL: https://issues.apache.org/jira/browse/BEAM-9211
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> beam_PostCommit_PortableJar_Spark has been failing since its creation because 
> I forgot to upload the test script it calls. Whoops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9211) Spark portable jar test script is missing

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9211?focusedWorklogId=387081&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387081
 ]

ASF GitHub Bot logged work on BEAM-9211:


Author: ASF GitHub Bot
Created on: 14/Feb/20 03:05
Start Date: 14/Feb/20 03:05
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #10723: [BEAM-9211] 
upload missing Spark portable jar test script
URL: https://github.com/apache/beam/pull/10723#discussion_r379230854
 
 

 ##
 File path: runners/portability/test_pipeline_jar.sh
 ##
 @@ -28,6 +28,16 @@ case $key in
 shift # past argument
 shift # past value
 ;;
+--spark_job_server_jar)
 
 Review comment:
   I changed it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387081)
Time Spent: 2h 40m  (was: 2.5h)

> Spark portable jar test script is missing
> -
>
> Key: BEAM-9211
> URL: https://issues.apache.org/jira/browse/BEAM-9211
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> beam_PostCommit_PortableJar_Spark has been failing since its creation because 
> I forgot to upload the test script it calls. Whoops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9287) Python Validates runner tests for Unified Worker

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9287?focusedWorklogId=387079&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387079
 ]

ASF GitHub Bot logged work on BEAM-9287:


Author: ASF GitHub Bot
Created on: 14/Feb/20 03:03
Start Date: 14/Feb/20 03:03
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #10863: [BEAM-9287] Add 
Python streaming Validates runner tests for Unified Worker
URL: https://github.com/apache/beam/pull/10863#issuecomment-586075421
 
 
   R: @markflyhigh @ananvay 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387079)
Time Spent: 20m  (was: 10m)

> Python Validates runner tests for Unified Worker
> 
>
> Key: BEAM-9287
> URL: https://issues.apache.org/jira/browse/BEAM-9287
> Project: Beam
>  Issue Type: Test
>  Components: runner-dataflow, testing
>Reporter: Ankur Goenka
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9287) Python Validates runner tests for Unified Worker

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9287?focusedWorklogId=387078&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387078
 ]

ASF GitHub Bot logged work on BEAM-9287:


Author: ASF GitHub Bot
Created on: 14/Feb/20 03:02
Start Date: 14/Feb/20 03:02
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #10863: [BEAM-9287] 
Add Python streaming Validates runner tests for Unified Worker
URL: https://github.com/apache/beam/pull/10863
 
 
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Pyth

[jira] [Work logged] (BEAM-9211) Spark portable jar test script is missing

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9211?focusedWorklogId=387077&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387077
 ]

ASF GitHub Bot logged work on BEAM-9211:


Author: ASF GitHub Bot
Created on: 14/Feb/20 03:00
Start Date: 14/Feb/20 03:00
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #10723: [BEAM-9211] 
upload missing Spark portable jar test script
URL: https://github.com/apache/beam/pull/10723#discussion_r379229678
 
 

 ##
 File path: runners/portability/test_pipeline_jar.sh
 ##
 @@ -28,6 +28,16 @@ case $key in
 shift # past argument
 shift # past value
 ;;
+--spark_job_server_jar)
 
 Review comment:
   I see, but will it not be a bit confusing to only specify one of these 
option at a time.
   Like this gives the user option to provide flink runner but also provide 
spark_job_server_jar.
   
   I don;t have a strong opinion on this so its upto you.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387077)
Time Spent: 2.5h  (was: 2h 20m)

> Spark portable jar test script is missing
> -
>
> Key: BEAM-9211
> URL: https://issues.apache.org/jira/browse/BEAM-9211
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> beam_PostCommit_PortableJar_Spark has been failing since its creation because 
> I forgot to upload the test script it calls. Whoops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9211) Spark portable jar test script is missing

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9211?focusedWorklogId=387075&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387075
 ]

ASF GitHub Bot logged work on BEAM-9211:


Author: ASF GitHub Bot
Created on: 14/Feb/20 02:55
Start Date: 14/Feb/20 02:55
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #10723: [BEAM-9211] 
upload missing Spark portable jar test script
URL: https://github.com/apache/beam/pull/10723#discussion_r379228919
 
 

 ##
 File path: runners/portability/test_pipeline_jar.sh
 ##
 @@ -28,6 +28,16 @@ case $key in
 shift # past argument
 shift # past value
 ;;
+--spark_job_server_jar)
 
 Review comment:
   Actually... I remember the reason I kept the flink_ and spark_ prefixes was 
so that the gradle build script would match the actual python args.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387075)
Time Spent: 2h 20m  (was: 2h 10m)

> Spark portable jar test script is missing
> -
>
> Key: BEAM-9211
> URL: https://issues.apache.org/jira/browse/BEAM-9211
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> beam_PostCommit_PortableJar_Spark has been failing since its creation because 
> I forgot to upload the test script it calls. Whoops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9211) Spark portable jar test script is missing

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9211?focusedWorklogId=387074&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387074
 ]

ASF GitHub Bot logged work on BEAM-9211:


Author: ASF GitHub Bot
Created on: 14/Feb/20 02:53
Start Date: 14/Feb/20 02:53
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #10723: [BEAM-9211] 
upload missing Spark portable jar test script
URL: https://github.com/apache/beam/pull/10723#discussion_r379228475
 
 

 ##
 File path: runners/portability/test_pipeline_jar.sh
 ##
 @@ -28,6 +28,16 @@ case $key in
 shift # past argument
 shift # past value
 ;;
+--spark_job_server_jar)
 
 Review comment:
   Sure.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387074)
Time Spent: 2h 10m  (was: 2h)

> Spark portable jar test script is missing
> -
>
> Key: BEAM-9211
> URL: https://issues.apache.org/jira/browse/BEAM-9211
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> beam_PostCommit_PortableJar_Spark has been failing since its creation because 
> I forgot to upload the test script it calls. Whoops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9211) Spark portable jar test script is missing

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9211?focusedWorklogId=387073&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387073
 ]

ASF GitHub Bot logged work on BEAM-9211:


Author: ASF GitHub Bot
Created on: 14/Feb/20 02:49
Start Date: 14/Feb/20 02:49
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #10723: [BEAM-9211] 
upload missing Spark portable jar test script
URL: https://github.com/apache/beam/pull/10723#discussion_r379227548
 
 

 ##
 File path: runners/portability/test_pipeline_jar.sh
 ##
 @@ -28,6 +28,16 @@ case $key in
 shift # past argument
 shift # past value
 ;;
+--spark_job_server_jar)
 
 Review comment:
   Can we just call it `JOB_SERVER_JAR`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387073)
Time Spent: 2h  (was: 1h 50m)

> Spark portable jar test script is missing
> -
>
> Key: BEAM-9211
> URL: https://issues.apache.org/jira/browse/BEAM-9211
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> beam_PostCommit_PortableJar_Spark has been failing since its creation because 
> I forgot to upload the test script it calls. Whoops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=387070&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387070
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 14/Feb/20 02:44
Start Date: 14/Feb/20 02:44
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #10375: [BEAM-8537] 
Provide WatermarkEstimator to track watermark
URL: https://github.com/apache/beam/pull/10375#discussion_r379226662
 
 

 ##
 File path: sdks/python/apache_beam/runners/common.py
 ##
 @@ -765,40 +802,54 @@ def _invoke_process_per_window(self,
   # ProcessSizedElementAndRestriction.
   self.threadsafe_restriction_tracker.check_done()
   deferred_status = self.threadsafe_restriction_tracker.deferred_status()
-  current_watermark = None
-  if self.watermark_estimator:
-current_watermark = self.watermark_estimator.current_watermark()
   if deferred_status:
 deferred_restriction, deferred_timestamp = deferred_status
 element = windowed_value.value
 size = self.signature.get_restriction_provider().restriction_size(
 element, deferred_restriction)
-residual_value = ((element, deferred_restriction), size)
-return SplitResultResidual(
-residual_value=windowed_value.with_value(residual_value),
-current_watermark=current_watermark,
-deferred_timestamp=deferred_timestamp)
-return None
+if self.threadsafe_watermark_estimator:
+  current_watermark = (
+  self.threadsafe_watermark_estimator.current_watermark())
+  estimator_state = (
+  self.threadsafe_watermark_estimator.get_estimator_state())
+  residual_value = (
+  (element, (deferred_restriction, estimator_state)), size)
+  return SplitResultResidual(
+  residual_value=windowed_value.with_value(residual_value),
+  current_watermark=current_watermark,
+  deferred_timestamp=deferred_timestamp)
+else:
 
 Review comment:
   Yes.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387070)
Time Spent: 16h  (was: 15h 50m)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 16h
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=387071&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387071
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 14/Feb/20 02:44
Start Date: 14/Feb/20 02:44
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #10375: [BEAM-8537] 
Provide WatermarkEstimator to track watermark
URL: https://github.com/apache/beam/pull/10375#discussion_r379226674
 
 

 ##
 File path: sdks/python/apache_beam/runners/common.py
 ##
 @@ -321,6 +333,21 @@ def is_splittable_dofn(self):
 # type: () -> bool
 return self.get_restriction_provider() is not None
 
+  def get_restriction_coder(self):
+"""Get coder for a restriction when processing an SDF.
+
+If the SDF uses an WatermarkEstimatorProvider, the restriction coder is a
+tuple coder of (restriction_coder, estimator_state_coder). Otherwise,
+returns the SDFs restriction_coder.
+"""
+restriction_coder = None
 
 Review comment:
   Changed to `if ... else` block. Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387071)
Time Spent: 16h 10m  (was: 16h)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 16h 10m
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=387068&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387068
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 14/Feb/20 02:44
Start Date: 14/Feb/20 02:44
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #10375: [BEAM-8537] 
Provide WatermarkEstimator to track watermark
URL: https://github.com/apache/beam/pull/10375#discussion_r379226640
 
 

 ##
 File path: sdks/python/apache_beam/io/watermark_estimators.py
 ##
 @@ -0,0 +1,101 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""A collection of WatermarkEstimator implementations that SplittableDoFns
+can use."""
+
+# pytype: skip-file
+
+from __future__ import absolute_import
+
+from apache_beam.io.iobase import WatermarkEstimator
+from apache_beam.utils.timestamp import Timestamp
+
+
+class MonotonicWatermarkEstimator(WatermarkEstimator):
+  """A WatermarkEstimator which assumes that timestamps of all ouput records
+  are increasing monotonically.
+  """
+  def __init__(self, timestamp):
+"""For a new  pair, the initial value is None. When
+resuming processing, the initial timestamp will be the last reported
+watermark.
+"""
+self._watermark = timestamp
+
+  def observe_timestamp(self, timestamp):
+if self._watermark is None:
+  self._watermark = timestamp
+else:
+  if timestamp < self._watermark:
+raise ValueError('A MonotonicWatermarkEstimator expects output '
+ 'timestamp to be increasing monotonically.')
+  self._watermark = timestamp
+
+  def current_watermark(self):
+return self._watermark
+
+  def get_estimator_state(self):
+return self._watermark
+
+
+class WalltimeWatermarkEstimator(WatermarkEstimator):
+  """A WatermarkEstimator which uses processing time as the estimated 
watermark.
+  """
+  def __init__(self, timestamp=None):
+if timestamp:
 
 Review comment:
   Done. Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387068)
Time Spent: 15h 40m  (was: 15.5h)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 15h 40m
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=387069&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387069
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 14/Feb/20 02:44
Start Date: 14/Feb/20 02:44
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #10375: [BEAM-8537] 
Provide WatermarkEstimator to track watermark
URL: https://github.com/apache/beam/pull/10375#discussion_r379226651
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -486,19 +486,28 @@ def process(
 _ = (p | beam.Create(data) | beam.ParDo(ExpandingStringsDoFn()))
 
   def test_sdf_with_watermark_tracking(self):
+class ManualWatermarkEstimatorProvider(
 
 Review comment:
   Yes we should! Thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387069)
Time Spent: 15h 50m  (was: 15h 40m)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9211) Spark portable jar test script is missing

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9211?focusedWorklogId=387067&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387067
 ]

ASF GitHub Bot logged work on BEAM-9211:


Author: ASF GitHub Bot
Created on: 14/Feb/20 02:42
Start Date: 14/Feb/20 02:42
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10723: [BEAM-9211] upload 
missing Spark portable jar test script
URL: https://github.com/apache/beam/pull/10723#issuecomment-586070657
 
 
   @mxm I'm now reusing the original script.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387067)
Time Spent: 1h 50m  (was: 1h 40m)

> Spark portable jar test script is missing
> -
>
> Key: BEAM-9211
> URL: https://issues.apache.org/jira/browse/BEAM-9211
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> beam_PostCommit_PortableJar_Spark has been failing since its creation because 
> I forgot to upload the test script it calls. Whoops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9211) Spark portable jar test script is missing

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9211?focusedWorklogId=387064&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387064
 ]

ASF GitHub Bot logged work on BEAM-9211:


Author: ASF GitHub Bot
Created on: 14/Feb/20 02:42
Start Date: 14/Feb/20 02:42
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10723: [BEAM-9211] upload 
missing Spark portable jar test script
URL: https://github.com/apache/beam/pull/10723#issuecomment-586070544
 
 
   Run PortableJar_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387064)
Time Spent: 1.5h  (was: 1h 20m)

> Spark portable jar test script is missing
> -
>
> Key: BEAM-9211
> URL: https://issues.apache.org/jira/browse/BEAM-9211
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> beam_PostCommit_PortableJar_Spark has been failing since its creation because 
> I forgot to upload the test script it calls. Whoops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9211) Spark portable jar test script is missing

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9211?focusedWorklogId=387065&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387065
 ]

ASF GitHub Bot logged work on BEAM-9211:


Author: ASF GitHub Bot
Created on: 14/Feb/20 02:42
Start Date: 14/Feb/20 02:42
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10723: [BEAM-9211] upload 
missing Spark portable jar test script
URL: https://github.com/apache/beam/pull/10723#issuecomment-586070563
 
 
   Run PortableJar_Spark PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387065)
Time Spent: 1h 40m  (was: 1.5h)

> Spark portable jar test script is missing
> -
>
> Key: BEAM-9211
> URL: https://issues.apache.org/jira/browse/BEAM-9211
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> beam_PostCommit_PortableJar_Spark has been failing since its creation because 
> I forgot to upload the test script it calls. Whoops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=387066&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387066
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 14/Feb/20 02:42
Start Date: 14/Feb/20 02:42
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add 
ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#issuecomment-586070601
 
 
   @pabloem @steveniemitz @iemejia comments have been addressed and updated. 
Also I will add `inferBeamSchema` to the future plans :) 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387066)
Time Spent: 16h  (was: 15h 50m)

> Add ThriftIO to Support IO for Thrift Files
> ---
>
> Key: BEAM-8561
> URL: https://issues.apache.org/jira/browse/BEAM-8561
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-files
>Reporter: Chris Larsen
>Assignee: Chris Larsen
>Priority: Major
>  Time Spent: 16h
>  Remaining Estimate: 0h
>
> Similar to AvroIO it would be very useful to support reading and writing 
> to/from Thrift files with a native connector. 
> Functionality would include:
>  # read() - Reading from one or more Thrift files.
>  # write() - Writing to one or more Thrift files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=387063&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387063
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 14/Feb/20 02:41
Start Date: 14/Feb/20 02:41
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10769: [BEAM-8889] 
Upgrades gcsio to 2.0.0
URL: https://github.com/apache/beam/pull/10769#issuecomment-586070331
 
 
   Run Java11 PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387063)
Remaining Estimate: 150h 50m  (was: 151h)
Time Spent: 17h 10m  (was: 17h)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 17h 10m
>  Remaining Estimate: 150h 50m
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=387062&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387062
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 14/Feb/20 02:40
Start Date: 14/Feb/20 02:40
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10769: [BEAM-8889] 
Upgrades gcsio to 2.0.0
URL: https://github.com/apache/beam/pull/10769#issuecomment-586070248
 
 
   Run Java11 PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387062)
Remaining Estimate: 151h  (was: 151h 10m)
Time Spent: 17h  (was: 16h 50m)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 17h
>  Remaining Estimate: 151h
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=387061&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387061
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 14/Feb/20 02:39
Start Date: 14/Feb/20 02:39
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10769: [BEAM-8889] 
Upgrades gcsio to 2.0.0
URL: https://github.com/apache/beam/pull/10769#issuecomment-586070039
 
 
   Run Java PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387061)
Remaining Estimate: 151h 10m  (was: 151h 20m)
Time Spent: 16h 50m  (was: 16h 40m)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 16h 50m
>  Remaining Estimate: 151h 10m
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=387060&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387060
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 14/Feb/20 02:39
Start Date: 14/Feb/20 02:39
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10769: [BEAM-8889] 
Upgrades gcsio to 2.0.0
URL: https://github.com/apache/beam/pull/10769#issuecomment-586069879
 
 
   Run CommunityMetrics PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387060)
Remaining Estimate: 151h 20m  (was: 151.5h)
Time Spent: 16h 40m  (was: 16.5h)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 16h 40m
>  Remaining Estimate: 151h 20m
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=387059&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387059
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 14/Feb/20 02:38
Start Date: 14/Feb/20 02:38
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10769: [BEAM-8889] 
Upgrades gcsio to 2.0.0
URL: https://github.com/apache/beam/pull/10769#issuecomment-586069795
 
 
   Run Dataflow ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387059)
Remaining Estimate: 151.5h  (was: 151h 40m)
Time Spent: 16.5h  (was: 16h 20m)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 16.5h
>  Remaining Estimate: 151.5h
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9211) Spark portable jar test script is missing

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9211?focusedWorklogId=387058&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387058
 ]

ASF GitHub Bot logged work on BEAM-9211:


Author: ASF GitHub Bot
Created on: 14/Feb/20 02:35
Start Date: 14/Feb/20 02:35
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10723: [BEAM-9211] upload 
missing Spark portable jar test script
URL: https://github.com/apache/beam/pull/10723#issuecomment-586069106
 
 
   Run PortableJar_Spark PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387058)
Time Spent: 1h 20m  (was: 1h 10m)

> Spark portable jar test script is missing
> -
>
> Key: BEAM-9211
> URL: https://issues.apache.org/jira/browse/BEAM-9211
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> beam_PostCommit_PortableJar_Spark has been failing since its creation because 
> I forgot to upload the test script it calls. Whoops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9211) Spark portable jar test script is missing

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9211?focusedWorklogId=387057&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387057
 ]

ASF GitHub Bot logged work on BEAM-9211:


Author: ASF GitHub Bot
Created on: 14/Feb/20 02:34
Start Date: 14/Feb/20 02:34
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10723: [BEAM-9211] upload 
missing Spark portable jar test script
URL: https://github.com/apache/beam/pull/10723#issuecomment-586068992
 
 
   Run PortableJar_Flink PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387057)
Time Spent: 1h 10m  (was: 1h)

> Spark portable jar test script is missing
> -
>
> Key: BEAM-9211
> URL: https://issues.apache.org/jira/browse/BEAM-9211
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> beam_PostCommit_PortableJar_Spark has been failing since its creation because 
> I forgot to upload the test script it calls. Whoops.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=387054&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387054
 ]

ASF GitHub Bot logged work on BEAM-9229:


Author: ASF GitHub Bot
Created on: 14/Feb/20 02:25
Start Date: 14/Feb/20 02:25
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #10733: [BEAM-9229] 
Adding dependency information to Environment proto
URL: https://github.com/apache/beam/pull/10733#discussion_r379222644
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1093,6 +1093,72 @@ message SideInput {
   FunctionSpec window_mapping_fn = 3;
 }
 
+message StandardArtifacts {
+  enum Types {
+// A URN for artifacts stored in a local directory.
+// payload: ArtifactFilePayload.
+FILE = 0 [(beam_urn) = "beam:artifact:file:v1"];
+
+// A URN for artifacts embedded in ArtifactInformation proto.
+// payload: EmbeddedFilePayload.
+EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"];
+
+// A URN for artifacts described by remote URLs.
+// payload: a string for an artifact URL e.g. "https://.../foo.jar";
+REMOTE = 2 [(beam_urn) = "beam:artifact:remote:v1"];
 
 Review comment:
   `FILE` and `REMOTE` are merged and renamed to `URL`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387054)
Time Spent: 5.5h  (was: 5h 20m)

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9299) Upgrade Flink Runner to 1.8.3 and 1.9.2

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9299?focusedWorklogId=387053&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387053
 ]

ASF GitHub Bot logged work on BEAM-9299:


Author: ASF GitHub Bot
Created on: 14/Feb/20 02:25
Start Date: 14/Feb/20 02:25
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #10850: [BEAM-9299] Upgrade 
Flink Runner from 1.8.2 to 1.8.3
URL: https://github.com/apache/beam/pull/10850#issuecomment-586066831
 
 
   Unfortunately, jenkins run the tests against latest Flink supported by beam 
(1.9) 
   Reference 
https://github.com/apache/beam/blob/cec1094adba3dd20f382fc07409fbf7fb58fbbc6/sdks/python/test-suites/portable/common.gradle#L57
 
   
   Can you please test it separately to make sure the version update passes the 
tests.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387053)
Time Spent: 0.5h  (was: 20m)

> Upgrade Flink Runner to 1.8.3 and 1.9.2
> ---
>
> Key: BEAM-9299
> URL: https://issues.apache.org/jira/browse/BEAM-9299
> Project: Beam
>  Issue Type: Task
>  Components: runner-flink
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I would like to Upgrade Flink Runner to 18.3 and 1.9.2 due to both the Apache 
> Flink 1.8.3 and Apache Flink 1.9.2 have been released [1]. 
> What do you think?
> [1] https://dist.apache.org/repos/dist/release/flink/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=387051&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387051
 ]

ASF GitHub Bot logged work on BEAM-9229:


Author: ASF GitHub Bot
Created on: 14/Feb/20 02:23
Start Date: 14/Feb/20 02:23
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #10733: [BEAM-9229] 
Adding dependency information to Environment proto
URL: https://github.com/apache/beam/pull/10733#discussion_r379222180
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1093,6 +1093,72 @@ message SideInput {
   FunctionSpec window_mapping_fn = 3;
 }
 
+message StandardArtifacts {
+  enum Types {
+// A URN for artifacts stored in a local directory.
+// payload: ArtifactFilePayload.
+FILE = 0 [(beam_urn) = "beam:artifact:file:v1"];
+
+// A URN for artifacts embedded in ArtifactInformation proto.
+// payload: EmbeddedFilePayload.
+EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"];
+
+// A URN for artifacts described by remote URLs.
+// payload: a string for an artifact URL e.g. "https://.../foo.jar";
+REMOTE = 2 [(beam_urn) = "beam:artifact:remote:v1"];
+
+// A URN for Python artifacts hosted on PYPI.
+// payload: PypiPayload
+PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"];
+
+// A URN for Java artifacts hosted on Maven central.
+// payload: MavenPayload
+MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"];
+  }
+}
+
+message ArtifactFilePayload {
+  // A path to an artifact file on a local system.
+  string local_path = 1;
+
+  // A generated staged name (no path).
+  string staged_name = 2;
+}
+
+message EmbeddedFilePayload {
+  // raw data bytes for an embedded artifact
+  bytes data = 1;
+
+  // A generated staged name (no path).
 
 Review comment:
   Ack.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387051)
Time Spent: 5h 20m  (was: 5h 10m)

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=387050&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387050
 ]

ASF GitHub Bot logged work on BEAM-9229:


Author: ASF GitHub Bot
Created on: 14/Feb/20 02:22
Start Date: 14/Feb/20 02:22
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #10733: [BEAM-9229] 
Adding dependency information to Environment proto
URL: https://github.com/apache/beam/pull/10733#discussion_r379222112
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1093,6 +1093,72 @@ message SideInput {
   FunctionSpec window_mapping_fn = 3;
 }
 
+message StandardArtifacts {
+  enum Types {
+// A URN for artifacts stored in a local directory.
+// payload: ArtifactFilePayload.
+FILE = 0 [(beam_urn) = "beam:artifact:file:v1"];
+
+// A URN for artifacts embedded in ArtifactInformation proto.
+// payload: EmbeddedFilePayload.
+EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"];
+
+// A URN for artifacts described by remote URLs.
+// payload: a string for an artifact URL e.g. "https://.../foo.jar";
+REMOTE = 2 [(beam_urn) = "beam:artifact:remote:v1"];
+
+// A URN for Python artifacts hosted on PYPI.
+// payload: PypiPayload
+PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"];
+
+// A URN for Java artifacts hosted on Maven central.
+// payload: MavenPayload
+MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"];
+  }
+}
+
+message ArtifactFilePayload {
+  // A path to an artifact file on a local system.
+  string local_path = 1;
+
+  // A generated staged name (no path).
+  string staged_name = 2;
+}
+
+message EmbeddedFilePayload {
+  // raw data bytes for an embedded artifact
+  bytes data = 1;
+
+  // A generated staged name (no path).
+  string staged_name = 2;
+}
+
+message PypiPayload {
+  // Pypi compatible artifact id e.g. "apache-beam"
+  string pypi_artifact_id = 1;
+
+  // Pypi compatible version string.
+  string pypi_version_range = 2;
+}
+
+message MavenPayload {
+  // Maven compatible group id e.g. "org.apache.beam"
+  string maven_group_id = 1;
 
 Review comment:
   Merged into a single field `artifact_specifier` which expects the format of 
`groupId:artifactId:version[:packaging[:classifier]]`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387050)
Time Spent: 5h 10m  (was: 5h)

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=387049&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387049
 ]

ASF GitHub Bot logged work on BEAM-9229:


Author: ASF GitHub Bot
Created on: 14/Feb/20 02:21
Start Date: 14/Feb/20 02:21
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #10733: [BEAM-9229] 
Adding dependency information to Environment proto
URL: https://github.com/apache/beam/pull/10733#discussion_r379221800
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1093,6 +1093,72 @@ message SideInput {
   FunctionSpec window_mapping_fn = 3;
 }
 
+message StandardArtifacts {
+  enum Types {
+// A URN for artifacts stored in a local directory.
+// payload: ArtifactFilePayload.
+FILE = 0 [(beam_urn) = "beam:artifact:file:v1"];
+
+// A URN for artifacts embedded in ArtifactInformation proto.
+// payload: EmbeddedFilePayload.
+EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"];
+
+// A URN for artifacts described by remote URLs.
+// payload: a string for an artifact URL e.g. "https://.../foo.jar";
+REMOTE = 2 [(beam_urn) = "beam:artifact:remote:v1"];
+
+// A URN for Python artifacts hosted on PYPI.
+// payload: PypiPayload
+PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"];
+
+// A URN for Java artifacts hosted on Maven central.
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387049)
Time Spent: 5h  (was: 4h 50m)

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=387048&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387048
 ]

ASF GitHub Bot logged work on BEAM-9229:


Author: ASF GitHub Bot
Created on: 14/Feb/20 02:21
Start Date: 14/Feb/20 02:21
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #10733: [BEAM-9229] 
Adding dependency information to Environment proto
URL: https://github.com/apache/beam/pull/10733#discussion_r379221747
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1093,6 +1093,72 @@ message SideInput {
   FunctionSpec window_mapping_fn = 3;
 }
 
+message StandardArtifacts {
+  enum Types {
+// A URN for artifacts stored in a local directory.
+// payload: ArtifactFilePayload.
+FILE = 0 [(beam_urn) = "beam:artifact:file:v1"];
+
+// A URN for artifacts embedded in ArtifactInformation proto.
+// payload: EmbeddedFilePayload.
+EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"];
+
+// A URN for artifacts described by remote URLs.
+// payload: a string for an artifact URL e.g. "https://.../foo.jar";
+REMOTE = 2 [(beam_urn) = "beam:artifact:remote:v1"];
+
+// A URN for Python artifacts hosted on PYPI.
+// payload: PypiPayload
+PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"];
+
+// A URN for Java artifacts hosted on Maven central.
+// payload: MavenPayload
+MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"];
+  }
+}
+
+message ArtifactFilePayload {
+  // A path to an artifact file on a local system.
+  string local_path = 1;
+
+  // A generated staged name (no path).
+  string staged_name = 2;
+}
+
+message EmbeddedFilePayload {
+  // raw data bytes for an embedded artifact
+  bytes data = 1;
+
+  // A generated staged name (no path).
+  string staged_name = 2;
+}
+
+message PypiPayload {
+  // Pypi compatible artifact id e.g. "apache-beam"
+  string pypi_artifact_id = 1;
+
+  // Pypi compatible version string.
+  string pypi_version_range = 2;
+}
+
+message MavenPayload {
+  // Maven compatible group id e.g. "org.apache.beam"
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387048)
Time Spent: 4h 50m  (was: 4h 40m)

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=387046&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387046
 ]

ASF GitHub Bot logged work on BEAM-9229:


Author: ASF GitHub Bot
Created on: 14/Feb/20 02:20
Start Date: 14/Feb/20 02:20
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #10733: [BEAM-9229] 
Adding dependency information to Environment proto
URL: https://github.com/apache/beam/pull/10733#discussion_r379221573
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1093,6 +1093,72 @@ message SideInput {
   FunctionSpec window_mapping_fn = 3;
 }
 
+message StandardArtifacts {
+  enum Types {
+// A URN for artifacts stored in a local directory.
+// payload: ArtifactFilePayload.
+FILE = 0 [(beam_urn) = "beam:artifact:file:v1"];
+
+// A URN for artifacts embedded in ArtifactInformation proto.
+// payload: EmbeddedFilePayload.
+EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"];
+
+// A URN for artifacts described by remote URLs.
+// payload: a string for an artifact URL e.g. "https://.../foo.jar";
+REMOTE = 2 [(beam_urn) = "beam:artifact:remote:v1"];
+
+// A URN for Python artifacts hosted on PYPI.
+// payload: PypiPayload
+PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"];
+
+// A URN for Java artifacts hosted on Maven central.
+// payload: MavenPayload
+MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"];
+  }
+}
+
+message ArtifactFilePayload {
+  // A path to an artifact file on a local system.
+  string local_path = 1;
+
+  // A generated staged name (no path).
+  string staged_name = 2;
+}
+
+message EmbeddedFilePayload {
+  // raw data bytes for an embedded artifact
+  bytes data = 1;
+
+  // A generated staged name (no path).
+  string staged_name = 2;
 
 Review comment:
   Ack.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387046)
Time Spent: 4.5h  (was: 4h 20m)

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=387047&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387047
 ]

ASF GitHub Bot logged work on BEAM-9229:


Author: ASF GitHub Bot
Created on: 14/Feb/20 02:20
Start Date: 14/Feb/20 02:20
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #10733: [BEAM-9229] 
Adding dependency information to Environment proto
URL: https://github.com/apache/beam/pull/10733#discussion_r379221603
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1093,6 +1093,72 @@ message SideInput {
   FunctionSpec window_mapping_fn = 3;
 }
 
+message StandardArtifacts {
+  enum Types {
+// A URN for artifacts stored in a local directory.
+// payload: ArtifactFilePayload.
+FILE = 0 [(beam_urn) = "beam:artifact:file:v1"];
+
+// A URN for artifacts embedded in ArtifactInformation proto.
+// payload: EmbeddedFilePayload.
+EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"];
+
+// A URN for artifacts described by remote URLs.
+// payload: a string for an artifact URL e.g. "https://.../foo.jar";
+REMOTE = 2 [(beam_urn) = "beam:artifact:remote:v1"];
+
+// A URN for Python artifacts hosted on PYPI.
+// payload: PypiPayload
+PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"];
+
+// A URN for Java artifacts hosted on Maven central.
+// payload: MavenPayload
+MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"];
+  }
+}
+
+message ArtifactFilePayload {
+  // A path to an artifact file on a local system.
+  string local_path = 1;
+
+  // A generated staged name (no path).
+  string staged_name = 2;
+}
+
+message EmbeddedFilePayload {
+  // raw data bytes for an embedded artifact
+  bytes data = 1;
+
+  // A generated staged name (no path).
+  string staged_name = 2;
+}
+
+message PypiPayload {
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387047)
Time Spent: 4h 40m  (was: 4.5h)

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9299) Upgrade Flink Runner to 1.8.3 and 1.9.2

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9299?focusedWorklogId=387043&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387043
 ]

ASF GitHub Bot logged work on BEAM-9299:


Author: ASF GitHub Bot
Created on: 14/Feb/20 02:12
Start Date: 14/Feb/20 02:12
Worklog Time Spent: 10m 
  Work Description: sunjincheng121 commented on issue #10850: [BEAM-9299] 
Upgrade Flink Runner from 1.8.2 to 1.8.3
URL: https://github.com/apache/beam/pull/10850#issuecomment-586063831
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387043)
Time Spent: 20m  (was: 10m)

> Upgrade Flink Runner to 1.8.3 and 1.9.2
> ---
>
> Key: BEAM-9299
> URL: https://issues.apache.org/jira/browse/BEAM-9299
> Project: Beam
>  Issue Type: Task
>  Components: runner-flink
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I would like to Upgrade Flink Runner to 18.3 and 1.9.2 due to both the Apache 
> Flink 1.8.3 and Apache Flink 1.9.2 have been released [1]. 
> What do you think?
> [1] https://dist.apache.org/repos/dist/release/flink/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Issue Comment Deleted] (BEAM-9003) test_reshuffle_preserves_timestamps (apache_beam.transforms.util_test.ReshuffleTest) does not work in Streaming VR suite on Dataflow

2020-02-13 Thread Liu Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Wang updated BEAM-9003:
---
Comment: was deleted

(was: Run the test again after removing MAX_TIMESTAMP from the the data set. 
The test passed and the pipeline succeeded.

[https://pantheon.corp.google.com/dataflow/jobs/us-central1/2020-02-13_15_52_16-2857015966414984604?project=google.com:clouddfe]

[https://screenshot.googleplex.com/JHWnHT828eB]

 )

> test_reshuffle_preserves_timestamps 
> (apache_beam.transforms.util_test.ReshuffleTest) does not work in Streaming 
> VR suite on Dataflow
> 
>
> Key: BEAM-9003
> URL: https://issues.apache.org/jira/browse/BEAM-9003
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Liu Wang
>Priority: Major
>
> Per investigation in https://issues.apache.org/jira/browse/BEAM-8877, the 
> test times out and was recently added to VR test suite.
> [~liumomo315], I will sickbay this test for streaming, could you please help 
> triage the failure?
> Thank you!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9003) test_reshuffle_preserves_timestamps (apache_beam.transforms.util_test.ReshuffleTest) does not work in Streaming VR suite on Dataflow

2020-02-13 Thread Liu Wang (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036635#comment-17036635
 ] 

Liu Wang commented on BEAM-9003:


Run the test again after removing MAX_TIMESTAMP from the the data set. The test 
passed and the pipeline succeeded.

[https://pantheon.corp.google.com/dataflow/jobs/us-central1/2020-02-13_15_52_16-2857015966414984604?project=google.com:clouddfe]

[https://screenshot.googleplex.com/JHWnHT828eB]

 

> test_reshuffle_preserves_timestamps 
> (apache_beam.transforms.util_test.ReshuffleTest) does not work in Streaming 
> VR suite on Dataflow
> 
>
> Key: BEAM-9003
> URL: https://issues.apache.org/jira/browse/BEAM-9003
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Liu Wang
>Priority: Major
>
> Per investigation in https://issues.apache.org/jira/browse/BEAM-8877, the 
> test times out and was recently added to VR test suite.
> [~liumomo315], I will sickbay this test for streaming, could you please help 
> triage the failure?
> Thank you!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=387024&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387024
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 14/Feb/20 01:48
Start Date: 14/Feb/20 01:48
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #10375: [BEAM-8537] Provide 
WatermarkEstimator to track watermark
URL: https://github.com/apache/beam/pull/10375#issuecomment-586058417
 
 
   Re: https://github.com/apache/beam/pull/10375#discussion_r379142182
   We still need to check whether `watermark_estimator` is None in the 
`_OutputProcessor.process_output().` because there are other non-sdf dofns 
calling this.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387024)
Time Spent: 15.5h  (was: 15h 20m)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 15.5h
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=387017&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387017
 ]

ASF GitHub Bot logged work on BEAM-8399:


Author: ASF GitHub Bot
Created on: 14/Feb/20 01:17
Start Date: 14/Feb/20 01:17
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10223: [BEAM-8399] Add 
--hdfs_full_urls option
URL: https://github.com/apache/beam/pull/10223#issuecomment-586051119
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387017)
Time Spent: 3h 10m  (was: 3h)

> Python HDFS implementation should support filenames of the format 
> "hdfs://namenodehost/parent/child"
> 
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the 
> correct filename formats for HDFS based on [1] but we currently support 
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both 
> versions by default (based on [2] seems like HDFS does not allow colons in 
> file path so this might be possible) (2) make  
> "hdfs://namenodehost/parent/child" optional for now and change it to default 
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system 
> implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8201) clean up the current container API

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8201?focusedWorklogId=387016&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387016
 ]

ASF GitHub Bot logged work on BEAM-8201:


Author: ASF GitHub Bot
Created on: 14/Feb/20 01:14
Start Date: 14/Feb/20 01:14
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10843: [BEAM-8201] Pass 
all other endpoints through provisioning service.
URL: https://github.com/apache/beam/pull/10843#issuecomment-586050328
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387016)
Time Spent: 50m  (was: 40m)

> clean up the current container API
> --
>
> Key: BEAM-8201
> URL: https://issues.apache.org/jira/browse/BEAM-8201
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> From [~robertwb]
> As part of this project, I propose we look at and clean up the current 
> container API before we "release" it as public and stable. IIRC, we currently 
> provide the worker arguments through a combination of (1) environment 
> variables (2) command line parameters to docker and (3) via the provisioning 
> API. It would be good to have a more principled approach to specifying 
> arguments (either all the same way, or if they vary, good reason for doing so 
> rather than by historical accident).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9250) Improve beam release script based on 2.19.0 release experience

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9250?focusedWorklogId=387011&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387011
 ]

ASF GitHub Bot logged work on BEAM-9250:


Author: ASF GitHub Bot
Created on: 14/Feb/20 01:00
Start Date: 14/Feb/20 01:00
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #10772: [BEAM-9250] 
Re-structure python release candidate target.
URL: https://github.com/apache/beam/pull/10772#issuecomment-586046427
 
 
   Thank you for looking into release improvement! This gives people a lot more 
flexibility to (re)run single test.
   
   However, I have few concerns about the split. Running 16 jobs add extra 
manual work for release manager in triggering and tracking. Also our Jenkins 
cluster has limited slots, run them at same time may increase the waiting queue 
in a short amount of time, especially some mobile-gaming tests require 30+ 
mins. 
   
   I feel it's a really hard trade off. What do you think?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387011)
Time Spent: 2h  (was: 1h 50m)

> Improve beam release script based on 2.19.0 release experience
> --
>
> Key: BEAM-9250
> URL: https://issues.apache.org/jira/browse/BEAM-9250
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=387012&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387012
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 14/Feb/20 01:01
Start Date: 14/Feb/20 01:01
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10731: [BEAM-7926] 
Data-centric Interactive Part3
URL: https://github.com/apache/beam/pull/10731#discussion_r379202827
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/interactive_beam.py
 ##
 @@ -82,7 +89,100 @@ def run_pipeline(self):
   ie.current_env().watch(watchable)
 
 
-def visualize(pcoll):
-  """Visualizes a PCollection."""
-  # TODO(BEAM-7926)
-  pass
+def show(*pcolls):
+  """Visualizes given PCollections in an interactive exploratory way if used
+  within a notebook, or prints a heading sampled data if used within an ipython
+  shell. Noop if used in a non-interactive environment.
+
+  Ad hoc builds a pipeline fragment including only transforms that are
+  necessary to produce data for given PCollections pcolls, runs the pipeline
+  fragment to compute data for those pcolls and then visualizes the data.
+
+  The function is always blocking. If used within a notebook, the data
+  visualized might be dynamically updated before the function returns as more
+  and more data could getting processed and emitted when the pipeline fragment
+  is being executed. If used within an ipython shell, there will be no dynamic
+  plotting but a static plotting in the end of pipeline fragment execution.
+
+  The PCollections given must belong to the same pipeline and be watched by
+  Interactive Beam (PCollections defined in __main__ are automatically 
watched).
+
+For example::
+
+  p = beam.Pipeline(InteractiveRunner())
+  init = p | 'Init' >> beam.Create(range(1000))
+  square = init | 'Square' >> beam.Map(lambda x: x * x)
+  cube = init | 'Cube' >> beam.Map(lambda x: x ** 3)
+
+  # Below builds a pipeline fragment from the defined pipeline `p` that
+  # contains only applied transforms of `Init` and `Square`. Then the
+  # interactive runner runs the pipeline fragment implicitly to compute 
data
+  # represented by PCollection `square` and visualizes it.
+  show(square)
+
+  # This is equivalent to `show(square)` because `square` depends on `init`
+  # and `init` is included in the pipeline fragment and computed anyway.
+  show(init, square)
+
+  # Below is similar to running `p.run()`. It computes data for both
+  # PCollection `square` and PCollection `cube`, then visualizes them.
+  show(square, cube)
+  """
+  assert len(pcolls) > 0, (
+  'Need at least 1 PCollection to show data visualization.')
+  for pcoll in pcolls:
+assert isinstance(pcoll, beam.pvalue.PCollection), (
+'{} is not an apache_beam.pvalue.PCollection.'.format(pcoll))
+  user_pipeline = pcolls[0].pipeline
+  for pcoll in pcolls:
+assert pcoll.pipeline is user_pipeline, (
+'{} belongs to a different user-defined pipeline ({}) than that of'
+' other PCollections ({}).'.format(
+pcoll, pcoll.pipeline, user_pipeline))
+  runner = user_pipeline.runner
+  if isinstance(runner, ir.InteractiveRunner):
+runner = runner._underlying_runner
+
+  # Make sure that all PCollections to be shown are watched. If a PCollection
+  # has not been watched, make up a variable name for that PCollection and 
watch
+  # it. No validation is needed here because the watch logic can handle
+  # arbitrary variables.
+  watched_pcollections = set()
+  for watching in ie.current_env().watching():
+for key, val in watching:
+  if hasattr(val, '__class__') and isinstance(val, 
beam.pvalue.PCollection):
+watched_pcollections.add(val)
+  for pcoll in pcolls:
+if pcoll not in watched_pcollections:
+  watch({re.sub(r'[\[\]\(\)]', '_', str(pcoll)): pcoll})
+
+  # Attempt to run background caching job since we have the reference to the
+  # user-defined pipeline.
+  bcj.attempt_to_run_background_caching_job(runner, user_pipeline)
 
 Review comment:
   This is to discern instances from modules.
   For example, given a `background_caching_job`, it feels more like an 
instance of `BackgroundCachingJob`.
   For `pipeline_instrument`, when importing the module, we normally rename it 
to `instr` or `inst`.
   Because `pipeline_instrument = instr.build_pipeline_instrument(...)`.
   The abbreviation is to avoid name conflicts.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking

[jira] [Work logged] (BEAM-9250) Improve beam release script based on 2.19.0 release experience

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9250?focusedWorklogId=387008&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387008
 ]

ASF GitHub Bot logged work on BEAM-9250:


Author: ASF GitHub Bot
Created on: 14/Feb/20 00:59
Start Date: 14/Feb/20 00:59
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #10772: [BEAM-9250] 
Re-structure python release candidate target.
URL: https://github.com/apache/beam/pull/10772#issuecomment-586046427
 
 
   Thank you for looking into release improvement! This gives people a lot more 
flexibility to (re)run single test.
   
   However, I have few concerns about split the split. Running 16 jobs add 
extra manual work for release manager in triggering and tracking. Also our 
Jenkins cluster has limited slots, run them at same time may increase the 
waiting queue, especially some mobile-gaming tests require 30+ mins. 
   
   I feel it's a really hard trade off. What do you think?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387008)
Time Spent: 1h 40m  (was: 1.5h)

> Improve beam release script based on 2.19.0 release experience
> --
>
> Key: BEAM-9250
> URL: https://issues.apache.org/jira/browse/BEAM-9250
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9250) Improve beam release script based on 2.19.0 release experience

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9250?focusedWorklogId=387009&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387009
 ]

ASF GitHub Bot logged work on BEAM-9250:


Author: ASF GitHub Bot
Created on: 14/Feb/20 00:59
Start Date: 14/Feb/20 00:59
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #10772: [BEAM-9250] 
Re-structure python release candidate target.
URL: https://github.com/apache/beam/pull/10772#issuecomment-586046427
 
 
   Thank you for looking into release improvement! This gives people a lot more 
flexibility to (re)run single test.
   
   However, I have few concerns about the split. Running 16 jobs add extra 
manual work for release manager in triggering and tracking. Also our Jenkins 
cluster has limited slots, run them at same time may increase the waiting 
queue, especially some mobile-gaming tests require 30+ mins. 
   
   I feel it's a really hard trade off. What do you think?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387009)
Time Spent: 1h 50m  (was: 1h 40m)

> Improve beam release script based on 2.19.0 release experience
> --
>
> Key: BEAM-9250
> URL: https://issues.apache.org/jira/browse/BEAM-9250
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=387010&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387010
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 14/Feb/20 00:59
Start Date: 14/Feb/20 00:59
Worklog Time Spent: 10m 
  Work Description: veblush commented on issue #10857: [BEAM-8889] Upgrade 
guava to 28.0-jre
URL: https://github.com/apache/beam/pull/10857#issuecomment-586046533
 
 
   
[beam-linkage-check](https://gist.github.com/veblush/fb47ada6e671e87cb1002e170f17db9c)
 showed no difference.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387010)
Remaining Estimate: 151h 40m  (was: 151h 50m)
Time Spent: 16h 20m  (was: 16h 10m)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 16h 20m
>  Remaining Estimate: 151h 40m
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=387007&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387007
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 14/Feb/20 00:58
Start Date: 14/Feb/20 00:58
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10731: [BEAM-7926] 
Data-centric Interactive Part3
URL: https://github.com/apache/beam/pull/10731#discussion_r379202106
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/pipeline_fragment.py
 ##
 @@ -100,17 +100,23 @@ def deduce_fragment(self):
 self._runner_pipeline.runner,
 self._options)
 
-  def run(self, display_pipeline_graph=False, use_cache=True):
+  def run(self,
+  display_pipeline_graph=False,
+  use_cache=True,
+  blocking_run=False):
 
 Review comment:
   Renamed `blocking_run` to `blocking`.
   Renamed all `try-finally` preserve-and-reset variables with `preserved_` 
prefix.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387007)
Time Spent: 46h 40m  (was: 46.5h)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 46h 40m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=387006&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387006
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 14/Feb/20 00:57
Start Date: 14/Feb/20 00:57
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10731: [BEAM-7926] 
Data-centric Interactive Part3
URL: https://github.com/apache/beam/pull/10731#discussion_r379201741
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/interactive_beam_test.py
 ##
 @@ -67,6 +77,31 @@ def test_watch_class_instance(self):
 test_env.watch(self)
 self.assertEqual(ie.current_env().watching(), test_env.watching())
 
+  def test_show_always_watch_given_pcolls(self):
+p = beam.Pipeline(ir.InteractiveRunner())
+# pylint: disable=range-builtin-not-iterating
+pcoll = p | 'Create' >> beam.Create(range(10))
+# The pcoll is not watched since watch(locals()) is not explicitly called.
+self.assertFalse(
+pcoll in _get_watched_pcollections_with_variable_names())
+# The call of show watches pcoll.
+ib.show(pcoll)
+self.assertTrue(
+pcoll in _get_watched_pcollections_with_variable_names())
+# The name of pcoll is made up by show.
+self.assertEqual(
+'PCollection_Create/Map_decode_.None_',
 
 Review comment:
   Removed the str assertion. Kept the existence check.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387006)
Time Spent: 46.5h  (was: 46h 20m)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 46.5h
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
>  
> {code:java}
> p = beam.Pipeline(InteractiveRunner())
> pcoll = p | 'Transform' >> transform()
> pcoll2 = ...
> pcoll3 = ...{code}
> The use can call a single function and get auto-magical charting of the data.
> e.g.,
> {code:java}
> show(pcoll, pcoll2)
> {code}
> Throughout the process, a pipeline fragment is built to include only 
> transforms necessary to produce the desired pcolls (pcoll and pcoll2) and 
> execute that fragment.
> This makes the Interactive Beam user flow data-centric.
>  
> Detailed 
> [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=387005&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387005
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 14/Feb/20 00:56
Start Date: 14/Feb/20 00:56
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10731: [BEAM-7926] 
Data-centric Interactive Part3
URL: https://github.com/apache/beam/pull/10731#discussion_r379201648
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/interactive_beam.py
 ##
 @@ -82,7 +89,100 @@ def run_pipeline(self):
   ie.current_env().watch(watchable)
 
 
-def visualize(pcoll):
-  """Visualizes a PCollection."""
-  # TODO(BEAM-7926)
-  pass
+def show(*pcolls):
+  """Visualizes given PCollections in an interactive exploratory way if used
+  within a notebook, or prints a heading sampled data if used within an ipython
+  shell. Noop if used in a non-interactive environment.
+
+  Ad hoc builds a pipeline fragment including only transforms that are
+  necessary to produce data for given PCollections pcolls, runs the pipeline
+  fragment to compute data for those pcolls and then visualizes the data.
+
+  The function is always blocking. If used within a notebook, the data
+  visualized might be dynamically updated before the function returns as more
+  and more data could getting processed and emitted when the pipeline fragment
+  is being executed. If used within an ipython shell, there will be no dynamic
+  plotting but a static plotting in the end of pipeline fragment execution.
+
+  The PCollections given must belong to the same pipeline and be watched by
+  Interactive Beam (PCollections defined in __main__ are automatically 
watched).
+
+For example::
+
+  p = beam.Pipeline(InteractiveRunner())
+  init = p | 'Init' >> beam.Create(range(1000))
+  square = init | 'Square' >> beam.Map(lambda x: x * x)
+  cube = init | 'Cube' >> beam.Map(lambda x: x ** 3)
+
+  # Below builds a pipeline fragment from the defined pipeline `p` that
+  # contains only applied transforms of `Init` and `Square`. Then the
+  # interactive runner runs the pipeline fragment implicitly to compute 
data
+  # represented by PCollection `square` and visualizes it.
+  show(square)
+
+  # This is equivalent to `show(square)` because `square` depends on `init`
+  # and `init` is included in the pipeline fragment and computed anyway.
+  show(init, square)
+
+  # Below is similar to running `p.run()`. It computes data for both
+  # PCollection `square` and PCollection `cube`, then visualizes them.
+  show(square, cube)
+  """
+  assert len(pcolls) > 0, (
+  'Need at least 1 PCollection to show data visualization.')
+  for pcoll in pcolls:
+assert isinstance(pcoll, beam.pvalue.PCollection), (
+'{} is not an apache_beam.pvalue.PCollection.'.format(pcoll))
+  user_pipeline = pcolls[0].pipeline
+  for pcoll in pcolls:
+assert pcoll.pipeline is user_pipeline, (
+'{} belongs to a different user-defined pipeline ({}) than that of'
+' other PCollections ({}).'.format(
+pcoll, pcoll.pipeline, user_pipeline))
+  runner = user_pipeline.runner
+  if isinstance(runner, ir.InteractiveRunner):
+runner = runner._underlying_runner
+
+  # Make sure that all PCollections to be shown are watched. If a PCollection
+  # has not been watched, make up a variable name for that PCollection and 
watch
+  # it. No validation is needed here because the watch logic can handle
+  # arbitrary variables.
+  watched_pcollections = set()
+  for watching in ie.current_env().watching():
+for key, val in watching:
+  if hasattr(val, '__class__') and isinstance(val, 
beam.pvalue.PCollection):
+watched_pcollections.add(val)
+  for pcoll in pcolls:
+if pcoll not in watched_pcollections:
+  watch({re.sub(r'[\[\]\(\)]', '_', str(pcoll)): pcoll})
+
+  # Attempt to run background caching job since we have the reference to the
+  # user-defined pipeline.
+  bcj.attempt_to_run_background_caching_job(runner, user_pipeline)
+
+  # Build a pipeline fragment for the PCollections and run it.
+  result = pf.PipelineFragment(list(pcolls)).run()
+  ie.current_env().set_pipeline_result(
+  user_pipeline,
+  result,
+  is_main_job=True)
+
+  # If in notebook, dynamic plotting as computation goes.
+  if ie.current_env().is_in_notebook:
+for pcoll in pcolls:
+  visualize(pcoll, dynamic_plotting_interval=1)
 
 Review comment:
   Yes, its unit is second. The underlying implementation uses a 
`datetime.timedelta`. This is to simplify the `visualize` interface and its 
usages.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to

[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=387002&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387002
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 14/Feb/20 00:54
Start Date: 14/Feb/20 00:54
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10835: [BEAM-8575] Removed 
MAX_TIMESTAMP from testing data
URL: https://github.com/apache/beam/pull/10835#issuecomment-586045114
 
 
   This looks good to me. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387002)
Time Spent: 51.5h  (was: 51h 20m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 51.5h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=387003&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387003
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 14/Feb/20 00:54
Start Date: 14/Feb/20 00:54
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10731: [BEAM-7926] 
Data-centric Interactive Part3
URL: https://github.com/apache/beam/pull/10731#discussion_r379200933
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/interactive_beam.py
 ##
 @@ -82,7 +89,100 @@ def run_pipeline(self):
   ie.current_env().watch(watchable)
 
 
-def visualize(pcoll):
-  """Visualizes a PCollection."""
-  # TODO(BEAM-7926)
-  pass
+def show(*pcolls):
+  """Visualizes given PCollections in an interactive exploratory way if used
+  within a notebook, or prints a heading sampled data if used within an ipython
+  shell. Noop if used in a non-interactive environment.
+
+  Ad hoc builds a pipeline fragment including only transforms that are
+  necessary to produce data for given PCollections pcolls, runs the pipeline
+  fragment to compute data for those pcolls and then visualizes the data.
+
+  The function is always blocking. If used within a notebook, the data
+  visualized might be dynamically updated before the function returns as more
+  and more data could getting processed and emitted when the pipeline fragment
+  is being executed. If used within an ipython shell, there will be no dynamic
+  plotting but a static plotting in the end of pipeline fragment execution.
+
+  The PCollections given must belong to the same pipeline and be watched by
+  Interactive Beam (PCollections defined in __main__ are automatically 
watched).
+
+For example::
+
+  p = beam.Pipeline(InteractiveRunner())
+  init = p | 'Init' >> beam.Create(range(1000))
+  square = init | 'Square' >> beam.Map(lambda x: x * x)
+  cube = init | 'Cube' >> beam.Map(lambda x: x ** 3)
+
+  # Below builds a pipeline fragment from the defined pipeline `p` that
+  # contains only applied transforms of `Init` and `Square`. Then the
+  # interactive runner runs the pipeline fragment implicitly to compute 
data
+  # represented by PCollection `square` and visualizes it.
+  show(square)
+
+  # This is equivalent to `show(square)` because `square` depends on `init`
+  # and `init` is included in the pipeline fragment and computed anyway.
+  show(init, square)
+
+  # Below is similar to running `p.run()`. It computes data for both
+  # PCollection `square` and PCollection `cube`, then visualizes them.
+  show(square, cube)
+  """
+  assert len(pcolls) > 0, (
+  'Need at least 1 PCollection to show data visualization.')
+  for pcoll in pcolls:
+assert isinstance(pcoll, beam.pvalue.PCollection), (
+'{} is not an apache_beam.pvalue.PCollection.'.format(pcoll))
+  user_pipeline = pcolls[0].pipeline
+  for pcoll in pcolls:
+assert pcoll.pipeline is user_pipeline, (
+'{} belongs to a different user-defined pipeline ({}) than that of'
+' other PCollections ({}).'.format(
+pcoll, pcoll.pipeline, user_pipeline))
+  runner = user_pipeline.runner
+  if isinstance(runner, ir.InteractiveRunner):
+runner = runner._underlying_runner
+
+  # Make sure that all PCollections to be shown are watched. If a PCollection
+  # has not been watched, make up a variable name for that PCollection and 
watch
+  # it. No validation is needed here because the watch logic can handle
+  # arbitrary variables.
+  watched_pcollections = set()
+  for watching in ie.current_env().watching():
+for key, val in watching:
+  if hasattr(val, '__class__') and isinstance(val, 
beam.pvalue.PCollection):
+watched_pcollections.add(val)
+  for pcoll in pcolls:
+if pcoll not in watched_pcollections:
+  watch({re.sub(r'[\[\]\(\)]', '_', str(pcoll)): pcoll})
 
 Review comment:
   Removed the `must be watched` statement from the docstrings.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387003)
Time Spent: 46h 10m  (was: 46h)

> Show PCollection with Interactive Beam in a data-centric user flow
> --
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>

[jira] [Work logged] (BEAM-9313) beam_PostRelease_NightlySnapshot failure due to ClassNotFoundException: org.apache.beam.model.pipeline.v1.StandardWindowFns$SessionsPayload$Enum

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9313?focusedWorklogId=387004&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387004
 ]

ASF GitHub Bot logged work on BEAM-9313:


Author: ASF GitHub Bot
Created on: 14/Feb/20 00:54
Start Date: 14/Feb/20 00:54
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #10861: 
[BEAM-9313] Bump dataflow container version
URL: https://github.com/apache/beam/pull/10861
 
 
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Pyt

[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=387001&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387001
 ]

ASF GitHub Bot logged work on BEAM-9229:


Author: ASF GitHub Bot
Created on: 14/Feb/20 00:52
Start Date: 14/Feb/20 00:52
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10733: [BEAM-9229] 
Adding dependency information to Environment proto
URL: https://github.com/apache/beam/pull/10733#discussion_r379187541
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1093,6 +1093,72 @@ message SideInput {
   FunctionSpec window_mapping_fn = 3;
 }
 
+message StandardArtifacts {
+  enum Types {
+// A URN for artifacts stored in a local directory.
+// payload: ArtifactFilePayload.
+FILE = 0 [(beam_urn) = "beam:artifact:file:v1"];
+
+// A URN for artifacts embedded in ArtifactInformation proto.
+// payload: EmbeddedFilePayload.
+EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"];
+
+// A URN for artifacts described by remote URLs.
+// payload: a string for an artifact URL e.g. "https://.../foo.jar";
+REMOTE = 2 [(beam_urn) = "beam:artifact:remote:v1"];
 
 Review comment:
   I would call this one URL rather than REMOTE as most of these are remote. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387001)
Time Spent: 4h 20m  (was: 4h 10m)

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=386999&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386999
 ]

ASF GitHub Bot logged work on BEAM-9229:


Author: ASF GitHub Bot
Created on: 14/Feb/20 00:52
Start Date: 14/Feb/20 00:52
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10733: [BEAM-9229] 
Adding dependency information to Environment proto
URL: https://github.com/apache/beam/pull/10733#discussion_r379198236
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1087,6 +1087,44 @@ message SideInput {
   FunctionSpec window_mapping_fn = 3;
 }
 
+message StandardArtifacts {
+  enum Types {
+// A URN for artifacts stored in a local directory.
+// payload: ArtifactFilePayload.
+FILE = 0 [(beam_urn) = "beam:artifact:file:v1"];
+// A URN for artifacts embedded in ArtifactInformation proto.
+// payload: raw data bytes.
+EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"];
+// A URN for artifacts described by HTTP links.
+// payload: a string for an artifact HTTP URL
+HTTP = 2 [(beam_urn) = "beam:artifact:http:v1"];
+// A URN for artifacts hosted on PYPI.
+// artifact_id: a PYPI project name
+// version_range: a PYPI compatible version string
+// payload: None
+PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"];
+// A URN for artifacts hosted on Maven central.
+// artifact_id: [maven group id]:[maven artifact id]
+// version_range: a Maven compatible version string
+// payload: None
+MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"];
+  }
+}
+
+message ArtifactFilePayload {
+  // A path to an artifact file on a local system.
+  string local_path = 1;
+  // A generated staged name (no path).
+  string staged_name = 2;
+}
+
+message ArtifactInformation {
+  string urn = 1;
+  bytes payload = 2;
+  string artifact_id = 3;
+  string version_range = 4;
+}
 
 Review comment:
   Role would be an opaque string. (Well, likely a URN, but the runner wouldn't 
do anything about it.) It would be provided by, and consumed by, the SDK. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386999)
Time Spent: 4h 10m  (was: 4h)

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=387000&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387000
 ]

ASF GitHub Bot logged work on BEAM-9229:


Author: ASF GitHub Bot
Created on: 14/Feb/20 00:52
Start Date: 14/Feb/20 00:52
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10733: [BEAM-9229] 
Adding dependency information to Environment proto
URL: https://github.com/apache/beam/pull/10733#discussion_r379187475
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1087,6 +1087,44 @@ message SideInput {
   FunctionSpec window_mapping_fn = 3;
 }
 
+message StandardArtifacts {
+  enum Types {
+// A URN for artifacts stored in a local directory.
+// payload: ArtifactFilePayload.
+FILE = 0 [(beam_urn) = "beam:artifact:file:v1"];
+// A URN for artifacts embedded in ArtifactInformation proto.
+// payload: raw data bytes.
+EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"];
 
 Review comment:
   Yes, I'm saying that we should have a new type here for that. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 387000)

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=386998&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386998
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 14/Feb/20 00:50
Start Date: 14/Feb/20 00:50
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10857: [BEAM-8889] Upgrade 
guava to 28.0-jre
URL: https://github.com/apache/beam/pull/10857#issuecomment-586044332
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386998)
Remaining Estimate: 151h 50m  (was: 152h)
Time Spent: 16h 10m  (was: 16h)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 16h 10m
>  Remaining Estimate: 151h 50m
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-1833) Restructure Python pipeline construction to better follow the Runner API

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-1833?focusedWorklogId=386997&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386997
 ]

ASF GitHub Bot logged work on BEAM-1833:


Author: ASF GitHub Bot
Created on: 14/Feb/20 00:46
Start Date: 14/Feb/20 00:46
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #10860: 
[BEAM-1833] Fixes BEAM-1833
URL: https://github.com/apache/beam/pull/10860
 
 
   * This changes the "add_output" interface to require a PCollection tag
   when adding an output to a PTransform.
   * This also changes the replacement algorithm's to propogate the
   PCollection tag when doing replacements.
   * This also moves the DirectRunner's TestStream implementation to a
   replacement transform. This is because the TestStream relies on getting
   the output_tags from the PTransform.
   
   Change-Id: Ibd80b0d25cd8cc5ff5c28e127f7313638e6664da
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastComp

[jira] [Work logged] (BEAM-9280) Update commons-compress to version 1.20

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9280?focusedWorklogId=386996&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386996
 ]

ASF GitHub Bot logged work on BEAM-9280:


Author: ASF GitHub Bot
Created on: 14/Feb/20 00:45
Start Date: 14/Feb/20 00:45
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10817: [BEAM-9280] Update 
commons-compress to version 1.20
URL: https://github.com/apache/beam/pull/10817#issuecomment-586043136
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386996)
Time Spent: 2h 20m  (was: 2h 10m)

> Update commons-compress to version 1.20
> ---
>
> Key: BEAM-9280
> URL: https://issues.apache.org/jira/browse/BEAM-9280
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.20.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9250) Improve beam release script based on 2.19.0 release experience

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9250?focusedWorklogId=386995&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386995
 ]

ASF GitHub Bot logged work on BEAM-9250:


Author: ASF GitHub Bot
Created on: 14/Feb/20 00:42
Start Date: 14/Feb/20 00:42
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #10772: 
[BEAM-9250] Re-structure python release candidate target.
URL: https://github.com/apache/beam/pull/10772#discussion_r379197751
 
 

 ##
 File path: 
.test-infra/jenkins/job_ReleaseCandidate_DataflowRunner_MobileGame_Py2.groovy
 ##
 @@ -0,0 +1,37 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import CommonJobProperties as commonJobProperties
+
+job('beam_PostRelease_Python2_Candidate_MobileGame_Dataflow') {
+description('Runs mobile game verification of the Python2 release 
candidate with dataflow runner by using tar and wheel.')
+
+// Set common parameters.
+commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 360)
+
+// Allows triggering this build against pull requests.
+commonJobProperties.enablePhraseTriggeringFromPullRequest(
+delegate,
+'Run Py2 ReleaseCandidate Dataflow MobileGame')
+
+// Execute shell command to test Python SDK.
+steps {
+shell('cd ' + commonJobProperties.checkoutDir +
+' && bash 
release/src/main/python-release/python_release_automation.sh 2.7 dataflow 
mobile_game')
+}
+}
 
 Review comment:
   please add an empty line at the end.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386995)
Time Spent: 1.5h  (was: 1h 20m)

> Improve beam release script based on 2.19.0 release experience
> --
>
> Key: BEAM-9250
> URL: https://issues.apache.org/jira/browse/BEAM-9250
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=386994&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386994
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 14/Feb/20 00:41
Start Date: 14/Feb/20 00:41
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10856: [BEAM-8335] 
Update StreamingCache with new Protos
URL: https://github.com/apache/beam/pull/10856#discussion_r379196764
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py
 ##
 @@ -79,23 +76,23 @@ def _test_stream_events_before_target(self, 
target_timestamp):
 if self._stream_times[tag] >= target_timestamp:
   continue
 try:
-  record = next(r)
-  records.append((tag, record))
-  self._stream_times[tag] = 
Timestamp.from_proto(record.processing_time)
+  record = next(r).recorded_event
+  if record.HasField('processing_time_event'):
+self._stream_times[tag] += timestamp.Duration(
+micros=record.processing_time_event.advance_duration)
+  records.append((tag, record, self._stream_times[tag]))
 
 Review comment:
   this would be much more readable if we used `typing.NamedTuple`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386994)
Time Spent: 58h 40m  (was: 58.5h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 58h 40m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=386993&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386993
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 14/Feb/20 00:39
Start Date: 14/Feb/20 00:39
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10856: [BEAM-8335] Update 
StreamingCache with new Protos
URL: https://github.com/apache/beam/pull/10856#issuecomment-586041696
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386993)
Time Spent: 58.5h  (was: 58h 20m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 58.5h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=386986&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386986
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 14/Feb/20 00:23
Start Date: 14/Feb/20 00:23
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #10856: [BEAM-8335] 
Update StreamingCache with new Protos
URL: https://github.com/apache/beam/pull/10856#issuecomment-586037668
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386986)
Time Spent: 58h 20m  (was: 58h 10m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 58h 20m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=386985&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386985
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 14/Feb/20 00:20
Start Date: 14/Feb/20 00:20
Worklog Time Spent: 10m 
  Work Description: veblush commented on pull request #10857: [BEAM-8889] 
Upgrade guava to 28.0-jre
URL: https://github.com/apache/beam/pull/10857
 
 
   This is a follow-up of #10769 about guava upgrade. Technically this is not 
required but it'd make a better dependency of beam by providing missing symbols 
to gcsio. Although these missing symbols are not being used through Beam as of 
today, chances are that it could cause unexpected problems in future.
   
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=386984&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386984
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 14/Feb/20 00:19
Start Date: 14/Feb/20 00:19
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on issue #10856: [BEAM-8335] 
Update StreamingCache with new Protos
URL: https://github.com/apache/beam/pull/10856#issuecomment-586036490
 
 
   Looks like you need to run yapf? 
https://cwiki.apache.org/confluence/display/BEAM/Python+Tips
   27
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386984)
Time Spent: 58h 10m  (was: 58h)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 58h 10m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9228) _SDFBoundedSourceWrapper doesn't distribute data to multiple workers

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9228?focusedWorklogId=386983&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386983
 ]

ASF GitHub Bot logged work on BEAM-9228:


Author: ASF GitHub Bot
Created on: 14/Feb/20 00:18
Start Date: 14/Feb/20 00:18
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #10847: 
[BEAM-9228] Support further partition for FnApi ListBuffer
URL: https://github.com/apache/beam/pull/10847#discussion_r379191540
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py
 ##
 @@ -907,6 +961,12 @@ def get_buffer(buffer_id):
   if kind in ('materialize', 'timers'):
 # If `buffer_id` is not a key in `pcoll_buffers`, it will be added by
 # the `defaultdict`.
+if buffer_id not in pcoll_buffers:
+  coder_id = beam_fn_api_pb2.RemoteGrpcPort.FromString(
+  process_bundle_descriptor.transforms[transform_id].spec.payload
+  ).coder_id
+  coder = context.coders[coder_id]
 
 Review comment:
   I have checked code for safe_coders, but still don't have clear ideas about 
it. What is the purpose of it? How do we handle for some coder_ids not part of 
safe_coders? Previously, I tried using safe_coders if the coder_id is in 
safe_coder, else return `coders[coder_id]`. This also pass tests, is it an 
option here?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386983)
Time Spent: 1h 40m  (was: 1.5h)

> _SDFBoundedSourceWrapper doesn't distribute data to multiple workers
> 
>
> Key: BEAM-9228
> URL: https://issues.apache.org/jira/browse/BEAM-9228
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.16.0, 2.18.0, 2.19.0
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> A user reported following issue.
> -
> I have a set of tfrecord files, obtained by converting parquet files with 
> Spark. Each file is roughly 1GB and I have 11 of those.
> I would expect simple statistics gathering (ie counting number of items of 
> all files) to scale linearly with respect to the number of cores on my system.
> I am able to reproduce the issue with the minimal snippet below
> {code:java}
> import apache_beam as beam
> from apache_beam.options.pipeline_options import PipelineOptions
> from apache_beam.runners.portability import fn_api_runner
> from apache_beam.portability.api import beam_runner_api_pb2
> from apache_beam.portability import python_urns
> import sys
> pipeline_options = PipelineOptions(['--direct_num_workers', '4'])
> file_pattern = 'part-r-00*
> runner=fn_api_runner.FnApiRunner(
>   default_environment=beam_runner_api_pb2.Environment(
>   urn=python_urns.SUBPROCESS_SDK,
>   payload=b'%s -m apache_beam.runners.worker.sdk_worker_main'
> % sys.executable.encode('ascii')))
> p = beam.Pipeline(runner=runner, options=pipeline_options)
> lines = (p | 'read' >> beam.io.tfrecordio.ReadFromTFRecord(file_pattern)
>  | beam.combiners.Count.Globally()
>  | beam.io.WriteToText('/tmp/output'))
> p.run()
> {code}
> Only one combination of apache_beam revision / worker type seems to work (I 
> refer to https://beam.apache.org/documentation/runners/direct/ for the worker 
> types)
> * beam 2.16; neither multithread nor multiprocess achieve high cpu usage on 
> multiple cores
> * beam 2.17: able to achieve high cpu usage on all 4 cores
> * beam 2.18: not tested the mulithreaded mode but the multiprocess mode fails 
> when trying to serialize the Environment instance most likely because of a 
> change from 2.17 to 2.18.
> I also tried briefly SparkRunner with version 2.16 but was no able to achieve 
> any throughput.
> What is the recommnended way to achieve what I am trying to ? How can I 
> troubleshoot ?
> --
> This is caused by [this 
> PR|https://github.com/apache/beam/commit/02f8ad4eee3ec0ea8cbdc0f99c1dad29f00a9f60].
> A [workaround|https://github.com/apache/beam/pull/10729] is tried, which is 
> rolling back iobase.py not to use _SDFBoundedSourceWra

[jira] [Created] (BEAM-9313) beam_PostRelease_NightlySnapshot failure due to ClassNotFoundException: org.apache.beam.model.pipeline.v1.StandardWindowFns$SessionsPayload$Enum

2020-02-13 Thread Brian Hulette (Jira)

Brian Hulette created BEAM-9313:
---

 Summary: beam_PostRelease_NightlySnapshot failure due to 
ClassNotFoundException: 
org.apache.beam.model.pipeline.v1.StandardWindowFns$SessionsPayload$Enum
 Key: BEAM-9313
 URL: https://issues.apache.org/jira/browse/BEAM-9313
 Project: Beam
  Issue Type: Bug
  Components: test-failures
Reporter: Brian Hulette
Assignee: Brian Hulette


Jenkins: https://builds.apache.org/job/beam_PostRelease_NightlySnapshot/885/
Gradle: https://scans.gradle.com/s/wbwr4nzluxtlc

:runners:google-cloud-dataflow-java:runQuickstartJavaDataflow and 
:runners:google-cloud-dataflow-java:runMobileGamingJavaDataflow are broken with 
ClassDefNotFound errors like:

{code}
INFO: 2020-02-08T11:05:44.038Z: Finished operation 
WordCount.CountWords/Count.PerElement/Combine.perKey(Count)/GroupByKey/Close 
Feb 08, 2020 11:05:45 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
INFO: 2020-02-08T11:05:44.119Z: Executing operation 
WordCount.CountWords/Count.PerElement/Combine.perKey(Count)/GroupByKey/Read+WordCount.CountWords/Count.PerElement/Combine.perKey(Count)/Combine.GroupedValues+WordCount.CountWords/Count.PerElement/Combine.perKey(Count)/Combine.GroupedValues/Extract+MapElements/Map+WriteCounts/WriteFiles/RewindowIntoGlobal/Window.Assign+WriteCounts/WriteFiles/WriteUnshardedBundlesToTempFiles/WriteUnshardedBundles+WriteCounts/WriteFiles/GatherTempFileResults/View.AsList/ParDo(ToIsmRecordForGlobalWindow)+WriteCounts/WriteFiles/WriteUnshardedBundlesToTempFiles/GroupUnwritten/Reify+WriteCounts/WriteFiles/WriteUnshardedBundlesToTempFiles/GroupUnwritten/Write
  
Feb 08, 2020 11:05:47 AM 
org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
SEVERE: 2020-02-08T11:05:46.096Z: java.lang.NoClassDefFoundError: 
org/apache/beam/model/pipeline/v1/StandardWindowFns$SessionsPayload$Enum  
at 
org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.construction.WindowingStrategyTranslation.(WindowingStrategyTranslation.java:211)

at 
org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowParDoFnFactory.deserializeWindowingStrategy(GroupAlsoByWindowParDoFnFactory.java:234)
   
at 
org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowParDoFnFactory.create(GroupAlsoByWindowParDoFnFactory.java:99)
  
at 
org.apache.beam.runners.dataflow.worker.DefaultParDoFnFactory.create(DefaultParDoFnFactory.java:75)
  
at 
org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.createParDoOperation(IntrinsicMapTaskExecutorFactory.java:264)
   
at 
org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.access$000(IntrinsicMapTaskExecutorFactory.java:86)
  
at 
org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$1.typedApply(IntrinsicMapTaskExecutorFactory.java:183)
   
at 
org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$1.typedApply(IntrinsicMapTaskExecutorFactory.java:165)
   
at 
org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:63)
  
at 
org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:50)
  
at 
org.apache.beam.runners.dataflow.worker.graph.Networks.replaceDirectedNetworkNodes(Networks.java:87)
 
at 
org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.create(IntrinsicMapTaskExecutorFactory.java:125)
 
at 
org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:353)
 
at 
org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:306)
  
at 
org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140)
  
at 
org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120)

at 
org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107)

at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
 
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: 
org.apache.beam.model.pipeline.v1.StandardWindowFns$SessionsPayload$Enum   
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)   
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)

[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=386979&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386979
 ]

ASF GitHub Bot logged work on BEAM-8561:


Author: ASF GitHub Bot
Created on: 14/Feb/20 00:11
Start Date: 14/Feb/20 00:11
Worklog Time Spent: 10m 
  Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] 
Add ThriftIO to support IO for Thrift files
URL: https://github.com/apache/beam/pull/10290#discussion_r379189609
 
 

 ##
 File path: 
sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java
 ##
 @@ -0,0 +1,289 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.thrift;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull;
+
+import com.google.auto.value.AutoValue;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.coders.Coder;
+import org.apache.beam.sdk.io.FileIO;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.PTransform;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.display.DisplayData;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.thrift.TBase;
+import org.apache.thrift.TException;
+import org.apache.thrift.protocol.TProtocol;
+import org.apache.thrift.protocol.TProtocolFactory;
+import org.apache.thrift.transport.TIOStreamTransport;
+import org.apache.thrift.transport.TTransportException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * {@link PTransform}s for reading and writing files containing Thrift encoded 
data.
+ *
+ * Reading Thrift Files
+ *
+ * For simple reading, use {@link ThriftIO#} with the desired file pattern 
to read from.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection examples = 
pipeline.apply(ThriftIO.read().from("/foo/bar/*"));
+ * ...
+ * }
+ *
+ * For more advanced use cases, like reading each file in a {@link 
PCollection} of {@link
+ * FileIO.ReadableFile}, use the {@link ReadFiles} transform.
+ *
+ * For example:
+ *
+ * {@code
+ * PCollection files = pipeline
+ *   .apply(FileIO.match().filepattern(options.getInputFilepattern())
+ *   .apply(FileIO.readMatches());
+ *
+ * PCollection examples = 
files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto);
+ * }
+ *
+ * Writing Thrift Files
+ *
+ * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} 
to be written to
+ * Thrift files. It can be used with the general-purpose {@link FileIO} 
transforms with
+ * FileIO.write/writeDynamic specifically.
+ *
+ * For example:
+ *
+ * {@code
+ * pipeline
+ *   .apply(...) // PCollection
+ *   .apply(FileIO
+ * .write()
+ * .via(ThriftIO.sink(thriftProto))
+ * .to("destination/path");
+ * }
+ *
+ * This IO API is considered experimental and may break or receive 
backwards-incompatible changes
+ * in future versions of the Apache Beam SDK.
+ */
+@Experimental(Experimental.Kind.SOURCE_SINK)
+public class ThriftIO {
+
+  private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class);
+
+  /** Disable construction of utility class. */
+  private ThriftIO() {}
+
+  /**
+   * Reads each file in a {@link PCollection} of {@link 
org.apache.beam.sdk.io.FileIO.ReadableFile},
+   * which allows more flexible usage.
+   */
+  public static  ReadFiles readFiles(Class recordClass) {
+return new 
AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build();
+  }
+
+  
//
+
+  /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link 
FileIO#writeDynamic}. */
+  public static > Sink si

[jira] [Work logged] (BEAM-9212) ZetaSQL structs always cause exception

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9212?focusedWorklogId=386978&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386978
 ]

ASF GitHub Bot logged work on BEAM-9212:


Author: ASF GitHub Bot
Created on: 14/Feb/20 00:02
Start Date: 14/Feb/20 00:02
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10707: [BEAM-9212] fix 
zetasql struct exception
URL: https://github.com/apache/beam/pull/10707#issuecomment-586031921
 
 
   Rebased this. Struct parameters still unusable however bc of 
https://issues.apache.org/jira/browse/BEAM-9300
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386978)
Time Spent: 1h  (was: 50m)

> ZetaSQL structs always cause exception
> --
>
> Key: BEAM-9212
> URL: https://issues.apache.org/jira/browse/BEAM-9212
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql-zetasql
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Because of an unimplemented method in ZetaSQL [1], StructType.toString always 
> throws an exception. This is bad because we always call toString implicitly 
> in a String.format [2].
> [1] https://github.com/google/zetasql/issues/24
> [2] 
> https://github.com/apache/beam/blob/b02a325409d55f1ecb7f9fb6ecc4f60a974c810d/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ExpressionConverter.java#L1010-L1012



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=386977&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386977
 ]

ASF GitHub Bot logged work on BEAM-9229:


Author: ASF GitHub Bot
Created on: 14/Feb/20 00:02
Start Date: 14/Feb/20 00:02
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10733: [BEAM-9229] 
Adding dependency information to Environment proto
URL: https://github.com/apache/beam/pull/10733#discussion_r379186995
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1093,6 +1093,72 @@ message SideInput {
   FunctionSpec window_mapping_fn = 3;
 }
 
+message StandardArtifacts {
+  enum Types {
+// A URN for artifacts stored in a local directory.
+// payload: ArtifactFilePayload.
+FILE = 0 [(beam_urn) = "beam:artifact:file:v1"];
+
+// A URN for artifacts embedded in ArtifactInformation proto.
+// payload: EmbeddedFilePayload.
+EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"];
+
+// A URN for artifacts described by remote URLs.
+// payload: a string for an artifact URL e.g. "https://.../foo.jar";
+REMOTE = 2 [(beam_urn) = "beam:artifact:remote:v1"];
+
+// A URN for Python artifacts hosted on PYPI.
+// payload: PypiPayload
+PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"];
+
+// A URN for Java artifacts hosted on Maven central.
+// payload: MavenPayload
+MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"];
+  }
+}
+
+message ArtifactFilePayload {
+  // A path to an artifact file on a local system.
+  string local_path = 1;
+
+  // A generated staged name (no path).
+  string staged_name = 2;
+}
+
+message EmbeddedFilePayload {
+  // raw data bytes for an embedded artifact
+  bytes data = 1;
+
+  // A generated staged name (no path).
+  string staged_name = 2;
+}
+
+message PypiPayload {
+  // Pypi compatible artifact id e.g. "apache-beam"
+  string pypi_artifact_id = 1;
+
+  // Pypi compatible version string.
+  string pypi_version_range = 2;
+}
+
+message MavenPayload {
+  // Maven compatible group id e.g. "org.apache.beam"
+  string maven_group_id = 1;
 
 Review comment:
   If there's a standard string format, let's just use that rather than 
breaking this up. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386977)
Time Spent: 4h  (was: 3h 50m)

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-9312) Make MonotonicWatermarkEstimator configurable for whether to ignore late timestamp.

2020-02-13 Thread Boyuan Zhang (Jira)

Boyuan Zhang created BEAM-9312:
--

 Summary: Make MonotonicWatermarkEstimator configurable for whether 
to ignore late timestamp.
 Key: BEAM-9312
 URL: https://issues.apache.org/jira/browse/BEAM-9312
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core, sdk-py-harness
Reporter: Boyuan Zhang


Current implementation of MonotonicWatermarkEstimator throws error and stop the 
pipeline  when there is a late timestamp. But there are more potential options 
like:
(1) Suppress the error and emit the item as possibly late data.
(2) Move the timestamp forward to respect the watermark.
We should consider making MonotonicWatermarkEstimator configurable with these 
options, or  providing different types of MonotonicWatermarkEstimator  to 
handle different options.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9291) upload_graph support in Dataflow Python SDK

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9291?focusedWorklogId=386975&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386975
 ]

ASF GitHub Bot logged work on BEAM-9291:


Author: ASF GitHub Bot
Created on: 14/Feb/20 00:01
Start Date: 14/Feb/20 00:01
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10829: [BEAM-9291] Upload 
graph option in dataflow's python sdk
URL: https://github.com/apache/beam/pull/10829#issuecomment-586031714
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386975)
Time Spent: 3h 40m  (was: 3.5h)

> upload_graph support in Dataflow Python SDK
> ---
>
> Key: BEAM-9291
> URL: https://issues.apache.org/jira/browse/BEAM-9291
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Radosław Stankiewicz
>Assignee: Radosław Stankiewicz
>Priority: Minor
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> upload_graph option is not supported in Dataflow's Python SDK so there is no 
> workaround for large graphs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9291) upload_graph support in Dataflow Python SDK

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9291?focusedWorklogId=386976&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386976
 ]

ASF GitHub Bot logged work on BEAM-9291:


Author: ASF GitHub Bot
Created on: 14/Feb/20 00:01
Start Date: 14/Feb/20 00:01
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10829: [BEAM-9291] Upload 
graph option in dataflow's python sdk
URL: https://github.com/apache/beam/pull/10829#issuecomment-586031756
 
 
   Run Python2_PVR_Flink PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386976)
Time Spent: 3h 50m  (was: 3h 40m)

> upload_graph support in Dataflow Python SDK
> ---
>
> Key: BEAM-9291
> URL: https://issues.apache.org/jira/browse/BEAM-9291
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Radosław Stankiewicz
>Assignee: Radosław Stankiewicz
>Priority: Minor
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> upload_graph option is not supported in Dataflow's Python SDK so there is no 
> workaround for large graphs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=386973&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386973
 ]

ASF GitHub Bot logged work on BEAM-9229:


Author: ASF GitHub Bot
Created on: 13/Feb/20 23:59
Start Date: 13/Feb/20 23:59
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10733: [BEAM-9229] 
Adding dependency information to Environment proto
URL: https://github.com/apache/beam/pull/10733#discussion_r379186220
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1093,6 +1093,72 @@ message SideInput {
   FunctionSpec window_mapping_fn = 3;
 }
 
+message StandardArtifacts {
+  enum Types {
+// A URN for artifacts stored in a local directory.
+// payload: ArtifactFilePayload.
+FILE = 0 [(beam_urn) = "beam:artifact:file:v1"];
+
+// A URN for artifacts embedded in ArtifactInformation proto.
+// payload: EmbeddedFilePayload.
+EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"];
+
+// A URN for artifacts described by remote URLs.
+// payload: a string for an artifact URL e.g. "https://.../foo.jar";
+REMOTE = 2 [(beam_urn) = "beam:artifact:remote:v1"];
+
+// A URN for Python artifacts hosted on PYPI.
+// payload: PypiPayload
+PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"];
+
+// A URN for Java artifacts hosted on Maven central.
+// payload: MavenPayload
+MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"];
+  }
+}
+
+message ArtifactFilePayload {
+  // A path to an artifact file on a local system.
+  string local_path = 1;
+
+  // A generated staged name (no path).
+  string staged_name = 2;
+}
+
+message EmbeddedFilePayload {
+  // raw data bytes for an embedded artifact
+  bytes data = 1;
+
+  // A generated staged name (no path).
 
 Review comment:
   +1 to allowing (relative) paths to some stagingDir root. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386973)
Time Spent: 3h 50m  (was: 3h 40m)

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=386971&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386971
 ]

ASF GitHub Bot logged work on BEAM-9229:


Author: ASF GitHub Bot
Created on: 13/Feb/20 23:58
Start Date: 13/Feb/20 23:58
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10733: [BEAM-9229] 
Adding dependency information to Environment proto
URL: https://github.com/apache/beam/pull/10733#discussion_r379185959
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -1093,6 +1093,72 @@ message SideInput {
   FunctionSpec window_mapping_fn = 3;
 }
 
+message StandardArtifacts {
+  enum Types {
+// A URN for artifacts stored in a local directory.
+// payload: ArtifactFilePayload.
+FILE = 0 [(beam_urn) = "beam:artifact:file:v1"];
+
+// A URN for artifacts embedded in ArtifactInformation proto.
+// payload: EmbeddedFilePayload.
+EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"];
+
+// A URN for artifacts described by remote URLs.
+// payload: a string for an artifact URL e.g. "https://.../foo.jar";
+REMOTE = 2 [(beam_urn) = "beam:artifact:remote:v1"];
 
 Review comment:
   Regarding the file payload, I think we may want to stage this under a 
different name than its "real" name. However, I think the stage name needs to 
be lifted to a higher level and apply to all artifact types. 
   
   One reason to keep file separate than URL is that we may want to set the 
space of files be those understood by (and fetched via) the beam filesystems 
packages (including remote, non-(public?)-url ones like gcs or aws). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386971)
Time Spent: 3h 40m  (was: 3.5h)

> Adding dependency information to Environment proto
> --
>
> Key: BEAM-9229
> URL: https://issues.apache.org/jira/browse/BEAM-9229
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Adding dependency information to Environment proto.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=386967&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386967
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 13/Feb/20 23:55
Start Date: 13/Feb/20 23:55
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10375: [BEAM-8537] 
Provide WatermarkEstimator to track watermark
URL: https://github.com/apache/beam/pull/10375#discussion_r379184461
 
 

 ##
 File path: sdks/python/apache_beam/io/watermark_estimators.py
 ##
 @@ -0,0 +1,101 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""A collection of WatermarkEstimator implementations that SplittableDoFns
+can use."""
+
+# pytype: skip-file
+
+from __future__ import absolute_import
+
+from apache_beam.io.iobase import WatermarkEstimator
+from apache_beam.utils.timestamp import Timestamp
+
+
+class MonotonicWatermarkEstimator(WatermarkEstimator):
+  """A WatermarkEstimator which assumes that timestamps of all ouput records
+  are increasing monotonically.
+  """
+  def __init__(self, timestamp):
+"""For a new  pair, the initial value is None. When
+resuming processing, the initial timestamp will be the last reported
+watermark.
+"""
+self._watermark = timestamp
+
+  def observe_timestamp(self, timestamp):
+if self._watermark is None:
+  self._watermark = timestamp
+else:
+  if timestamp < self._watermark:
+raise ValueError('A MonotonicWatermarkEstimator expects output '
+ 'timestamp to be increasing monotonically.')
+  self._watermark = timestamp
+
+  def current_watermark(self):
+return self._watermark
+
+  def get_estimator_state(self):
+return self._watermark
+
+
+class WalltimeWatermarkEstimator(WatermarkEstimator):
+  """A WatermarkEstimator which uses processing time as the estimated 
watermark.
+  """
+  def __init__(self, timestamp=None):
+if timestamp:
 
 Review comment:
   FWIW, This can be written `self._timestamp  = timestamp or Timestamp.now()`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386967)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 15h 10m
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=386969&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386969
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 13/Feb/20 23:55
Start Date: 13/Feb/20 23:55
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10375: [BEAM-8537] 
Provide WatermarkEstimator to track watermark
URL: https://github.com/apache/beam/pull/10375#discussion_r379142973
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -486,19 +486,28 @@ def process(
 _ = (p | beam.Create(data) | beam.ParDo(ExpandingStringsDoFn()))
 
   def test_sdf_with_watermark_tracking(self):
+class ManualWatermarkEstimatorProvider(
 
 Review comment:
   Hmm... should we provide these as well? (Perhaps via a static 
default_provider() method on the corresponding watermark estimators?)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386969)
Time Spent: 15h 20m  (was: 15h 10m)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 15h 20m
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=386968&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386968
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 13/Feb/20 23:55
Start Date: 13/Feb/20 23:55
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10375: [BEAM-8537] 
Provide WatermarkEstimator to track watermark
URL: https://github.com/apache/beam/pull/10375#discussion_r379141998
 
 

 ##
 File path: sdks/python/apache_beam/runners/common.py
 ##
 @@ -765,40 +802,54 @@ def _invoke_process_per_window(self,
   # ProcessSizedElementAndRestriction.
   self.threadsafe_restriction_tracker.check_done()
   deferred_status = self.threadsafe_restriction_tracker.deferred_status()
-  current_watermark = None
-  if self.watermark_estimator:
-current_watermark = self.watermark_estimator.current_watermark()
   if deferred_status:
 deferred_restriction, deferred_timestamp = deferred_status
 element = windowed_value.value
 size = self.signature.get_restriction_provider().restriction_size(
 element, deferred_restriction)
-residual_value = ((element, deferred_restriction), size)
-return SplitResultResidual(
-residual_value=windowed_value.with_value(residual_value),
-current_watermark=current_watermark,
-deferred_timestamp=deferred_timestamp)
-return None
+if self.threadsafe_watermark_estimator:
+  current_watermark = (
+  self.threadsafe_watermark_estimator.current_watermark())
+  estimator_state = (
+  self.threadsafe_watermark_estimator.get_estimator_state())
+  residual_value = (
+  (element, (deferred_restriction, estimator_state)), size)
+  return SplitResultResidual(
+  residual_value=windowed_value.with_value(residual_value),
+  current_watermark=current_watermark,
+  deferred_timestamp=deferred_timestamp)
+else:
 
 Review comment:
   I think this can be removed as self.threadsafe_watermark_estimator will 
always be something, right?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386968)
Time Spent: 15h 20m  (was: 15h 10m)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 15h 20m
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=386964&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386964
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 13/Feb/20 23:55
Start Date: 13/Feb/20 23:55
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10375: [BEAM-8537] 
Provide WatermarkEstimator to track watermark
URL: https://github.com/apache/beam/pull/10375#discussion_r379142182
 
 

 ##
 File path: sdks/python/apache_beam/runners/common.py
 ##
 @@ -1035,6 +1284,8 @@ def process_outputs(self, windowed_input_element, 
results):
   windowed_value.windows *= len(windowed_input_element.windows)
   else:
 windowed_value = windowed_input_element.with_value(result)
+  if watermark_estimator:
 
 Review comment:
   Fix? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386964)
Time Spent: 15h  (was: 14h 50m)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 15h
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=386963&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386963
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 13/Feb/20 23:55
Start Date: 13/Feb/20 23:55
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10375: [BEAM-8537] 
Provide WatermarkEstimator to track watermark
URL: https://github.com/apache/beam/pull/10375#discussion_r379136355
 
 

 ##
 File path: sdks/python/apache_beam/runners/common.py
 ##
 @@ -321,6 +333,21 @@ def is_splittable_dofn(self):
 # type: () -> bool
 return self.get_restriction_provider() is not None
 
+  def get_restriction_coder(self):
+"""Get coder for a restriction when processing an SDF.
+
+If the SDF uses an WatermarkEstimatorProvider, the restriction coder is a
+tuple coder of (restriction_coder, estimator_state_coder). Otherwise,
+returns the SDFs restriction_coder.
+"""
+restriction_coder = None
 
 Review comment:
   Nit: https://engdoc.corp.google.com/eng/doc/devguide/py/totw/026.md
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386963)
Time Spent: 15h  (was: 14h 50m)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 15h
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=386966&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386966
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 13/Feb/20 23:55
Start Date: 13/Feb/20 23:55
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10375: [BEAM-8537] 
Provide WatermarkEstimator to track watermark
URL: https://github.com/apache/beam/pull/10375#discussion_r379136495
 
 

 ##
 File path: sdks/python/apache_beam/runners/common.py
 ##
 @@ -323,6 +330,27 @@ def is_splittable_dofn(self):
 # type: () -> bool
 return self.get_restriction_provider() is not None
 
+  def is_tracking_watermark(self):
+return self.get_watermark_estimator_provider() is not None
+
+  def get_restriction_coder(self):
+"""Get coder for a restriction when processing an SDF.
+
+If the SDF uses an WatermarkEstimatorProvider, the restriction coder is a
 
 Review comment:
   This code assumes a watermark estimator is always available, if so update 
the docstring (and simplify the code elsewhere to not have a different branch 
for the no-watermark-estimator case).
   
   (Yes, this should be feasible for other SDKs as well, but each SDK can do 
their own thing.) 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386966)
Time Spent: 15h 10m  (was: 15h)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 15h 10m
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9290) runner_harness_container_image experiment is not honored in python released sdks.

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9290?focusedWorklogId=386962&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386962
 ]

ASF GitHub Bot logged work on BEAM-9290:


Author: ASF GitHub Bot
Created on: 13/Feb/20 23:55
Start Date: 13/Feb/20 23:55
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #10827: [BEAM-9290] 
Support runner_harness_container_image in released python…
URL: https://github.com/apache/beam/pull/10827
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386962)
Time Spent: 1h 10m  (was: 1h)

> runner_harness_container_image experiment is not honored in python released 
> sdks.
> -
>
> Key: BEAM-9290
> URL: https://issues.apache.org/jira/browse/BEAM-9290
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
>  
> {code:java}
> --experiments=runner_harness_container_image=foo_image{code}
> does not have any affect on the job.
>  
>  
> cc: [~tvalentyn]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=386965&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386965
 ]

ASF GitHub Bot logged work on BEAM-8537:


Author: ASF GitHub Bot
Created on: 13/Feb/20 23:55
Start Date: 13/Feb/20 23:55
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10375: [BEAM-8537] 
Provide WatermarkEstimator to track watermark
URL: https://github.com/apache/beam/pull/10375#discussion_r379138864
 
 

 ##
 File path: sdks/python/apache_beam/runners/common.py
 ##
 @@ -527,11 +734,11 @@ def __init__(self,
 signature.is_stateful_dofn())
 self.user_state_context = user_state_context
 self.is_splittable = signature.is_splittable_dofn()
-self.watermark_estimator = self.signature.get_watermark_estimator()
-self.watermark_estimator_param = (
-self.signature.process_method.watermark_estimator_arg_name
-if self.watermark_estimator else None)
-self.threadsafe_restriction_tracker = None  # type: 
Optional[iobase.ThreadsafeRestrictionTracker]
+self.threadsafe_restriction_tracker = None
+self.threadsafe_watermark_estimator = None
+# The lock which guarantee synchronization for both
+# ThreadsafeRestrictionTracker and ThreadsafeWatermarkEstimator.
+self._synchronized_lock = threading.Lock()
 
 Review comment:
   Would an in-person discussion on this be helpful @lukecwik @boyuanzz ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386965)
Time Spent: 15h 10m  (was: 15h)

> Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
> 
>
> Key: BEAM-8537
> URL: https://issues.apache.org/jira/browse/BEAM-8537
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core, sdk-py-harness
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 15h 10m
>  Remaining Estimate: 0h
>
> This is a follow up for in-progress PR:  
> https://github.com/apache/beam/pull/9794.
> Current implementation in PR9794 provides a default implementation of 
> WatermarkEstimator. For further work, we want to let WatermarkEstimator to be 
> a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to 
> create a custom WatermarkEstimator per windowed value. It should be similar 
> to how we track restriction for SDF: 
> WatermarkEstimator <---> RestrictionTracker 
> WatermarkEstimatorProvider <---> RestrictionTrackerProvider
> WatermarkEstimatorParam <---> RestrictionDoFnParam



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9003) test_reshuffle_preserves_timestamps (apache_beam.transforms.util_test.ReshuffleTest) does not work in Streaming VR suite on Dataflow

2020-02-13 Thread Valentyn Tymofieiev (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036592#comment-17036592
 ] 

Valentyn Tymofieiev commented on BEAM-9003:
---

It is concerning that you are using a worker jar for 2.19.0 snapshot(aka dev) 
and python sdk from 2.18.0 dev version - you should try to build both artifacts 
from the same version of the SDK that you are running the test, and to reflect 
the current state, you should be looking at current SDK code at Beam master, so 
you'd have 
beam-runners-google-cloud-dataflow-java-fn-api-worker-2.20.0-SNAPSHOT.jar and 
apache-beam-2.20.0.dev0.tar.gz 

> test_reshuffle_preserves_timestamps 
> (apache_beam.transforms.util_test.ReshuffleTest) does not work in Streaming 
> VR suite on Dataflow
> 
>
> Key: BEAM-9003
> URL: https://issues.apache.org/jira/browse/BEAM-9003
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Liu Wang
>Priority: Major
>
> Per investigation in https://issues.apache.org/jira/browse/BEAM-8877, the 
> test times out and was recently added to VR test suite.
> [~liumomo315], I will sickbay this test for streaming, could you please help 
> triage the failure?
> Thank you!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-9290) runner_harness_container_image experiment is not honored in python released sdks.

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-9290?focusedWorklogId=386961&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386961
 ]

ASF GitHub Bot logged work on BEAM-9290:


Author: ASF GitHub Bot
Created on: 13/Feb/20 23:55
Start Date: 13/Feb/20 23:55
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #10827: [BEAM-9290] Support 
runner_harness_container_image in released python…
URL: https://github.com/apache/beam/pull/10827#issuecomment-586029907
 
 
   thanks for the review
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386961)
Time Spent: 1h  (was: 50m)

> runner_harness_container_image experiment is not honored in python released 
> sdks.
> -
>
> Key: BEAM-9290
> URL: https://issues.apache.org/jira/browse/BEAM-9290
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow, sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
>  
> {code:java}
> --experiments=runner_harness_container_image=foo_image{code}
> does not have any affect on the job.
>  
>  
> cc: [~tvalentyn]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (BEAM-9003) test_reshuffle_preserves_timestamps (apache_beam.transforms.util_test.ReshuffleTest) does not work in Streaming VR suite on Dataflow

2020-02-13 Thread Liu Wang (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036590#comment-17036590
 ] 

Liu Wang edited comment on BEAM-9003 at 2/13/20 11:48 PM:
--

Run the test on VR with command:

python setup.py nosetests -test-pipeline-options="runner=TestDataflowRunner 
--dataflow_worker_jar='./../../runners/google-cloud-dataflow-java/worker/build/libs/beam-runners-google-cloud-dataflow-java-fn-api-worker-2.19.0-SNAPSHOT.jar'
 --project=google.com:clouddfe --temp_location=gs://clouddfe-test/staging$USER 
--output=gs://world-readable-mkcq69tkcu/$USER/result.txt 
--sdk_location=./build/apache-beam-2.18.0.dev0.tar.gz --num_workers=1 
--sleep_secs=20 --streaming " 
--tests=apache_beam.transforms.util_test.ReshuffleTest --attr=ValidatesRunner 
--nocapture

 

The Error message shows TIMESTAMP_MAX_VALUE is one day larger than the end of 
the global window.

---

^Error message from worker: java.lang.IllegalStateException: TimestampCombiner 
moved element from 294247-01-10T04:00:54.775Z (TIMESTAMP_MAX_VALUE) to earlier 
time 294247-01-09T04:00:54.775Z (end of global window) for window 
org.apache.beam.sdk.transforms.windowing.GlobalWindow@68b7f410 
org.apache.beam.runners.core.WatermarkHold.shift(WatermarkHold.java:117) 
org.apache.beam.runners.core.WatermarkHold.addElementHold(WatermarkHold.java:154)
 org.apache.beam.runners.core.WatermarkHold.addHolds(WatermarkHold.java:98) 
org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:605)
 
org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349)
 
org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:94)
 
org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:42)
 
org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.invokeProcessElement(GroupAlsoByWindowFnRunner.java:115)
 
org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.processElement(GroupAlsoByWindowFnRunner.java:73)
 
org.apache.beam.runners.core.LateDataDroppingDoFnRunner.processElement(LateDataDroppingDoFnRunner.java:80)
 
org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn.processElement(GroupAlsoByWindowsParDoFn.java:134)
 
org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:44)
 
org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:49)
 
org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:201)
 
org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
 
org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
 
org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125)
 
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1324)
 
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:151)
 
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1053)
 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
java.lang.Thread.run(Thread.java:748) java.lang.IllegalStateException: 
TimestampCombiner moved element from 294247-01-10T04:00:54.775Z 
(TIMESTAMP_MAX_VALUE) to earlier time 294247-01-09T04:00:54.775Z (end of global 
window) for window 
org.apache.beam.sdk.transforms.windowing.GlobalWindow@68b7f410 
org.apache.beam.runners.core.WatermarkHold.shift(WatermarkHold.java:117) 
org.apache.beam.runners.core.WatermarkHold.addElementHold(WatermarkHold.java:154)
 org.apache.beam.runners.core.WatermarkHold.addHolds(WatermarkHold.java:98) 
org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:605)
 
org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349)
 
org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:94)
 
org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:42)
 
org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.invokeProcessElement(GroupAlsoByWindowFnRunner.java:115)
 
org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRu

[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=386959&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386959
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 13/Feb/20 23:48
Start Date: 13/Feb/20 23:48
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10769: [BEAM-8889] 
Upgrades gcsio to 2.0.0
URL: https://github.com/apache/beam/pull/10769#issuecomment-586028106
 
 
   Run CommunityMetrics PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386959)
Remaining Estimate: 152h 10m  (was: 152h 20m)
Time Spent: 15h 50m  (was: 15h 40m)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 15h 50m
>  Remaining Estimate: 152h 10m
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=386958&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386958
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 13/Feb/20 23:47
Start Date: 13/Feb/20 23:47
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10769: [BEAM-8889] 
Upgrades gcsio to 2.0.0
URL: https://github.com/apache/beam/pull/10769#issuecomment-586027938
 
 
   Run Dataflow ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386958)
Remaining Estimate: 152h 20m  (was: 152.5h)
Time Spent: 15h 40m  (was: 15.5h)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 15h 40m
>  Remaining Estimate: 152h 20m
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-9003) test_reshuffle_preserves_timestamps (apache_beam.transforms.util_test.ReshuffleTest) does not work in Streaming VR suite on Dataflow

2020-02-13 Thread Liu Wang (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-9003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036590#comment-17036590
 ] 

Liu Wang commented on BEAM-9003:


Run the test on VR with command:

^python setup.py nosetests --test-pipeline-options="--runner=TestDataflowRunner 
--dataflow_worker_jar='./../../runners/google-cloud-dataflow-java/worker/build/libs/beam-runners-google-cloud-dataflow-java-fn-api-worker-2.19.0-SNAPSHOT.jar'
 --project=google.com:clouddfe --temp_location=gs://clouddfe-test/staging-$USER 
--output=gs://world-readable-mkcq69tkcu/$USER/result.txt 
--sdk_location=./build/apache-beam-2.18.0.dev0.tar.gz --num_workers=1 
--sleep_secs=20 --streaming " 
--tests=apache_beam.transforms.util_test.ReshuffleTest --attr=ValidatesRunner 
--nocapture^

 

The Error message shows TIMESTAMP_MAX_VALUE is one day larger than the end of 
the global window.

---

^Error message from worker: java.lang.IllegalStateException: TimestampCombiner 
moved element from 294247-01-10T04:00:54.775Z (TIMESTAMP_MAX_VALUE) to earlier 
time 294247-01-09T04:00:54.775Z (end of global window) for window 
org.apache.beam.sdk.transforms.windowing.GlobalWindow@68b7f410 
org.apache.beam.runners.core.WatermarkHold.shift(WatermarkHold.java:117) 
org.apache.beam.runners.core.WatermarkHold.addElementHold(WatermarkHold.java:154)
 org.apache.beam.runners.core.WatermarkHold.addHolds(WatermarkHold.java:98) 
org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:605)
 
org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349)
 
org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:94)
 
org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:42)
 
org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.invokeProcessElement(GroupAlsoByWindowFnRunner.java:115)
 
org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.processElement(GroupAlsoByWindowFnRunner.java:73)
 
org.apache.beam.runners.core.LateDataDroppingDoFnRunner.processElement(LateDataDroppingDoFnRunner.java:80)
 
org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn.processElement(GroupAlsoByWindowsParDoFn.java:134)
 
org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:44)
 
org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:49)
 
org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:201)
 
org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
 
org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
 
org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125)
 
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1324)
 
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:151)
 
org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1053)
 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
java.lang.Thread.run(Thread.java:748) java.lang.IllegalStateException: 
TimestampCombiner moved element from 294247-01-10T04:00:54.775Z 
(TIMESTAMP_MAX_VALUE) to earlier time 294247-01-09T04:00:54.775Z (end of global 
window) for window 
org.apache.beam.sdk.transforms.windowing.GlobalWindow@68b7f410 
org.apache.beam.runners.core.WatermarkHold.shift(WatermarkHold.java:117) 
org.apache.beam.runners.core.WatermarkHold.addElementHold(WatermarkHold.java:154)
 org.apache.beam.runners.core.WatermarkHold.addHolds(WatermarkHold.java:98) 
org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:605)
 
org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349)
 
org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:94)
 
org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:42)
 
org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.invokeProcessElement(GroupAlsoByWindowFnRunner.java:115)
 
org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.processElement(GroupAlsoByWindowFnRunner

[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=386957&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386957
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 13/Feb/20 23:45
Start Date: 13/Feb/20 23:45
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10769: [BEAM-8889] 
Upgrades gcsio to 2.0.0
URL: https://github.com/apache/beam/pull/10769#issuecomment-586027249
 
 
   > @lukecwik This version 2.0 doesn't introduce a new dependency against 
gRPC. The next version will.
   
   sgtm, will let any committer merge once tests are green
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386957)
Remaining Estimate: 152.5h  (was: 152h 40m)
Time Spent: 15.5h  (was: 15h 20m)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 15.5h
>  Remaining Estimate: 152.5h
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=386956&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386956
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 13/Feb/20 23:44
Start Date: 13/Feb/20 23:44
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10856: [BEAM-8335] Update 
StreamingCache with new Protos
URL: https://github.com/apache/beam/pull/10856#issuecomment-586027007
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386956)
Time Spent: 58h  (was: 57h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 58h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=386954&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386954
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 13/Feb/20 23:41
Start Date: 13/Feb/20 23:41
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #10856: [BEAM-8335] 
Update StreamingCache with new Protos
URL: https://github.com/apache/beam/pull/10856#issuecomment-586026276
 
 
   R: @davidyan74 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 386954)
Time Spent: 57h 50m  (was: 57h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 57h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-13 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=386953&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386953
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 13/Feb/20 23:40
Start Date: 13/Feb/20 23:40
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #10856: 
[BEAM-8335] Update StreamingCache with new Protos
URL: https://github.com/apache/beam/pull/10856
 
 
   This fixes the breakage from https://github.com/apache/beam/pull/10826.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCom

1 2 3 >

1 - 100 of 283 matches

Mail list logo