[jira] [Work logged] (BEAM-9287) Python Validates runner tests for Unified Worker
[ https://issues.apache.org/jira/browse/BEAM-9287?focusedWorklogId=387194&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387194 ] ASF GitHub Bot logged work on BEAM-9287: Author: ASF GitHub Bot Created on: 14/Feb/20 07:56 Start Date: 14/Feb/20 07:56 Worklog Time Spent: 10m Work Description: ananvay commented on issue #10863: [BEAM-9287] Add Python streaming Validates runner tests for Unified Worker URL: https://github.com/apache/beam/pull/10863#issuecomment-586141508 Thanks Ankur! LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387194) Time Spent: 0.5h (was: 20m) > Python Validates runner tests for Unified Worker > > > Key: BEAM-9287 > URL: https://issues.apache.org/jira/browse/BEAM-9287 > Project: Beam > Issue Type: Test > Components: runner-dataflow, testing >Reporter: Ankur Goenka >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=387187&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387187 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 14/Feb/20 07:40 Start Date: 14/Feb/20 07:40 Worklog Time Spent: 10m Work Description: amoght commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-586136772 @lukecwik we are working on all the suggestions provided by you, will be updating the PR in a few days. Thank you for your patience. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387187) Time Spent: 8h 40m (was: 8.5h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 8h 40m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9299) Upgrade Flink Runner to 1.8.3 and 1.9.2
[ https://issues.apache.org/jira/browse/BEAM-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036700#comment-17036700 ] sunjincheng edited comment on BEAM-9299 at 2/14/20 5:42 AM: I found that the issue FLINK-15844 also exists in Flink 1.8.3. I noticed that FLINK-15844 has provided a fix for 1.9.3. However, AFAIK, there will be no new 1.8 releases any more according to FLink's [release policy|#update-policy-for-old-releases]]. There are two solutions in my mind: - Solution1: As the signature of `JobWithJars.buildUserCodeClassLoader `has changed in both Flink 1.8.3 and 1.9.2 and Flink 1.7.2 still uses the old signature, we could drop the Flink 1.7 support firstly and then update the implementation of `FlinkExecutionEnvironments` to use the new signature. - Solution2: We could make a copy of `FlinkExecutionEnvironments` in each version of Flink runner and update the implementation for each copy according to the Flink version. This solution decouples the drop of Flink 1.7 support and the upgrades of 1.8 and 1.9. Besides, Flink community has made big change for the job submission path(FLINK-13954) in 1.10, e.g. `JobWithJars` has been removed in 1.10. It means that the job submission logic will be separate for 1.8/1.9 and 1.10 anyway. Personally I prefer solution2 and what's your thought? :) [~iemejia] [~mxm] [~thw] was (Author: sunjincheng121): I found that the issue FLINK-15844 also exists in Flink 1.8.3. I noticed that FLINK-15844 has provided a fix for 1.9.3. However, AFAIK, there will be no new 1.8 releases any more according to FLink's [release policy|#update-policy-for-old-releases]]. There are two solutions in my mind: - Solution1: As the signature of `JobWithJars.buildUserCodeClassLoader `has changed in both Flink 1.8.3 and 1.9.2 and Flink 1.7.2 still uses the old signature, we could drop the Flink 1.7 support firstly and then update the implementation of `FlinkExecutionEnvironments` to use the new signature. - Solution2: We could make a copy of `FlinkExecutionEnvironments` in each version of Flink runner and update the implementation for each copy according to the Flink version. This solution decouples the drop of Flink 1.7 support and the upgrades of 1.8 and 1.9. Besides, Flink community has made big change for the job submission path(FLINK-13954) in 1.10, e.g. `JobWithJars` has been removed in 1.10. It means that the job submission logic will be separate for 1.8/1.9 and 1.10 anyway. Personally I prefer solution2 and what's your thought? :) > Upgrade Flink Runner to 1.8.3 and 1.9.2 > --- > > Key: BEAM-9299 > URL: https://issues.apache.org/jira/browse/BEAM-9299 > Project: Beam > Issue Type: Task > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.20.0 > > Time Spent: 40m > Remaining Estimate: 0h > > I would like to Upgrade Flink Runner to 18.3 and 1.9.2 due to both the Apache > Flink 1.8.3 and Apache Flink 1.9.2 have been released [1]. > What do you think? > [1] https://dist.apache.org/repos/dist/release/flink/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9299) Upgrade Flink Runner to 1.8.3 and 1.9.2
[ https://issues.apache.org/jira/browse/BEAM-9299?focusedWorklogId=387147&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387147 ] ASF GitHub Bot logged work on BEAM-9299: Author: ASF GitHub Bot Created on: 14/Feb/20 05:39 Start Date: 14/Feb/20 05:39 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on issue #10850: [BEAM-9299] Upgrade Flink Runner from 1.8.2 to 1.8.3 URL: https://github.com/apache/beam/pull/10850#issuecomment-586106747 Hi @angoenka Thanks for your comment, I found that the issue FLINK-15844 also exists in Flink 1.8.3. I noticed that FLINK-15844 has provided a fix for 1.9.3. However, AFAIK, there will be no new 1.8 releases any more according to Flink's release policy]. There are two solutions I have left in the JIRA. I appreciate if you have a look at it :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387147) Time Spent: 40m (was: 0.5h) > Upgrade Flink Runner to 1.8.3 and 1.9.2 > --- > > Key: BEAM-9299 > URL: https://issues.apache.org/jira/browse/BEAM-9299 > Project: Beam > Issue Type: Task > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.20.0 > > Time Spent: 40m > Remaining Estimate: 0h > > I would like to Upgrade Flink Runner to 18.3 and 1.9.2 due to both the Apache > Flink 1.8.3 and Apache Flink 1.9.2 have been released [1]. > What do you think? > [1] https://dist.apache.org/repos/dist/release/flink/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9299) Upgrade Flink Runner to 1.8.3 and 1.9.2
[ https://issues.apache.org/jira/browse/BEAM-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036700#comment-17036700 ] sunjincheng edited comment on BEAM-9299 at 2/14/20 5:35 AM: I found that the issue FLINK-15844 also exists in Flink 1.8.3. I noticed that FLINK-15844 has provided a fix for 1.9.3. However, AFAIK, there will be no new 1.8 releases any more according to FLink's [release policy|#update-policy-for-old-releases]]. There are two solutions in my mind: - Solution1: As the signature of `JobWithJars.buildUserCodeClassLoader `has changed in both Flink 1.8.3 and 1.9.2 and Flink 1.7.2 still uses the old signature, we could drop the Flink 1.7 support firstly and then update the implementation of `FlinkExecutionEnvironments` to use the new signature. - Solution2: We could make a copy of `FlinkExecutionEnvironments` in each version of Flink runner and update the implementation for each copy according to the Flink version. This solution decouples the drop of Flink 1.7 support and the upgrades of 1.8 and 1.9. Besides, Flink community has made big change for the job submission path(FLINK-13954) in 1.10, e.g. `JobWithJars` has been removed in 1.10. It means that the job submission logic will be separate for 1.8/1.9 and 1.10 anyway. Personally I prefer solution2 and what's your thought? :) was (Author: sunjincheng121): I found that the issue FLINK-15844 also exists in Flink 1.8.3. I noticed that FLINK-15844 has provided a fix for 1.9.3. However, AFAIK, there will be no new 1.8 releases any more according to FLink's [release policy|[https://flink.apache.org/downloads.html#update-policy-for-old-releases]]. There are two solutions in my mind: - Solution1: As the signature of `JobWithJars.buildUserCodeClassLoader `has changed in both Flink 1.8.3 and 1.9.2 and Flink 1.7.2 still uses the old signature, we could drop the Flink 1.7 support firstly and then update the implementation of `FlinkExecutionEnvironments` to use the new signature. -Solution2: We could make a copy of `FlinkExecutionEnvironments` in each version of Flink runner and update the implementation for each copy according to the Flink version. This solution decouples the drop of Flink 1.7 support and the upgrades of 1.8 and 1.9. Besides, Flink community has made big change for the job submission path(FLINK-13954) in 1.10, e.g. `JobWithJars` has been removed in 1.10. It means that the job submission logic will be separate for 1.8/1.9 and 1.10 anyway. Personally I prefer solution2 and what's your thought? :) > Upgrade Flink Runner to 1.8.3 and 1.9.2 > --- > > Key: BEAM-9299 > URL: https://issues.apache.org/jira/browse/BEAM-9299 > Project: Beam > Issue Type: Task > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.20.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > I would like to Upgrade Flink Runner to 18.3 and 1.9.2 due to both the Apache > Flink 1.8.3 and Apache Flink 1.9.2 have been released [1]. > What do you think? > [1] https://dist.apache.org/repos/dist/release/flink/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9299) Upgrade Flink Runner to 1.8.3 and 1.9.2
[ https://issues.apache.org/jira/browse/BEAM-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036700#comment-17036700 ] sunjincheng commented on BEAM-9299: --- I found that the issue FLINK-15844 also exists in Flink 1.8.3. I noticed that FLINK-15844 has provided a fix for 1.9.3. However, AFAIK, there will be no new 1.8 releases any more according to FLink's [release policy|[https://flink.apache.org/downloads.html#update-policy-for-old-releases]]. There are two solutions in my mind: - Solution1: As the signature of `JobWithJars.buildUserCodeClassLoader `has changed in both Flink 1.8.3 and 1.9.2 and Flink 1.7.2 still uses the old signature, we could drop the Flink 1.7 support firstly and then update the implementation of `FlinkExecutionEnvironments` to use the new signature. -Solution2: We could make a copy of `FlinkExecutionEnvironments` in each version of Flink runner and update the implementation for each copy according to the Flink version. This solution decouples the drop of Flink 1.7 support and the upgrades of 1.8 and 1.9. Besides, Flink community has made big change for the job submission path(FLINK-13954) in 1.10, e.g. `JobWithJars` has been removed in 1.10. It means that the job submission logic will be separate for 1.8/1.9 and 1.10 anyway. Personally I prefer solution2 and what's your thought? :) > Upgrade Flink Runner to 1.8.3 and 1.9.2 > --- > > Key: BEAM-9299 > URL: https://issues.apache.org/jira/browse/BEAM-9299 > Project: Beam > Issue Type: Task > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.20.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > I would like to Upgrade Flink Runner to 18.3 and 1.9.2 due to both the Apache > Flink 1.8.3 and Apache Flink 1.9.2 have been released [1]. > What do you think? > [1] https://dist.apache.org/repos/dist/release/flink/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage
[ https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=387097&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387097 ] ASF GitHub Bot logged work on BEAM-8889: Author: ASF GitHub Bot Created on: 14/Feb/20 04:05 Start Date: 14/Feb/20 04:05 Worklog Time Spent: 10m Work Description: veblush commented on pull request #10857: [BEAM-8889] Upgrade guava to 28.0-jre URL: https://github.com/apache/beam/pull/10857#discussion_r379241854 ## File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy ## @@ -383,7 +383,7 @@ class BeamModulePlugin implements Plugin { def google_cloud_spanner_version = "1.49.1" def google_http_clients_version = "1.34.0" def grpc_version = "1.25.0" -def guava_version = "25.1-jre" +def guava_version = "28.0-jre" Review comment: This was because `28.0-jre` is the first version having missing symbols. `28.2-jre` would work too. It's up to Beam's team to decide which version to go. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387097) Remaining Estimate: 150.5h (was: 150h 40m) Time Spent: 17.5h (was: 17h 20m) > Make GcsUtil use GoogleCloudStorage > --- > > Key: BEAM-8889 > URL: https://issues.apache.org/jira/browse/BEAM-8889 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.16.0 >Reporter: Esun Kim >Assignee: VASU NORI >Priority: Major > Labels: gcs > Original Estimate: 168h > Time Spent: 17.5h > Remaining Estimate: 150.5h > > [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java] > is a primary class to access Google Cloud Storage on Apache Beam. Current > implementation directly creates GoogleCloudStorageReadChannel and > GoogleCloudStorageWriteChannel by itself to read and write GCS data rather > than using > [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java] > which is an abstract class providing basic IO capability which eventually > creates channel objects. This request is about updating GcsUtil to use > GoogleCloudStorage to create read and write channel, which is expected > flexible because it can easily pick up the new change; e.g. new channel > implementation using new protocol without code change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9301) Check in beam-linkage-check.sh
[ https://issues.apache.org/jira/browse/BEAM-9301?focusedWorklogId=387092&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387092 ] ASF GitHub Bot logged work on BEAM-9301: Author: ASF GitHub Bot Created on: 14/Feb/20 03:28 Start Date: 14/Feb/20 03:28 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10841: [BEAM-9301] Check in beam-linkage-check.sh URL: https://github.com/apache/beam/pull/10841#issuecomment-586080883 > master is checked eagerly What is it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387092) Time Spent: 1h 20m (was: 1h 10m) > Check in beam-linkage-check.sh > -- > > Key: BEAM-9301 > URL: https://issues.apache.org/jira/browse/BEAM-9301 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > https://github.com/apache/beam/pull/10769#issuecomment-584571787 > bq. @suztomo can you contribute this script maybe into Beam's build-tools > directory so we can improve it a bit for further use? > This is a temporary solution before exclusion rules in Linkage Checker > (BEAM-9206) are implemented. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage
[ https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=387088&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387088 ] ASF GitHub Bot logged work on BEAM-8889: Author: ASF GitHub Bot Created on: 14/Feb/20 03:17 Start Date: 14/Feb/20 03:17 Worklog Time Spent: 10m Work Description: medb commented on pull request #10857: [BEAM-8889] Upgrade guava to 28.0-jre URL: https://github.com/apache/beam/pull/10857#discussion_r379233105 ## File path: buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy ## @@ -383,7 +383,7 @@ class BeamModulePlugin implements Plugin { def google_cloud_spanner_version = "1.49.1" def google_http_clients_version = "1.34.0" def grpc_version = "1.25.0" -def guava_version = "25.1-jre" +def guava_version = "28.0-jre" Review comment: Why to not use the latest `28.2-jre` version? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387088) Remaining Estimate: 150h 40m (was: 150h 50m) Time Spent: 17h 20m (was: 17h 10m) > Make GcsUtil use GoogleCloudStorage > --- > > Key: BEAM-8889 > URL: https://issues.apache.org/jira/browse/BEAM-8889 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.16.0 >Reporter: Esun Kim >Assignee: VASU NORI >Priority: Major > Labels: gcs > Original Estimate: 168h > Time Spent: 17h 20m > Remaining Estimate: 150h 40m > > [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java] > is a primary class to access Google Cloud Storage on Apache Beam. Current > implementation directly creates GoogleCloudStorageReadChannel and > GoogleCloudStorageWriteChannel by itself to read and write GCS data rather > than using > [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java] > which is an abstract class providing basic IO capability which eventually > creates channel objects. This request is about updating GcsUtil to use > GoogleCloudStorage to create read and write channel, which is expected > flexible because it can easily pick up the new change; e.g. new channel > implementation using new protocol without code change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9211) Spark portable jar test script is missing
[ https://issues.apache.org/jira/browse/BEAM-9211?focusedWorklogId=387086&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387086 ] ASF GitHub Bot logged work on BEAM-9211: Author: ASF GitHub Bot Created on: 14/Feb/20 03:12 Start Date: 14/Feb/20 03:12 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #10723: [BEAM-9211] upload missing Spark portable jar test script URL: https://github.com/apache/beam/pull/10723#discussion_r379231967 ## File path: runners/portability/test_pipeline_jar.sh ## @@ -96,11 +101,17 @@ result = pipeline.run() result.wait_until_finish() " +if [[ "$RUNNER" -eq "FlinkRunner" ]]; then Review comment: ```suggestion if [[ "$RUNNER" = "FlinkRunner" ]]; then ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387086) Time Spent: 3h 10m (was: 3h) > Spark portable jar test script is missing > - > > Key: BEAM-9211 > URL: https://issues.apache.org/jira/browse/BEAM-9211 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 3h 10m > Remaining Estimate: 0h > > beam_PostCommit_PortableJar_Spark has been failing since its creation because > I forgot to upload the test script it calls. Whoops. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
[ https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=387082&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387082 ] ASF GitHub Bot logged work on BEAM-8537: Author: ASF GitHub Bot Created on: 14/Feb/20 03:05 Start Date: 14/Feb/20 03:05 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #10375: [BEAM-8537] Provide WatermarkEstimator to track watermark URL: https://github.com/apache/beam/pull/10375#issuecomment-586075968 Run PythonFormatter PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387082) Time Spent: 16h 20m (was: 16h 10m) > Provide WatermarkEstimatorProvider for different types of WatermarkEstimator > > > Key: BEAM-8537 > URL: https://issues.apache.org/jira/browse/BEAM-8537 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 16h 20m > Remaining Estimate: 0h > > This is a follow up for in-progress PR: > https://github.com/apache/beam/pull/9794. > Current implementation in PR9794 provides a default implementation of > WatermarkEstimator. For further work, we want to let WatermarkEstimator to be > a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to > create a custom WatermarkEstimator per windowed value. It should be similar > to how we track restriction for SDF: > WatermarkEstimator <---> RestrictionTracker > WatermarkEstimatorProvider <---> RestrictionTrackerProvider > WatermarkEstimatorParam <---> RestrictionDoFnParam -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9211) Spark portable jar test script is missing
[ https://issues.apache.org/jira/browse/BEAM-9211?focusedWorklogId=387084&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387084 ] ASF GitHub Bot logged work on BEAM-9211: Author: ASF GitHub Bot Created on: 14/Feb/20 03:05 Start Date: 14/Feb/20 03:05 Worklog Time Spent: 10m Work Description: ibzib commented on issue #10723: [BEAM-9211] upload missing Spark portable jar test script URL: https://github.com/apache/beam/pull/10723#issuecomment-586075989 Run PortableJar_Spark PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387084) Time Spent: 3h (was: 2h 50m) > Spark portable jar test script is missing > - > > Key: BEAM-9211 > URL: https://issues.apache.org/jira/browse/BEAM-9211 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 3h > Remaining Estimate: 0h > > beam_PostCommit_PortableJar_Spark has been failing since its creation because > I forgot to upload the test script it calls. Whoops. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9211) Spark portable jar test script is missing
[ https://issues.apache.org/jira/browse/BEAM-9211?focusedWorklogId=387083&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387083 ] ASF GitHub Bot logged work on BEAM-9211: Author: ASF GitHub Bot Created on: 14/Feb/20 03:05 Start Date: 14/Feb/20 03:05 Worklog Time Spent: 10m Work Description: ibzib commented on issue #10723: [BEAM-9211] upload missing Spark portable jar test script URL: https://github.com/apache/beam/pull/10723#issuecomment-586075970 Run PortableJar_Flink PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387083) Time Spent: 2h 50m (was: 2h 40m) > Spark portable jar test script is missing > - > > Key: BEAM-9211 > URL: https://issues.apache.org/jira/browse/BEAM-9211 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 2h 50m > Remaining Estimate: 0h > > beam_PostCommit_PortableJar_Spark has been failing since its creation because > I forgot to upload the test script it calls. Whoops. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9211) Spark portable jar test script is missing
[ https://issues.apache.org/jira/browse/BEAM-9211?focusedWorklogId=387081&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387081 ] ASF GitHub Bot logged work on BEAM-9211: Author: ASF GitHub Bot Created on: 14/Feb/20 03:05 Start Date: 14/Feb/20 03:05 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #10723: [BEAM-9211] upload missing Spark portable jar test script URL: https://github.com/apache/beam/pull/10723#discussion_r379230854 ## File path: runners/portability/test_pipeline_jar.sh ## @@ -28,6 +28,16 @@ case $key in shift # past argument shift # past value ;; +--spark_job_server_jar) Review comment: I changed it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387081) Time Spent: 2h 40m (was: 2.5h) > Spark portable jar test script is missing > - > > Key: BEAM-9211 > URL: https://issues.apache.org/jira/browse/BEAM-9211 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 2h 40m > Remaining Estimate: 0h > > beam_PostCommit_PortableJar_Spark has been failing since its creation because > I forgot to upload the test script it calls. Whoops. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9287) Python Validates runner tests for Unified Worker
[ https://issues.apache.org/jira/browse/BEAM-9287?focusedWorklogId=387079&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387079 ] ASF GitHub Bot logged work on BEAM-9287: Author: ASF GitHub Bot Created on: 14/Feb/20 03:03 Start Date: 14/Feb/20 03:03 Worklog Time Spent: 10m Work Description: angoenka commented on issue #10863: [BEAM-9287] Add Python streaming Validates runner tests for Unified Worker URL: https://github.com/apache/beam/pull/10863#issuecomment-586075421 R: @markflyhigh @ananvay This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387079) Time Spent: 20m (was: 10m) > Python Validates runner tests for Unified Worker > > > Key: BEAM-9287 > URL: https://issues.apache.org/jira/browse/BEAM-9287 > Project: Beam > Issue Type: Test > Components: runner-dataflow, testing >Reporter: Ankur Goenka >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9287) Python Validates runner tests for Unified Worker
[ https://issues.apache.org/jira/browse/BEAM-9287?focusedWorklogId=387078&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387078 ] ASF GitHub Bot logged work on BEAM-9287: Author: ASF GitHub Bot Created on: 14/Feb/20 03:02 Start Date: 14/Feb/20 03:02 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #10863: [BEAM-9287] Add Python streaming Validates runner tests for Unified Worker URL: https://github.com/apache/beam/pull/10863 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Pyth
[jira] [Work logged] (BEAM-9211) Spark portable jar test script is missing
[ https://issues.apache.org/jira/browse/BEAM-9211?focusedWorklogId=387077&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387077 ] ASF GitHub Bot logged work on BEAM-9211: Author: ASF GitHub Bot Created on: 14/Feb/20 03:00 Start Date: 14/Feb/20 03:00 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #10723: [BEAM-9211] upload missing Spark portable jar test script URL: https://github.com/apache/beam/pull/10723#discussion_r379229678 ## File path: runners/portability/test_pipeline_jar.sh ## @@ -28,6 +28,16 @@ case $key in shift # past argument shift # past value ;; +--spark_job_server_jar) Review comment: I see, but will it not be a bit confusing to only specify one of these option at a time. Like this gives the user option to provide flink runner but also provide spark_job_server_jar. I don;t have a strong opinion on this so its upto you. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387077) Time Spent: 2.5h (was: 2h 20m) > Spark portable jar test script is missing > - > > Key: BEAM-9211 > URL: https://issues.apache.org/jira/browse/BEAM-9211 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 2.5h > Remaining Estimate: 0h > > beam_PostCommit_PortableJar_Spark has been failing since its creation because > I forgot to upload the test script it calls. Whoops. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9211) Spark portable jar test script is missing
[ https://issues.apache.org/jira/browse/BEAM-9211?focusedWorklogId=387075&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387075 ] ASF GitHub Bot logged work on BEAM-9211: Author: ASF GitHub Bot Created on: 14/Feb/20 02:55 Start Date: 14/Feb/20 02:55 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #10723: [BEAM-9211] upload missing Spark portable jar test script URL: https://github.com/apache/beam/pull/10723#discussion_r379228919 ## File path: runners/portability/test_pipeline_jar.sh ## @@ -28,6 +28,16 @@ case $key in shift # past argument shift # past value ;; +--spark_job_server_jar) Review comment: Actually... I remember the reason I kept the flink_ and spark_ prefixes was so that the gradle build script would match the actual python args. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387075) Time Spent: 2h 20m (was: 2h 10m) > Spark portable jar test script is missing > - > > Key: BEAM-9211 > URL: https://issues.apache.org/jira/browse/BEAM-9211 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 2h 20m > Remaining Estimate: 0h > > beam_PostCommit_PortableJar_Spark has been failing since its creation because > I forgot to upload the test script it calls. Whoops. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9211) Spark portable jar test script is missing
[ https://issues.apache.org/jira/browse/BEAM-9211?focusedWorklogId=387074&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387074 ] ASF GitHub Bot logged work on BEAM-9211: Author: ASF GitHub Bot Created on: 14/Feb/20 02:53 Start Date: 14/Feb/20 02:53 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #10723: [BEAM-9211] upload missing Spark portable jar test script URL: https://github.com/apache/beam/pull/10723#discussion_r379228475 ## File path: runners/portability/test_pipeline_jar.sh ## @@ -28,6 +28,16 @@ case $key in shift # past argument shift # past value ;; +--spark_job_server_jar) Review comment: Sure. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387074) Time Spent: 2h 10m (was: 2h) > Spark portable jar test script is missing > - > > Key: BEAM-9211 > URL: https://issues.apache.org/jira/browse/BEAM-9211 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 2h 10m > Remaining Estimate: 0h > > beam_PostCommit_PortableJar_Spark has been failing since its creation because > I forgot to upload the test script it calls. Whoops. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9211) Spark portable jar test script is missing
[ https://issues.apache.org/jira/browse/BEAM-9211?focusedWorklogId=387073&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387073 ] ASF GitHub Bot logged work on BEAM-9211: Author: ASF GitHub Bot Created on: 14/Feb/20 02:49 Start Date: 14/Feb/20 02:49 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #10723: [BEAM-9211] upload missing Spark portable jar test script URL: https://github.com/apache/beam/pull/10723#discussion_r379227548 ## File path: runners/portability/test_pipeline_jar.sh ## @@ -28,6 +28,16 @@ case $key in shift # past argument shift # past value ;; +--spark_job_server_jar) Review comment: Can we just call it `JOB_SERVER_JAR`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387073) Time Spent: 2h (was: 1h 50m) > Spark portable jar test script is missing > - > > Key: BEAM-9211 > URL: https://issues.apache.org/jira/browse/BEAM-9211 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 2h > Remaining Estimate: 0h > > beam_PostCommit_PortableJar_Spark has been failing since its creation because > I forgot to upload the test script it calls. Whoops. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
[ https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=387070&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387070 ] ASF GitHub Bot logged work on BEAM-8537: Author: ASF GitHub Bot Created on: 14/Feb/20 02:44 Start Date: 14/Feb/20 02:44 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #10375: [BEAM-8537] Provide WatermarkEstimator to track watermark URL: https://github.com/apache/beam/pull/10375#discussion_r379226662 ## File path: sdks/python/apache_beam/runners/common.py ## @@ -765,40 +802,54 @@ def _invoke_process_per_window(self, # ProcessSizedElementAndRestriction. self.threadsafe_restriction_tracker.check_done() deferred_status = self.threadsafe_restriction_tracker.deferred_status() - current_watermark = None - if self.watermark_estimator: -current_watermark = self.watermark_estimator.current_watermark() if deferred_status: deferred_restriction, deferred_timestamp = deferred_status element = windowed_value.value size = self.signature.get_restriction_provider().restriction_size( element, deferred_restriction) -residual_value = ((element, deferred_restriction), size) -return SplitResultResidual( -residual_value=windowed_value.with_value(residual_value), -current_watermark=current_watermark, -deferred_timestamp=deferred_timestamp) -return None +if self.threadsafe_watermark_estimator: + current_watermark = ( + self.threadsafe_watermark_estimator.current_watermark()) + estimator_state = ( + self.threadsafe_watermark_estimator.get_estimator_state()) + residual_value = ( + (element, (deferred_restriction, estimator_state)), size) + return SplitResultResidual( + residual_value=windowed_value.with_value(residual_value), + current_watermark=current_watermark, + deferred_timestamp=deferred_timestamp) +else: Review comment: Yes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387070) Time Spent: 16h (was: 15h 50m) > Provide WatermarkEstimatorProvider for different types of WatermarkEstimator > > > Key: BEAM-8537 > URL: https://issues.apache.org/jira/browse/BEAM-8537 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 16h > Remaining Estimate: 0h > > This is a follow up for in-progress PR: > https://github.com/apache/beam/pull/9794. > Current implementation in PR9794 provides a default implementation of > WatermarkEstimator. For further work, we want to let WatermarkEstimator to be > a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to > create a custom WatermarkEstimator per windowed value. It should be similar > to how we track restriction for SDF: > WatermarkEstimator <---> RestrictionTracker > WatermarkEstimatorProvider <---> RestrictionTrackerProvider > WatermarkEstimatorParam <---> RestrictionDoFnParam -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
[ https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=387071&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387071 ] ASF GitHub Bot logged work on BEAM-8537: Author: ASF GitHub Bot Created on: 14/Feb/20 02:44 Start Date: 14/Feb/20 02:44 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #10375: [BEAM-8537] Provide WatermarkEstimator to track watermark URL: https://github.com/apache/beam/pull/10375#discussion_r379226674 ## File path: sdks/python/apache_beam/runners/common.py ## @@ -321,6 +333,21 @@ def is_splittable_dofn(self): # type: () -> bool return self.get_restriction_provider() is not None + def get_restriction_coder(self): +"""Get coder for a restriction when processing an SDF. + +If the SDF uses an WatermarkEstimatorProvider, the restriction coder is a +tuple coder of (restriction_coder, estimator_state_coder). Otherwise, +returns the SDFs restriction_coder. +""" +restriction_coder = None Review comment: Changed to `if ... else` block. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387071) Time Spent: 16h 10m (was: 16h) > Provide WatermarkEstimatorProvider for different types of WatermarkEstimator > > > Key: BEAM-8537 > URL: https://issues.apache.org/jira/browse/BEAM-8537 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 16h 10m > Remaining Estimate: 0h > > This is a follow up for in-progress PR: > https://github.com/apache/beam/pull/9794. > Current implementation in PR9794 provides a default implementation of > WatermarkEstimator. For further work, we want to let WatermarkEstimator to be > a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to > create a custom WatermarkEstimator per windowed value. It should be similar > to how we track restriction for SDF: > WatermarkEstimator <---> RestrictionTracker > WatermarkEstimatorProvider <---> RestrictionTrackerProvider > WatermarkEstimatorParam <---> RestrictionDoFnParam -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
[ https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=387068&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387068 ] ASF GitHub Bot logged work on BEAM-8537: Author: ASF GitHub Bot Created on: 14/Feb/20 02:44 Start Date: 14/Feb/20 02:44 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #10375: [BEAM-8537] Provide WatermarkEstimator to track watermark URL: https://github.com/apache/beam/pull/10375#discussion_r379226640 ## File path: sdks/python/apache_beam/io/watermark_estimators.py ## @@ -0,0 +1,101 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""A collection of WatermarkEstimator implementations that SplittableDoFns +can use.""" + +# pytype: skip-file + +from __future__ import absolute_import + +from apache_beam.io.iobase import WatermarkEstimator +from apache_beam.utils.timestamp import Timestamp + + +class MonotonicWatermarkEstimator(WatermarkEstimator): + """A WatermarkEstimator which assumes that timestamps of all ouput records + are increasing monotonically. + """ + def __init__(self, timestamp): +"""For a new pair, the initial value is None. When +resuming processing, the initial timestamp will be the last reported +watermark. +""" +self._watermark = timestamp + + def observe_timestamp(self, timestamp): +if self._watermark is None: + self._watermark = timestamp +else: + if timestamp < self._watermark: +raise ValueError('A MonotonicWatermarkEstimator expects output ' + 'timestamp to be increasing monotonically.') + self._watermark = timestamp + + def current_watermark(self): +return self._watermark + + def get_estimator_state(self): +return self._watermark + + +class WalltimeWatermarkEstimator(WatermarkEstimator): + """A WatermarkEstimator which uses processing time as the estimated watermark. + """ + def __init__(self, timestamp=None): +if timestamp: Review comment: Done. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387068) Time Spent: 15h 40m (was: 15.5h) > Provide WatermarkEstimatorProvider for different types of WatermarkEstimator > > > Key: BEAM-8537 > URL: https://issues.apache.org/jira/browse/BEAM-8537 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 15h 40m > Remaining Estimate: 0h > > This is a follow up for in-progress PR: > https://github.com/apache/beam/pull/9794. > Current implementation in PR9794 provides a default implementation of > WatermarkEstimator. For further work, we want to let WatermarkEstimator to be > a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to > create a custom WatermarkEstimator per windowed value. It should be similar > to how we track restriction for SDF: > WatermarkEstimator <---> RestrictionTracker > WatermarkEstimatorProvider <---> RestrictionTrackerProvider > WatermarkEstimatorParam <---> RestrictionDoFnParam -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
[ https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=387069&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387069 ] ASF GitHub Bot logged work on BEAM-8537: Author: ASF GitHub Bot Created on: 14/Feb/20 02:44 Start Date: 14/Feb/20 02:44 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #10375: [BEAM-8537] Provide WatermarkEstimator to track watermark URL: https://github.com/apache/beam/pull/10375#discussion_r379226651 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py ## @@ -486,19 +486,28 @@ def process( _ = (p | beam.Create(data) | beam.ParDo(ExpandingStringsDoFn())) def test_sdf_with_watermark_tracking(self): +class ManualWatermarkEstimatorProvider( Review comment: Yes we should! Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387069) Time Spent: 15h 50m (was: 15h 40m) > Provide WatermarkEstimatorProvider for different types of WatermarkEstimator > > > Key: BEAM-8537 > URL: https://issues.apache.org/jira/browse/BEAM-8537 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 15h 50m > Remaining Estimate: 0h > > This is a follow up for in-progress PR: > https://github.com/apache/beam/pull/9794. > Current implementation in PR9794 provides a default implementation of > WatermarkEstimator. For further work, we want to let WatermarkEstimator to be > a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to > create a custom WatermarkEstimator per windowed value. It should be similar > to how we track restriction for SDF: > WatermarkEstimator <---> RestrictionTracker > WatermarkEstimatorProvider <---> RestrictionTrackerProvider > WatermarkEstimatorParam <---> RestrictionDoFnParam -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9211) Spark portable jar test script is missing
[ https://issues.apache.org/jira/browse/BEAM-9211?focusedWorklogId=387067&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387067 ] ASF GitHub Bot logged work on BEAM-9211: Author: ASF GitHub Bot Created on: 14/Feb/20 02:42 Start Date: 14/Feb/20 02:42 Worklog Time Spent: 10m Work Description: ibzib commented on issue #10723: [BEAM-9211] upload missing Spark portable jar test script URL: https://github.com/apache/beam/pull/10723#issuecomment-586070657 @mxm I'm now reusing the original script. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387067) Time Spent: 1h 50m (was: 1h 40m) > Spark portable jar test script is missing > - > > Key: BEAM-9211 > URL: https://issues.apache.org/jira/browse/BEAM-9211 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 1h 50m > Remaining Estimate: 0h > > beam_PostCommit_PortableJar_Spark has been failing since its creation because > I forgot to upload the test script it calls. Whoops. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9211) Spark portable jar test script is missing
[ https://issues.apache.org/jira/browse/BEAM-9211?focusedWorklogId=387064&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387064 ] ASF GitHub Bot logged work on BEAM-9211: Author: ASF GitHub Bot Created on: 14/Feb/20 02:42 Start Date: 14/Feb/20 02:42 Worklog Time Spent: 10m Work Description: ibzib commented on issue #10723: [BEAM-9211] upload missing Spark portable jar test script URL: https://github.com/apache/beam/pull/10723#issuecomment-586070544 Run PortableJar_Flink PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387064) Time Spent: 1.5h (was: 1h 20m) > Spark portable jar test script is missing > - > > Key: BEAM-9211 > URL: https://issues.apache.org/jira/browse/BEAM-9211 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 1.5h > Remaining Estimate: 0h > > beam_PostCommit_PortableJar_Spark has been failing since its creation because > I forgot to upload the test script it calls. Whoops. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9211) Spark portable jar test script is missing
[ https://issues.apache.org/jira/browse/BEAM-9211?focusedWorklogId=387065&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387065 ] ASF GitHub Bot logged work on BEAM-9211: Author: ASF GitHub Bot Created on: 14/Feb/20 02:42 Start Date: 14/Feb/20 02:42 Worklog Time Spent: 10m Work Description: ibzib commented on issue #10723: [BEAM-9211] upload missing Spark portable jar test script URL: https://github.com/apache/beam/pull/10723#issuecomment-586070563 Run PortableJar_Spark PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387065) Time Spent: 1h 40m (was: 1.5h) > Spark portable jar test script is missing > - > > Key: BEAM-9211 > URL: https://issues.apache.org/jira/browse/BEAM-9211 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 1h 40m > Remaining Estimate: 0h > > beam_PostCommit_PortableJar_Spark has been failing since its creation because > I forgot to upload the test script it calls. Whoops. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=387066&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387066 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 14/Feb/20 02:42 Start Date: 14/Feb/20 02:42 Worklog Time Spent: 10m Work Description: chrlarsen commented on issue #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#issuecomment-586070601 @pabloem @steveniemitz @iemejia comments have been addressed and updated. Also I will add `inferBeamSchema` to the future plans :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387066) Time Spent: 16h (was: 15h 50m) > Add ThriftIO to Support IO for Thrift Files > --- > > Key: BEAM-8561 > URL: https://issues.apache.org/jira/browse/BEAM-8561 > Project: Beam > Issue Type: New Feature > Components: io-java-files >Reporter: Chris Larsen >Assignee: Chris Larsen >Priority: Major > Time Spent: 16h > Remaining Estimate: 0h > > Similar to AvroIO it would be very useful to support reading and writing > to/from Thrift files with a native connector. > Functionality would include: > # read() - Reading from one or more Thrift files. > # write() - Writing to one or more Thrift files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage
[ https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=387063&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387063 ] ASF GitHub Bot logged work on BEAM-8889: Author: ASF GitHub Bot Created on: 14/Feb/20 02:41 Start Date: 14/Feb/20 02:41 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10769: [BEAM-8889] Upgrades gcsio to 2.0.0 URL: https://github.com/apache/beam/pull/10769#issuecomment-586070331 Run Java11 PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387063) Remaining Estimate: 150h 50m (was: 151h) Time Spent: 17h 10m (was: 17h) > Make GcsUtil use GoogleCloudStorage > --- > > Key: BEAM-8889 > URL: https://issues.apache.org/jira/browse/BEAM-8889 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.16.0 >Reporter: Esun Kim >Assignee: VASU NORI >Priority: Major > Labels: gcs > Original Estimate: 168h > Time Spent: 17h 10m > Remaining Estimate: 150h 50m > > [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java] > is a primary class to access Google Cloud Storage on Apache Beam. Current > implementation directly creates GoogleCloudStorageReadChannel and > GoogleCloudStorageWriteChannel by itself to read and write GCS data rather > than using > [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java] > which is an abstract class providing basic IO capability which eventually > creates channel objects. This request is about updating GcsUtil to use > GoogleCloudStorage to create read and write channel, which is expected > flexible because it can easily pick up the new change; e.g. new channel > implementation using new protocol without code change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage
[ https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=387062&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387062 ] ASF GitHub Bot logged work on BEAM-8889: Author: ASF GitHub Bot Created on: 14/Feb/20 02:40 Start Date: 14/Feb/20 02:40 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10769: [BEAM-8889] Upgrades gcsio to 2.0.0 URL: https://github.com/apache/beam/pull/10769#issuecomment-586070248 Run Java11 PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387062) Remaining Estimate: 151h (was: 151h 10m) Time Spent: 17h (was: 16h 50m) > Make GcsUtil use GoogleCloudStorage > --- > > Key: BEAM-8889 > URL: https://issues.apache.org/jira/browse/BEAM-8889 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.16.0 >Reporter: Esun Kim >Assignee: VASU NORI >Priority: Major > Labels: gcs > Original Estimate: 168h > Time Spent: 17h > Remaining Estimate: 151h > > [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java] > is a primary class to access Google Cloud Storage on Apache Beam. Current > implementation directly creates GoogleCloudStorageReadChannel and > GoogleCloudStorageWriteChannel by itself to read and write GCS data rather > than using > [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java] > which is an abstract class providing basic IO capability which eventually > creates channel objects. This request is about updating GcsUtil to use > GoogleCloudStorage to create read and write channel, which is expected > flexible because it can easily pick up the new change; e.g. new channel > implementation using new protocol without code change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage
[ https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=387061&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387061 ] ASF GitHub Bot logged work on BEAM-8889: Author: ASF GitHub Bot Created on: 14/Feb/20 02:39 Start Date: 14/Feb/20 02:39 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10769: [BEAM-8889] Upgrades gcsio to 2.0.0 URL: https://github.com/apache/beam/pull/10769#issuecomment-586070039 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387061) Remaining Estimate: 151h 10m (was: 151h 20m) Time Spent: 16h 50m (was: 16h 40m) > Make GcsUtil use GoogleCloudStorage > --- > > Key: BEAM-8889 > URL: https://issues.apache.org/jira/browse/BEAM-8889 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.16.0 >Reporter: Esun Kim >Assignee: VASU NORI >Priority: Major > Labels: gcs > Original Estimate: 168h > Time Spent: 16h 50m > Remaining Estimate: 151h 10m > > [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java] > is a primary class to access Google Cloud Storage on Apache Beam. Current > implementation directly creates GoogleCloudStorageReadChannel and > GoogleCloudStorageWriteChannel by itself to read and write GCS data rather > than using > [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java] > which is an abstract class providing basic IO capability which eventually > creates channel objects. This request is about updating GcsUtil to use > GoogleCloudStorage to create read and write channel, which is expected > flexible because it can easily pick up the new change; e.g. new channel > implementation using new protocol without code change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage
[ https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=387060&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387060 ] ASF GitHub Bot logged work on BEAM-8889: Author: ASF GitHub Bot Created on: 14/Feb/20 02:39 Start Date: 14/Feb/20 02:39 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10769: [BEAM-8889] Upgrades gcsio to 2.0.0 URL: https://github.com/apache/beam/pull/10769#issuecomment-586069879 Run CommunityMetrics PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387060) Remaining Estimate: 151h 20m (was: 151.5h) Time Spent: 16h 40m (was: 16.5h) > Make GcsUtil use GoogleCloudStorage > --- > > Key: BEAM-8889 > URL: https://issues.apache.org/jira/browse/BEAM-8889 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.16.0 >Reporter: Esun Kim >Assignee: VASU NORI >Priority: Major > Labels: gcs > Original Estimate: 168h > Time Spent: 16h 40m > Remaining Estimate: 151h 20m > > [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java] > is a primary class to access Google Cloud Storage on Apache Beam. Current > implementation directly creates GoogleCloudStorageReadChannel and > GoogleCloudStorageWriteChannel by itself to read and write GCS data rather > than using > [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java] > which is an abstract class providing basic IO capability which eventually > creates channel objects. This request is about updating GcsUtil to use > GoogleCloudStorage to create read and write channel, which is expected > flexible because it can easily pick up the new change; e.g. new channel > implementation using new protocol without code change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage
[ https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=387059&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387059 ] ASF GitHub Bot logged work on BEAM-8889: Author: ASF GitHub Bot Created on: 14/Feb/20 02:38 Start Date: 14/Feb/20 02:38 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10769: [BEAM-8889] Upgrades gcsio to 2.0.0 URL: https://github.com/apache/beam/pull/10769#issuecomment-586069795 Run Dataflow ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387059) Remaining Estimate: 151.5h (was: 151h 40m) Time Spent: 16.5h (was: 16h 20m) > Make GcsUtil use GoogleCloudStorage > --- > > Key: BEAM-8889 > URL: https://issues.apache.org/jira/browse/BEAM-8889 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.16.0 >Reporter: Esun Kim >Assignee: VASU NORI >Priority: Major > Labels: gcs > Original Estimate: 168h > Time Spent: 16.5h > Remaining Estimate: 151.5h > > [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java] > is a primary class to access Google Cloud Storage on Apache Beam. Current > implementation directly creates GoogleCloudStorageReadChannel and > GoogleCloudStorageWriteChannel by itself to read and write GCS data rather > than using > [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java] > which is an abstract class providing basic IO capability which eventually > creates channel objects. This request is about updating GcsUtil to use > GoogleCloudStorage to create read and write channel, which is expected > flexible because it can easily pick up the new change; e.g. new channel > implementation using new protocol without code change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9211) Spark portable jar test script is missing
[ https://issues.apache.org/jira/browse/BEAM-9211?focusedWorklogId=387058&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387058 ] ASF GitHub Bot logged work on BEAM-9211: Author: ASF GitHub Bot Created on: 14/Feb/20 02:35 Start Date: 14/Feb/20 02:35 Worklog Time Spent: 10m Work Description: ibzib commented on issue #10723: [BEAM-9211] upload missing Spark portable jar test script URL: https://github.com/apache/beam/pull/10723#issuecomment-586069106 Run PortableJar_Spark PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387058) Time Spent: 1h 20m (was: 1h 10m) > Spark portable jar test script is missing > - > > Key: BEAM-9211 > URL: https://issues.apache.org/jira/browse/BEAM-9211 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 1h 20m > Remaining Estimate: 0h > > beam_PostCommit_PortableJar_Spark has been failing since its creation because > I forgot to upload the test script it calls. Whoops. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9211) Spark portable jar test script is missing
[ https://issues.apache.org/jira/browse/BEAM-9211?focusedWorklogId=387057&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387057 ] ASF GitHub Bot logged work on BEAM-9211: Author: ASF GitHub Bot Created on: 14/Feb/20 02:34 Start Date: 14/Feb/20 02:34 Worklog Time Spent: 10m Work Description: ibzib commented on issue #10723: [BEAM-9211] upload missing Spark portable jar test script URL: https://github.com/apache/beam/pull/10723#issuecomment-586068992 Run PortableJar_Flink PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387057) Time Spent: 1h 10m (was: 1h) > Spark portable jar test script is missing > - > > Key: BEAM-9211 > URL: https://issues.apache.org/jira/browse/BEAM-9211 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 1h 10m > Remaining Estimate: 0h > > beam_PostCommit_PortableJar_Spark has been failing since its creation because > I forgot to upload the test script it calls. Whoops. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto
[ https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=387054&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387054 ] ASF GitHub Bot logged work on BEAM-9229: Author: ASF GitHub Bot Created on: 14/Feb/20 02:25 Start Date: 14/Feb/20 02:25 Worklog Time Spent: 10m Work Description: ihji commented on pull request #10733: [BEAM-9229] Adding dependency information to Environment proto URL: https://github.com/apache/beam/pull/10733#discussion_r379222644 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1093,6 +1093,72 @@ message SideInput { FunctionSpec window_mapping_fn = 3; } +message StandardArtifacts { + enum Types { +// A URN for artifacts stored in a local directory. +// payload: ArtifactFilePayload. +FILE = 0 [(beam_urn) = "beam:artifact:file:v1"]; + +// A URN for artifacts embedded in ArtifactInformation proto. +// payload: EmbeddedFilePayload. +EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"]; + +// A URN for artifacts described by remote URLs. +// payload: a string for an artifact URL e.g. "https://.../foo.jar"; +REMOTE = 2 [(beam_urn) = "beam:artifact:remote:v1"]; Review comment: `FILE` and `REMOTE` are merged and renamed to `URL`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387054) Time Spent: 5.5h (was: 5h 20m) > Adding dependency information to Environment proto > -- > > Key: BEAM-9229 > URL: https://issues.apache.org/jira/browse/BEAM-9229 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 5.5h > Remaining Estimate: 0h > > Adding dependency information to Environment proto. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9299) Upgrade Flink Runner to 1.8.3 and 1.9.2
[ https://issues.apache.org/jira/browse/BEAM-9299?focusedWorklogId=387053&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387053 ] ASF GitHub Bot logged work on BEAM-9299: Author: ASF GitHub Bot Created on: 14/Feb/20 02:25 Start Date: 14/Feb/20 02:25 Worklog Time Spent: 10m Work Description: angoenka commented on issue #10850: [BEAM-9299] Upgrade Flink Runner from 1.8.2 to 1.8.3 URL: https://github.com/apache/beam/pull/10850#issuecomment-586066831 Unfortunately, jenkins run the tests against latest Flink supported by beam (1.9) Reference https://github.com/apache/beam/blob/cec1094adba3dd20f382fc07409fbf7fb58fbbc6/sdks/python/test-suites/portable/common.gradle#L57 Can you please test it separately to make sure the version update passes the tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387053) Time Spent: 0.5h (was: 20m) > Upgrade Flink Runner to 1.8.3 and 1.9.2 > --- > > Key: BEAM-9299 > URL: https://issues.apache.org/jira/browse/BEAM-9299 > Project: Beam > Issue Type: Task > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.20.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > I would like to Upgrade Flink Runner to 18.3 and 1.9.2 due to both the Apache > Flink 1.8.3 and Apache Flink 1.9.2 have been released [1]. > What do you think? > [1] https://dist.apache.org/repos/dist/release/flink/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto
[ https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=387051&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387051 ] ASF GitHub Bot logged work on BEAM-9229: Author: ASF GitHub Bot Created on: 14/Feb/20 02:23 Start Date: 14/Feb/20 02:23 Worklog Time Spent: 10m Work Description: ihji commented on pull request #10733: [BEAM-9229] Adding dependency information to Environment proto URL: https://github.com/apache/beam/pull/10733#discussion_r379222180 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1093,6 +1093,72 @@ message SideInput { FunctionSpec window_mapping_fn = 3; } +message StandardArtifacts { + enum Types { +// A URN for artifacts stored in a local directory. +// payload: ArtifactFilePayload. +FILE = 0 [(beam_urn) = "beam:artifact:file:v1"]; + +// A URN for artifacts embedded in ArtifactInformation proto. +// payload: EmbeddedFilePayload. +EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"]; + +// A URN for artifacts described by remote URLs. +// payload: a string for an artifact URL e.g. "https://.../foo.jar"; +REMOTE = 2 [(beam_urn) = "beam:artifact:remote:v1"]; + +// A URN for Python artifacts hosted on PYPI. +// payload: PypiPayload +PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"]; + +// A URN for Java artifacts hosted on Maven central. +// payload: MavenPayload +MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"]; + } +} + +message ArtifactFilePayload { + // A path to an artifact file on a local system. + string local_path = 1; + + // A generated staged name (no path). + string staged_name = 2; +} + +message EmbeddedFilePayload { + // raw data bytes for an embedded artifact + bytes data = 1; + + // A generated staged name (no path). Review comment: Ack. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387051) Time Spent: 5h 20m (was: 5h 10m) > Adding dependency information to Environment proto > -- > > Key: BEAM-9229 > URL: https://issues.apache.org/jira/browse/BEAM-9229 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > > Adding dependency information to Environment proto. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto
[ https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=387050&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387050 ] ASF GitHub Bot logged work on BEAM-9229: Author: ASF GitHub Bot Created on: 14/Feb/20 02:22 Start Date: 14/Feb/20 02:22 Worklog Time Spent: 10m Work Description: ihji commented on pull request #10733: [BEAM-9229] Adding dependency information to Environment proto URL: https://github.com/apache/beam/pull/10733#discussion_r379222112 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1093,6 +1093,72 @@ message SideInput { FunctionSpec window_mapping_fn = 3; } +message StandardArtifacts { + enum Types { +// A URN for artifacts stored in a local directory. +// payload: ArtifactFilePayload. +FILE = 0 [(beam_urn) = "beam:artifact:file:v1"]; + +// A URN for artifacts embedded in ArtifactInformation proto. +// payload: EmbeddedFilePayload. +EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"]; + +// A URN for artifacts described by remote URLs. +// payload: a string for an artifact URL e.g. "https://.../foo.jar"; +REMOTE = 2 [(beam_urn) = "beam:artifact:remote:v1"]; + +// A URN for Python artifacts hosted on PYPI. +// payload: PypiPayload +PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"]; + +// A URN for Java artifacts hosted on Maven central. +// payload: MavenPayload +MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"]; + } +} + +message ArtifactFilePayload { + // A path to an artifact file on a local system. + string local_path = 1; + + // A generated staged name (no path). + string staged_name = 2; +} + +message EmbeddedFilePayload { + // raw data bytes for an embedded artifact + bytes data = 1; + + // A generated staged name (no path). + string staged_name = 2; +} + +message PypiPayload { + // Pypi compatible artifact id e.g. "apache-beam" + string pypi_artifact_id = 1; + + // Pypi compatible version string. + string pypi_version_range = 2; +} + +message MavenPayload { + // Maven compatible group id e.g. "org.apache.beam" + string maven_group_id = 1; Review comment: Merged into a single field `artifact_specifier` which expects the format of `groupId:artifactId:version[:packaging[:classifier]]` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387050) Time Spent: 5h 10m (was: 5h) > Adding dependency information to Environment proto > -- > > Key: BEAM-9229 > URL: https://issues.apache.org/jira/browse/BEAM-9229 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 5h 10m > Remaining Estimate: 0h > > Adding dependency information to Environment proto. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto
[ https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=387049&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387049 ] ASF GitHub Bot logged work on BEAM-9229: Author: ASF GitHub Bot Created on: 14/Feb/20 02:21 Start Date: 14/Feb/20 02:21 Worklog Time Spent: 10m Work Description: ihji commented on pull request #10733: [BEAM-9229] Adding dependency information to Environment proto URL: https://github.com/apache/beam/pull/10733#discussion_r379221800 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1093,6 +1093,72 @@ message SideInput { FunctionSpec window_mapping_fn = 3; } +message StandardArtifacts { + enum Types { +// A URN for artifacts stored in a local directory. +// payload: ArtifactFilePayload. +FILE = 0 [(beam_urn) = "beam:artifact:file:v1"]; + +// A URN for artifacts embedded in ArtifactInformation proto. +// payload: EmbeddedFilePayload. +EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"]; + +// A URN for artifacts described by remote URLs. +// payload: a string for an artifact URL e.g. "https://.../foo.jar"; +REMOTE = 2 [(beam_urn) = "beam:artifact:remote:v1"]; + +// A URN for Python artifacts hosted on PYPI. +// payload: PypiPayload +PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"]; + +// A URN for Java artifacts hosted on Maven central. Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387049) Time Spent: 5h (was: 4h 50m) > Adding dependency information to Environment proto > -- > > Key: BEAM-9229 > URL: https://issues.apache.org/jira/browse/BEAM-9229 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > Adding dependency information to Environment proto. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto
[ https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=387048&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387048 ] ASF GitHub Bot logged work on BEAM-9229: Author: ASF GitHub Bot Created on: 14/Feb/20 02:21 Start Date: 14/Feb/20 02:21 Worklog Time Spent: 10m Work Description: ihji commented on pull request #10733: [BEAM-9229] Adding dependency information to Environment proto URL: https://github.com/apache/beam/pull/10733#discussion_r379221747 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1093,6 +1093,72 @@ message SideInput { FunctionSpec window_mapping_fn = 3; } +message StandardArtifacts { + enum Types { +// A URN for artifacts stored in a local directory. +// payload: ArtifactFilePayload. +FILE = 0 [(beam_urn) = "beam:artifact:file:v1"]; + +// A URN for artifacts embedded in ArtifactInformation proto. +// payload: EmbeddedFilePayload. +EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"]; + +// A URN for artifacts described by remote URLs. +// payload: a string for an artifact URL e.g. "https://.../foo.jar"; +REMOTE = 2 [(beam_urn) = "beam:artifact:remote:v1"]; + +// A URN for Python artifacts hosted on PYPI. +// payload: PypiPayload +PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"]; + +// A URN for Java artifacts hosted on Maven central. +// payload: MavenPayload +MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"]; + } +} + +message ArtifactFilePayload { + // A path to an artifact file on a local system. + string local_path = 1; + + // A generated staged name (no path). + string staged_name = 2; +} + +message EmbeddedFilePayload { + // raw data bytes for an embedded artifact + bytes data = 1; + + // A generated staged name (no path). + string staged_name = 2; +} + +message PypiPayload { + // Pypi compatible artifact id e.g. "apache-beam" + string pypi_artifact_id = 1; + + // Pypi compatible version string. + string pypi_version_range = 2; +} + +message MavenPayload { + // Maven compatible group id e.g. "org.apache.beam" Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387048) Time Spent: 4h 50m (was: 4h 40m) > Adding dependency information to Environment proto > -- > > Key: BEAM-9229 > URL: https://issues.apache.org/jira/browse/BEAM-9229 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > > Adding dependency information to Environment proto. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto
[ https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=387046&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387046 ] ASF GitHub Bot logged work on BEAM-9229: Author: ASF GitHub Bot Created on: 14/Feb/20 02:20 Start Date: 14/Feb/20 02:20 Worklog Time Spent: 10m Work Description: ihji commented on pull request #10733: [BEAM-9229] Adding dependency information to Environment proto URL: https://github.com/apache/beam/pull/10733#discussion_r379221573 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1093,6 +1093,72 @@ message SideInput { FunctionSpec window_mapping_fn = 3; } +message StandardArtifacts { + enum Types { +// A URN for artifacts stored in a local directory. +// payload: ArtifactFilePayload. +FILE = 0 [(beam_urn) = "beam:artifact:file:v1"]; + +// A URN for artifacts embedded in ArtifactInformation proto. +// payload: EmbeddedFilePayload. +EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"]; + +// A URN for artifacts described by remote URLs. +// payload: a string for an artifact URL e.g. "https://.../foo.jar"; +REMOTE = 2 [(beam_urn) = "beam:artifact:remote:v1"]; + +// A URN for Python artifacts hosted on PYPI. +// payload: PypiPayload +PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"]; + +// A URN for Java artifacts hosted on Maven central. +// payload: MavenPayload +MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"]; + } +} + +message ArtifactFilePayload { + // A path to an artifact file on a local system. + string local_path = 1; + + // A generated staged name (no path). + string staged_name = 2; +} + +message EmbeddedFilePayload { + // raw data bytes for an embedded artifact + bytes data = 1; + + // A generated staged name (no path). + string staged_name = 2; Review comment: Ack. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387046) Time Spent: 4.5h (was: 4h 20m) > Adding dependency information to Environment proto > -- > > Key: BEAM-9229 > URL: https://issues.apache.org/jira/browse/BEAM-9229 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > > Adding dependency information to Environment proto. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto
[ https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=387047&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387047 ] ASF GitHub Bot logged work on BEAM-9229: Author: ASF GitHub Bot Created on: 14/Feb/20 02:20 Start Date: 14/Feb/20 02:20 Worklog Time Spent: 10m Work Description: ihji commented on pull request #10733: [BEAM-9229] Adding dependency information to Environment proto URL: https://github.com/apache/beam/pull/10733#discussion_r379221603 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1093,6 +1093,72 @@ message SideInput { FunctionSpec window_mapping_fn = 3; } +message StandardArtifacts { + enum Types { +// A URN for artifacts stored in a local directory. +// payload: ArtifactFilePayload. +FILE = 0 [(beam_urn) = "beam:artifact:file:v1"]; + +// A URN for artifacts embedded in ArtifactInformation proto. +// payload: EmbeddedFilePayload. +EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"]; + +// A URN for artifacts described by remote URLs. +// payload: a string for an artifact URL e.g. "https://.../foo.jar"; +REMOTE = 2 [(beam_urn) = "beam:artifact:remote:v1"]; + +// A URN for Python artifacts hosted on PYPI. +// payload: PypiPayload +PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"]; + +// A URN for Java artifacts hosted on Maven central. +// payload: MavenPayload +MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"]; + } +} + +message ArtifactFilePayload { + // A path to an artifact file on a local system. + string local_path = 1; + + // A generated staged name (no path). + string staged_name = 2; +} + +message EmbeddedFilePayload { + // raw data bytes for an embedded artifact + bytes data = 1; + + // A generated staged name (no path). + string staged_name = 2; +} + +message PypiPayload { Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387047) Time Spent: 4h 40m (was: 4.5h) > Adding dependency information to Environment proto > -- > > Key: BEAM-9229 > URL: https://issues.apache.org/jira/browse/BEAM-9229 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 4h 40m > Remaining Estimate: 0h > > Adding dependency information to Environment proto. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9299) Upgrade Flink Runner to 1.8.3 and 1.9.2
[ https://issues.apache.org/jira/browse/BEAM-9299?focusedWorklogId=387043&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387043 ] ASF GitHub Bot logged work on BEAM-9299: Author: ASF GitHub Bot Created on: 14/Feb/20 02:12 Start Date: 14/Feb/20 02:12 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on issue #10850: [BEAM-9299] Upgrade Flink Runner from 1.8.2 to 1.8.3 URL: https://github.com/apache/beam/pull/10850#issuecomment-586063831 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387043) Time Spent: 20m (was: 10m) > Upgrade Flink Runner to 1.8.3 and 1.9.2 > --- > > Key: BEAM-9299 > URL: https://issues.apache.org/jira/browse/BEAM-9299 > Project: Beam > Issue Type: Task > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.20.0 > > Time Spent: 20m > Remaining Estimate: 0h > > I would like to Upgrade Flink Runner to 18.3 and 1.9.2 due to both the Apache > Flink 1.8.3 and Apache Flink 1.9.2 have been released [1]. > What do you think? > [1] https://dist.apache.org/repos/dist/release/flink/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Issue Comment Deleted] (BEAM-9003) test_reshuffle_preserves_timestamps (apache_beam.transforms.util_test.ReshuffleTest) does not work in Streaming VR suite on Dataflow
[ https://issues.apache.org/jira/browse/BEAM-9003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Wang updated BEAM-9003: --- Comment: was deleted (was: Run the test again after removing MAX_TIMESTAMP from the the data set. The test passed and the pipeline succeeded. [https://pantheon.corp.google.com/dataflow/jobs/us-central1/2020-02-13_15_52_16-2857015966414984604?project=google.com:clouddfe] [https://screenshot.googleplex.com/JHWnHT828eB] ) > test_reshuffle_preserves_timestamps > (apache_beam.transforms.util_test.ReshuffleTest) does not work in Streaming > VR suite on Dataflow > > > Key: BEAM-9003 > URL: https://issues.apache.org/jira/browse/BEAM-9003 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Liu Wang >Priority: Major > > Per investigation in https://issues.apache.org/jira/browse/BEAM-8877, the > test times out and was recently added to VR test suite. > [~liumomo315], I will sickbay this test for streaming, could you please help > triage the failure? > Thank you! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9003) test_reshuffle_preserves_timestamps (apache_beam.transforms.util_test.ReshuffleTest) does not work in Streaming VR suite on Dataflow
[ https://issues.apache.org/jira/browse/BEAM-9003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036635#comment-17036635 ] Liu Wang commented on BEAM-9003: Run the test again after removing MAX_TIMESTAMP from the the data set. The test passed and the pipeline succeeded. [https://pantheon.corp.google.com/dataflow/jobs/us-central1/2020-02-13_15_52_16-2857015966414984604?project=google.com:clouddfe] [https://screenshot.googleplex.com/JHWnHT828eB] > test_reshuffle_preserves_timestamps > (apache_beam.transforms.util_test.ReshuffleTest) does not work in Streaming > VR suite on Dataflow > > > Key: BEAM-9003 > URL: https://issues.apache.org/jira/browse/BEAM-9003 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Liu Wang >Priority: Major > > Per investigation in https://issues.apache.org/jira/browse/BEAM-8877, the > test times out and was recently added to VR test suite. > [~liumomo315], I will sickbay this test for streaming, could you please help > triage the failure? > Thank you! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
[ https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=387024&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387024 ] ASF GitHub Bot logged work on BEAM-8537: Author: ASF GitHub Bot Created on: 14/Feb/20 01:48 Start Date: 14/Feb/20 01:48 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #10375: [BEAM-8537] Provide WatermarkEstimator to track watermark URL: https://github.com/apache/beam/pull/10375#issuecomment-586058417 Re: https://github.com/apache/beam/pull/10375#discussion_r379142182 We still need to check whether `watermark_estimator` is None in the `_OutputProcessor.process_output().` because there are other non-sdf dofns calling this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387024) Time Spent: 15.5h (was: 15h 20m) > Provide WatermarkEstimatorProvider for different types of WatermarkEstimator > > > Key: BEAM-8537 > URL: https://issues.apache.org/jira/browse/BEAM-8537 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 15.5h > Remaining Estimate: 0h > > This is a follow up for in-progress PR: > https://github.com/apache/beam/pull/9794. > Current implementation in PR9794 provides a default implementation of > WatermarkEstimator. For further work, we want to let WatermarkEstimator to be > a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to > create a custom WatermarkEstimator per windowed value. It should be similar > to how we track restriction for SDF: > WatermarkEstimator <---> RestrictionTracker > WatermarkEstimatorProvider <---> RestrictionTrackerProvider > WatermarkEstimatorParam <---> RestrictionDoFnParam -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"
[ https://issues.apache.org/jira/browse/BEAM-8399?focusedWorklogId=387017&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387017 ] ASF GitHub Bot logged work on BEAM-8399: Author: ASF GitHub Bot Created on: 14/Feb/20 01:17 Start Date: 14/Feb/20 01:17 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10223: [BEAM-8399] Add --hdfs_full_urls option URL: https://github.com/apache/beam/pull/10223#issuecomment-586051119 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387017) Time Spent: 3h 10m (was: 3h) > Python HDFS implementation should support filenames of the format > "hdfs://namenodehost/parent/child" > > > Key: BEAM-8399 > URL: https://issues.apache.org/jira/browse/BEAM-8399 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Udi Meiri >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the > correct filename formats for HDFS based on [1] but we currently support > format "hdfs://parent/child". > To not break existing users, we have to either (1) somehow support both > versions by default (based on [2] seems like HDFS does not allow colons in > file path so this might be possible) (2) make > "hdfs://namenodehost/parent/child" optional for now and change it to default > after few versions. > We should also make sure that Beam Java and Python HDFS file-system > implementations are consistent in this regard. > > [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html] > [2] https://issues.apache.org/jira/browse/HDFS-13 > > cc: [~udim] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8201) clean up the current container API
[ https://issues.apache.org/jira/browse/BEAM-8201?focusedWorklogId=387016&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387016 ] ASF GitHub Bot logged work on BEAM-8201: Author: ASF GitHub Bot Created on: 14/Feb/20 01:14 Start Date: 14/Feb/20 01:14 Worklog Time Spent: 10m Work Description: robertwb commented on issue #10843: [BEAM-8201] Pass all other endpoints through provisioning service. URL: https://github.com/apache/beam/pull/10843#issuecomment-586050328 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387016) Time Spent: 50m (was: 40m) > clean up the current container API > -- > > Key: BEAM-8201 > URL: https://issues.apache.org/jira/browse/BEAM-8201 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Hannah Jiang >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > From [~robertwb] > As part of this project, I propose we look at and clean up the current > container API before we "release" it as public and stable. IIRC, we currently > provide the worker arguments through a combination of (1) environment > variables (2) command line parameters to docker and (3) via the provisioning > API. It would be good to have a more principled approach to specifying > arguments (either all the same way, or if they vary, good reason for doing so > rather than by historical accident). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9250) Improve beam release script based on 2.19.0 release experience
[ https://issues.apache.org/jira/browse/BEAM-9250?focusedWorklogId=387011&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387011 ] ASF GitHub Bot logged work on BEAM-9250: Author: ASF GitHub Bot Created on: 14/Feb/20 01:00 Start Date: 14/Feb/20 01:00 Worklog Time Spent: 10m Work Description: markflyhigh commented on issue #10772: [BEAM-9250] Re-structure python release candidate target. URL: https://github.com/apache/beam/pull/10772#issuecomment-586046427 Thank you for looking into release improvement! This gives people a lot more flexibility to (re)run single test. However, I have few concerns about the split. Running 16 jobs add extra manual work for release manager in triggering and tracking. Also our Jenkins cluster has limited slots, run them at same time may increase the waiting queue in a short amount of time, especially some mobile-gaming tests require 30+ mins. I feel it's a really hard trade off. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387011) Time Spent: 2h (was: 1h 50m) > Improve beam release script based on 2.19.0 release experience > -- > > Key: BEAM-9250 > URL: https://issues.apache.org/jira/browse/BEAM-9250 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.20.0 > > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow
[ https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=387012&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387012 ] ASF GitHub Bot logged work on BEAM-7926: Author: ASF GitHub Bot Created on: 14/Feb/20 01:01 Start Date: 14/Feb/20 01:01 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #10731: [BEAM-7926] Data-centric Interactive Part3 URL: https://github.com/apache/beam/pull/10731#discussion_r379202827 ## File path: sdks/python/apache_beam/runners/interactive/interactive_beam.py ## @@ -82,7 +89,100 @@ def run_pipeline(self): ie.current_env().watch(watchable) -def visualize(pcoll): - """Visualizes a PCollection.""" - # TODO(BEAM-7926) - pass +def show(*pcolls): + """Visualizes given PCollections in an interactive exploratory way if used + within a notebook, or prints a heading sampled data if used within an ipython + shell. Noop if used in a non-interactive environment. + + Ad hoc builds a pipeline fragment including only transforms that are + necessary to produce data for given PCollections pcolls, runs the pipeline + fragment to compute data for those pcolls and then visualizes the data. + + The function is always blocking. If used within a notebook, the data + visualized might be dynamically updated before the function returns as more + and more data could getting processed and emitted when the pipeline fragment + is being executed. If used within an ipython shell, there will be no dynamic + plotting but a static plotting in the end of pipeline fragment execution. + + The PCollections given must belong to the same pipeline and be watched by + Interactive Beam (PCollections defined in __main__ are automatically watched). + +For example:: + + p = beam.Pipeline(InteractiveRunner()) + init = p | 'Init' >> beam.Create(range(1000)) + square = init | 'Square' >> beam.Map(lambda x: x * x) + cube = init | 'Cube' >> beam.Map(lambda x: x ** 3) + + # Below builds a pipeline fragment from the defined pipeline `p` that + # contains only applied transforms of `Init` and `Square`. Then the + # interactive runner runs the pipeline fragment implicitly to compute data + # represented by PCollection `square` and visualizes it. + show(square) + + # This is equivalent to `show(square)` because `square` depends on `init` + # and `init` is included in the pipeline fragment and computed anyway. + show(init, square) + + # Below is similar to running `p.run()`. It computes data for both + # PCollection `square` and PCollection `cube`, then visualizes them. + show(square, cube) + """ + assert len(pcolls) > 0, ( + 'Need at least 1 PCollection to show data visualization.') + for pcoll in pcolls: +assert isinstance(pcoll, beam.pvalue.PCollection), ( +'{} is not an apache_beam.pvalue.PCollection.'.format(pcoll)) + user_pipeline = pcolls[0].pipeline + for pcoll in pcolls: +assert pcoll.pipeline is user_pipeline, ( +'{} belongs to a different user-defined pipeline ({}) than that of' +' other PCollections ({}).'.format( +pcoll, pcoll.pipeline, user_pipeline)) + runner = user_pipeline.runner + if isinstance(runner, ir.InteractiveRunner): +runner = runner._underlying_runner + + # Make sure that all PCollections to be shown are watched. If a PCollection + # has not been watched, make up a variable name for that PCollection and watch + # it. No validation is needed here because the watch logic can handle + # arbitrary variables. + watched_pcollections = set() + for watching in ie.current_env().watching(): +for key, val in watching: + if hasattr(val, '__class__') and isinstance(val, beam.pvalue.PCollection): +watched_pcollections.add(val) + for pcoll in pcolls: +if pcoll not in watched_pcollections: + watch({re.sub(r'[\[\]\(\)]', '_', str(pcoll)): pcoll}) + + # Attempt to run background caching job since we have the reference to the + # user-defined pipeline. + bcj.attempt_to_run_background_caching_job(runner, user_pipeline) Review comment: This is to discern instances from modules. For example, given a `background_caching_job`, it feels more like an instance of `BackgroundCachingJob`. For `pipeline_instrument`, when importing the module, we normally rename it to `instr` or `inst`. Because `pipeline_instrument = instr.build_pipeline_instrument(...)`. The abbreviation is to avoid name conflicts. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking
[jira] [Work logged] (BEAM-9250) Improve beam release script based on 2.19.0 release experience
[ https://issues.apache.org/jira/browse/BEAM-9250?focusedWorklogId=387008&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387008 ] ASF GitHub Bot logged work on BEAM-9250: Author: ASF GitHub Bot Created on: 14/Feb/20 00:59 Start Date: 14/Feb/20 00:59 Worklog Time Spent: 10m Work Description: markflyhigh commented on issue #10772: [BEAM-9250] Re-structure python release candidate target. URL: https://github.com/apache/beam/pull/10772#issuecomment-586046427 Thank you for looking into release improvement! This gives people a lot more flexibility to (re)run single test. However, I have few concerns about split the split. Running 16 jobs add extra manual work for release manager in triggering and tracking. Also our Jenkins cluster has limited slots, run them at same time may increase the waiting queue, especially some mobile-gaming tests require 30+ mins. I feel it's a really hard trade off. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387008) Time Spent: 1h 40m (was: 1.5h) > Improve beam release script based on 2.19.0 release experience > -- > > Key: BEAM-9250 > URL: https://issues.apache.org/jira/browse/BEAM-9250 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.20.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9250) Improve beam release script based on 2.19.0 release experience
[ https://issues.apache.org/jira/browse/BEAM-9250?focusedWorklogId=387009&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387009 ] ASF GitHub Bot logged work on BEAM-9250: Author: ASF GitHub Bot Created on: 14/Feb/20 00:59 Start Date: 14/Feb/20 00:59 Worklog Time Spent: 10m Work Description: markflyhigh commented on issue #10772: [BEAM-9250] Re-structure python release candidate target. URL: https://github.com/apache/beam/pull/10772#issuecomment-586046427 Thank you for looking into release improvement! This gives people a lot more flexibility to (re)run single test. However, I have few concerns about the split. Running 16 jobs add extra manual work for release manager in triggering and tracking. Also our Jenkins cluster has limited slots, run them at same time may increase the waiting queue, especially some mobile-gaming tests require 30+ mins. I feel it's a really hard trade off. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387009) Time Spent: 1h 50m (was: 1h 40m) > Improve beam release script based on 2.19.0 release experience > -- > > Key: BEAM-9250 > URL: https://issues.apache.org/jira/browse/BEAM-9250 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.20.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage
[ https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=387010&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387010 ] ASF GitHub Bot logged work on BEAM-8889: Author: ASF GitHub Bot Created on: 14/Feb/20 00:59 Start Date: 14/Feb/20 00:59 Worklog Time Spent: 10m Work Description: veblush commented on issue #10857: [BEAM-8889] Upgrade guava to 28.0-jre URL: https://github.com/apache/beam/pull/10857#issuecomment-586046533 [beam-linkage-check](https://gist.github.com/veblush/fb47ada6e671e87cb1002e170f17db9c) showed no difference. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387010) Remaining Estimate: 151h 40m (was: 151h 50m) Time Spent: 16h 20m (was: 16h 10m) > Make GcsUtil use GoogleCloudStorage > --- > > Key: BEAM-8889 > URL: https://issues.apache.org/jira/browse/BEAM-8889 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.16.0 >Reporter: Esun Kim >Assignee: VASU NORI >Priority: Major > Labels: gcs > Original Estimate: 168h > Time Spent: 16h 20m > Remaining Estimate: 151h 40m > > [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java] > is a primary class to access Google Cloud Storage on Apache Beam. Current > implementation directly creates GoogleCloudStorageReadChannel and > GoogleCloudStorageWriteChannel by itself to read and write GCS data rather > than using > [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java] > which is an abstract class providing basic IO capability which eventually > creates channel objects. This request is about updating GcsUtil to use > GoogleCloudStorage to create read and write channel, which is expected > flexible because it can easily pick up the new change; e.g. new channel > implementation using new protocol without code change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow
[ https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=387007&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387007 ] ASF GitHub Bot logged work on BEAM-7926: Author: ASF GitHub Bot Created on: 14/Feb/20 00:58 Start Date: 14/Feb/20 00:58 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #10731: [BEAM-7926] Data-centric Interactive Part3 URL: https://github.com/apache/beam/pull/10731#discussion_r379202106 ## File path: sdks/python/apache_beam/runners/interactive/pipeline_fragment.py ## @@ -100,17 +100,23 @@ def deduce_fragment(self): self._runner_pipeline.runner, self._options) - def run(self, display_pipeline_graph=False, use_cache=True): + def run(self, + display_pipeline_graph=False, + use_cache=True, + blocking_run=False): Review comment: Renamed `blocking_run` to `blocking`. Renamed all `try-finally` preserve-and-reset variables with `preserved_` prefix. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387007) Time Spent: 46h 40m (was: 46.5h) > Show PCollection with Interactive Beam in a data-centric user flow > -- > > Key: BEAM-7926 > URL: https://issues.apache.org/jira/browse/BEAM-7926 > Project: Beam > Issue Type: New Feature > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 46h 40m > Remaining Estimate: 0h > > Support auto plotting / charting of materialized data of a given PCollection > with Interactive Beam. > Say an Interactive Beam pipeline defined as > > {code:java} > p = beam.Pipeline(InteractiveRunner()) > pcoll = p | 'Transform' >> transform() > pcoll2 = ... > pcoll3 = ...{code} > The use can call a single function and get auto-magical charting of the data. > e.g., > {code:java} > show(pcoll, pcoll2) > {code} > Throughout the process, a pipeline fragment is built to include only > transforms necessary to produce the desired pcolls (pcoll and pcoll2) and > execute that fragment. > This makes the Interactive Beam user flow data-centric. > > Detailed > [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow
[ https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=387006&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387006 ] ASF GitHub Bot logged work on BEAM-7926: Author: ASF GitHub Bot Created on: 14/Feb/20 00:57 Start Date: 14/Feb/20 00:57 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #10731: [BEAM-7926] Data-centric Interactive Part3 URL: https://github.com/apache/beam/pull/10731#discussion_r379201741 ## File path: sdks/python/apache_beam/runners/interactive/interactive_beam_test.py ## @@ -67,6 +77,31 @@ def test_watch_class_instance(self): test_env.watch(self) self.assertEqual(ie.current_env().watching(), test_env.watching()) + def test_show_always_watch_given_pcolls(self): +p = beam.Pipeline(ir.InteractiveRunner()) +# pylint: disable=range-builtin-not-iterating +pcoll = p | 'Create' >> beam.Create(range(10)) +# The pcoll is not watched since watch(locals()) is not explicitly called. +self.assertFalse( +pcoll in _get_watched_pcollections_with_variable_names()) +# The call of show watches pcoll. +ib.show(pcoll) +self.assertTrue( +pcoll in _get_watched_pcollections_with_variable_names()) +# The name of pcoll is made up by show. +self.assertEqual( +'PCollection_Create/Map_decode_.None_', Review comment: Removed the str assertion. Kept the existence check. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387006) Time Spent: 46.5h (was: 46h 20m) > Show PCollection with Interactive Beam in a data-centric user flow > -- > > Key: BEAM-7926 > URL: https://issues.apache.org/jira/browse/BEAM-7926 > Project: Beam > Issue Type: New Feature > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 46.5h > Remaining Estimate: 0h > > Support auto plotting / charting of materialized data of a given PCollection > with Interactive Beam. > Say an Interactive Beam pipeline defined as > > {code:java} > p = beam.Pipeline(InteractiveRunner()) > pcoll = p | 'Transform' >> transform() > pcoll2 = ... > pcoll3 = ...{code} > The use can call a single function and get auto-magical charting of the data. > e.g., > {code:java} > show(pcoll, pcoll2) > {code} > Throughout the process, a pipeline fragment is built to include only > transforms necessary to produce the desired pcolls (pcoll and pcoll2) and > execute that fragment. > This makes the Interactive Beam user flow data-centric. > > Detailed > [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow
[ https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=387005&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387005 ] ASF GitHub Bot logged work on BEAM-7926: Author: ASF GitHub Bot Created on: 14/Feb/20 00:56 Start Date: 14/Feb/20 00:56 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #10731: [BEAM-7926] Data-centric Interactive Part3 URL: https://github.com/apache/beam/pull/10731#discussion_r379201648 ## File path: sdks/python/apache_beam/runners/interactive/interactive_beam.py ## @@ -82,7 +89,100 @@ def run_pipeline(self): ie.current_env().watch(watchable) -def visualize(pcoll): - """Visualizes a PCollection.""" - # TODO(BEAM-7926) - pass +def show(*pcolls): + """Visualizes given PCollections in an interactive exploratory way if used + within a notebook, or prints a heading sampled data if used within an ipython + shell. Noop if used in a non-interactive environment. + + Ad hoc builds a pipeline fragment including only transforms that are + necessary to produce data for given PCollections pcolls, runs the pipeline + fragment to compute data for those pcolls and then visualizes the data. + + The function is always blocking. If used within a notebook, the data + visualized might be dynamically updated before the function returns as more + and more data could getting processed and emitted when the pipeline fragment + is being executed. If used within an ipython shell, there will be no dynamic + plotting but a static plotting in the end of pipeline fragment execution. + + The PCollections given must belong to the same pipeline and be watched by + Interactive Beam (PCollections defined in __main__ are automatically watched). + +For example:: + + p = beam.Pipeline(InteractiveRunner()) + init = p | 'Init' >> beam.Create(range(1000)) + square = init | 'Square' >> beam.Map(lambda x: x * x) + cube = init | 'Cube' >> beam.Map(lambda x: x ** 3) + + # Below builds a pipeline fragment from the defined pipeline `p` that + # contains only applied transforms of `Init` and `Square`. Then the + # interactive runner runs the pipeline fragment implicitly to compute data + # represented by PCollection `square` and visualizes it. + show(square) + + # This is equivalent to `show(square)` because `square` depends on `init` + # and `init` is included in the pipeline fragment and computed anyway. + show(init, square) + + # Below is similar to running `p.run()`. It computes data for both + # PCollection `square` and PCollection `cube`, then visualizes them. + show(square, cube) + """ + assert len(pcolls) > 0, ( + 'Need at least 1 PCollection to show data visualization.') + for pcoll in pcolls: +assert isinstance(pcoll, beam.pvalue.PCollection), ( +'{} is not an apache_beam.pvalue.PCollection.'.format(pcoll)) + user_pipeline = pcolls[0].pipeline + for pcoll in pcolls: +assert pcoll.pipeline is user_pipeline, ( +'{} belongs to a different user-defined pipeline ({}) than that of' +' other PCollections ({}).'.format( +pcoll, pcoll.pipeline, user_pipeline)) + runner = user_pipeline.runner + if isinstance(runner, ir.InteractiveRunner): +runner = runner._underlying_runner + + # Make sure that all PCollections to be shown are watched. If a PCollection + # has not been watched, make up a variable name for that PCollection and watch + # it. No validation is needed here because the watch logic can handle + # arbitrary variables. + watched_pcollections = set() + for watching in ie.current_env().watching(): +for key, val in watching: + if hasattr(val, '__class__') and isinstance(val, beam.pvalue.PCollection): +watched_pcollections.add(val) + for pcoll in pcolls: +if pcoll not in watched_pcollections: + watch({re.sub(r'[\[\]\(\)]', '_', str(pcoll)): pcoll}) + + # Attempt to run background caching job since we have the reference to the + # user-defined pipeline. + bcj.attempt_to_run_background_caching_job(runner, user_pipeline) + + # Build a pipeline fragment for the PCollections and run it. + result = pf.PipelineFragment(list(pcolls)).run() + ie.current_env().set_pipeline_result( + user_pipeline, + result, + is_main_job=True) + + # If in notebook, dynamic plotting as computation goes. + if ie.current_env().is_in_notebook: +for pcoll in pcolls: + visualize(pcoll, dynamic_plotting_interval=1) Review comment: Yes, its unit is second. The underlying implementation uses a `datetime.timedelta`. This is to simplify the `visualize` interface and its usages. This is an automated message from the Apache Git Service. To respond to the message, please log on to
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=387002&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387002 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 14/Feb/20 00:54 Start Date: 14/Feb/20 00:54 Worklog Time Spent: 10m Work Description: robertwb commented on issue #10835: [BEAM-8575] Removed MAX_TIMESTAMP from testing data URL: https://github.com/apache/beam/pull/10835#issuecomment-586045114 This looks good to me. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387002) Time Spent: 51.5h (was: 51h 20m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 51.5h > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow
[ https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=387003&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387003 ] ASF GitHub Bot logged work on BEAM-7926: Author: ASF GitHub Bot Created on: 14/Feb/20 00:54 Start Date: 14/Feb/20 00:54 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #10731: [BEAM-7926] Data-centric Interactive Part3 URL: https://github.com/apache/beam/pull/10731#discussion_r379200933 ## File path: sdks/python/apache_beam/runners/interactive/interactive_beam.py ## @@ -82,7 +89,100 @@ def run_pipeline(self): ie.current_env().watch(watchable) -def visualize(pcoll): - """Visualizes a PCollection.""" - # TODO(BEAM-7926) - pass +def show(*pcolls): + """Visualizes given PCollections in an interactive exploratory way if used + within a notebook, or prints a heading sampled data if used within an ipython + shell. Noop if used in a non-interactive environment. + + Ad hoc builds a pipeline fragment including only transforms that are + necessary to produce data for given PCollections pcolls, runs the pipeline + fragment to compute data for those pcolls and then visualizes the data. + + The function is always blocking. If used within a notebook, the data + visualized might be dynamically updated before the function returns as more + and more data could getting processed and emitted when the pipeline fragment + is being executed. If used within an ipython shell, there will be no dynamic + plotting but a static plotting in the end of pipeline fragment execution. + + The PCollections given must belong to the same pipeline and be watched by + Interactive Beam (PCollections defined in __main__ are automatically watched). + +For example:: + + p = beam.Pipeline(InteractiveRunner()) + init = p | 'Init' >> beam.Create(range(1000)) + square = init | 'Square' >> beam.Map(lambda x: x * x) + cube = init | 'Cube' >> beam.Map(lambda x: x ** 3) + + # Below builds a pipeline fragment from the defined pipeline `p` that + # contains only applied transforms of `Init` and `Square`. Then the + # interactive runner runs the pipeline fragment implicitly to compute data + # represented by PCollection `square` and visualizes it. + show(square) + + # This is equivalent to `show(square)` because `square` depends on `init` + # and `init` is included in the pipeline fragment and computed anyway. + show(init, square) + + # Below is similar to running `p.run()`. It computes data for both + # PCollection `square` and PCollection `cube`, then visualizes them. + show(square, cube) + """ + assert len(pcolls) > 0, ( + 'Need at least 1 PCollection to show data visualization.') + for pcoll in pcolls: +assert isinstance(pcoll, beam.pvalue.PCollection), ( +'{} is not an apache_beam.pvalue.PCollection.'.format(pcoll)) + user_pipeline = pcolls[0].pipeline + for pcoll in pcolls: +assert pcoll.pipeline is user_pipeline, ( +'{} belongs to a different user-defined pipeline ({}) than that of' +' other PCollections ({}).'.format( +pcoll, pcoll.pipeline, user_pipeline)) + runner = user_pipeline.runner + if isinstance(runner, ir.InteractiveRunner): +runner = runner._underlying_runner + + # Make sure that all PCollections to be shown are watched. If a PCollection + # has not been watched, make up a variable name for that PCollection and watch + # it. No validation is needed here because the watch logic can handle + # arbitrary variables. + watched_pcollections = set() + for watching in ie.current_env().watching(): +for key, val in watching: + if hasattr(val, '__class__') and isinstance(val, beam.pvalue.PCollection): +watched_pcollections.add(val) + for pcoll in pcolls: +if pcoll not in watched_pcollections: + watch({re.sub(r'[\[\]\(\)]', '_', str(pcoll)): pcoll}) Review comment: Removed the `must be watched` statement from the docstrings. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387003) Time Spent: 46h 10m (was: 46h) > Show PCollection with Interactive Beam in a data-centric user flow > -- > > Key: BEAM-7926 > URL: https://issues.apache.org/jira/browse/BEAM-7926 > Project: Beam > Issue Type: New Feature > Components: runner-py-interactive >Reporter: Ning Kang >
[jira] [Work logged] (BEAM-9313) beam_PostRelease_NightlySnapshot failure due to ClassNotFoundException: org.apache.beam.model.pipeline.v1.StandardWindowFns$SessionsPayload$Enum
[ https://issues.apache.org/jira/browse/BEAM-9313?focusedWorklogId=387004&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387004 ] ASF GitHub Bot logged work on BEAM-9313: Author: ASF GitHub Bot Created on: 14/Feb/20 00:54 Start Date: 14/Feb/20 00:54 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #10861: [BEAM-9313] Bump dataflow container version URL: https://github.com/apache/beam/pull/10861 Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Pyt
[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto
[ https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=387001&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387001 ] ASF GitHub Bot logged work on BEAM-9229: Author: ASF GitHub Bot Created on: 14/Feb/20 00:52 Start Date: 14/Feb/20 00:52 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10733: [BEAM-9229] Adding dependency information to Environment proto URL: https://github.com/apache/beam/pull/10733#discussion_r379187541 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1093,6 +1093,72 @@ message SideInput { FunctionSpec window_mapping_fn = 3; } +message StandardArtifacts { + enum Types { +// A URN for artifacts stored in a local directory. +// payload: ArtifactFilePayload. +FILE = 0 [(beam_urn) = "beam:artifact:file:v1"]; + +// A URN for artifacts embedded in ArtifactInformation proto. +// payload: EmbeddedFilePayload. +EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"]; + +// A URN for artifacts described by remote URLs. +// payload: a string for an artifact URL e.g. "https://.../foo.jar"; +REMOTE = 2 [(beam_urn) = "beam:artifact:remote:v1"]; Review comment: I would call this one URL rather than REMOTE as most of these are remote. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387001) Time Spent: 4h 20m (was: 4h 10m) > Adding dependency information to Environment proto > -- > > Key: BEAM-9229 > URL: https://issues.apache.org/jira/browse/BEAM-9229 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > Adding dependency information to Environment proto. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto
[ https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=386999&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386999 ] ASF GitHub Bot logged work on BEAM-9229: Author: ASF GitHub Bot Created on: 14/Feb/20 00:52 Start Date: 14/Feb/20 00:52 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10733: [BEAM-9229] Adding dependency information to Environment proto URL: https://github.com/apache/beam/pull/10733#discussion_r379198236 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1087,6 +1087,44 @@ message SideInput { FunctionSpec window_mapping_fn = 3; } +message StandardArtifacts { + enum Types { +// A URN for artifacts stored in a local directory. +// payload: ArtifactFilePayload. +FILE = 0 [(beam_urn) = "beam:artifact:file:v1"]; +// A URN for artifacts embedded in ArtifactInformation proto. +// payload: raw data bytes. +EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"]; +// A URN for artifacts described by HTTP links. +// payload: a string for an artifact HTTP URL +HTTP = 2 [(beam_urn) = "beam:artifact:http:v1"]; +// A URN for artifacts hosted on PYPI. +// artifact_id: a PYPI project name +// version_range: a PYPI compatible version string +// payload: None +PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"]; +// A URN for artifacts hosted on Maven central. +// artifact_id: [maven group id]:[maven artifact id] +// version_range: a Maven compatible version string +// payload: None +MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"]; + } +} + +message ArtifactFilePayload { + // A path to an artifact file on a local system. + string local_path = 1; + // A generated staged name (no path). + string staged_name = 2; +} + +message ArtifactInformation { + string urn = 1; + bytes payload = 2; + string artifact_id = 3; + string version_range = 4; +} Review comment: Role would be an opaque string. (Well, likely a URN, but the runner wouldn't do anything about it.) It would be provided by, and consumed by, the SDK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386999) Time Spent: 4h 10m (was: 4h) > Adding dependency information to Environment proto > -- > > Key: BEAM-9229 > URL: https://issues.apache.org/jira/browse/BEAM-9229 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > Adding dependency information to Environment proto. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto
[ https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=387000&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-387000 ] ASF GitHub Bot logged work on BEAM-9229: Author: ASF GitHub Bot Created on: 14/Feb/20 00:52 Start Date: 14/Feb/20 00:52 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10733: [BEAM-9229] Adding dependency information to Environment proto URL: https://github.com/apache/beam/pull/10733#discussion_r379187475 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1087,6 +1087,44 @@ message SideInput { FunctionSpec window_mapping_fn = 3; } +message StandardArtifacts { + enum Types { +// A URN for artifacts stored in a local directory. +// payload: ArtifactFilePayload. +FILE = 0 [(beam_urn) = "beam:artifact:file:v1"]; +// A URN for artifacts embedded in ArtifactInformation proto. +// payload: raw data bytes. +EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"]; Review comment: Yes, I'm saying that we should have a new type here for that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 387000) > Adding dependency information to Environment proto > -- > > Key: BEAM-9229 > URL: https://issues.apache.org/jira/browse/BEAM-9229 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > Adding dependency information to Environment proto. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage
[ https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=386998&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386998 ] ASF GitHub Bot logged work on BEAM-8889: Author: ASF GitHub Bot Created on: 14/Feb/20 00:50 Start Date: 14/Feb/20 00:50 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10857: [BEAM-8889] Upgrade guava to 28.0-jre URL: https://github.com/apache/beam/pull/10857#issuecomment-586044332 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386998) Remaining Estimate: 151h 50m (was: 152h) Time Spent: 16h 10m (was: 16h) > Make GcsUtil use GoogleCloudStorage > --- > > Key: BEAM-8889 > URL: https://issues.apache.org/jira/browse/BEAM-8889 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.16.0 >Reporter: Esun Kim >Assignee: VASU NORI >Priority: Major > Labels: gcs > Original Estimate: 168h > Time Spent: 16h 10m > Remaining Estimate: 151h 50m > > [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java] > is a primary class to access Google Cloud Storage on Apache Beam. Current > implementation directly creates GoogleCloudStorageReadChannel and > GoogleCloudStorageWriteChannel by itself to read and write GCS data rather > than using > [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java] > which is an abstract class providing basic IO capability which eventually > creates channel objects. This request is about updating GcsUtil to use > GoogleCloudStorage to create read and write channel, which is expected > flexible because it can easily pick up the new change; e.g. new channel > implementation using new protocol without code change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1833) Restructure Python pipeline construction to better follow the Runner API
[ https://issues.apache.org/jira/browse/BEAM-1833?focusedWorklogId=386997&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386997 ] ASF GitHub Bot logged work on BEAM-1833: Author: ASF GitHub Bot Created on: 14/Feb/20 00:46 Start Date: 14/Feb/20 00:46 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #10860: [BEAM-1833] Fixes BEAM-1833 URL: https://github.com/apache/beam/pull/10860 * This changes the "add_output" interface to require a PCollection tag when adding an output to a PTransform. * This also changes the replacement algorithm's to propogate the PCollection tag when doing replacements. * This also moves the DirectRunner's TestStream implementation to a replacement transform. This is because the TestStream relies on getting the output_tags from the PTransform. Change-Id: Ibd80b0d25cd8cc5ff5c28e127f7313638e6664da **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastComp
[jira] [Work logged] (BEAM-9280) Update commons-compress to version 1.20
[ https://issues.apache.org/jira/browse/BEAM-9280?focusedWorklogId=386996&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386996 ] ASF GitHub Bot logged work on BEAM-9280: Author: ASF GitHub Bot Created on: 14/Feb/20 00:45 Start Date: 14/Feb/20 00:45 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10817: [BEAM-9280] Update commons-compress to version 1.20 URL: https://github.com/apache/beam/pull/10817#issuecomment-586043136 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386996) Time Spent: 2h 20m (was: 2h 10m) > Update commons-compress to version 1.20 > --- > > Key: BEAM-9280 > URL: https://issues.apache.org/jira/browse/BEAM-9280 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Fix For: 2.20.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9250) Improve beam release script based on 2.19.0 release experience
[ https://issues.apache.org/jira/browse/BEAM-9250?focusedWorklogId=386995&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386995 ] ASF GitHub Bot logged work on BEAM-9250: Author: ASF GitHub Bot Created on: 14/Feb/20 00:42 Start Date: 14/Feb/20 00:42 Worklog Time Spent: 10m Work Description: markflyhigh commented on pull request #10772: [BEAM-9250] Re-structure python release candidate target. URL: https://github.com/apache/beam/pull/10772#discussion_r379197751 ## File path: .test-infra/jenkins/job_ReleaseCandidate_DataflowRunner_MobileGame_Py2.groovy ## @@ -0,0 +1,37 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import CommonJobProperties as commonJobProperties + +job('beam_PostRelease_Python2_Candidate_MobileGame_Dataflow') { +description('Runs mobile game verification of the Python2 release candidate with dataflow runner by using tar and wheel.') + +// Set common parameters. +commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 360) + +// Allows triggering this build against pull requests. +commonJobProperties.enablePhraseTriggeringFromPullRequest( +delegate, +'Run Py2 ReleaseCandidate Dataflow MobileGame') + +// Execute shell command to test Python SDK. +steps { +shell('cd ' + commonJobProperties.checkoutDir + +' && bash release/src/main/python-release/python_release_automation.sh 2.7 dataflow mobile_game') +} +} Review comment: please add an empty line at the end. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386995) Time Spent: 1.5h (was: 1h 20m) > Improve beam release script based on 2.19.0 release experience > -- > > Key: BEAM-9250 > URL: https://issues.apache.org/jira/browse/BEAM-9250 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: 2.20.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=386994&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386994 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 14/Feb/20 00:41 Start Date: 14/Feb/20 00:41 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10856: [BEAM-8335] Update StreamingCache with new Protos URL: https://github.com/apache/beam/pull/10856#discussion_r379196764 ## File path: sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py ## @@ -79,23 +76,23 @@ def _test_stream_events_before_target(self, target_timestamp): if self._stream_times[tag] >= target_timestamp: continue try: - record = next(r) - records.append((tag, record)) - self._stream_times[tag] = Timestamp.from_proto(record.processing_time) + record = next(r).recorded_event + if record.HasField('processing_time_event'): +self._stream_times[tag] += timestamp.Duration( +micros=record.processing_time_event.advance_duration) + records.append((tag, record, self._stream_times[tag])) Review comment: this would be much more readable if we used `typing.NamedTuple` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386994) Time Spent: 58h 40m (was: 58.5h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 58h 40m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=386993&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386993 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 14/Feb/20 00:39 Start Date: 14/Feb/20 00:39 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10856: [BEAM-8335] Update StreamingCache with new Protos URL: https://github.com/apache/beam/pull/10856#issuecomment-586041696 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386993) Time Spent: 58.5h (was: 58h 20m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 58.5h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=386986&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386986 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 14/Feb/20 00:23 Start Date: 14/Feb/20 00:23 Worklog Time Spent: 10m Work Description: rohdesamuel commented on issue #10856: [BEAM-8335] Update StreamingCache with new Protos URL: https://github.com/apache/beam/pull/10856#issuecomment-586037668 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386986) Time Spent: 58h 20m (was: 58h 10m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 58h 20m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage
[ https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=386985&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386985 ] ASF GitHub Bot logged work on BEAM-8889: Author: ASF GitHub Bot Created on: 14/Feb/20 00:20 Start Date: 14/Feb/20 00:20 Worklog Time Spent: 10m Work Description: veblush commented on pull request #10857: [BEAM-8889] Upgrade guava to 28.0-jre URL: https://github.com/apache/beam/pull/10857 This is a follow-up of #10769 about guava upgrade. Technically this is not required but it'd make a better dependency of beam by providing missing symbols to gcsio. Although these missing symbols are not being used through Beam as of today, chances are that it could cause unexpected problems in future. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=386984&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386984 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 14/Feb/20 00:19 Start Date: 14/Feb/20 00:19 Worklog Time Spent: 10m Work Description: davidyan74 commented on issue #10856: [BEAM-8335] Update StreamingCache with new Protos URL: https://github.com/apache/beam/pull/10856#issuecomment-586036490 Looks like you need to run yapf? https://cwiki.apache.org/confluence/display/BEAM/Python+Tips 27 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386984) Time Spent: 58h 10m (was: 58h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 58h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9228) _SDFBoundedSourceWrapper doesn't distribute data to multiple workers
[ https://issues.apache.org/jira/browse/BEAM-9228?focusedWorklogId=386983&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386983 ] ASF GitHub Bot logged work on BEAM-9228: Author: ASF GitHub Bot Created on: 14/Feb/20 00:18 Start Date: 14/Feb/20 00:18 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on pull request #10847: [BEAM-9228] Support further partition for FnApi ListBuffer URL: https://github.com/apache/beam/pull/10847#discussion_r379191540 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py ## @@ -907,6 +961,12 @@ def get_buffer(buffer_id): if kind in ('materialize', 'timers'): # If `buffer_id` is not a key in `pcoll_buffers`, it will be added by # the `defaultdict`. +if buffer_id not in pcoll_buffers: + coder_id = beam_fn_api_pb2.RemoteGrpcPort.FromString( + process_bundle_descriptor.transforms[transform_id].spec.payload + ).coder_id + coder = context.coders[coder_id] Review comment: I have checked code for safe_coders, but still don't have clear ideas about it. What is the purpose of it? How do we handle for some coder_ids not part of safe_coders? Previously, I tried using safe_coders if the coder_id is in safe_coder, else return `coders[coder_id]`. This also pass tests, is it an option here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386983) Time Spent: 1h 40m (was: 1.5h) > _SDFBoundedSourceWrapper doesn't distribute data to multiple workers > > > Key: BEAM-9228 > URL: https://issues.apache.org/jira/browse/BEAM-9228 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.16.0, 2.18.0, 2.19.0 >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.20.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > A user reported following issue. > - > I have a set of tfrecord files, obtained by converting parquet files with > Spark. Each file is roughly 1GB and I have 11 of those. > I would expect simple statistics gathering (ie counting number of items of > all files) to scale linearly with respect to the number of cores on my system. > I am able to reproduce the issue with the minimal snippet below > {code:java} > import apache_beam as beam > from apache_beam.options.pipeline_options import PipelineOptions > from apache_beam.runners.portability import fn_api_runner > from apache_beam.portability.api import beam_runner_api_pb2 > from apache_beam.portability import python_urns > import sys > pipeline_options = PipelineOptions(['--direct_num_workers', '4']) > file_pattern = 'part-r-00* > runner=fn_api_runner.FnApiRunner( > default_environment=beam_runner_api_pb2.Environment( > urn=python_urns.SUBPROCESS_SDK, > payload=b'%s -m apache_beam.runners.worker.sdk_worker_main' > % sys.executable.encode('ascii'))) > p = beam.Pipeline(runner=runner, options=pipeline_options) > lines = (p | 'read' >> beam.io.tfrecordio.ReadFromTFRecord(file_pattern) > | beam.combiners.Count.Globally() > | beam.io.WriteToText('/tmp/output')) > p.run() > {code} > Only one combination of apache_beam revision / worker type seems to work (I > refer to https://beam.apache.org/documentation/runners/direct/ for the worker > types) > * beam 2.16; neither multithread nor multiprocess achieve high cpu usage on > multiple cores > * beam 2.17: able to achieve high cpu usage on all 4 cores > * beam 2.18: not tested the mulithreaded mode but the multiprocess mode fails > when trying to serialize the Environment instance most likely because of a > change from 2.17 to 2.18. > I also tried briefly SparkRunner with version 2.16 but was no able to achieve > any throughput. > What is the recommnended way to achieve what I am trying to ? How can I > troubleshoot ? > -- > This is caused by [this > PR|https://github.com/apache/beam/commit/02f8ad4eee3ec0ea8cbdc0f99c1dad29f00a9f60]. > A [workaround|https://github.com/apache/beam/pull/10729] is tried, which is > rolling back iobase.py not to use _SDFBoundedSourceWra
[jira] [Created] (BEAM-9313) beam_PostRelease_NightlySnapshot failure due to ClassNotFoundException: org.apache.beam.model.pipeline.v1.StandardWindowFns$SessionsPayload$Enum
Brian Hulette created BEAM-9313: --- Summary: beam_PostRelease_NightlySnapshot failure due to ClassNotFoundException: org.apache.beam.model.pipeline.v1.StandardWindowFns$SessionsPayload$Enum Key: BEAM-9313 URL: https://issues.apache.org/jira/browse/BEAM-9313 Project: Beam Issue Type: Bug Components: test-failures Reporter: Brian Hulette Assignee: Brian Hulette Jenkins: https://builds.apache.org/job/beam_PostRelease_NightlySnapshot/885/ Gradle: https://scans.gradle.com/s/wbwr4nzluxtlc :runners:google-cloud-dataflow-java:runQuickstartJavaDataflow and :runners:google-cloud-dataflow-java:runMobileGamingJavaDataflow are broken with ClassDefNotFound errors like: {code} INFO: 2020-02-08T11:05:44.038Z: Finished operation WordCount.CountWords/Count.PerElement/Combine.perKey(Count)/GroupByKey/Close Feb 08, 2020 11:05:45 AM org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process INFO: 2020-02-08T11:05:44.119Z: Executing operation WordCount.CountWords/Count.PerElement/Combine.perKey(Count)/GroupByKey/Read+WordCount.CountWords/Count.PerElement/Combine.perKey(Count)/Combine.GroupedValues+WordCount.CountWords/Count.PerElement/Combine.perKey(Count)/Combine.GroupedValues/Extract+MapElements/Map+WriteCounts/WriteFiles/RewindowIntoGlobal/Window.Assign+WriteCounts/WriteFiles/WriteUnshardedBundlesToTempFiles/WriteUnshardedBundles+WriteCounts/WriteFiles/GatherTempFileResults/View.AsList/ParDo(ToIsmRecordForGlobalWindow)+WriteCounts/WriteFiles/WriteUnshardedBundlesToTempFiles/GroupUnwritten/Reify+WriteCounts/WriteFiles/WriteUnshardedBundlesToTempFiles/GroupUnwritten/Write Feb 08, 2020 11:05:47 AM org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process SEVERE: 2020-02-08T11:05:46.096Z: java.lang.NoClassDefFoundError: org/apache/beam/model/pipeline/v1/StandardWindowFns$SessionsPayload$Enum at org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.construction.WindowingStrategyTranslation.(WindowingStrategyTranslation.java:211) at org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowParDoFnFactory.deserializeWindowingStrategy(GroupAlsoByWindowParDoFnFactory.java:234) at org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowParDoFnFactory.create(GroupAlsoByWindowParDoFnFactory.java:99) at org.apache.beam.runners.dataflow.worker.DefaultParDoFnFactory.create(DefaultParDoFnFactory.java:75) at org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.createParDoOperation(IntrinsicMapTaskExecutorFactory.java:264) at org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.access$000(IntrinsicMapTaskExecutorFactory.java:86) at org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$1.typedApply(IntrinsicMapTaskExecutorFactory.java:183) at org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$1.typedApply(IntrinsicMapTaskExecutorFactory.java:165) at org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:63) at org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:50) at org.apache.beam.runners.dataflow.worker.graph.Networks.replaceDirectedNetworkNodes(Networks.java:87) at org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.create(IntrinsicMapTaskExecutorFactory.java:125) at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:353) at org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:306) at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140) at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120) at org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: org.apache.beam.model.pipeline.v1.StandardWindowFns$SessionsPayload$Enum at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
[jira] [Work logged] (BEAM-8561) Add ThriftIO to Support IO for Thrift Files
[ https://issues.apache.org/jira/browse/BEAM-8561?focusedWorklogId=386979&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386979 ] ASF GitHub Bot logged work on BEAM-8561: Author: ASF GitHub Bot Created on: 14/Feb/20 00:11 Start Date: 14/Feb/20 00:11 Worklog Time Spent: 10m Work Description: chrlarsen commented on pull request #10290: [BEAM-8561] Add ThriftIO to support IO for Thrift files URL: https://github.com/apache/beam/pull/10290#discussion_r379189609 ## File path: sdks/java/io/thrift/src/main/java/org/apache/beam/sdk/io/thrift/ThriftIO.java ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.thrift; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument; +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkNotNull; + +import com.google.auto.value.AutoValue; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.io.InputStream; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Experimental; +import org.apache.beam.sdk.coders.Coder; +import org.apache.beam.sdk.io.FileIO; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.transforms.display.DisplayData; +import org.apache.beam.sdk.values.PCollection; +import org.apache.thrift.TBase; +import org.apache.thrift.TException; +import org.apache.thrift.protocol.TProtocol; +import org.apache.thrift.protocol.TProtocolFactory; +import org.apache.thrift.transport.TIOStreamTransport; +import org.apache.thrift.transport.TTransportException; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * {@link PTransform}s for reading and writing files containing Thrift encoded data. + * + * Reading Thrift Files + * + * For simple reading, use {@link ThriftIO#} with the desired file pattern to read from. + * + * For example: + * + * {@code + * PCollection examples = pipeline.apply(ThriftIO.read().from("/foo/bar/*")); + * ... + * } + * + * For more advanced use cases, like reading each file in a {@link PCollection} of {@link + * FileIO.ReadableFile}, use the {@link ReadFiles} transform. + * + * For example: + * + * {@code + * PCollection files = pipeline + * .apply(FileIO.match().filepattern(options.getInputFilepattern()) + * .apply(FileIO.readMatches()); + * + * PCollection examples = files.apply(ThriftIO.readFiles(ExampleType.class).withProtocol(thriftProto); + * } + * + * Writing Thrift Files + * + * {@link ThriftIO.Sink} allows for a {@link PCollection} of {@link TBase} to be written to + * Thrift files. It can be used with the general-purpose {@link FileIO} transforms with + * FileIO.write/writeDynamic specifically. + * + * For example: + * + * {@code + * pipeline + * .apply(...) // PCollection + * .apply(FileIO + * .write() + * .via(ThriftIO.sink(thriftProto)) + * .to("destination/path"); + * } + * + * This IO API is considered experimental and may break or receive backwards-incompatible changes + * in future versions of the Apache Beam SDK. + */ +@Experimental(Experimental.Kind.SOURCE_SINK) +public class ThriftIO { + + private static final Logger LOG = LoggerFactory.getLogger(ThriftIO.class); + + /** Disable construction of utility class. */ + private ThriftIO() {} + + /** + * Reads each file in a {@link PCollection} of {@link org.apache.beam.sdk.io.FileIO.ReadableFile}, + * which allows more flexible usage. + */ + public static ReadFiles readFiles(Class recordClass) { +return new AutoValue_ThriftIO_ReadFiles.Builder().setRecordClass(recordClass).build(); + } + + // + + /** Creates a {@link Sink} for use with {@link FileIO#write} and {@link FileIO#writeDynamic}. */ + public static > Sink si
[jira] [Work logged] (BEAM-9212) ZetaSQL structs always cause exception
[ https://issues.apache.org/jira/browse/BEAM-9212?focusedWorklogId=386978&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386978 ] ASF GitHub Bot logged work on BEAM-9212: Author: ASF GitHub Bot Created on: 14/Feb/20 00:02 Start Date: 14/Feb/20 00:02 Worklog Time Spent: 10m Work Description: ibzib commented on issue #10707: [BEAM-9212] fix zetasql struct exception URL: https://github.com/apache/beam/pull/10707#issuecomment-586031921 Rebased this. Struct parameters still unusable however bc of https://issues.apache.org/jira/browse/BEAM-9300 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386978) Time Spent: 1h (was: 50m) > ZetaSQL structs always cause exception > -- > > Key: BEAM-9212 > URL: https://issues.apache.org/jira/browse/BEAM-9212 > Project: Beam > Issue Type: Bug > Components: dsl-sql-zetasql >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Fix For: Not applicable > > Time Spent: 1h > Remaining Estimate: 0h > > Because of an unimplemented method in ZetaSQL [1], StructType.toString always > throws an exception. This is bad because we always call toString implicitly > in a String.format [2]. > [1] https://github.com/google/zetasql/issues/24 > [2] > https://github.com/apache/beam/blob/b02a325409d55f1ecb7f9fb6ecc4f60a974c810d/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/ExpressionConverter.java#L1010-L1012 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto
[ https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=386977&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386977 ] ASF GitHub Bot logged work on BEAM-9229: Author: ASF GitHub Bot Created on: 14/Feb/20 00:02 Start Date: 14/Feb/20 00:02 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10733: [BEAM-9229] Adding dependency information to Environment proto URL: https://github.com/apache/beam/pull/10733#discussion_r379186995 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1093,6 +1093,72 @@ message SideInput { FunctionSpec window_mapping_fn = 3; } +message StandardArtifacts { + enum Types { +// A URN for artifacts stored in a local directory. +// payload: ArtifactFilePayload. +FILE = 0 [(beam_urn) = "beam:artifact:file:v1"]; + +// A URN for artifacts embedded in ArtifactInformation proto. +// payload: EmbeddedFilePayload. +EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"]; + +// A URN for artifacts described by remote URLs. +// payload: a string for an artifact URL e.g. "https://.../foo.jar"; +REMOTE = 2 [(beam_urn) = "beam:artifact:remote:v1"]; + +// A URN for Python artifacts hosted on PYPI. +// payload: PypiPayload +PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"]; + +// A URN for Java artifacts hosted on Maven central. +// payload: MavenPayload +MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"]; + } +} + +message ArtifactFilePayload { + // A path to an artifact file on a local system. + string local_path = 1; + + // A generated staged name (no path). + string staged_name = 2; +} + +message EmbeddedFilePayload { + // raw data bytes for an embedded artifact + bytes data = 1; + + // A generated staged name (no path). + string staged_name = 2; +} + +message PypiPayload { + // Pypi compatible artifact id e.g. "apache-beam" + string pypi_artifact_id = 1; + + // Pypi compatible version string. + string pypi_version_range = 2; +} + +message MavenPayload { + // Maven compatible group id e.g. "org.apache.beam" + string maven_group_id = 1; Review comment: If there's a standard string format, let's just use that rather than breaking this up. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386977) Time Spent: 4h (was: 3h 50m) > Adding dependency information to Environment proto > -- > > Key: BEAM-9229 > URL: https://issues.apache.org/jira/browse/BEAM-9229 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > Adding dependency information to Environment proto. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9312) Make MonotonicWatermarkEstimator configurable for whether to ignore late timestamp.
Boyuan Zhang created BEAM-9312: -- Summary: Make MonotonicWatermarkEstimator configurable for whether to ignore late timestamp. Key: BEAM-9312 URL: https://issues.apache.org/jira/browse/BEAM-9312 Project: Beam Issue Type: Improvement Components: sdk-py-core, sdk-py-harness Reporter: Boyuan Zhang Current implementation of MonotonicWatermarkEstimator throws error and stop the pipeline when there is a late timestamp. But there are more potential options like: (1) Suppress the error and emit the item as possibly late data. (2) Move the timestamp forward to respect the watermark. We should consider making MonotonicWatermarkEstimator configurable with these options, or providing different types of MonotonicWatermarkEstimator to handle different options. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9291) upload_graph support in Dataflow Python SDK
[ https://issues.apache.org/jira/browse/BEAM-9291?focusedWorklogId=386975&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386975 ] ASF GitHub Bot logged work on BEAM-9291: Author: ASF GitHub Bot Created on: 14/Feb/20 00:01 Start Date: 14/Feb/20 00:01 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10829: [BEAM-9291] Upload graph option in dataflow's python sdk URL: https://github.com/apache/beam/pull/10829#issuecomment-586031714 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386975) Time Spent: 3h 40m (was: 3.5h) > upload_graph support in Dataflow Python SDK > --- > > Key: BEAM-9291 > URL: https://issues.apache.org/jira/browse/BEAM-9291 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Radosław Stankiewicz >Assignee: Radosław Stankiewicz >Priority: Minor > Time Spent: 3h 40m > Remaining Estimate: 0h > > upload_graph option is not supported in Dataflow's Python SDK so there is no > workaround for large graphs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9291) upload_graph support in Dataflow Python SDK
[ https://issues.apache.org/jira/browse/BEAM-9291?focusedWorklogId=386976&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386976 ] ASF GitHub Bot logged work on BEAM-9291: Author: ASF GitHub Bot Created on: 14/Feb/20 00:01 Start Date: 14/Feb/20 00:01 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10829: [BEAM-9291] Upload graph option in dataflow's python sdk URL: https://github.com/apache/beam/pull/10829#issuecomment-586031756 Run Python2_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386976) Time Spent: 3h 50m (was: 3h 40m) > upload_graph support in Dataflow Python SDK > --- > > Key: BEAM-9291 > URL: https://issues.apache.org/jira/browse/BEAM-9291 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Radosław Stankiewicz >Assignee: Radosław Stankiewicz >Priority: Minor > Time Spent: 3h 50m > Remaining Estimate: 0h > > upload_graph option is not supported in Dataflow's Python SDK so there is no > workaround for large graphs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto
[ https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=386973&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386973 ] ASF GitHub Bot logged work on BEAM-9229: Author: ASF GitHub Bot Created on: 13/Feb/20 23:59 Start Date: 13/Feb/20 23:59 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10733: [BEAM-9229] Adding dependency information to Environment proto URL: https://github.com/apache/beam/pull/10733#discussion_r379186220 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1093,6 +1093,72 @@ message SideInput { FunctionSpec window_mapping_fn = 3; } +message StandardArtifacts { + enum Types { +// A URN for artifacts stored in a local directory. +// payload: ArtifactFilePayload. +FILE = 0 [(beam_urn) = "beam:artifact:file:v1"]; + +// A URN for artifacts embedded in ArtifactInformation proto. +// payload: EmbeddedFilePayload. +EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"]; + +// A URN for artifacts described by remote URLs. +// payload: a string for an artifact URL e.g. "https://.../foo.jar"; +REMOTE = 2 [(beam_urn) = "beam:artifact:remote:v1"]; + +// A URN for Python artifacts hosted on PYPI. +// payload: PypiPayload +PYPI = 3 [(beam_urn) = "beam:artifact:pypi:v1"]; + +// A URN for Java artifacts hosted on Maven central. +// payload: MavenPayload +MAVEN= 4 [(beam_urn) = "beam:artifact:maven:v1"]; + } +} + +message ArtifactFilePayload { + // A path to an artifact file on a local system. + string local_path = 1; + + // A generated staged name (no path). + string staged_name = 2; +} + +message EmbeddedFilePayload { + // raw data bytes for an embedded artifact + bytes data = 1; + + // A generated staged name (no path). Review comment: +1 to allowing (relative) paths to some stagingDir root. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386973) Time Spent: 3h 50m (was: 3h 40m) > Adding dependency information to Environment proto > -- > > Key: BEAM-9229 > URL: https://issues.apache.org/jira/browse/BEAM-9229 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > > Adding dependency information to Environment proto. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9229) Adding dependency information to Environment proto
[ https://issues.apache.org/jira/browse/BEAM-9229?focusedWorklogId=386971&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386971 ] ASF GitHub Bot logged work on BEAM-9229: Author: ASF GitHub Bot Created on: 13/Feb/20 23:58 Start Date: 13/Feb/20 23:58 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10733: [BEAM-9229] Adding dependency information to Environment proto URL: https://github.com/apache/beam/pull/10733#discussion_r379185959 ## File path: model/pipeline/src/main/proto/beam_runner_api.proto ## @@ -1093,6 +1093,72 @@ message SideInput { FunctionSpec window_mapping_fn = 3; } +message StandardArtifacts { + enum Types { +// A URN for artifacts stored in a local directory. +// payload: ArtifactFilePayload. +FILE = 0 [(beam_urn) = "beam:artifact:file:v1"]; + +// A URN for artifacts embedded in ArtifactInformation proto. +// payload: EmbeddedFilePayload. +EMBEDDED = 1 [(beam_urn) = "beam:artifact:embedded:v1"]; + +// A URN for artifacts described by remote URLs. +// payload: a string for an artifact URL e.g. "https://.../foo.jar"; +REMOTE = 2 [(beam_urn) = "beam:artifact:remote:v1"]; Review comment: Regarding the file payload, I think we may want to stage this under a different name than its "real" name. However, I think the stage name needs to be lifted to a higher level and apply to all artifact types. One reason to keep file separate than URL is that we may want to set the space of files be those understood by (and fetched via) the beam filesystems packages (including remote, non-(public?)-url ones like gcs or aws). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386971) Time Spent: 3h 40m (was: 3.5h) > Adding dependency information to Environment proto > -- > > Key: BEAM-9229 > URL: https://issues.apache.org/jira/browse/BEAM-9229 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > Adding dependency information to Environment proto. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
[ https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=386967&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386967 ] ASF GitHub Bot logged work on BEAM-8537: Author: ASF GitHub Bot Created on: 13/Feb/20 23:55 Start Date: 13/Feb/20 23:55 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10375: [BEAM-8537] Provide WatermarkEstimator to track watermark URL: https://github.com/apache/beam/pull/10375#discussion_r379184461 ## File path: sdks/python/apache_beam/io/watermark_estimators.py ## @@ -0,0 +1,101 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""A collection of WatermarkEstimator implementations that SplittableDoFns +can use.""" + +# pytype: skip-file + +from __future__ import absolute_import + +from apache_beam.io.iobase import WatermarkEstimator +from apache_beam.utils.timestamp import Timestamp + + +class MonotonicWatermarkEstimator(WatermarkEstimator): + """A WatermarkEstimator which assumes that timestamps of all ouput records + are increasing monotonically. + """ + def __init__(self, timestamp): +"""For a new pair, the initial value is None. When +resuming processing, the initial timestamp will be the last reported +watermark. +""" +self._watermark = timestamp + + def observe_timestamp(self, timestamp): +if self._watermark is None: + self._watermark = timestamp +else: + if timestamp < self._watermark: +raise ValueError('A MonotonicWatermarkEstimator expects output ' + 'timestamp to be increasing monotonically.') + self._watermark = timestamp + + def current_watermark(self): +return self._watermark + + def get_estimator_state(self): +return self._watermark + + +class WalltimeWatermarkEstimator(WatermarkEstimator): + """A WatermarkEstimator which uses processing time as the estimated watermark. + """ + def __init__(self, timestamp=None): +if timestamp: Review comment: FWIW, This can be written `self._timestamp = timestamp or Timestamp.now()`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386967) > Provide WatermarkEstimatorProvider for different types of WatermarkEstimator > > > Key: BEAM-8537 > URL: https://issues.apache.org/jira/browse/BEAM-8537 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 15h 10m > Remaining Estimate: 0h > > This is a follow up for in-progress PR: > https://github.com/apache/beam/pull/9794. > Current implementation in PR9794 provides a default implementation of > WatermarkEstimator. For further work, we want to let WatermarkEstimator to be > a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to > create a custom WatermarkEstimator per windowed value. It should be similar > to how we track restriction for SDF: > WatermarkEstimator <---> RestrictionTracker > WatermarkEstimatorProvider <---> RestrictionTrackerProvider > WatermarkEstimatorParam <---> RestrictionDoFnParam -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
[ https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=386969&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386969 ] ASF GitHub Bot logged work on BEAM-8537: Author: ASF GitHub Bot Created on: 13/Feb/20 23:55 Start Date: 13/Feb/20 23:55 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10375: [BEAM-8537] Provide WatermarkEstimator to track watermark URL: https://github.com/apache/beam/pull/10375#discussion_r379142973 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py ## @@ -486,19 +486,28 @@ def process( _ = (p | beam.Create(data) | beam.ParDo(ExpandingStringsDoFn())) def test_sdf_with_watermark_tracking(self): +class ManualWatermarkEstimatorProvider( Review comment: Hmm... should we provide these as well? (Perhaps via a static default_provider() method on the corresponding watermark estimators?) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386969) Time Spent: 15h 20m (was: 15h 10m) > Provide WatermarkEstimatorProvider for different types of WatermarkEstimator > > > Key: BEAM-8537 > URL: https://issues.apache.org/jira/browse/BEAM-8537 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 15h 20m > Remaining Estimate: 0h > > This is a follow up for in-progress PR: > https://github.com/apache/beam/pull/9794. > Current implementation in PR9794 provides a default implementation of > WatermarkEstimator. For further work, we want to let WatermarkEstimator to be > a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to > create a custom WatermarkEstimator per windowed value. It should be similar > to how we track restriction for SDF: > WatermarkEstimator <---> RestrictionTracker > WatermarkEstimatorProvider <---> RestrictionTrackerProvider > WatermarkEstimatorParam <---> RestrictionDoFnParam -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
[ https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=386968&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386968 ] ASF GitHub Bot logged work on BEAM-8537: Author: ASF GitHub Bot Created on: 13/Feb/20 23:55 Start Date: 13/Feb/20 23:55 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10375: [BEAM-8537] Provide WatermarkEstimator to track watermark URL: https://github.com/apache/beam/pull/10375#discussion_r379141998 ## File path: sdks/python/apache_beam/runners/common.py ## @@ -765,40 +802,54 @@ def _invoke_process_per_window(self, # ProcessSizedElementAndRestriction. self.threadsafe_restriction_tracker.check_done() deferred_status = self.threadsafe_restriction_tracker.deferred_status() - current_watermark = None - if self.watermark_estimator: -current_watermark = self.watermark_estimator.current_watermark() if deferred_status: deferred_restriction, deferred_timestamp = deferred_status element = windowed_value.value size = self.signature.get_restriction_provider().restriction_size( element, deferred_restriction) -residual_value = ((element, deferred_restriction), size) -return SplitResultResidual( -residual_value=windowed_value.with_value(residual_value), -current_watermark=current_watermark, -deferred_timestamp=deferred_timestamp) -return None +if self.threadsafe_watermark_estimator: + current_watermark = ( + self.threadsafe_watermark_estimator.current_watermark()) + estimator_state = ( + self.threadsafe_watermark_estimator.get_estimator_state()) + residual_value = ( + (element, (deferred_restriction, estimator_state)), size) + return SplitResultResidual( + residual_value=windowed_value.with_value(residual_value), + current_watermark=current_watermark, + deferred_timestamp=deferred_timestamp) +else: Review comment: I think this can be removed as self.threadsafe_watermark_estimator will always be something, right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386968) Time Spent: 15h 20m (was: 15h 10m) > Provide WatermarkEstimatorProvider for different types of WatermarkEstimator > > > Key: BEAM-8537 > URL: https://issues.apache.org/jira/browse/BEAM-8537 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 15h 20m > Remaining Estimate: 0h > > This is a follow up for in-progress PR: > https://github.com/apache/beam/pull/9794. > Current implementation in PR9794 provides a default implementation of > WatermarkEstimator. For further work, we want to let WatermarkEstimator to be > a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to > create a custom WatermarkEstimator per windowed value. It should be similar > to how we track restriction for SDF: > WatermarkEstimator <---> RestrictionTracker > WatermarkEstimatorProvider <---> RestrictionTrackerProvider > WatermarkEstimatorParam <---> RestrictionDoFnParam -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
[ https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=386964&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386964 ] ASF GitHub Bot logged work on BEAM-8537: Author: ASF GitHub Bot Created on: 13/Feb/20 23:55 Start Date: 13/Feb/20 23:55 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10375: [BEAM-8537] Provide WatermarkEstimator to track watermark URL: https://github.com/apache/beam/pull/10375#discussion_r379142182 ## File path: sdks/python/apache_beam/runners/common.py ## @@ -1035,6 +1284,8 @@ def process_outputs(self, windowed_input_element, results): windowed_value.windows *= len(windowed_input_element.windows) else: windowed_value = windowed_input_element.with_value(result) + if watermark_estimator: Review comment: Fix? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386964) Time Spent: 15h (was: 14h 50m) > Provide WatermarkEstimatorProvider for different types of WatermarkEstimator > > > Key: BEAM-8537 > URL: https://issues.apache.org/jira/browse/BEAM-8537 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 15h > Remaining Estimate: 0h > > This is a follow up for in-progress PR: > https://github.com/apache/beam/pull/9794. > Current implementation in PR9794 provides a default implementation of > WatermarkEstimator. For further work, we want to let WatermarkEstimator to be > a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to > create a custom WatermarkEstimator per windowed value. It should be similar > to how we track restriction for SDF: > WatermarkEstimator <---> RestrictionTracker > WatermarkEstimatorProvider <---> RestrictionTrackerProvider > WatermarkEstimatorParam <---> RestrictionDoFnParam -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
[ https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=386963&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386963 ] ASF GitHub Bot logged work on BEAM-8537: Author: ASF GitHub Bot Created on: 13/Feb/20 23:55 Start Date: 13/Feb/20 23:55 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10375: [BEAM-8537] Provide WatermarkEstimator to track watermark URL: https://github.com/apache/beam/pull/10375#discussion_r379136355 ## File path: sdks/python/apache_beam/runners/common.py ## @@ -321,6 +333,21 @@ def is_splittable_dofn(self): # type: () -> bool return self.get_restriction_provider() is not None + def get_restriction_coder(self): +"""Get coder for a restriction when processing an SDF. + +If the SDF uses an WatermarkEstimatorProvider, the restriction coder is a +tuple coder of (restriction_coder, estimator_state_coder). Otherwise, +returns the SDFs restriction_coder. +""" +restriction_coder = None Review comment: Nit: https://engdoc.corp.google.com/eng/doc/devguide/py/totw/026.md This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386963) Time Spent: 15h (was: 14h 50m) > Provide WatermarkEstimatorProvider for different types of WatermarkEstimator > > > Key: BEAM-8537 > URL: https://issues.apache.org/jira/browse/BEAM-8537 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 15h > Remaining Estimate: 0h > > This is a follow up for in-progress PR: > https://github.com/apache/beam/pull/9794. > Current implementation in PR9794 provides a default implementation of > WatermarkEstimator. For further work, we want to let WatermarkEstimator to be > a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to > create a custom WatermarkEstimator per windowed value. It should be similar > to how we track restriction for SDF: > WatermarkEstimator <---> RestrictionTracker > WatermarkEstimatorProvider <---> RestrictionTrackerProvider > WatermarkEstimatorParam <---> RestrictionDoFnParam -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
[ https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=386966&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386966 ] ASF GitHub Bot logged work on BEAM-8537: Author: ASF GitHub Bot Created on: 13/Feb/20 23:55 Start Date: 13/Feb/20 23:55 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10375: [BEAM-8537] Provide WatermarkEstimator to track watermark URL: https://github.com/apache/beam/pull/10375#discussion_r379136495 ## File path: sdks/python/apache_beam/runners/common.py ## @@ -323,6 +330,27 @@ def is_splittable_dofn(self): # type: () -> bool return self.get_restriction_provider() is not None + def is_tracking_watermark(self): +return self.get_watermark_estimator_provider() is not None + + def get_restriction_coder(self): +"""Get coder for a restriction when processing an SDF. + +If the SDF uses an WatermarkEstimatorProvider, the restriction coder is a Review comment: This code assumes a watermark estimator is always available, if so update the docstring (and simplify the code elsewhere to not have a different branch for the no-watermark-estimator case). (Yes, this should be feasible for other SDKs as well, but each SDK can do their own thing.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386966) Time Spent: 15h 10m (was: 15h) > Provide WatermarkEstimatorProvider for different types of WatermarkEstimator > > > Key: BEAM-8537 > URL: https://issues.apache.org/jira/browse/BEAM-8537 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 15h 10m > Remaining Estimate: 0h > > This is a follow up for in-progress PR: > https://github.com/apache/beam/pull/9794. > Current implementation in PR9794 provides a default implementation of > WatermarkEstimator. For further work, we want to let WatermarkEstimator to be > a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to > create a custom WatermarkEstimator per windowed value. It should be similar > to how we track restriction for SDF: > WatermarkEstimator <---> RestrictionTracker > WatermarkEstimatorProvider <---> RestrictionTrackerProvider > WatermarkEstimatorParam <---> RestrictionDoFnParam -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9290) runner_harness_container_image experiment is not honored in python released sdks.
[ https://issues.apache.org/jira/browse/BEAM-9290?focusedWorklogId=386962&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386962 ] ASF GitHub Bot logged work on BEAM-9290: Author: ASF GitHub Bot Created on: 13/Feb/20 23:55 Start Date: 13/Feb/20 23:55 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #10827: [BEAM-9290] Support runner_harness_container_image in released python… URL: https://github.com/apache/beam/pull/10827 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386962) Time Spent: 1h 10m (was: 1h) > runner_harness_container_image experiment is not honored in python released > sdks. > - > > Key: BEAM-9290 > URL: https://issues.apache.org/jira/browse/BEAM-9290 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-py-core >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Fix For: 2.20.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > > {code:java} > --experiments=runner_harness_container_image=foo_image{code} > does not have any affect on the job. > > > cc: [~tvalentyn] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8537) Provide WatermarkEstimatorProvider for different types of WatermarkEstimator
[ https://issues.apache.org/jira/browse/BEAM-8537?focusedWorklogId=386965&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386965 ] ASF GitHub Bot logged work on BEAM-8537: Author: ASF GitHub Bot Created on: 13/Feb/20 23:55 Start Date: 13/Feb/20 23:55 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10375: [BEAM-8537] Provide WatermarkEstimator to track watermark URL: https://github.com/apache/beam/pull/10375#discussion_r379138864 ## File path: sdks/python/apache_beam/runners/common.py ## @@ -527,11 +734,11 @@ def __init__(self, signature.is_stateful_dofn()) self.user_state_context = user_state_context self.is_splittable = signature.is_splittable_dofn() -self.watermark_estimator = self.signature.get_watermark_estimator() -self.watermark_estimator_param = ( -self.signature.process_method.watermark_estimator_arg_name -if self.watermark_estimator else None) -self.threadsafe_restriction_tracker = None # type: Optional[iobase.ThreadsafeRestrictionTracker] +self.threadsafe_restriction_tracker = None +self.threadsafe_watermark_estimator = None +# The lock which guarantee synchronization for both +# ThreadsafeRestrictionTracker and ThreadsafeWatermarkEstimator. +self._synchronized_lock = threading.Lock() Review comment: Would an in-person discussion on this be helpful @lukecwik @boyuanzz ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386965) Time Spent: 15h 10m (was: 15h) > Provide WatermarkEstimatorProvider for different types of WatermarkEstimator > > > Key: BEAM-8537 > URL: https://issues.apache.org/jira/browse/BEAM-8537 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core, sdk-py-harness >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 15h 10m > Remaining Estimate: 0h > > This is a follow up for in-progress PR: > https://github.com/apache/beam/pull/9794. > Current implementation in PR9794 provides a default implementation of > WatermarkEstimator. For further work, we want to let WatermarkEstimator to be > a pure Interface. We'll provide a WatermarkEstimatorProvider to be able to > create a custom WatermarkEstimator per windowed value. It should be similar > to how we track restriction for SDF: > WatermarkEstimator <---> RestrictionTracker > WatermarkEstimatorProvider <---> RestrictionTrackerProvider > WatermarkEstimatorParam <---> RestrictionDoFnParam -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9003) test_reshuffle_preserves_timestamps (apache_beam.transforms.util_test.ReshuffleTest) does not work in Streaming VR suite on Dataflow
[ https://issues.apache.org/jira/browse/BEAM-9003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036592#comment-17036592 ] Valentyn Tymofieiev commented on BEAM-9003: --- It is concerning that you are using a worker jar for 2.19.0 snapshot(aka dev) and python sdk from 2.18.0 dev version - you should try to build both artifacts from the same version of the SDK that you are running the test, and to reflect the current state, you should be looking at current SDK code at Beam master, so you'd have beam-runners-google-cloud-dataflow-java-fn-api-worker-2.20.0-SNAPSHOT.jar and apache-beam-2.20.0.dev0.tar.gz > test_reshuffle_preserves_timestamps > (apache_beam.transforms.util_test.ReshuffleTest) does not work in Streaming > VR suite on Dataflow > > > Key: BEAM-9003 > URL: https://issues.apache.org/jira/browse/BEAM-9003 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Liu Wang >Priority: Major > > Per investigation in https://issues.apache.org/jira/browse/BEAM-8877, the > test times out and was recently added to VR test suite. > [~liumomo315], I will sickbay this test for streaming, could you please help > triage the failure? > Thank you! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9290) runner_harness_container_image experiment is not honored in python released sdks.
[ https://issues.apache.org/jira/browse/BEAM-9290?focusedWorklogId=386961&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386961 ] ASF GitHub Bot logged work on BEAM-9290: Author: ASF GitHub Bot Created on: 13/Feb/20 23:55 Start Date: 13/Feb/20 23:55 Worklog Time Spent: 10m Work Description: angoenka commented on issue #10827: [BEAM-9290] Support runner_harness_container_image in released python… URL: https://github.com/apache/beam/pull/10827#issuecomment-586029907 thanks for the review This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386961) Time Spent: 1h (was: 50m) > runner_harness_container_image experiment is not honored in python released > sdks. > - > > Key: BEAM-9290 > URL: https://issues.apache.org/jira/browse/BEAM-9290 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-py-core >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Fix For: 2.20.0 > > Time Spent: 1h > Remaining Estimate: 0h > > > {code:java} > --experiments=runner_harness_container_image=foo_image{code} > does not have any affect on the job. > > > cc: [~tvalentyn] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9003) test_reshuffle_preserves_timestamps (apache_beam.transforms.util_test.ReshuffleTest) does not work in Streaming VR suite on Dataflow
[ https://issues.apache.org/jira/browse/BEAM-9003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036590#comment-17036590 ] Liu Wang edited comment on BEAM-9003 at 2/13/20 11:48 PM: -- Run the test on VR with command: python setup.py nosetests -test-pipeline-options="runner=TestDataflowRunner --dataflow_worker_jar='./../../runners/google-cloud-dataflow-java/worker/build/libs/beam-runners-google-cloud-dataflow-java-fn-api-worker-2.19.0-SNAPSHOT.jar' --project=google.com:clouddfe --temp_location=gs://clouddfe-test/staging$USER --output=gs://world-readable-mkcq69tkcu/$USER/result.txt --sdk_location=./build/apache-beam-2.18.0.dev0.tar.gz --num_workers=1 --sleep_secs=20 --streaming " --tests=apache_beam.transforms.util_test.ReshuffleTest --attr=ValidatesRunner --nocapture The Error message shows TIMESTAMP_MAX_VALUE is one day larger than the end of the global window. --- ^Error message from worker: java.lang.IllegalStateException: TimestampCombiner moved element from 294247-01-10T04:00:54.775Z (TIMESTAMP_MAX_VALUE) to earlier time 294247-01-09T04:00:54.775Z (end of global window) for window org.apache.beam.sdk.transforms.windowing.GlobalWindow@68b7f410 org.apache.beam.runners.core.WatermarkHold.shift(WatermarkHold.java:117) org.apache.beam.runners.core.WatermarkHold.addElementHold(WatermarkHold.java:154) org.apache.beam.runners.core.WatermarkHold.addHolds(WatermarkHold.java:98) org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:605) org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349) org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:94) org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:42) org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.invokeProcessElement(GroupAlsoByWindowFnRunner.java:115) org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.processElement(GroupAlsoByWindowFnRunner.java:73) org.apache.beam.runners.core.LateDataDroppingDoFnRunner.processElement(LateDataDroppingDoFnRunner.java:80) org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn.processElement(GroupAlsoByWindowsParDoFn.java:134) org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:44) org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:49) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:201) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159) org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77) org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1324) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:151) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1053) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:748) java.lang.IllegalStateException: TimestampCombiner moved element from 294247-01-10T04:00:54.775Z (TIMESTAMP_MAX_VALUE) to earlier time 294247-01-09T04:00:54.775Z (end of global window) for window org.apache.beam.sdk.transforms.windowing.GlobalWindow@68b7f410 org.apache.beam.runners.core.WatermarkHold.shift(WatermarkHold.java:117) org.apache.beam.runners.core.WatermarkHold.addElementHold(WatermarkHold.java:154) org.apache.beam.runners.core.WatermarkHold.addHolds(WatermarkHold.java:98) org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:605) org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349) org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:94) org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:42) org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.invokeProcessElement(GroupAlsoByWindowFnRunner.java:115) org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRu
[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage
[ https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=386959&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386959 ] ASF GitHub Bot logged work on BEAM-8889: Author: ASF GitHub Bot Created on: 13/Feb/20 23:48 Start Date: 13/Feb/20 23:48 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10769: [BEAM-8889] Upgrades gcsio to 2.0.0 URL: https://github.com/apache/beam/pull/10769#issuecomment-586028106 Run CommunityMetrics PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386959) Remaining Estimate: 152h 10m (was: 152h 20m) Time Spent: 15h 50m (was: 15h 40m) > Make GcsUtil use GoogleCloudStorage > --- > > Key: BEAM-8889 > URL: https://issues.apache.org/jira/browse/BEAM-8889 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.16.0 >Reporter: Esun Kim >Assignee: VASU NORI >Priority: Major > Labels: gcs > Original Estimate: 168h > Time Spent: 15h 50m > Remaining Estimate: 152h 10m > > [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java] > is a primary class to access Google Cloud Storage on Apache Beam. Current > implementation directly creates GoogleCloudStorageReadChannel and > GoogleCloudStorageWriteChannel by itself to read and write GCS data rather > than using > [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java] > which is an abstract class providing basic IO capability which eventually > creates channel objects. This request is about updating GcsUtil to use > GoogleCloudStorage to create read and write channel, which is expected > flexible because it can easily pick up the new change; e.g. new channel > implementation using new protocol without code change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage
[ https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=386958&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386958 ] ASF GitHub Bot logged work on BEAM-8889: Author: ASF GitHub Bot Created on: 13/Feb/20 23:47 Start Date: 13/Feb/20 23:47 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10769: [BEAM-8889] Upgrades gcsio to 2.0.0 URL: https://github.com/apache/beam/pull/10769#issuecomment-586027938 Run Dataflow ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386958) Remaining Estimate: 152h 20m (was: 152.5h) Time Spent: 15h 40m (was: 15.5h) > Make GcsUtil use GoogleCloudStorage > --- > > Key: BEAM-8889 > URL: https://issues.apache.org/jira/browse/BEAM-8889 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.16.0 >Reporter: Esun Kim >Assignee: VASU NORI >Priority: Major > Labels: gcs > Original Estimate: 168h > Time Spent: 15h 40m > Remaining Estimate: 152h 20m > > [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java] > is a primary class to access Google Cloud Storage on Apache Beam. Current > implementation directly creates GoogleCloudStorageReadChannel and > GoogleCloudStorageWriteChannel by itself to read and write GCS data rather > than using > [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java] > which is an abstract class providing basic IO capability which eventually > creates channel objects. This request is about updating GcsUtil to use > GoogleCloudStorage to create read and write channel, which is expected > flexible because it can easily pick up the new change; e.g. new channel > implementation using new protocol without code change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9003) test_reshuffle_preserves_timestamps (apache_beam.transforms.util_test.ReshuffleTest) does not work in Streaming VR suite on Dataflow
[ https://issues.apache.org/jira/browse/BEAM-9003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17036590#comment-17036590 ] Liu Wang commented on BEAM-9003: Run the test on VR with command: ^python setup.py nosetests --test-pipeline-options="--runner=TestDataflowRunner --dataflow_worker_jar='./../../runners/google-cloud-dataflow-java/worker/build/libs/beam-runners-google-cloud-dataflow-java-fn-api-worker-2.19.0-SNAPSHOT.jar' --project=google.com:clouddfe --temp_location=gs://clouddfe-test/staging-$USER --output=gs://world-readable-mkcq69tkcu/$USER/result.txt --sdk_location=./build/apache-beam-2.18.0.dev0.tar.gz --num_workers=1 --sleep_secs=20 --streaming " --tests=apache_beam.transforms.util_test.ReshuffleTest --attr=ValidatesRunner --nocapture^ The Error message shows TIMESTAMP_MAX_VALUE is one day larger than the end of the global window. --- ^Error message from worker: java.lang.IllegalStateException: TimestampCombiner moved element from 294247-01-10T04:00:54.775Z (TIMESTAMP_MAX_VALUE) to earlier time 294247-01-09T04:00:54.775Z (end of global window) for window org.apache.beam.sdk.transforms.windowing.GlobalWindow@68b7f410 org.apache.beam.runners.core.WatermarkHold.shift(WatermarkHold.java:117) org.apache.beam.runners.core.WatermarkHold.addElementHold(WatermarkHold.java:154) org.apache.beam.runners.core.WatermarkHold.addHolds(WatermarkHold.java:98) org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:605) org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349) org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:94) org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:42) org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.invokeProcessElement(GroupAlsoByWindowFnRunner.java:115) org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.processElement(GroupAlsoByWindowFnRunner.java:73) org.apache.beam.runners.core.LateDataDroppingDoFnRunner.processElement(LateDataDroppingDoFnRunner.java:80) org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowsParDoFn.processElement(GroupAlsoByWindowsParDoFn.java:134) org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:44) org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:49) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:201) org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159) org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77) org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1324) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:151) org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1053) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:748) java.lang.IllegalStateException: TimestampCombiner moved element from 294247-01-10T04:00:54.775Z (TIMESTAMP_MAX_VALUE) to earlier time 294247-01-09T04:00:54.775Z (end of global window) for window org.apache.beam.sdk.transforms.windowing.GlobalWindow@68b7f410 org.apache.beam.runners.core.WatermarkHold.shift(WatermarkHold.java:117) org.apache.beam.runners.core.WatermarkHold.addElementHold(WatermarkHold.java:154) org.apache.beam.runners.core.WatermarkHold.addHolds(WatermarkHold.java:98) org.apache.beam.runners.core.ReduceFnRunner.processElement(ReduceFnRunner.java:605) org.apache.beam.runners.core.ReduceFnRunner.processElements(ReduceFnRunner.java:349) org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:94) org.apache.beam.runners.dataflow.worker.StreamingGroupAlsoByWindowViaWindowSetFn.processElement(StreamingGroupAlsoByWindowViaWindowSetFn.java:42) org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.invokeProcessElement(GroupAlsoByWindowFnRunner.java:115) org.apache.beam.runners.dataflow.worker.GroupAlsoByWindowFnRunner.processElement(GroupAlsoByWindowFnRunner
[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage
[ https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=386957&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386957 ] ASF GitHub Bot logged work on BEAM-8889: Author: ASF GitHub Bot Created on: 13/Feb/20 23:45 Start Date: 13/Feb/20 23:45 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10769: [BEAM-8889] Upgrades gcsio to 2.0.0 URL: https://github.com/apache/beam/pull/10769#issuecomment-586027249 > @lukecwik This version 2.0 doesn't introduce a new dependency against gRPC. The next version will. sgtm, will let any committer merge once tests are green This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386957) Remaining Estimate: 152.5h (was: 152h 40m) Time Spent: 15.5h (was: 15h 20m) > Make GcsUtil use GoogleCloudStorage > --- > > Key: BEAM-8889 > URL: https://issues.apache.org/jira/browse/BEAM-8889 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.16.0 >Reporter: Esun Kim >Assignee: VASU NORI >Priority: Major > Labels: gcs > Original Estimate: 168h > Time Spent: 15.5h > Remaining Estimate: 152.5h > > [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java] > is a primary class to access Google Cloud Storage on Apache Beam. Current > implementation directly creates GoogleCloudStorageReadChannel and > GoogleCloudStorageWriteChannel by itself to read and write GCS data rather > than using > [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java] > which is an abstract class providing basic IO capability which eventually > creates channel objects. This request is about updating GcsUtil to use > GoogleCloudStorage to create read and write channel, which is expected > flexible because it can easily pick up the new change; e.g. new channel > implementation using new protocol without code change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=386956&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386956 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 13/Feb/20 23:44 Start Date: 13/Feb/20 23:44 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10856: [BEAM-8335] Update StreamingCache with new Protos URL: https://github.com/apache/beam/pull/10856#issuecomment-586027007 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386956) Time Spent: 58h (was: 57h 50m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 58h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=386954&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386954 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 13/Feb/20 23:41 Start Date: 13/Feb/20 23:41 Worklog Time Spent: 10m Work Description: rohdesamuel commented on issue #10856: [BEAM-8335] Update StreamingCache with new Protos URL: https://github.com/apache/beam/pull/10856#issuecomment-586026276 R: @davidyan74 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 386954) Time Spent: 57h 50m (was: 57h 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 57h 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=386953&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-386953 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 13/Feb/20 23:40 Start Date: 13/Feb/20 23:40 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #10856: [BEAM-8335] Update StreamingCache with new Protos URL: https://github.com/apache/beam/pull/10856 This fixes the breakage from https://github.com/apache/beam/pull/10826. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCom