[jira] [Work logged] (BEAM-6583) Audit Python 3 version support and refine compatibility spec.
[ https://issues.apache.org/jira/browse/BEAM-6583?focusedWorklogId=198506&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198506 ] ASF GitHub Bot logged work on BEAM-6583: Author: ASF GitHub Bot Created on: 14/Feb/19 05:57 Start Date: 14/Feb/19 05:57 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #7839: [BEAM-6583] Update minimal Python 3 requirements. URL: https://github.com/apache/beam/pull/7839 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198506) Time Spent: 1h (was: 50m) > Audit Python 3 version support and refine compatibility spec. > - > > Key: BEAM-6583 > URL: https://issues.apache.org/jira/browse/BEAM-6583 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.11.0 >Reporter: Charles Chen >Assignee: Valentyn Tymofieiev >Priority: Blocker > Labels: triaged > Fix For: 2.11.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Currently, we have a compatibility spec of >= 2.7 <= 3.7 for development. > However, we have not validated this (especially Python 3.7) and need to do so > and refine this before release. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6602) Support schemas in BigQueryIO.Write
[ https://issues.apache.org/jira/browse/BEAM-6602?focusedWorklogId=198466&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198466 ] ASF GitHub Bot logged work on BEAM-6602: Author: ASF GitHub Bot Created on: 14/Feb/19 04:43 Start Date: 14/Feb/19 04:43 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #7840: [BEAM-6602] BigQueryIO.write natively understands Beam schemas URL: https://github.com/apache/beam/pull/7840 If the input PCollection has a schema, BigQueryIO can automatically infer a BigQuery table schema and automatically convert the input type into a TableRow. A new option, useBeamSchema, is introduced to enable this behavior. This PR still needs some more unit tests and an integration test, however sending for review now for the BigQueryIO changes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198466) Time Spent: 1h 50m (was: 1h 40m) > Support schemas in BigQueryIO.Write > --- > > Key: BEAM-6602 > URL: https://issues.apache.org/jira/browse/BEAM-6602 > Project: Beam > Issue Type: Sub-task > Components: io-java-gcp >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Labels: triaged > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6583) Audit Python 3 version support and refine compatibility spec.
[ https://issues.apache.org/jira/browse/BEAM-6583?focusedWorklogId=198457&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198457 ] ASF GitHub Bot logged work on BEAM-6583: Author: ASF GitHub Bot Created on: 14/Feb/19 04:18 Start Date: 14/Feb/19 04:18 Worklog Time Spent: 10m Work Description: aaltay commented on issue #7839: [BEAM-6583] Update minimal Python 3 requirements. URL: https://github.com/apache/beam/pull/7839#issuecomment-463483283 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198457) Time Spent: 50m (was: 40m) > Audit Python 3 version support and refine compatibility spec. > - > > Key: BEAM-6583 > URL: https://issues.apache.org/jira/browse/BEAM-6583 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.11.0 >Reporter: Charles Chen >Assignee: Valentyn Tymofieiev >Priority: Blocker > Labels: triaged > Fix For: 2.11.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Currently, we have a compatibility spec of >= 2.7 <= 3.7 for development. > However, we have not validated this (especially Python 3.7) and need to do so > and refine this before release. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6583) Audit Python 3 version support and refine compatibility spec.
[ https://issues.apache.org/jira/browse/BEAM-6583?focusedWorklogId=198440&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198440 ] ASF GitHub Bot logged work on BEAM-6583: Author: ASF GitHub Bot Created on: 14/Feb/19 03:19 Start Date: 14/Feb/19 03:19 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #7839: [BEAM-6583] Update minimal Python 3 requirements. URL: https://github.com/apache/beam/pull/7839#issuecomment-463471804 Run Dataflow Python ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198440) Time Spent: 0.5h (was: 20m) > Audit Python 3 version support and refine compatibility spec. > - > > Key: BEAM-6583 > URL: https://issues.apache.org/jira/browse/BEAM-6583 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.11.0 >Reporter: Charles Chen >Assignee: Valentyn Tymofieiev >Priority: Blocker > Labels: triaged > Fix For: 2.11.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently, we have a compatibility spec of >= 2.7 <= 3.7 for development. > However, we have not validated this (especially Python 3.7) and need to do so > and refine this before release. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6583) Audit Python 3 version support and refine compatibility spec.
[ https://issues.apache.org/jira/browse/BEAM-6583?focusedWorklogId=198441&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198441 ] ASF GitHub Bot logged work on BEAM-6583: Author: ASF GitHub Bot Created on: 14/Feb/19 03:20 Start Date: 14/Feb/19 03:20 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #7839: [BEAM-6583] Update minimal Python 3 requirements. URL: https://github.com/apache/beam/pull/7839#issuecomment-463472007 Run Python Dataflow ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198441) Time Spent: 40m (was: 0.5h) > Audit Python 3 version support and refine compatibility spec. > - > > Key: BEAM-6583 > URL: https://issues.apache.org/jira/browse/BEAM-6583 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.11.0 >Reporter: Charles Chen >Assignee: Valentyn Tymofieiev >Priority: Blocker > Labels: triaged > Fix For: 2.11.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Currently, we have a compatibility spec of >= 2.7 <= 3.7 for development. > However, we have not validated this (especially Python 3.7) and need to do so > and refine this before release. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6392) Add support for new BigQuery streaming read API to BigQueryIO
[ https://issues.apache.org/jira/browse/BEAM-6392?focusedWorklogId=198439&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198439 ] ASF GitHub Bot logged work on BEAM-6392: Author: ASF GitHub Bot Created on: 14/Feb/19 03:04 Start Date: 14/Feb/19 03:04 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #7441: [BEAM-6392] Add support for the BigQuery read API to BigQueryIO. URL: https://github.com/apache/beam/pull/7441 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198439) Time Spent: 4h 10m (was: 4h) > Add support for new BigQuery streaming read API to BigQueryIO > - > > Key: BEAM-6392 > URL: https://issues.apache.org/jira/browse/BEAM-6392 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Kenneth Jung >Assignee: Kenneth Jung >Priority: Major > Labels: triaged > Time Spent: 4h 10m > Remaining Estimate: 0h > > BigQuery has developed a new streaming egress API which will soon reach > public availability. Add support for the new API in BigQueryIO. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6583) Audit Python 3 version support and refine compatibility spec.
[ https://issues.apache.org/jira/browse/BEAM-6583?focusedWorklogId=198428&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198428 ] ASF GitHub Bot logged work on BEAM-6583: Author: ASF GitHub Bot Created on: 14/Feb/19 02:29 Start Date: 14/Feb/19 02:29 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #7839: [BEAM-6583] Update minimal Python 3 requirements. URL: https://github.com/apache/beam/pull/7839#issuecomment-463461306 R: @aaltay @charlesccychen This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198428) Time Spent: 20m (was: 10m) > Audit Python 3 version support and refine compatibility spec. > - > > Key: BEAM-6583 > URL: https://issues.apache.org/jira/browse/BEAM-6583 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.11.0 >Reporter: Charles Chen >Assignee: Valentyn Tymofieiev >Priority: Blocker > Labels: triaged > Fix For: 2.11.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Currently, we have a compatibility spec of >= 2.7 <= 3.7 for development. > However, we have not validated this (especially Python 3.7) and need to do so > and refine this before release. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6583) Audit Python 3 version support and refine compatibility spec.
[ https://issues.apache.org/jira/browse/BEAM-6583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767819#comment-16767819 ] Valentyn Tymofieiev commented on BEAM-6583: --- I propose: - to make 2.11 installable on Python 3.5 or higher. - add 3.5 version classifier as 'supported' in setup.py (which will be reflected in PyPi). 3.5 is the only version that has been reasonably tested on Jenkins, - runners can have additional restrictions based on Py3 versions they support. This is reflected in https://github.com/apache/beam/pull/7839. > Audit Python 3 version support and refine compatibility spec. > - > > Key: BEAM-6583 > URL: https://issues.apache.org/jira/browse/BEAM-6583 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.11.0 >Reporter: Charles Chen >Assignee: Valentyn Tymofieiev >Priority: Blocker > Labels: triaged > Fix For: 2.11.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Currently, we have a compatibility spec of >= 2.7 <= 3.7 for development. > However, we have not validated this (especially Python 3.7) and need to do so > and refine this before release. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6583) Audit Python 3 version support and refine compatibility spec.
[ https://issues.apache.org/jira/browse/BEAM-6583?focusedWorklogId=198424&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198424 ] ASF GitHub Bot logged work on BEAM-6583: Author: ASF GitHub Bot Created on: 14/Feb/19 01:56 Start Date: 14/Feb/19 01:56 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #7839: [BEAM-6583] Update minimal Python 3 requirements. URL: https://github.com/apache/beam/pull/7839 Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198424) Time Spent: 10m Remaining Estimate: 0h > Audit Python 3 version support and refine compatibility spec. > - > > Key: BEAM-6583 > URL: https://issues.apache.org/jira/browse/BEAM-6583 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.11.0 >Reporter: Charles Chen >Assignee: Valentyn Tymofieiev >Priority: Blocker > Labels: triaged > Fix For: 2.11.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we have a compatibility spec of >= 2.7 <= 3.7 for development. > However, we have not validated this (especially Python 3.7) and need
[jira] [Commented] (BEAM-5846) inconsistent result from pylint
[ https://issues.apache.org/jira/browse/BEAM-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767787#comment-16767787 ] Heejong Lee commented on BEAM-5846: --- The workaround is to always use `./gradlew lint` instead of `run_pylint.sh`. Seems there's nothing to do from Beam. Closing. > inconsistent result from pylint > --- > > Key: BEAM-5846 > URL: https://issues.apache.org/jira/browse/BEAM-5846 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Labels: triaged > Attachments: lint_100.txt, lint_j1_100.txt > > > scripts/run_pylint.sh returns inconsistent result. > {code:java} > beam/sdks/python$ for i in `seq 100`; do scripts/run_pylint.sh; done > > lint_100.txt > beam/sdks/python$ grep slots-on-old-class lint_100.txt | wc -l > 100 > beam/sdks/python$ grep no-self-argument lint_100.txt | wc -l > 42 > {code} > > Tested @ 0c8ccae9aa608f4d64b22c08d57b9aaa8724bfee > {code:java} > beam/sdks/python$ pip freeze > apache-beam==2.9.0.dev0 > avro==1.8.2 > cachetools==2.1.0 > certifi==2018.8.24 > chardet==3.0.4 > crcmod==1.7 > dill==0.2.8.2 > docopt==0.6.2 > enum34==1.1.6 > fastavro==0.21.9 > fasteners==0.14.1 > funcsigs==1.0.2 > future==0.16.0 > futures==3.2.0 > gapic-google-cloud-pubsub-v1==0.15.4 > google-apitools==0.5.20 > google-auth==1.5.1 > google-auth-httplib2==0.0.3 > google-cloud-bigquery==0.25.0 > google-cloud-core==0.25.0 > google-cloud-pubsub==0.26.0 > google-gax==0.15.16 > googleapis-common-protos==1.5.3 > googledatastore==7.0.1 > grpc-google-iam-v1==0.11.4 > grpcio==1.15.0 > hdfs==2.1.0 > httplib2==0.11.3 > idna==2.7 > mock==2.0.0 > monotonic==1.5 > nose==1.3.7 > numpy==1.15.2 > oauth2client==4.1.3 > pbr==4.3.0 > ply==3.8 > proto-google-cloud-datastore-v1==0.90.4 > proto-google-cloud-pubsub-v1==0.15.4 > protobuf==3.6.1 > pyarrow==0.11.0 > pyasn1==0.4.4 > pyasn1-modules==0.2.2 > pydot==1.2.4 > PyHamcrest==1.9.0 > pyparsing==2.2.2 > pytz==2018.4 > PyVCF==0.6.8 > PyYAML==3.13 > requests==2.19.1 > rsa==4.0 > six==1.11.0 > typing==3.6.6 > urllib3==1.23 > beam/sdks/python$ python --version > Python 2.7.13 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6658) Add kms_key to BigQuery transforms, pass to Dataflow
[ https://issues.apache.org/jira/browse/BEAM-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri updated BEAM-6658: Fix Version/s: (was: Not applicable) 2.11.0 > Add kms_key to BigQuery transforms, pass to Dataflow > > > Key: BEAM-6658 > URL: https://issues.apache.org/jira/browse/BEAM-6658 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Affects Versions: 2.11.0 >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Blocker > Fix For: 2.11.0 > > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-6488) Portable Flink runner support for running cross-language transforms
[ https://issues.apache.org/jira/browse/BEAM-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heejong Lee resolved BEAM-6488. --- Resolution: Fixed Fix Version/s: 2.11.0 Merged. > Portable Flink runner support for running cross-language transforms > --- > > Key: BEAM-6488 > URL: https://issues.apache.org/jira/browse/BEAM-6488 > Project: Beam > Issue Type: New Feature > Components: beam-model, runner-core, runner-flink, sdk-java-core, > sdk-py-core >Reporter: Chamikara Jayalath >Assignee: Heejong Lee >Priority: Major > Labels: triaged > Fix For: 2.11.0 > > Time Spent: 4.5h > Remaining Estimate: 0h > > To support running cross-language transforms, Portable Flink runner needs to > support executing pipelines with steps defined to be executed for different > environments. > I believe this support is already there. If that is the case we should > validate that and add any missing tests. > If there are missing pieces, we should figure out details and create more > JIRAs as needed. > CC: [~angoenka] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6636) [beam_PostCommit_Java] testGcsWriteWithKmsKey test fails
[ https://issues.apache.org/jira/browse/BEAM-6636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767784#comment-16767784 ] Udi Meiri commented on BEAM-6636: - I believe this is fixed. Java post-commits are currently green. > [beam_PostCommit_Java] testGcsWriteWithKmsKey test fails > > > Key: BEAM-6636 > URL: https://issues.apache.org/jira/browse/BEAM-6636 > Project: Beam > Issue Type: New Feature > Components: test-failures >Reporter: Mikhail Gryzykhin >Assignee: Udi Meiri >Priority: Major > Labels: currently-failing > Time Spent: 40m > Remaining Estimate: 0h > > [https://builds.apache.org/job/beam_PostCommit_Java/2563/] > Test failed, no detailed logs available. > Relevant logs: > |org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT > testGcsWriteWithKmsKey > FAILED| > | java.lang.IllegalArgumentException at GcsKmsKeyIT.java:73| -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-5846) inconsistent result from pylint
[ https://issues.apache.org/jira/browse/BEAM-5846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heejong Lee closed BEAM-5846. - Resolution: Won't Fix Fix Version/s: Not applicable > inconsistent result from pylint > --- > > Key: BEAM-5846 > URL: https://issues.apache.org/jira/browse/BEAM-5846 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Labels: triaged > Fix For: Not applicable > > Attachments: lint_100.txt, lint_j1_100.txt > > > scripts/run_pylint.sh returns inconsistent result. > {code:java} > beam/sdks/python$ for i in `seq 100`; do scripts/run_pylint.sh; done > > lint_100.txt > beam/sdks/python$ grep slots-on-old-class lint_100.txt | wc -l > 100 > beam/sdks/python$ grep no-self-argument lint_100.txt | wc -l > 42 > {code} > > Tested @ 0c8ccae9aa608f4d64b22c08d57b9aaa8724bfee > {code:java} > beam/sdks/python$ pip freeze > apache-beam==2.9.0.dev0 > avro==1.8.2 > cachetools==2.1.0 > certifi==2018.8.24 > chardet==3.0.4 > crcmod==1.7 > dill==0.2.8.2 > docopt==0.6.2 > enum34==1.1.6 > fastavro==0.21.9 > fasteners==0.14.1 > funcsigs==1.0.2 > future==0.16.0 > futures==3.2.0 > gapic-google-cloud-pubsub-v1==0.15.4 > google-apitools==0.5.20 > google-auth==1.5.1 > google-auth-httplib2==0.0.3 > google-cloud-bigquery==0.25.0 > google-cloud-core==0.25.0 > google-cloud-pubsub==0.26.0 > google-gax==0.15.16 > googleapis-common-protos==1.5.3 > googledatastore==7.0.1 > grpc-google-iam-v1==0.11.4 > grpcio==1.15.0 > hdfs==2.1.0 > httplib2==0.11.3 > idna==2.7 > mock==2.0.0 > monotonic==1.5 > nose==1.3.7 > numpy==1.15.2 > oauth2client==4.1.3 > pbr==4.3.0 > ply==3.8 > proto-google-cloud-datastore-v1==0.90.4 > proto-google-cloud-pubsub-v1==0.15.4 > protobuf==3.6.1 > pyarrow==0.11.0 > pyasn1==0.4.4 > pyasn1-modules==0.2.2 > pydot==1.2.4 > PyHamcrest==1.9.0 > pyparsing==2.2.2 > pytz==2018.4 > PyVCF==0.6.8 > PyYAML==3.13 > requests==2.19.1 > rsa==4.0 > six==1.11.0 > typing==3.6.6 > urllib3==1.23 > beam/sdks/python$ python --version > Python 2.7.13 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-5173) org.apache.beam.runners.fnexecution.control.RemoteExecutionTest.testExecutionWithMultipleStages is flaky
[ https://issues.apache.org/jira/browse/BEAM-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette reassigned BEAM-5173: --- Assignee: Brian Hulette > org.apache.beam.runners.fnexecution.control.RemoteExecutionTest.testExecutionWithMultipleStages > is flaky > > > Key: BEAM-5173 > URL: https://issues.apache.org/jira/browse/BEAM-5173 > Project: Beam > Issue Type: Bug > Components: runner-core >Reporter: Valentyn Tymofieiev >Assignee: Brian Hulette >Priority: Major > > Hi [~lcwik], this test failed in a [recent postcommit > build|https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/1285/testReport/junit/org.apache.beam.runners.fnexecution.control/RemoteExecutionTest/testExecutionWithMultipleStages]. > Could you please take a look or help triage to the right owner? Thank you. > Stack trace: > ava.util.concurrent.ExecutionException: java.lang.RuntimeException: > java.util.concurrent.ExecutionException: > org.apache.beam.vendor.grpc.v1.io.grpc.StatusRuntimeException: CANCELLED: > Runner closed connection > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.beam.runners.fnexecution.control.RemoteExecutionTest.tearDown(RemoteExecutionTest.java:198) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) > at > org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) > at > org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93) > at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) > at > org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:109) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.re
[jira] [Commented] (BEAM-6636) [beam_PostCommit_Java] testGcsWriteWithKmsKey test fails
[ https://issues.apache.org/jira/browse/BEAM-6636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767785#comment-16767785 ] Udi Meiri commented on BEAM-6636: - I believe this is fixed. Java post-commits are currently green. > [beam_PostCommit_Java] testGcsWriteWithKmsKey test fails > > > Key: BEAM-6636 > URL: https://issues.apache.org/jira/browse/BEAM-6636 > Project: Beam > Issue Type: New Feature > Components: test-failures >Reporter: Mikhail Gryzykhin >Assignee: Udi Meiri >Priority: Major > Labels: currently-failing > Time Spent: 40m > Remaining Estimate: 0h > > [https://builds.apache.org/job/beam_PostCommit_Java/2563/] > Test failed, no detailed logs available. > Relevant logs: > |org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT > testGcsWriteWithKmsKey > FAILED| > | java.lang.IllegalArgumentException at GcsKmsKeyIT.java:73| -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-6636) [beam_PostCommit_Java] testGcsWriteWithKmsKey test fails
[ https://issues.apache.org/jira/browse/BEAM-6636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri resolved BEAM-6636. - Resolution: Fixed Fix Version/s: Not applicable > [beam_PostCommit_Java] testGcsWriteWithKmsKey test fails > > > Key: BEAM-6636 > URL: https://issues.apache.org/jira/browse/BEAM-6636 > Project: Beam > Issue Type: New Feature > Components: test-failures >Reporter: Mikhail Gryzykhin >Assignee: Udi Meiri >Priority: Major > Labels: currently-failing > Fix For: Not applicable > > Time Spent: 40m > Remaining Estimate: 0h > > [https://builds.apache.org/job/beam_PostCommit_Java/2563/] > Test failed, no detailed logs available. > Relevant logs: > |org.apache.beam.sdk.io.gcp.storage.GcsKmsKeyIT > testGcsWriteWithKmsKey > FAILED| > | java.lang.IllegalArgumentException at GcsKmsKeyIT.java:73| -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-2801) Implement a BigQuery custom sink
[ https://issues.apache.org/jira/browse/BEAM-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri reassigned BEAM-2801: --- Assignee: Pablo Estrada (was: Udi Meiri) > Implement a BigQuery custom sink > > > Key: BEAM-2801 > URL: https://issues.apache.org/jira/browse/BEAM-2801 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Jayalath >Assignee: Pablo Estrada >Priority: Major > Labels: triaged > > Currently Python SDK has a native (Dataflow) BigQuery sink. We need to > implement a custom BigQuery sink to support following. > * overcome BigQuery per load job quotas by executing multiple load jobs. > * support SDK level features such as data-dependent writes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5959) Add Cloud KMS support to GCS copies
[ https://issues.apache.org/jira/browse/BEAM-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri updated BEAM-5959: Summary: Add Cloud KMS support to GCS copies (was: Add Cloud KMS support to GCS creates and copies) > Add Cloud KMS support to GCS copies > --- > > Key: BEAM-5959 > URL: https://issues.apache.org/jira/browse/BEAM-5959 > Project: Beam > Issue Type: Bug > Components: io-java-gcp, sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Labels: triaged > Time Spent: 30h 40m > Remaining Estimate: 0h > > Beam SDK currently uses the CopyTo GCS API call, which doesn't support > copying objects that Customer Managed Encryption Keys (CMEK). > CMEKs are managed in Cloud KMS. > Items (for Java and Python SDKs): > - Update clients to versions that support KMS keys. > - Change copyTo API calls to use rewriteTo (Python - directly, Java - > possibly convert copyTo API call to use client library) > - Add unit tests. > - Add basic tests (DirectRunner and GCS buckets with CMEK). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-5959) Add Cloud KMS support to GCS copies
[ https://issues.apache.org/jira/browse/BEAM-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri resolved BEAM-5959. - Resolution: Fixed Fix Version/s: 2.11.0 > Add Cloud KMS support to GCS copies > --- > > Key: BEAM-5959 > URL: https://issues.apache.org/jira/browse/BEAM-5959 > Project: Beam > Issue Type: Bug > Components: io-java-gcp, sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Labels: triaged > Fix For: 2.11.0 > > Time Spent: 30h 40m > Remaining Estimate: 0h > > Beam SDK currently uses the CopyTo GCS API call, which doesn't support > copying objects that Customer Managed Encryption Keys (CMEK). > CMEKs are managed in Cloud KMS. > Items (for Java and Python SDKs): > - Update clients to versions that support KMS keys. > - Change copyTo API calls to use rewriteTo (Python - directly, Java - > possibly convert copyTo API call to use client library) > - Add unit tests. > - Add basic tests (DirectRunner and GCS buckets with CMEK). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5959) Add Cloud KMS support to GCS creates and copies
[ https://issues.apache.org/jira/browse/BEAM-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1676#comment-1676 ] Udi Meiri commented on BEAM-5959: - Copy support is done (implemented as a rewrite call). Create support has been delayed. Might be added later as a --gcsKmsKey flag or similar. KMS can still be used on GCS using bucket default keys. Objects created in a bucket will inherit the bucket default KMS key. > Add Cloud KMS support to GCS creates and copies > --- > > Key: BEAM-5959 > URL: https://issues.apache.org/jira/browse/BEAM-5959 > Project: Beam > Issue Type: Bug > Components: io-java-gcp, sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Labels: triaged > Time Spent: 30h 40m > Remaining Estimate: 0h > > Beam SDK currently uses the CopyTo GCS API call, which doesn't support > copying objects that Customer Managed Encryption Keys (CMEK). > CMEKs are managed in Cloud KMS. > Items (for Java and Python SDKs): > - Update clients to versions that support KMS keys. > - Change copyTo API calls to use rewriteTo (Python - directly, Java - > possibly convert copyTo API call to use client library) > - Add unit tests. > - Add basic tests (DirectRunner and GCS buckets with CMEK). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6380) apache_beam.examples.wordcount_it_test.WordCountIT with DirectRunner failed
[ https://issues.apache.org/jira/browse/BEAM-6380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767773#comment-16767773 ] Udi Meiri commented on BEAM-6380: - Reference doc: https://googleapis.github.io/google-resumable-media-python/latest/google.resumable_media.requests.html#resumable-uploads > apache_beam.examples.wordcount_it_test.WordCountIT with DirectRunner failed > --- > > Key: BEAM-6380 > URL: https://issues.apache.org/jira/browse/BEAM-6380 > Project: Beam > Issue Type: Test > Components: test-failures >Reporter: Boyuan Zhang >Assignee: Udi Meiri >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > wordcount test in :pythonPostCommit failed owing to RuntimeError: > NotImplementedError [while running 'write/Write/WriteImpl/WriteBundles'] > > https://builds.apache.org/job/beam_PostCommit_Python_Verify/7001/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6668) use add experiment methods (Java and Python)
[ https://issues.apache.org/jira/browse/BEAM-6668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri updated BEAM-6668: Labels: beginner easyfix newbie (was: ) > use add experiment methods (Java and Python) > > > Key: BEAM-6668 > URL: https://issues.apache.org/jira/browse/BEAM-6668 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp, sdk-py-core >Reporter: Udi Meiri >Priority: Minor > Labels: beginner, easyfix, newbie > > Python: > Convert instances of experiments.append(...) > to debug_options.add_experiment(...) > Java: > Use ExperimentalOptions.addExperiment(...) > instead of getExperiments(), modify, setExperiments() pattern. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6669) revert service_default_cmek_config experiment flag
Udi Meiri created BEAM-6669: --- Summary: revert service_default_cmek_config experiment flag Key: BEAM-6669 URL: https://issues.apache.org/jira/browse/BEAM-6669 Project: Beam Issue Type: Improvement Components: io-java-gcp, sdk-py-core Reporter: Udi Meiri Assignee: Udi Meiri Do this when --dataflowKmsKey is supported on Dataflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6668) use add experiment methods (Java and Python)
Udi Meiri created BEAM-6668: --- Summary: use add experiment methods (Java and Python) Key: BEAM-6668 URL: https://issues.apache.org/jira/browse/BEAM-6668 Project: Beam Issue Type: Improvement Components: io-java-gcp, sdk-py-core Reporter: Udi Meiri Python: Convert instances of experiments.append(...) to debug_options.add_experiment(...) Java: Use ExperimentalOptions.addExperiment(...) instead of getExperiments(), modify, setExperiments() pattern. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-6583) Audit Python 3 version support and refine compatibility spec.
[ https://issues.apache.org/jira/browse/BEAM-6583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Altay reassigned BEAM-6583: - Assignee: Valentyn Tymofieiev (was: Ahmet Altay) > Audit Python 3 version support and refine compatibility spec. > - > > Key: BEAM-6583 > URL: https://issues.apache.org/jira/browse/BEAM-6583 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.11.0 >Reporter: Charles Chen >Assignee: Valentyn Tymofieiev >Priority: Blocker > Labels: triaged > Fix For: 2.11.0 > > > Currently, we have a compatibility spec of >= 2.7 <= 3.7 for development. > However, we have not validated this (especially Python 3.7) and need to do so > and refine this before release. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6583) Audit Python 3 version support and refine compatibility spec.
[ https://issues.apache.org/jira/browse/BEAM-6583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Altay updated BEAM-6583: -- Fix Version/s: 2.11.0 > Audit Python 3 version support and refine compatibility spec. > - > > Key: BEAM-6583 > URL: https://issues.apache.org/jira/browse/BEAM-6583 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.11.0 >Reporter: Charles Chen >Assignee: Ahmet Altay >Priority: Blocker > Labels: triaged > Fix For: 2.11.0 > > > Currently, we have a compatibility spec of >= 2.7 <= 3.7 for development. > However, we have not validated this (especially Python 3.7) and need to do so > and refine this before release. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6583) Audit Python 3 version support and refine compatibility spec.
[ https://issues.apache.org/jira/browse/BEAM-6583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767767#comment-16767767 ] Ahmet Altay commented on BEAM-6583: --- What needs to be done for 2.11 ? > Audit Python 3 version support and refine compatibility spec. > - > > Key: BEAM-6583 > URL: https://issues.apache.org/jira/browse/BEAM-6583 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.11.0 >Reporter: Charles Chen >Assignee: Valentyn Tymofieiev >Priority: Blocker > Labels: triaged > Fix For: 2.11.0 > > > Currently, we have a compatibility spec of >= 2.7 <= 3.7 for development. > However, we have not validated this (especially Python 3.7) and need to do so > and refine this before release. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-6664) Temporarily convert dataflowKmsKey flag to experimental flag for Dataflow
[ https://issues.apache.org/jira/browse/BEAM-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Altay resolved BEAM-6664. --- Resolution: Fixed Fix Version/s: 2.11.0 > Temporarily convert dataflowKmsKey flag to experimental flag for Dataflow > - > > Key: BEAM-6664 > URL: https://issues.apache.org/jira/browse/BEAM-6664 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp, sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Fix For: 2.11.0 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5173) org.apache.beam.runners.fnexecution.control.RemoteExecutionTest.testExecutionWithMultipleStages is flaky
[ https://issues.apache.org/jira/browse/BEAM-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767764#comment-16767764 ] Brian Hulette commented on BEAM-5173: - Ok I'll try that out locally > org.apache.beam.runners.fnexecution.control.RemoteExecutionTest.testExecutionWithMultipleStages > is flaky > > > Key: BEAM-5173 > URL: https://issues.apache.org/jira/browse/BEAM-5173 > Project: Beam > Issue Type: Bug > Components: runner-core >Reporter: Valentyn Tymofieiev >Priority: Major > > Hi [~lcwik], this test failed in a [recent postcommit > build|https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/1285/testReport/junit/org.apache.beam.runners.fnexecution.control/RemoteExecutionTest/testExecutionWithMultipleStages]. > Could you please take a look or help triage to the right owner? Thank you. > Stack trace: > ava.util.concurrent.ExecutionException: java.lang.RuntimeException: > java.util.concurrent.ExecutionException: > org.apache.beam.vendor.grpc.v1.io.grpc.StatusRuntimeException: CANCELLED: > Runner closed connection > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.beam.runners.fnexecution.control.RemoteExecutionTest.tearDown(RemoteExecutionTest.java:198) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) > at > org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) > at > org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93) > at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) > at > org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:109) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >
[jira] [Created] (BEAM-6667) portableWordCount test doesn't properly cleanup python docker instance
Heejong Lee created BEAM-6667: - Summary: portableWordCount test doesn't properly cleanup python docker instance Key: BEAM-6667 URL: https://issues.apache.org/jira/browse/BEAM-6667 Project: Beam Issue Type: Bug Components: java-fn-execution, runner-flink Reporter: Heejong Lee Assignee: Heejong Lee portableWordCount test cleans flink job server docker instance but not python FnHarness docker instance. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6666) subprocess.Popen hangs after use of gRPC channel
[ https://issues.apache.org/jira/browse/BEAM-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767757#comment-16767757 ] Heejong Lee commented on BEAM-: --- multiple related issues are reported to gRPC since 2017 but the problem doesn't look fundamentally fixed. * [https://github.com/grpc/grpc/issues/17986] * [https://github.com/grpc/grpc/issues/13873] * [https://github.com/grpc/grpc/issues/13998] * [https://github.com/grpc/grpc/issues/15334] * [https://github.com/grpc/grpc/issues/15557] > subprocess.Popen hangs after use of gRPC channel > > > Key: BEAM- > URL: https://issues.apache.org/jira/browse/BEAM- > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > > subprocess.Popen randomly hangs after use of gRPC channel. This makes > cross-language wordcount test fail because the test uses Popen to launch > Dockerized Flink job server in `PortableRunner.run_pipeline` after use of > gRPC channel for the expansion service in `ExternalTransform.expand`. Few > symptoms are listed below: > * Hanging at `docker_path = check_output(['which', 'docker']).strip()` > * Hanging at `self.docker_process = Popen(cmd)` > * Crashing with `assertion failed: pthread_mutex_lock(mu) == 0` message > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6666) subprocess.Popen hangs after use of gRPC channel
Heejong Lee created BEAM-: - Summary: subprocess.Popen hangs after use of gRPC channel Key: BEAM- URL: https://issues.apache.org/jira/browse/BEAM- Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: Heejong Lee Assignee: Heejong Lee subprocess.Popen randomly hangs after use of gRPC channel. This makes cross-language wordcount test fail because the test uses Popen to launch Dockerized Flink job server in `PortableRunner.run_pipeline` after use of gRPC channel for the expansion service in `ExternalTransform.expand`. Few symptoms are listed below: * Hanging at `docker_path = check_output(['which', 'docker']).strip()` * Hanging at `self.docker_process = Popen(cmd)` * Crashing with `assertion failed: pthread_mutex_lock(mu) == 0` message -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6664) Temporarily convert dataflowKmsKey flag to experimental flag for Dataflow
[ https://issues.apache.org/jira/browse/BEAM-6664?focusedWorklogId=198418&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198418 ] ASF GitHub Bot logged work on BEAM-6664: Author: ASF GitHub Bot Created on: 14/Feb/19 00:55 Start Date: 14/Feb/19 00:55 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #7835: [BEAM-6664] Temporarily convert dataflowKmsKey flag to experimental URL: https://github.com/apache/beam/pull/7835 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198418) Time Spent: 1h (was: 50m) > Temporarily convert dataflowKmsKey flag to experimental flag for Dataflow > - > > Key: BEAM-6664 > URL: https://issues.apache.org/jira/browse/BEAM-6664 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp, sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5173) org.apache.beam.runners.fnexecution.control.RemoteExecutionTest.testExecutionWithMultipleStages is flaky
[ https://issues.apache.org/jira/browse/BEAM-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767752#comment-16767752 ] Valentyn Tymofieiev commented on BEAM-5173: --- I recommend to run this test 1000 times and see if it is still failing. On Wed, Feb 13, 2019 at 4:14 PM Brian Hulette (JIRA) > org.apache.beam.runners.fnexecution.control.RemoteExecutionTest.testExecutionWithMultipleStages > is flaky > > > Key: BEAM-5173 > URL: https://issues.apache.org/jira/browse/BEAM-5173 > Project: Beam > Issue Type: Bug > Components: runner-core >Reporter: Valentyn Tymofieiev >Priority: Major > > Hi [~lcwik], this test failed in a [recent postcommit > build|https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/1285/testReport/junit/org.apache.beam.runners.fnexecution.control/RemoteExecutionTest/testExecutionWithMultipleStages]. > Could you please take a look or help triage to the right owner? Thank you. > Stack trace: > ava.util.concurrent.ExecutionException: java.lang.RuntimeException: > java.util.concurrent.ExecutionException: > org.apache.beam.vendor.grpc.v1.io.grpc.StatusRuntimeException: CANCELLED: > Runner closed connection > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.beam.runners.fnexecution.control.RemoteExecutionTest.tearDown(RemoteExecutionTest.java:198) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) > at > org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) > at > org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93) > at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) > at > org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:109) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
[jira] [Work logged] (BEAM-6553) A BigQuery sink thta is SDK-implemented and supports file loads in Python
[ https://issues.apache.org/jira/browse/BEAM-6553?focusedWorklogId=198417&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198417 ] ASF GitHub Bot logged work on BEAM-6553: Author: ASF GitHub Bot Created on: 14/Feb/19 00:51 Start Date: 14/Feb/19 00:51 Worklog Time Spent: 10m Work Description: pabloem commented on issue #7655: [BEAM-6553] A Python SDK sink that supports File Loads into BQ URL: https://github.com/apache/beam/pull/7655#issuecomment-463438577 Run Python PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198417) Time Spent: 9.5h (was: 9h 20m) > A BigQuery sink thta is SDK-implemented and supports file loads in Python > - > > Key: BEAM-6553 > URL: https://issues.apache.org/jira/browse/BEAM-6553 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Labels: triaged > Time Spent: 9.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6380) apache_beam.examples.wordcount_it_test.WordCountIT with DirectRunner failed
[ https://issues.apache.org/jira/browse/BEAM-6380?focusedWorklogId=198414&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198414 ] ASF GitHub Bot logged work on BEAM-6380: Author: ASF GitHub Bot Created on: 14/Feb/19 00:47 Start Date: 14/Feb/19 00:47 Worklog Time Spent: 10m Work Description: udim commented on issue #7837: [BEAM-6380] Add debugging output to PipeStream URL: https://github.com/apache/beam/pull/7837#issuecomment-463437472 R: @charlesccychen This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198414) Time Spent: 20m (was: 10m) > apache_beam.examples.wordcount_it_test.WordCountIT with DirectRunner failed > --- > > Key: BEAM-6380 > URL: https://issues.apache.org/jira/browse/BEAM-6380 > Project: Beam > Issue Type: Test > Components: test-failures >Reporter: Boyuan Zhang >Assignee: Udi Meiri >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > wordcount test in :pythonPostCommit failed owing to RuntimeError: > NotImplementedError [while running 'write/Write/WriteImpl/WriteBundles'] > > https://builds.apache.org/job/beam_PostCommit_Python_Verify/7001/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6380) apache_beam.examples.wordcount_it_test.WordCountIT with DirectRunner failed
[ https://issues.apache.org/jira/browse/BEAM-6380?focusedWorklogId=198415&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198415 ] ASF GitHub Bot logged work on BEAM-6380: Author: ASF GitHub Bot Created on: 14/Feb/19 00:48 Start Date: 14/Feb/19 00:48 Worklog Time Spent: 10m Work Description: charlesccychen commented on issue #7837: [BEAM-6380] Add debugging output to PipeStream URL: https://github.com/apache/beam/pull/7837#issuecomment-463437793 Thanks, this LGTM. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198415) Time Spent: 0.5h (was: 20m) > apache_beam.examples.wordcount_it_test.WordCountIT with DirectRunner failed > --- > > Key: BEAM-6380 > URL: https://issues.apache.org/jira/browse/BEAM-6380 > Project: Beam > Issue Type: Test > Components: test-failures >Reporter: Boyuan Zhang >Assignee: Udi Meiri >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > wordcount test in :pythonPostCommit failed owing to RuntimeError: > NotImplementedError [while running 'write/Write/WriteImpl/WriteBundles'] > > https://builds.apache.org/job/beam_PostCommit_Python_Verify/7001/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6664) Temporarily convert dataflowKmsKey flag to experimental flag for Dataflow
[ https://issues.apache.org/jira/browse/BEAM-6664?focusedWorklogId=198411&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198411 ] ASF GitHub Bot logged work on BEAM-6664: Author: ASF GitHub Bot Created on: 14/Feb/19 00:13 Start Date: 14/Feb/19 00:13 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #7835: [BEAM-6664] Temporarily convert dataflowKmsKey flag to experimental URL: https://github.com/apache/beam/pull/7835#discussion_r256644550 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -590,6 +590,12 @@ def _add_argparse_args(cls, parser): 'enabled with this flag. Please sync with the owners of the runner ' 'before enabling any experiments.')) + def add_experiment(self, experiment): Review comment: This is great. Please file an JIRA to convert existing uses of `self.debug_options.experiments.append(...)` across the codebase to use this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198411) Time Spent: 50m (was: 40m) > Temporarily convert dataflowKmsKey flag to experimental flag for Dataflow > - > > Key: BEAM-6664 > URL: https://issues.apache.org/jira/browse/BEAM-6664 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp, sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5173) org.apache.beam.runners.fnexecution.control.RemoteExecutionTest.testExecutionWithMultipleStages is flaky
[ https://issues.apache.org/jira/browse/BEAM-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767726#comment-16767726 ] Brian Hulette commented on BEAM-5173: - I think this may already be resolved. The last time it failed in Java_PreCommit_Cron was #623 on Nov 23, 2018: https://builds.apache.org/job/beam_PreCommit_Java_Cron/623/ Cron Job History https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/testReport/junit/org.apache.beam.runners.fnexecution.control/RemoteExecutionTest/testExecutionWithMultipleStages/history/ [~tvalentyn] would you be ok closing, or do you think it may just be infrequent? > org.apache.beam.runners.fnexecution.control.RemoteExecutionTest.testExecutionWithMultipleStages > is flaky > > > Key: BEAM-5173 > URL: https://issues.apache.org/jira/browse/BEAM-5173 > Project: Beam > Issue Type: Bug > Components: runner-core >Reporter: Valentyn Tymofieiev >Priority: Major > > Hi [~lcwik], this test failed in a [recent postcommit > build|https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/1285/testReport/junit/org.apache.beam.runners.fnexecution.control/RemoteExecutionTest/testExecutionWithMultipleStages]. > Could you please take a look or help triage to the right owner? Thank you. > Stack trace: > ava.util.concurrent.ExecutionException: java.lang.RuntimeException: > java.util.concurrent.ExecutionException: > org.apache.beam.vendor.grpc.v1.io.grpc.StatusRuntimeException: CANCELLED: > Runner closed connection > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.beam.runners.fnexecution.control.RemoteExecutionTest.tearDown(RemoteExecutionTest.java:198) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33) > at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) > at > org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) > at > org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93) >
[jira] [Work logged] (BEAM-6380) apache_beam.examples.wordcount_it_test.WordCountIT with DirectRunner failed
[ https://issues.apache.org/jira/browse/BEAM-6380?focusedWorklogId=198408&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198408 ] ASF GitHub Bot logged work on BEAM-6380: Author: ASF GitHub Bot Created on: 14/Feb/19 00:04 Start Date: 14/Feb/19 00:04 Worklog Time Spent: 10m Work Description: udim commented on pull request #7837: [BEAM-6380] Add some debugging output. URL: https://github.com/apache/beam/pull/7837 IIUC, offset should be None (which would be a bug in apitools). Otherwise, I'd like to verify that `position - last_position <= chunkSize` Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Wo
[jira] [Assigned] (BEAM-6380) apache_beam.examples.wordcount_it_test.WordCountIT with DirectRunner failed
[ https://issues.apache.org/jira/browse/BEAM-6380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri reassigned BEAM-6380: --- Assignee: Udi Meiri > apache_beam.examples.wordcount_it_test.WordCountIT with DirectRunner failed > --- > > Key: BEAM-6380 > URL: https://issues.apache.org/jira/browse/BEAM-6380 > Project: Beam > Issue Type: Test > Components: test-failures >Reporter: Boyuan Zhang >Assignee: Udi Meiri >Priority: Major > > wordcount test in :pythonPostCommit failed owing to RuntimeError: > NotImplementedError [while running 'write/Write/WriteImpl/WriteBundles'] > > https://builds.apache.org/job/beam_PostCommit_Python_Verify/7001/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6553) A BigQuery sink thta is SDK-implemented and supports file loads in Python
[ https://issues.apache.org/jira/browse/BEAM-6553?focusedWorklogId=198396&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198396 ] ASF GitHub Bot logged work on BEAM-6553: Author: ASF GitHub Bot Created on: 13/Feb/19 23:32 Start Date: 13/Feb/19 23:32 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #7655: [BEAM-6553] A Python SDK sink that supports File Loads into BQ URL: https://github.com/apache/beam/pull/7655#discussion_r256635014 ## File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads.py ## @@ -0,0 +1,589 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +""" +Functionality to perform file loads into BigQuery for Batch and Streaming +pipelines. + +This source is able to work around BigQuery load quotas and limitations. When +destinations are dynamic, or when data for a single job is too large, the data +will be split into multiple jobs. + +NOTHING IN THIS FILE HAS BACKWARDS COMPATIBILITY GUARANTEES. +""" + +from __future__ import absolute_import + +import datetime +import hashlib +import logging +import random +import time +import uuid + +from future.utils import iteritems + +import apache_beam as beam +from apache_beam import pvalue +from apache_beam.io import filesystems as fs +from apache_beam.io.gcp import bigquery_tools +from apache_beam.io.gcp.internal.clients import bigquery as bigquery_api +from apache_beam.options import value_provider as vp +from apache_beam.transforms.combiners import Count + +ONE_TERABYTE = (1 << 40) + +# The maximum file size for imports is 5TB. We keep our files under that. +_DEFAULT_MAX_FILE_SIZE = 4 * ONE_TERABYTE + +_DEFAULT_MAX_WRITERS_PER_BUNDLE = 20 + +# The maximum size for a single load job is one terabyte +_MAXIMUM_LOAD_SIZE = 15 * ONE_TERABYTE + +# Big query only supports up to 10 thousand URIs for a single load job. +_MAXIMUM_SOURCE_URIS = 10*1000 + + +def _generate_load_job_name(): + datetime_component = datetime.datetime.now().strftime("%Y_%m_%d_%H%M%S") + # TODO(pabloem): include job id / pipeline component? + return 'beam_load_%s_%s' % (datetime_component, random.randint(0, 100)) + + +def _generate_file_prefix(pipeline_gcs_location): + # If a gcs location is provided to the pipeline, then we shall use that. + # Otherwise, we shall use the temp_location from pipeline options. + gcs_base = str(pipeline_gcs_location or + vp.RuntimeValueProvider.get_value('temp_location', str, '')) + prefix_uuid = _bq_uuid() + return fs.FileSystems.join(gcs_base, 'bq_load', prefix_uuid) + + +def _make_new_file_writer(file_prefix, destination): + if isinstance(destination, bigquery_api.TableReference): +destination = '%s:%s.%s' % ( +destination.projectId, destination.datasetId, destination.tableId) + + directory = fs.FileSystems.join(file_prefix, destination) + + if not fs.FileSystems.exists(directory): +fs.FileSystems.mkdirs(directory) + + file_name = str(uuid.uuid4()) + file_path = fs.FileSystems.join(file_prefix, destination, file_name) + + return file_path, fs.FileSystems.create(file_path, 'application/text') + + +def _bq_uuid(seed=None): + if not seed: +return str(uuid.uuid4()).replace("-", "") + else: +return str(hashlib.md5(seed).hexdigest()) + + +class _AppendDestinationsFn(beam.DoFn): + """Adds the destination to an element, making it a KV pair. + + Outputs a PCollection of KV-pairs where the key is a TableReference for the + destination, and the value is the record itself. + + Experimental; no backwards compatibility guarantees. + """ + + def __init__(self, destination): +if callable(destination): + self.destination = destination +else: + self.destination = lambda x: destination + + def process(self, element): +yield (self.destination(element), element) + + +class _ShardDestinations(beam.DoFn): + """Adds a shard number to the key of the KV element. + + Experimental; no backwards compatibility guarantees.""" + DEFAULT_SHARDING_FACTOR = 10 + + def __init__(self, sharding_factor=DEFAULT_SHARDING_FACTOR): +self.sharding_factor = sharding_factor + +
[jira] [Work logged] (BEAM-6553) A BigQuery sink thta is SDK-implemented and supports file loads in Python
[ https://issues.apache.org/jira/browse/BEAM-6553?focusedWorklogId=198400&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198400 ] ASF GitHub Bot logged work on BEAM-6553: Author: ASF GitHub Bot Created on: 13/Feb/19 23:32 Start Date: 13/Feb/19 23:32 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #7655: [BEAM-6553] A Python SDK sink that supports File Loads into BQ URL: https://github.com/apache/beam/pull/7655#discussion_r256635339 ## File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads.py ## @@ -0,0 +1,533 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +""" +Functionality to perform file loads into BigQuery for Batch and Streaming +pipelines. + +NOTHING IN THIS FILE HAS BACKWARDS COMPATIBILITY GUARANTEES. +""" + +from __future__ import absolute_import + +import datetime +import logging +import random +import time +import uuid + +from future.utils import iteritems + +import apache_beam as beam +from apache_beam import pvalue +from apache_beam.io import filesystems as fs +from apache_beam.io.gcp import bigquery_tools +from apache_beam.io.gcp.internal.clients import bigquery as bigquery_api +from apache_beam.options import value_provider as vp +from apache_beam.transforms.combiners import Count + +ONE_TERABYTE = (1 << 40) + +# The maximum file size for imports is 5TB. We keep our files under that. +_DEFAULT_MAX_FILE_SIZE = 4 * ONE_TERABYTE + +_DEFAULT_MAX_WRITERS_PER_BUNDLE = 20 + +# The maximum size for a single load job is one terabyte +_MAXIMUM_LOAD_SIZE = 15 * ONE_TERABYTE + + +def _generate_load_job_name(): + datetime_component = datetime.datetime.now().strftime("%Y_%m_%d_%H%M%S") + # TODO(pabloem): include job id / pipeline component? + return 'beam_load_%s_%s' % (datetime_component, random.randint(0, 100)) + + +def _generate_file_prefix(pipeline_gcs_location): + # If a gcs location is provided to the pipeline, then we shall use that. + # Otherwise, we shall use the temp_location from pipeline options. + gcs_base = str(pipeline_gcs_location or + vp.RuntimeValueProvider.get_value('temp_location', str, '')) + return fs.FileSystems.join(gcs_base, 'bq_load') + + +def _make_new_file_writer(file_prefix, destination): + if isinstance(destination, bigquery_api.TableReference): +destination = '%s:%s.%s' % ( +destination.projectId, destination.datasetId, destination.tableId) + + directory = fs.FileSystems.join(file_prefix, destination) + + if not fs.FileSystems.exists(directory): +fs.FileSystems.mkdirs(directory) + + file_name = str(uuid.uuid4()) + file_path = fs.FileSystems.join(file_prefix, destination, file_name) + + return file_path, fs.FileSystems.create(file_path, 'application/text') + + +def _bq_uuid(): + return str(uuid.uuid4()).replace("-", "") + + +class _AppendDestinationsFn(beam.DoFn): + """Adds the destination to an element, making it a KV pair. + + Outputs a PCollection of KV-pairs where the key is a TableReference for the + destination, and the value is the record itself. + + Experimental; no backwards compatibility guarantees. + """ + + def __init__(self, destination): +if callable(destination): + self.destination = destination +else: + self.destination = lambda x: destination + + def process(self, element): +yield (self.destination(element), element) + + +class _ShardDestinations(beam.DoFn): + """Adds a shard number to the key of the KV element. + + Experimental; no backwards compatibility guarantees.""" + DEFAULT_SHARDING_FACTOR = 10 + + def __init__(self, sharding_factor=DEFAULT_SHARDING_FACTOR): +self.sharding_factor = sharding_factor + + def start_bundle(self): +self._shard_count = random.randrange(self.sharding_factor) + + def process(self, element): +destination = element[0] +row = element[1] + +sharded_destination = (destination, + self._shard_count % self.sharding_factor) +self._shard_count += 1 +yield (sharded_destination, row) + + +class WriteRecordsToFile(beam.DoFn): + """Write input records to files before trigger
[jira] [Work logged] (BEAM-6553) A BigQuery sink thta is SDK-implemented and supports file loads in Python
[ https://issues.apache.org/jira/browse/BEAM-6553?focusedWorklogId=198397&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198397 ] ASF GitHub Bot logged work on BEAM-6553: Author: ASF GitHub Bot Created on: 13/Feb/19 23:32 Start Date: 13/Feb/19 23:32 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #7655: [BEAM-6553] A Python SDK sink that supports File Loads into BQ URL: https://github.com/apache/beam/pull/7655#discussion_r256634728 ## File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads.py ## @@ -0,0 +1,589 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +""" +Functionality to perform file loads into BigQuery for Batch and Streaming +pipelines. + +This source is able to work around BigQuery load quotas and limitations. When +destinations are dynamic, or when data for a single job is too large, the data +will be split into multiple jobs. + +NOTHING IN THIS FILE HAS BACKWARDS COMPATIBILITY GUARANTEES. +""" + +from __future__ import absolute_import + +import datetime +import hashlib +import logging +import random +import time +import uuid + +from future.utils import iteritems + +import apache_beam as beam +from apache_beam import pvalue +from apache_beam.io import filesystems as fs +from apache_beam.io.gcp import bigquery_tools +from apache_beam.io.gcp.internal.clients import bigquery as bigquery_api +from apache_beam.options import value_provider as vp +from apache_beam.transforms.combiners import Count + +ONE_TERABYTE = (1 << 40) + +# The maximum file size for imports is 5TB. We keep our files under that. +_DEFAULT_MAX_FILE_SIZE = 4 * ONE_TERABYTE + +_DEFAULT_MAX_WRITERS_PER_BUNDLE = 20 + +# The maximum size for a single load job is one terabyte +_MAXIMUM_LOAD_SIZE = 15 * ONE_TERABYTE + +# Big query only supports up to 10 thousand URIs for a single load job. +_MAXIMUM_SOURCE_URIS = 10*1000 + + +def _generate_load_job_name(): + datetime_component = datetime.datetime.now().strftime("%Y_%m_%d_%H%M%S") + # TODO(pabloem): include job id / pipeline component? + return 'beam_load_%s_%s' % (datetime_component, random.randint(0, 100)) + + +def _generate_file_prefix(pipeline_gcs_location): + # If a gcs location is provided to the pipeline, then we shall use that. Review comment: Hm? GCS is what it says : ) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198397) Time Spent: 9h (was: 8h 50m) > A BigQuery sink thta is SDK-implemented and supports file loads in Python > - > > Key: BEAM-6553 > URL: https://issues.apache.org/jira/browse/BEAM-6553 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Labels: triaged > Time Spent: 9h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6553) A BigQuery sink thta is SDK-implemented and supports file loads in Python
[ https://issues.apache.org/jira/browse/BEAM-6553?focusedWorklogId=198398&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198398 ] ASF GitHub Bot logged work on BEAM-6553: Author: ASF GitHub Bot Created on: 13/Feb/19 23:32 Start Date: 13/Feb/19 23:32 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #7655: [BEAM-6553] A Python SDK sink that supports File Loads into BQ URL: https://github.com/apache/beam/pull/7655#discussion_r256634913 ## File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads.py ## @@ -0,0 +1,589 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +""" +Functionality to perform file loads into BigQuery for Batch and Streaming +pipelines. + +This source is able to work around BigQuery load quotas and limitations. When +destinations are dynamic, or when data for a single job is too large, the data +will be split into multiple jobs. + +NOTHING IN THIS FILE HAS BACKWARDS COMPATIBILITY GUARANTEES. +""" + +from __future__ import absolute_import + +import datetime +import hashlib +import logging +import random +import time +import uuid + +from future.utils import iteritems + +import apache_beam as beam +from apache_beam import pvalue +from apache_beam.io import filesystems as fs +from apache_beam.io.gcp import bigquery_tools +from apache_beam.io.gcp.internal.clients import bigquery as bigquery_api +from apache_beam.options import value_provider as vp +from apache_beam.transforms.combiners import Count + +ONE_TERABYTE = (1 << 40) + +# The maximum file size for imports is 5TB. We keep our files under that. +_DEFAULT_MAX_FILE_SIZE = 4 * ONE_TERABYTE + +_DEFAULT_MAX_WRITERS_PER_BUNDLE = 20 + +# The maximum size for a single load job is one terabyte +_MAXIMUM_LOAD_SIZE = 15 * ONE_TERABYTE + +# Big query only supports up to 10 thousand URIs for a single load job. +_MAXIMUM_SOURCE_URIS = 10*1000 + + +def _generate_load_job_name(): + datetime_component = datetime.datetime.now().strftime("%Y_%m_%d_%H%M%S") + # TODO(pabloem): include job id / pipeline component? + return 'beam_load_%s_%s' % (datetime_component, random.randint(0, 100)) + + +def _generate_file_prefix(pipeline_gcs_location): + # If a gcs location is provided to the pipeline, then we shall use that. + # Otherwise, we shall use the temp_location from pipeline options. + gcs_base = str(pipeline_gcs_location or + vp.RuntimeValueProvider.get_value('temp_location', str, '')) + prefix_uuid = _bq_uuid() + return fs.FileSystems.join(gcs_base, 'bq_load', prefix_uuid) + + +def _make_new_file_writer(file_prefix, destination): + if isinstance(destination, bigquery_api.TableReference): +destination = '%s:%s.%s' % ( +destination.projectId, destination.datasetId, destination.tableId) + + directory = fs.FileSystems.join(file_prefix, destination) + + if not fs.FileSystems.exists(directory): +fs.FileSystems.mkdirs(directory) + + file_name = str(uuid.uuid4()) + file_path = fs.FileSystems.join(file_prefix, destination, file_name) + + return file_path, fs.FileSystems.create(file_path, 'application/text') + + +def _bq_uuid(seed=None): + if not seed: +return str(uuid.uuid4()).replace("-", "") + else: +return str(hashlib.md5(seed).hexdigest()) + + +class _AppendDestinationsFn(beam.DoFn): + """Adds the destination to an element, making it a KV pair. + + Outputs a PCollection of KV-pairs where the key is a TableReference for the + destination, and the value is the record itself. + + Experimental; no backwards compatibility guarantees. + """ + + def __init__(self, destination): +if callable(destination): + self.destination = destination +else: + self.destination = lambda x: destination + + def process(self, element): +yield (self.destination(element), element) + + +class _ShardDestinations(beam.DoFn): + """Adds a shard number to the key of the KV element. + + Experimental; no backwards compatibility guarantees.""" + DEFAULT_SHARDING_FACTOR = 10 + + def __init__(self, sharding_factor=DEFAULT_SHARDING_FACTOR): +self.sharding_factor = sharding_factor + +
[jira] [Work logged] (BEAM-6553) A BigQuery sink thta is SDK-implemented and supports file loads in Python
[ https://issues.apache.org/jira/browse/BEAM-6553?focusedWorklogId=198399&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198399 ] ASF GitHub Bot logged work on BEAM-6553: Author: ASF GitHub Bot Created on: 13/Feb/19 23:32 Start Date: 13/Feb/19 23:32 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #7655: [BEAM-6553] A Python SDK sink that supports File Loads into BQ URL: https://github.com/apache/beam/pull/7655#discussion_r256634572 ## File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads.py ## @@ -0,0 +1,533 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +""" +Functionality to perform file loads into BigQuery for Batch and Streaming +pipelines. + +NOTHING IN THIS FILE HAS BACKWARDS COMPATIBILITY GUARANTEES. +""" + +from __future__ import absolute_import + +import datetime +import logging +import random +import time +import uuid + +from future.utils import iteritems + +import apache_beam as beam +from apache_beam import pvalue +from apache_beam.io import filesystems as fs +from apache_beam.io.gcp import bigquery_tools +from apache_beam.io.gcp.internal.clients import bigquery as bigquery_api +from apache_beam.options import value_provider as vp +from apache_beam.transforms.combiners import Count + +ONE_TERABYTE = (1 << 40) + +# The maximum file size for imports is 5TB. We keep our files under that. +_DEFAULT_MAX_FILE_SIZE = 4 * ONE_TERABYTE Review comment: I've handled this now. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198399) Time Spent: 9h 20m (was: 9h 10m) > A BigQuery sink thta is SDK-implemented and supports file loads in Python > - > > Key: BEAM-6553 > URL: https://issues.apache.org/jira/browse/BEAM-6553 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Pablo Estrada >Assignee: Pablo Estrada >Priority: Major > Labels: triaged > Time Spent: 9h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6664) Temporarily convert dataflowKmsKey flag to experimental flag for Dataflow
[ https://issues.apache.org/jira/browse/BEAM-6664?focusedWorklogId=198390&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198390 ] ASF GitHub Bot logged work on BEAM-6664: Author: ASF GitHub Bot Created on: 13/Feb/19 23:05 Start Date: 13/Feb/19 23:05 Worklog Time Spent: 10m Work Description: udim commented on issue #7835: [BEAM-6664] Temporarily convert dataflowKmsKey flag to experimental URL: https://github.com/apache/beam/pull/7835#issuecomment-463412346 run python postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198390) Time Spent: 20m (was: 10m) > Temporarily convert dataflowKmsKey flag to experimental flag for Dataflow > - > > Key: BEAM-6664 > URL: https://issues.apache.org/jira/browse/BEAM-6664 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp, sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6664) Temporarily convert dataflowKmsKey flag to experimental flag for Dataflow
[ https://issues.apache.org/jira/browse/BEAM-6664?focusedWorklogId=198391&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198391 ] ASF GitHub Bot logged work on BEAM-6664: Author: ASF GitHub Bot Created on: 13/Feb/19 23:05 Start Date: 13/Feb/19 23:05 Worklog Time Spent: 10m Work Description: udim commented on issue #7835: [BEAM-6664] Temporarily convert dataflowKmsKey flag to experimental URL: https://github.com/apache/beam/pull/7835#issuecomment-463412375 run java postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198391) Time Spent: 0.5h (was: 20m) > Temporarily convert dataflowKmsKey flag to experimental flag for Dataflow > - > > Key: BEAM-6664 > URL: https://issues.apache.org/jira/browse/BEAM-6664 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp, sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6664) Temporarily convert dataflowKmsKey flag to experimental flag for Dataflow
[ https://issues.apache.org/jira/browse/BEAM-6664?focusedWorklogId=198392&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198392 ] ASF GitHub Bot logged work on BEAM-6664: Author: ASF GitHub Bot Created on: 13/Feb/19 23:06 Start Date: 13/Feb/19 23:06 Worklog Time Spent: 10m Work Description: udim commented on issue #7835: [BEAM-6664] Temporarily convert dataflowKmsKey flag to experimental URL: https://github.com/apache/beam/pull/7835#issuecomment-463412595 R: @aaltay This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198392) Time Spent: 40m (was: 0.5h) > Temporarily convert dataflowKmsKey flag to experimental flag for Dataflow > - > > Key: BEAM-6664 > URL: https://issues.apache.org/jira/browse/BEAM-6664 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp, sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-5816) Flink runner starts new bundles while disposing operator
[ https://issues.apache.org/jira/browse/BEAM-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Weise resolved BEAM-5816. Resolution: Fixed Fix Version/s: 2.11.0 > Flink runner starts new bundles while disposing operator > - > > Key: BEAM-5816 > URL: https://issues.apache.org/jira/browse/BEAM-5816 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Micah Wylde >Assignee: Maximilian Michels >Priority: Major > Labels: portability-flink, triaged > Fix For: 2.11.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > We sometimes see exceptions when shutting down portable flink pipelines > (either due to cancellation or failure): > {code} > 2018-10-19 15:54:52,905 ERROR > org.apache.flink.streaming.runtime.tasks.StreamTask - Error during > disposal of stream operator. > java.lang.RuntimeException: Failed to finish remote bundle > at > org.apache.beam.runners.flink.translation.wrappers.streaming.ExecutableStageDoFnOperator$SdkHarnessDoFnRunner.finishBundle(ExecutableStageDoFnOperator.java:241) > at > org.apache.beam.runners.flink.metrics.DoFnRunnerWithMetricsUpdate.finishBundle(DoFnRunnerWithMetricsUpdate.java:87) > at > org.apache.beam.runners.core.SimplePushbackSideInputDoFnRunner.finishBundle(SimplePushbackSideInputDoFnRunner.java:118) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator.invokeFinishBundle(DoFnOperator.java:674) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator.dispose(DoFnOperator.java:391) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.ExecutableStageDoFnOperator.dispose(ExecutableStageDoFnOperator.java:166) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:473) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:374) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.IllegalStateException: Already closed. > at > org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:95) > at > org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:251) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.ExecutableStageDoFnOperator$SdkHarnessDoFnRunner.finishBundle(ExecutableStageDoFnOperator.java:238) > ... 9 more > Suppressed: java.lang.IllegalStateException: Processing bundle failed, > TODO: [BEAM-3962] abort bundle. > at > org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:266) > ... 10 more > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5816) Flink runner starts new bundles while disposing operator
[ https://issues.apache.org/jira/browse/BEAM-5816?focusedWorklogId=198381&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198381 ] ASF GitHub Bot logged work on BEAM-5816: Author: ASF GitHub Bot Created on: 13/Feb/19 22:48 Start Date: 13/Feb/19 22:48 Worklog Time Spent: 10m Work Description: tweise commented on pull request #7719: [BEAM-5816] Finish Flink bundles exactly once URL: https://github.com/apache/beam/pull/7719 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198381) Time Spent: 2h 50m (was: 2h 40m) > Flink runner starts new bundles while disposing operator > - > > Key: BEAM-5816 > URL: https://issues.apache.org/jira/browse/BEAM-5816 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Micah Wylde >Assignee: Maximilian Michels >Priority: Major > Labels: portability-flink, triaged > Time Spent: 2h 50m > Remaining Estimate: 0h > > We sometimes see exceptions when shutting down portable flink pipelines > (either due to cancellation or failure): > {code} > 2018-10-19 15:54:52,905 ERROR > org.apache.flink.streaming.runtime.tasks.StreamTask - Error during > disposal of stream operator. > java.lang.RuntimeException: Failed to finish remote bundle > at > org.apache.beam.runners.flink.translation.wrappers.streaming.ExecutableStageDoFnOperator$SdkHarnessDoFnRunner.finishBundle(ExecutableStageDoFnOperator.java:241) > at > org.apache.beam.runners.flink.metrics.DoFnRunnerWithMetricsUpdate.finishBundle(DoFnRunnerWithMetricsUpdate.java:87) > at > org.apache.beam.runners.core.SimplePushbackSideInputDoFnRunner.finishBundle(SimplePushbackSideInputDoFnRunner.java:118) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator.invokeFinishBundle(DoFnOperator.java:674) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator.dispose(DoFnOperator.java:391) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.ExecutableStageDoFnOperator.dispose(ExecutableStageDoFnOperator.java:166) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:473) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:374) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.IllegalStateException: Already closed. > at > org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:95) > at > org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:251) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.ExecutableStageDoFnOperator$SdkHarnessDoFnRunner.finishBundle(ExecutableStageDoFnOperator.java:238) > ... 9 more > Suppressed: java.lang.IllegalStateException: Processing bundle failed, > TODO: [BEAM-3962] abort bundle. > at > org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:266) > ... 10 more > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3306) Consider: Go coder registry
[ https://issues.apache.org/jira/browse/BEAM-3306?focusedWorklogId=198380&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198380 ] ASF GitHub Bot logged work on BEAM-3306: Author: ASF GitHub Bot Created on: 13/Feb/19 22:46 Start Date: 13/Feb/19 22:46 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #7834: [BEAM-3306] Encapsulate coder details. URL: https://github.com/apache/beam/pull/7834 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198380) Time Spent: 2h 50m (was: 2h 40m) > Consider: Go coder registry > --- > > Key: BEAM-3306 > URL: https://issues.apache.org/jira/browse/BEAM-3306 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Henning Rohde >Assignee: Robert Burke >Priority: Minor > Labels: triaged > Time Spent: 2h 50m > Remaining Estimate: 0h > > Add coder registry to allow easier overwrite of default coders. We may also > allow otherwise un-encodable types, but that would require that function > analysis depends on it. > If we're hardcoding support for proto/avro, then there may be little need for > such a feature. Conversely, this may be how we implement such support. > > Proposal Doc: > [https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit#|https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3306) Consider: Go coder registry
[ https://issues.apache.org/jira/browse/BEAM-3306?focusedWorklogId=198368&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198368 ] ASF GitHub Bot logged work on BEAM-3306: Author: ASF GitHub Bot Created on: 13/Feb/19 22:26 Start Date: 13/Feb/19 22:26 Worklog Time Spent: 10m Work Description: lostluck commented on issue #7834: [BEAM-3306] Encapsulate coder details. URL: https://github.com/apache/beam/pull/7834#issuecomment-463400846 R: @aaltay Review Please! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198368) Time Spent: 2h 40m (was: 2.5h) > Consider: Go coder registry > --- > > Key: BEAM-3306 > URL: https://issues.apache.org/jira/browse/BEAM-3306 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Henning Rohde >Assignee: Robert Burke >Priority: Minor > Labels: triaged > Time Spent: 2h 40m > Remaining Estimate: 0h > > Add coder registry to allow easier overwrite of default coders. We may also > allow otherwise un-encodable types, but that would require that function > analysis depends on it. > If we're hardcoding support for proto/avro, then there may be little need for > such a feature. Conversely, this may be how we implement such support. > > Proposal Doc: > [https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit#|https://docs.google.com/document/d/1kQwx4Ah6PzG8z2ZMuNsNEXkGsLXm6gADOZaIO7reUOg/edit] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6664) Temporarily convert dataflowKmsKey flag to experimental flag for Dataflow
[ https://issues.apache.org/jira/browse/BEAM-6664?focusedWorklogId=198365&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198365 ] ASF GitHub Bot logged work on BEAM-6664: Author: ASF GitHub Bot Created on: 13/Feb/19 22:19 Start Date: 13/Feb/19 22:19 Worklog Time Spent: 10m Work Description: udim commented on pull request #7835: [BEAM-6664] Temporarily convert dataflowKmsKey flag to experimental URL: https://github.com/apache/beam/pull/7835 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/) | --- | --- | --- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198365) Time Spent: 10m Remaining Estimate: 0h > Temporarily convert dataflowKm
[jira] [Work logged] (BEAM-3306) Consider: Go coder registry
[ https://issues.apache.org/jira/browse/BEAM-3306?focusedWorklogId=198350&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198350 ] ASF GitHub Bot logged work on BEAM-3306: Author: ASF GitHub Bot Created on: 13/Feb/19 21:59 Start Date: 13/Feb/19 21:59 Worklog Time Spent: 10m Work Description: lostluck commented on pull request #7834: [BEAM-3306] Encapsulate coder details. URL: https://github.com/apache/beam/pull/7834 This change encapsulates internal implementation details that most users should not need. In particular: This CL removes needing to wrap values in the exec.FullValue type before encoding them. It also hides the internal coder.Coder abstraction. Previously users needed to do the following to use a beam coder to encode an element of a given type: var t reflect.Type // from some where coder := beam.NewCoder(typex.New(t)) enc := exec.MakeElementEncoder(beam.UnwrapCoder(coder)) var buf bytes.Buffer err := enc.Encode(exec.FullValue{Elm: value}, &buf) ... This required users to access beam implementation details directly: coder.Coder, the typex package, the exec package & the FullValue type. Following this PR: var t reflect.Type // from somewhere enc := beam.NewElementEncoder(t) var buf bytes.Buffer err := enc.Encode(value, &buf) ... which doesn't leak such details. Similarly for the Decoders. This PR makes the breaking change of removing the beam.UnwrapCoder method. There is an open question of whether users will ever need to encode KVs or similar appropriately for beam themselves, rather than via the framework, however at that point. TODO in a later PR: allow users to register coders that implement beam.ElementEncoder and beam.ElementDecoder. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | --- | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.a
[jira] [Work logged] (BEAM-6392) Add support for new BigQuery streaming read API to BigQueryIO
[ https://issues.apache.org/jira/browse/BEAM-6392?focusedWorklogId=198344&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198344 ] ASF GitHub Bot logged work on BEAM-6392: Author: ASF GitHub Bot Created on: 13/Feb/19 21:55 Start Date: 13/Feb/19 21:55 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #7441: [BEAM-6392] Add support for the BigQuery read API to BigQueryIO. URL: https://github.com/apache/beam/pull/7441#issuecomment-463390681 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198344) Time Spent: 4h (was: 3h 50m) > Add support for new BigQuery streaming read API to BigQueryIO > - > > Key: BEAM-6392 > URL: https://issues.apache.org/jira/browse/BEAM-6392 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Kenneth Jung >Assignee: Kenneth Jung >Priority: Major > Labels: triaged > Time Spent: 4h > Remaining Estimate: 0h > > BigQuery has developed a new streaming egress API which will soon reach > public availability. Add support for the new API in BigQueryIO. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-6623) Dataflow ValidatesRunner test suite should also exercise ValidatesRunner tests under Python 3.
[ https://issues.apache.org/jira/browse/BEAM-6623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu reassigned BEAM-6623: -- Assignee: Mark Liu (was: Robbe) > Dataflow ValidatesRunner test suite should also exercise ValidatesRunner > tests under Python 3. > -- > > Key: BEAM-6623 > URL: https://issues.apache.org/jira/browse/BEAM-6623 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Valentyn Tymofieiev >Assignee: Mark Liu >Priority: Critical > Labels: triaged > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6665) SDK source tarball is different when created on Python 2 and Python 3
[ https://issues.apache.org/jira/browse/BEAM-6665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767535#comment-16767535 ] Valentyn Tymofieiev commented on BEAM-6665: --- Verified that the contents of a tarball generated on Py2 and Py3 is the same lest a few insignificant differences in setup.cfg {noformat} 0 + @@ -7,21 +7,19 @@ branch = True source = apache_beam omit = - # Omit auto-generated files by the protocol buffer compiler. apache_beam/portability/api/*_pb2.py apache_beam/portability/api/*_pb2_grpc.py [coverage:report] exclude_lines = - # Have to re-enable the standard pragma pragma: no cover abc.abstractmethod - # Don't complain about missing debug-only code: + def __repr__ if self\.debug - # Don't complain if tests don't hit defensive assertion code: + raise NotImplementedError - # Don't complain if non-runnable code isn't run: + if __name__ == .__main__.: {noformat} > SDK source tarball is different when created on Python 2 and Python 3 > - > > Key: BEAM-6665 > URL: https://issues.apache.org/jira/browse/BEAM-6665 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Priority: Blocker > Fix For: 2.11.0 > > > When we create a source tarball via `python setup.py sdist` on Python 2, the > generated protocol buffer code includes relative imports. > Because of that, a tarball created on Python 2 interpreter cannot be used on > Python 3. > AFAIK, we release only one source tarball to PyPi, so if possible we should > make source distribution of Beam compatible both with Python 2 and Python 3. > When we create a source tarball on Python 3, we call futurize on generated > _.pb2.py files[1]. Looks like this does not happen on Python 2, which we can > fix in this issue. > [1] > [[https://github.com/apache/beam/blob/d6dc3602c808ce1961f46098deb898e5c7480ad4/sdks/python/gen_protos.py#L122] > cc [~altay], [~charlesche...@yahoo.com], [~RobbeSneyders] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-6665) SDK source tarball is different when created on Python 2 and Python 3
[ https://issues.apache.org/jira/browse/BEAM-6665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev resolved BEAM-6665. --- Resolution: Fixed Assignee: Valentyn Tymofieiev > SDK source tarball is different when created on Python 2 and Python 3 > - > > Key: BEAM-6665 > URL: https://issues.apache.org/jira/browse/BEAM-6665 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Blocker > Fix For: 2.11.0 > > > When we create a source tarball via `python setup.py sdist` on Python 2, the > generated protocol buffer code includes relative imports. > Because of that, a tarball created on Python 2 interpreter cannot be used on > Python 3. > AFAIK, we release only one source tarball to PyPi, so if possible we should > make source distribution of Beam compatible both with Python 2 and Python 3. > When we create a source tarball on Python 3, we call futurize on generated > _.pb2.py files[1]. Looks like this does not happen on Python 2, which we can > fix in this issue. > [1] > [[https://github.com/apache/beam/blob/d6dc3602c808ce1961f46098deb898e5c7480ad4/sdks/python/gen_protos.py#L122] > cc [~altay], [~charlesche...@yahoo.com], [~RobbeSneyders] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-6623) Dataflow ValidatesRunner test suite should also exercise ValidatesRunner tests under Python 3.
[ https://issues.apache.org/jira/browse/BEAM-6623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu resolved BEAM-6623. Resolution: Done Fix Version/s: Not applicable > Dataflow ValidatesRunner test suite should also exercise ValidatesRunner > tests under Python 3. > -- > > Key: BEAM-6623 > URL: https://issues.apache.org/jira/browse/BEAM-6623 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Valentyn Tymofieiev >Assignee: Mark Liu >Priority: Critical > Labels: triaged > Fix For: Not applicable > > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6623) Dataflow ValidatesRunner test suite should also exercise ValidatesRunner tests under Python 3.
[ https://issues.apache.org/jira/browse/BEAM-6623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767524#comment-16767524 ] Mark Liu commented on BEAM-6623: #7825 is merged. ValidatesRunner suite is now running in beam_PostCommit_Python3_Verify. > Dataflow ValidatesRunner test suite should also exercise ValidatesRunner > tests under Python 3. > -- > > Key: BEAM-6623 > URL: https://issues.apache.org/jira/browse/BEAM-6623 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Valentyn Tymofieiev >Assignee: Mark Liu >Priority: Critical > Labels: triaged > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6665) SDK source tarball is different when created on Python 2 and Python 3
[ https://issues.apache.org/jira/browse/BEAM-6665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767516#comment-16767516 ] Valentyn Tymofieiev commented on BEAM-6665: --- Looking at gen_protos code, futurization of protobufs happens both on Python 2 and Python 3 versions, and in successful scenario it does. I encountered a corner case where protocol bufferes were generated, but futurization failed due to a pip error. The next time I re-ran sdist, the gen_protos did not re-generate protocol buffers because they were already generated, and futurization codepath was not exercised: https://github.com/apache/beam/blob/d6dc3602c808ce1961f46098deb898e5c7480ad4/sdks/python/gen_protos.py#L76 > SDK source tarball is different when created on Python 2 and Python 3 > - > > Key: BEAM-6665 > URL: https://issues.apache.org/jira/browse/BEAM-6665 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Priority: Blocker > Fix For: 2.11.0 > > > When we create a source tarball via `python setup.py sdist` on Python 2, the > generated protocol buffer code includes relative imports. > Because of that, a tarball created on Python 2 interpreter cannot be used on > Python 3. > AFAIK, we release only one source tarball to PyPi, so if possible we should > make source distribution of Beam compatible both with Python 2 and Python 3. > When we create a source tarball on Python 3, we call futurize on generated > _.pb2.py files[1]. Looks like this does not happen on Python 2, which we can > fix in this issue. > [1] > [[https://github.com/apache/beam/blob/d6dc3602c808ce1961f46098deb898e5c7480ad4/sdks/python/gen_protos.py#L122] > cc [~altay], [~charlesche...@yahoo.com], [~RobbeSneyders] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6649) beam11 worker fails all jobs
[ https://issues.apache.org/jira/browse/BEAM-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767509#comment-16767509 ] yifan zou commented on BEAM-6649: - The VM was reset, waiting for reconnection. https://issues.apache.org/jira/browse/INFRA-17837 > beam11 worker fails all jobs > > > Key: BEAM-6649 > URL: https://issues.apache.org/jira/browse/BEAM-6649 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Mikhail Gryzykhin >Assignee: yifan zou >Priority: Major > Labels: currently-failing > > _Use this form to file an issue for test failure:_ > * [Jenkins > Job|https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/288/console] > Initial investigation: > Github plugin crashed fetching data from git. > Accroding to [stackoverflow error > 128|https://stackoverflow.com/questions/16721629/jenkins-returned-status-code-128-with-github] > means wrong ssh key. > Relevant logs: > *04:01:08* FATAL: Could not checkout > d1200202c8e98d39dc8422b1255954b31a4341cb*04:01:08* > hudson.plugins.git.GitException: Command "git checkout -f > d1200202c8e98d39dc8422b1255954b31a4341cb" returned status code 128:*04:01:08* > stdout: *04:01:08* stderr: fatal: unable to create threaded lstat > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6665) SDK source tarball is different when created on Python 2 and Python 3
[ https://issues.apache.org/jira/browse/BEAM-6665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev updated BEAM-6665: -- Description: When we create a source tarball via `python setup.py sdist` on Python 2, the generated protocol buffer code includes relative imports. Because of that, a tarball created on Python 2 interpreter cannot be used on Python 3. AFAIK, we release only one source tarball to PyPi, so if possible we should make source distribution of Beam compatible both with Python 2 and Python 3. When we create a source tarball on Python 3, we call futurize on generated _.pb2.py files[1]. Looks like this does not happen on Python 2, which we can fix in this issue. [1] [[https://github.com/apache/beam/blob/d6dc3602c808ce1961f46098deb898e5c7480ad4/sdks/python/gen_protos.py#L122] cc [~altay], [~charlesche...@yahoo.com], [~RobbeSneyders] was: When we create a source tarball via `python setup.py sdist` on Python 2, the generated protocol buffer code includes relative imports. Because of that, a tarball created on Python 2 interpreter cannot be used on Python 3. AFAIK, we release only one source tarball to PyPi, so if possible we should make source distribution of Beam compatible both with Python 2 and Python 3. When we create a source tarball on Python 3, we call futurize on generated _.pb2.py files[1]. Looks like this does not happen on Python 2, which we can fix in this issue. [1] [[https://github.com/apache/beam/blob/d6dc3602c808ce1961f46098deb898e5c7480ad4/sdks/python/gen_protos.py#L122]|https://github.com/apache/beam/blob/d6dc3602c808ce1961f46098deb898e5c7480ad4/sdks/python/gen_protos.py#L122] cc [~altay], [~charlesche...@yahoo.com], [~RobbeSneyders] > SDK source tarball is different when created on Python 2 and Python 3 > - > > Key: BEAM-6665 > URL: https://issues.apache.org/jira/browse/BEAM-6665 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Priority: Blocker > Fix For: 2.11.0 > > > When we create a source tarball via `python setup.py sdist` on Python 2, the > generated protocol buffer code includes relative imports. > Because of that, a tarball created on Python 2 interpreter cannot be used on > Python 3. > AFAIK, we release only one source tarball to PyPi, so if possible we should > make source distribution of Beam compatible both with Python 2 and Python 3. > When we create a source tarball on Python 3, we call futurize on generated > _.pb2.py files[1]. Looks like this does not happen on Python 2, which we can > fix in this issue. > [1] > [[https://github.com/apache/beam/blob/d6dc3602c808ce1961f46098deb898e5c7480ad4/sdks/python/gen_protos.py#L122] > cc [~altay], [~charlesche...@yahoo.com], [~RobbeSneyders] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6665) SDK source tarball is different when created on Python 2 and Python 3
[ https://issues.apache.org/jira/browse/BEAM-6665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev updated BEAM-6665: -- Description: When we create a source tarball via `python setup.py sdist` on Python 2, the generated protocol buffer code includes relative imports. Because of that, a tarball created on Python 2 interpreter cannot be used on Python 3. AFAIK, we release only one source tarball to PyPi, so if possible we should make source distribution of Beam compatible both with Python 2 and Python 3. When we create a source tarball on Python 3, we call futurize on generated _.pb2.py files[1]. Looks like this does not happen on Python 2, which we can fix in this issue. [1] [[https://github.com/apache/beam/blob/d6dc3602c808ce1961f46098deb898e5c7480ad4/sdks/python/gen_protos.py#L122]|https://github.com/apache/beam/blob/d6dc3602c808ce1961f46098deb898e5c7480ad4/sdks/python/gen_protos.py#L122] cc [~altay], [~charlesche...@yahoo.com], [~RobbeSneyders] was: When we create a source tarball via `python setup.py sdist` on Python 2, the generated protocol buffer code includes relative imports. Because of that, a tarball created on Python 2 interpreter cannot be used on Python 3. AFAIK, we release only one source tarball to PyPi, so if possible we should make source distribution of Beam compatible both with Python 2 and Python 3. When we create a source tarball on Python 3, we call futurize on generated _.pb2.py files[1]. Looks like this does not happen on Python 2, which we can fix in this issue. [[1] https://github.com/apache/beam/blob/d6dc3602c808ce1961f46098deb898e5c7480ad4/sdks/python/gen_protos.py#L122|https://github.com/apache/beam/blob/d6dc3602c808ce1961f46098deb898e5c7480ad4/sdks/python/gen_protos.py#L122] cc [~altay], [~charlesche...@yahoo.com], [~RobbeSneyders] > SDK source tarball is different when created on Python 2 and Python 3 > - > > Key: BEAM-6665 > URL: https://issues.apache.org/jira/browse/BEAM-6665 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Priority: Blocker > Fix For: 2.11.0 > > > When we create a source tarball via `python setup.py sdist` on Python 2, the > generated protocol buffer code includes relative imports. > Because of that, a tarball created on Python 2 interpreter cannot be used on > Python 3. > AFAIK, we release only one source tarball to PyPi, so if possible we should > make source distribution of Beam compatible both with Python 2 and Python 3. > When we create a source tarball on Python 3, we call futurize on generated > _.pb2.py files[1]. Looks like this does not happen on Python 2, which we can > fix in this issue. > [1] > [[https://github.com/apache/beam/blob/d6dc3602c808ce1961f46098deb898e5c7480ad4/sdks/python/gen_protos.py#L122]|https://github.com/apache/beam/blob/d6dc3602c808ce1961f46098deb898e5c7480ad4/sdks/python/gen_protos.py#L122] > > cc [~altay], [~charlesche...@yahoo.com], [~RobbeSneyders] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-6572) Dataflow Python runner should use a Python-3 compatible container when starting a Python 3 pipeline.
[ https://issues.apache.org/jira/browse/BEAM-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev reassigned BEAM-6572: - Assignee: Valentyn Tymofieiev > Dataflow Python runner should use a Python-3 compatible container when > starting a Python 3 pipeline. > > > Key: BEAM-6572 > URL: https://issues.apache.org/jira/browse/BEAM-6572 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Blocker > Fix For: 2.11.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Container choice will be affected by: > - SDK version (dev/released) > - Execution mode (FnApi/Legacy) > - Python interpreter version. > cc: [~ccy] [~markflyhigh] [~altay] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-6572) Dataflow Python runner should use a Python-3 compatible container when starting a Python 3 pipeline.
[ https://issues.apache.org/jira/browse/BEAM-6572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev resolved BEAM-6572. --- Resolution: Fixed Fix Version/s: 2.11.0 > Dataflow Python runner should use a Python-3 compatible container when > starting a Python 3 pipeline. > > > Key: BEAM-6572 > URL: https://issues.apache.org/jira/browse/BEAM-6572 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow >Reporter: Valentyn Tymofieiev >Priority: Blocker > Fix For: 2.11.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Container choice will be affected by: > - SDK version (dev/released) > - Execution mode (FnApi/Legacy) > - Python interpreter version. > cc: [~ccy] [~markflyhigh] [~altay] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6665) SDK source tarball is different when created on Python 2 and Python 3
[ https://issues.apache.org/jira/browse/BEAM-6665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev updated BEAM-6665: -- Fix Version/s: 2.11.0 > SDK source tarball is different when created on Python 2 and Python 3 > - > > Key: BEAM-6665 > URL: https://issues.apache.org/jira/browse/BEAM-6665 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Priority: Blocker > Fix For: 2.11.0 > > > When we create a source tarball via `python setup.py sdist` on Python 2, the > generated protocol buffer code includes relative imports. > Because of that, a tarball created on Python 2 interpreter cannot be used on > Python 3. > AFAIK, we release only one source tarball to PyPi, so if possible we should > make source distribution of Beam compatible both with Python 2 and Python 3. > When we create a source tarball on Python 3, we call futurize on generated > _.pb2.py files[1]. Looks like this does not happen on Python 2, which we can > fix in this issue. > [[1] > https://github.com/apache/beam/blob/d6dc3602c808ce1961f46098deb898e5c7480ad4/sdks/python/gen_protos.py#L122|https://github.com/apache/beam/blob/d6dc3602c808ce1961f46098deb898e5c7480ad4/sdks/python/gen_protos.py#L122] > > cc [~altay], [~charlesche...@yahoo.com], [~RobbeSneyders] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3342) Create a Cloud Bigtable Python connector
[ https://issues.apache.org/jira/browse/BEAM-3342?focusedWorklogId=198268&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198268 ] ASF GitHub Bot logged work on BEAM-3342: Author: ASF GitHub Bot Created on: 13/Feb/19 18:39 Start Date: 13/Feb/19 18:39 Worklog Time Spent: 10m Work Description: angoenka commented on issue #7737: [BEAM-3342] Create a Cloud Bigtable Python connector Read URL: https://github.com/apache/beam/pull/7737#issuecomment-463315783 @chamikaramj can you please take a look. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198268) Time Spent: 15h 40m (was: 15.5h) > Create a Cloud Bigtable Python connector > > > Key: BEAM-3342 > URL: https://issues.apache.org/jira/browse/BEAM-3342 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Solomon Duskis >Assignee: Solomon Duskis >Priority: Major > Labels: triaged > Time Spent: 15h 40m > Remaining Estimate: 0h > > I would like to create a Cloud Bigtable python connector. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6623) Dataflow ValidatesRunner test suite should also exercise ValidatesRunner tests under Python 3.
[ https://issues.apache.org/jira/browse/BEAM-6623?focusedWorklogId=198272&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198272 ] ASF GitHub Bot logged work on BEAM-6623: Author: ASF GitHub Bot Created on: 13/Feb/19 18:43 Start Date: 13/Feb/19 18:43 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #7825: [BEAM-6623] Exercise ValidatesRunner batch tests in Python 3 URL: https://github.com/apache/beam/pull/7825 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198272) Time Spent: 1h 10m (was: 1h) > Dataflow ValidatesRunner test suite should also exercise ValidatesRunner > tests under Python 3. > -- > > Key: BEAM-6623 > URL: https://issues.apache.org/jira/browse/BEAM-6623 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Valentyn Tymofieiev >Assignee: Robbe >Priority: Critical > Labels: triaged > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6572) Dataflow Python runner should use a Python-3 compatible container when starting a Python 3 pipeline.
[ https://issues.apache.org/jira/browse/BEAM-6572?focusedWorklogId=198270&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198270 ] ASF GitHub Bot logged work on BEAM-6572: Author: ASF GitHub Bot Created on: 13/Feb/19 18:42 Start Date: 13/Feb/19 18:42 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #7829: [BEAM-6572] Makes Dataflow runner use python3-fnapi container when applicable. URL: https://github.com/apache/beam/pull/7829 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198270) Time Spent: 0.5h (was: 20m) > Dataflow Python runner should use a Python-3 compatible container when > starting a Python 3 pipeline. > > > Key: BEAM-6572 > URL: https://issues.apache.org/jira/browse/BEAM-6572 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow >Reporter: Valentyn Tymofieiev >Priority: Blocker > Time Spent: 0.5h > Remaining Estimate: 0h > > Container choice will be affected by: > - SDK version (dev/released) > - Execution mode (FnApi/Legacy) > - Python interpreter version. > cc: [~ccy] [~markflyhigh] [~altay] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5315) Finish Python 3 porting for io module
[ https://issues.apache.org/jira/browse/BEAM-5315?focusedWorklogId=198269&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198269 ] ASF GitHub Bot logged work on BEAM-5315: Author: ASF GitHub Bot Created on: 13/Feb/19 18:42 Start Date: 13/Feb/19 18:42 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #7804: [BEAM-5315] python 3 datastore fail gracefully URL: https://github.com/apache/beam/pull/7804 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198269) Time Spent: 19.5h (was: 19h 20m) > Finish Python 3 porting for io module > - > > Key: BEAM-5315 > URL: https://issues.apache.org/jira/browse/BEAM-5315 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Robbe >Priority: Major > Labels: triaged > Time Spent: 19.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6392) Add support for new BigQuery streaming read API to BigQueryIO
[ https://issues.apache.org/jira/browse/BEAM-6392?focusedWorklogId=198267&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198267 ] ASF GitHub Bot logged work on BEAM-6392: Author: ASF GitHub Bot Created on: 13/Feb/19 18:38 Start Date: 13/Feb/19 18:38 Worklog Time Spent: 10m Work Description: kmjung commented on issue #7441: [BEAM-6392] Add support for the BigQuery read API to BigQueryIO. URL: https://github.com/apache/beam/pull/7441#issuecomment-463315433 @chamikaramj we should be good to go here. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198267) Time Spent: 3h 50m (was: 3h 40m) > Add support for new BigQuery streaming read API to BigQueryIO > - > > Key: BEAM-6392 > URL: https://issues.apache.org/jira/browse/BEAM-6392 > Project: Beam > Issue Type: New Feature > Components: io-java-gcp >Reporter: Kenneth Jung >Assignee: Kenneth Jung >Priority: Major > Labels: triaged > Time Spent: 3h 50m > Remaining Estimate: 0h > > BigQuery has developed a new streaming egress API which will soon reach > public availability. Add support for the new API in BigQueryIO. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4461) Create a library of useful transforms that use schemas
[ https://issues.apache.org/jira/browse/BEAM-4461?focusedWorklogId=198266&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198266 ] ASF GitHub Bot logged work on BEAM-4461: Author: ASF GitHub Bot Created on: 13/Feb/19 18:38 Start Date: 13/Feb/19 18:38 Worklog Time Spent: 10m Work Description: kanterov commented on issue #7353: [BEAM-4461] Support inner and outer style joins in CoGroup. URL: https://github.com/apache/beam/pull/7353#issuecomment-463315320 @reuvenlax I'm going on vacation, but I review on the week of 25th February This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198266) Time Spent: 19.5h (was: 19h 20m) > Create a library of useful transforms that use schemas > -- > > Key: BEAM-4461 > URL: https://issues.apache.org/jira/browse/BEAM-4461 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Labels: triaged > Time Spent: 19.5h > Remaining Estimate: 0h > > e.g. JoinBy(fields). Project, Filter, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6650) FlinkRunner fails to checkpoint elements emitted during finishBundle
[ https://issues.apache.org/jira/browse/BEAM-6650?focusedWorklogId=198263&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198263 ] ASF GitHub Bot logged work on BEAM-6650: Author: ASF GitHub Bot Created on: 13/Feb/19 18:33 Start Date: 13/Feb/19 18:33 Worklog Time Spent: 10m Work Description: mxm commented on issue #7810: [BEAM-6650] Add bundle test with checkpointing for keyed processing URL: https://github.com/apache/beam/pull/7810#issuecomment-463313948 @aljoscha I'm assuming insertion order at the state backend at the moment. Not sure if that assumption holds. At least for the tests it works. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198263) Time Spent: 1h 40m (was: 1.5h) > FlinkRunner fails to checkpoint elements emitted during finishBundle > > > Key: BEAM-6650 > URL: https://issues.apache.org/jira/browse/BEAM-6650 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.11.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Elements emitted during the finalizeBundle call in snapshopState are lost > after the pipeline is restored. This only happens when the operator is keyed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5816) Flink runner starts new bundles while disposing operator
[ https://issues.apache.org/jira/browse/BEAM-5816?focusedWorklogId=198254&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198254 ] ASF GitHub Bot logged work on BEAM-5816: Author: ASF GitHub Bot Created on: 13/Feb/19 18:16 Start Date: 13/Feb/19 18:16 Worklog Time Spent: 10m Work Description: mxm commented on issue #7719: [BEAM-5816] Finish Flink bundles exactly once URL: https://github.com/apache/beam/pull/7719#issuecomment-463307702 >1. After an exception or cancel, we don't want any more "normal" processing (looks like this is now addressed) Yes, this is addressed. Bundle will only be finalized in case of normal operation. Though my original intention here was to fix the race condition between the timer thread for finishing the bundle and a regular bundle finish >2. Best effort resource cleanup for user code (not sure about that one, would this happen after an exception)? Cleanup is performed via DoFn#teardown which will be called in any case in `dispose()`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198254) Time Spent: 2h 40m (was: 2.5h) > Flink runner starts new bundles while disposing operator > - > > Key: BEAM-5816 > URL: https://issues.apache.org/jira/browse/BEAM-5816 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Micah Wylde >Assignee: Maximilian Michels >Priority: Major > Labels: portability-flink, triaged > Time Spent: 2h 40m > Remaining Estimate: 0h > > We sometimes see exceptions when shutting down portable flink pipelines > (either due to cancellation or failure): > {code} > 2018-10-19 15:54:52,905 ERROR > org.apache.flink.streaming.runtime.tasks.StreamTask - Error during > disposal of stream operator. > java.lang.RuntimeException: Failed to finish remote bundle > at > org.apache.beam.runners.flink.translation.wrappers.streaming.ExecutableStageDoFnOperator$SdkHarnessDoFnRunner.finishBundle(ExecutableStageDoFnOperator.java:241) > at > org.apache.beam.runners.flink.metrics.DoFnRunnerWithMetricsUpdate.finishBundle(DoFnRunnerWithMetricsUpdate.java:87) > at > org.apache.beam.runners.core.SimplePushbackSideInputDoFnRunner.finishBundle(SimplePushbackSideInputDoFnRunner.java:118) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator.invokeFinishBundle(DoFnOperator.java:674) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator.dispose(DoFnOperator.java:391) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.ExecutableStageDoFnOperator.dispose(ExecutableStageDoFnOperator.java:166) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:473) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:374) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.IllegalStateException: Already closed. > at > org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:95) > at > org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:251) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.ExecutableStageDoFnOperator$SdkHarnessDoFnRunner.finishBundle(ExecutableStageDoFnOperator.java:238) > ... 9 more > Suppressed: java.lang.IllegalStateException: Processing bundle failed, > TODO: [BEAM-3962] abort bundle. > at > org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:266) > ... 10 more > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5315) Finish Python 3 porting for io module
[ https://issues.apache.org/jira/browse/BEAM-5315?focusedWorklogId=198255&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198255 ] ASF GitHub Bot logged work on BEAM-5315: Author: ASF GitHub Bot Created on: 13/Feb/19 18:20 Start Date: 13/Feb/19 18:20 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #7804: [BEAM-5315] python 3 datastore fail gracefully URL: https://github.com/apache/beam/pull/7804#discussion_r256525248 ## File path: sdks/python/tox.ini ## @@ -93,7 +93,7 @@ setenv = RUN_SKIPPED_PY3_TESTS=0 extras = test,gcp modules = - apache_beam.typehints,apache_beam.coders,apache_beam.options,apache_beam.tools,apache_beam.utils,apache_beam.internal,apache_beam.metrics,apache_beam.portability,apache_beam.pipeline_test,apache_beam.pvalue_test,apache_beam.runners,apache_beam.io.hadoopfilesystem_test,apache_beam.io.gcp.tests.utils_test,apache_beam.io.gcp.big_query_query_to_table_it_test,apache_beam.io.gcp.bigquery_io_read_it_test,apache_beam.io.gcp.bigquery_test,apache_beam.io.gcp.pubsub_integration_test,apache_beam.io.hdfs_integration_test,apache_beam.io.gcp.internal,apache_beam.io.filesystem_test,apache_beam.io.filesystems_test,apache_beam.io.sources_test,apache_beam.transforms,apache_beam.testing,apache_beam.io.filesystemio_test,apache_beam.io.localfilesystem_test,apache_beam.io.range_trackers_test,apache_beam.io.restriction_trackers_test,apache_beam.io.source_test_utils_test,apache_beam.io.concat_source_test,apache_beam.io.filebasedsink_test,apache_beam.io.filebasedsource_test,apache_beam.io.textio_test,apache_beam.io.tfrecordio_test,apache_beam.examples.wordcount_debugging_test,apache_beam.examples.wordcount_minimal_test,apache_beam.examples.wordcount_test,apache_beam.io.parquetio_test,apache_beam.io.gcp.gcsfilesystem_test,apache_beam.io.gcp.gcsio_integration_test,apache_beam.io.gcp.bigquery_test,apache_beam.io.gcp.big_query_query_to_table_it_test,apache_beam.io.gcp.bigquery_io_read_it_test,apache_beam.io.gcp.bigquery_tools_test,apache_beam.io.gcp.pubsub_integration_test,apache_beam.io.gcp.pubsub_test + apache_beam.typehints,apache_beam.coders,apache_beam.options,apache_beam.tools,apache_beam.utils,apache_beam.internal,apache_beam.metrics,apache_beam.portability,apache_beam.pipeline_test,apache_beam.pvalue_test,apache_beam.runners,apache_beam.io.hadoopfilesystem_test,apache_beam.io.gcp.tests.utils_test,apache_beam.io.gcp.big_query_query_to_table_it_test,apache_beam.io.gcp.bigquery_io_read_it_test,apache_beam.io.gcp.bigquery_test,apache_beam.io.gcp.pubsub_integration_test,apache_beam.io.hdfs_integration_test,apache_beam.io.gcp.internal,apache_beam.io.filesystem_test,apache_beam.io.filesystems_test,apache_beam.io.sources_test,apache_beam.transforms,apache_beam.testing,apache_beam.io.filesystemio_test,apache_beam.io.localfilesystem_test,apache_beam.io.range_trackers_test,apache_beam.io.restriction_trackers_test,apache_beam.io.source_test_utils_test,apache_beam.io.concat_source_test,apache_beam.io.filebasedsink_test,apache_beam.io.filebasedsource_test,apache_beam.io.textio_test,apache_beam.io.tfrecordio_test,apache_beam.examples.wordcount_debugging_test,apache_beam.examples.wordcount_minimal_test,apache_beam.examples.wordcount_test,apache_beam.io.parquetio_test,apache_beam.io.gcp.gcsfilesystem_test,apache_beam.io.gcp.gcsio_integration_test,apache_beam.io.gcp.bigquery_test,apache_beam.io.gcp.big_query_query_to_table_it_test,apache_beam.io.gcp.bigquery_io_read_it_test,apache_beam.io.gcp.bigquery_tools_test,apache_beam.io.gcp.pubsub_integration_test,apache_beam.io.gcp.pubsub_test,apache_beam.io.gcp.datastore,apache_beam.io.gcp.datastore_write_it_test Review comment: Adding Datastore tests doesn't change anything since we skip them anyway, but it's a harmless change, so we can merge it. Let's try to simplify this to run all tests in a future PR. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198255) Time Spent: 19h 20m (was: 19h 10m) > Finish Python 3 porting for io module > - > > Key: BEAM-5315 > URL: https://issues.apache.org/jira/browse/BEAM-5315 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Robbe >Priority: Major > Labels: triaged > Time Spent: 19h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6623) Dataflow ValidatesRunner test suite should also exercise ValidatesRunner tests under Python 3.
[ https://issues.apache.org/jira/browse/BEAM-6623?focusedWorklogId=198251&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198251 ] ASF GitHub Bot logged work on BEAM-6623: Author: ASF GitHub Bot Created on: 13/Feb/19 18:08 Start Date: 13/Feb/19 18:08 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #7825: [BEAM-6623] Exercise ValidatesRunner batch tests in Python 3 URL: https://github.com/apache/beam/pull/7825#issuecomment-463304891 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198251) Time Spent: 50m (was: 40m) > Dataflow ValidatesRunner test suite should also exercise ValidatesRunner > tests under Python 3. > -- > > Key: BEAM-6623 > URL: https://issues.apache.org/jira/browse/BEAM-6623 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Valentyn Tymofieiev >Assignee: Robbe >Priority: Critical > Labels: triaged > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6665) SDK source tarball is different when created on Python 2 and Python 3
Valentyn Tymofieiev created BEAM-6665: - Summary: SDK source tarball is different when created on Python 2 and Python 3 Key: BEAM-6665 URL: https://issues.apache.org/jira/browse/BEAM-6665 Project: Beam Issue Type: Sub-task Components: sdk-py-core Reporter: Valentyn Tymofieiev When we create a source tarball via `python setup.py sdist` on Python 2, the generated protocol buffer code includes relative imports. Because of that, a tarball created on Python 2 interpreter cannot be used on Python 3. AFAIK, we release only one source tarball to PyPi, so if possible we should make source distribution of Beam compatible both with Python 2 and Python 3. When we create a source tarball on Python 3, we call futurize on generated _.pb2.py files[1]. Looks like this does not happen on Python 2, which we can fix in this issue. [[1] https://github.com/apache/beam/blob/d6dc3602c808ce1961f46098deb898e5c7480ad4/sdks/python/gen_protos.py#L122|https://github.com/apache/beam/blob/d6dc3602c808ce1961f46098deb898e5c7480ad4/sdks/python/gen_protos.py#L122] cc [~altay], [~charlesche...@yahoo.com], [~RobbeSneyders] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5816) Flink runner starts new bundles while disposing operator
[ https://issues.apache.org/jira/browse/BEAM-5816?focusedWorklogId=198253&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198253 ] ASF GitHub Bot logged work on BEAM-5816: Author: ASF GitHub Bot Created on: 13/Feb/19 18:12 Start Date: 13/Feb/19 18:12 Worklog Time Spent: 10m Work Description: mxm commented on pull request #7719: [BEAM-5816] Finish Flink bundles exactly once URL: https://github.com/apache/beam/pull/7719#discussion_r256522068 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java ## @@ -178,7 +179,7 @@ private transient PushedBackElementsHandler> pushedBackElementsHandler; // bundle control - private transient boolean bundleStarted = false; + private transient AtomicBoolean bundleStarted; Review comment: Yes, there is a timer thread for the "finish bundle by time" feature, see `FlinkPipelineOptions#setMaxBundleTimeMills`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198253) Time Spent: 2.5h (was: 2h 20m) > Flink runner starts new bundles while disposing operator > - > > Key: BEAM-5816 > URL: https://issues.apache.org/jira/browse/BEAM-5816 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Micah Wylde >Assignee: Maximilian Michels >Priority: Major > Labels: portability-flink, triaged > Time Spent: 2.5h > Remaining Estimate: 0h > > We sometimes see exceptions when shutting down portable flink pipelines > (either due to cancellation or failure): > {code} > 2018-10-19 15:54:52,905 ERROR > org.apache.flink.streaming.runtime.tasks.StreamTask - Error during > disposal of stream operator. > java.lang.RuntimeException: Failed to finish remote bundle > at > org.apache.beam.runners.flink.translation.wrappers.streaming.ExecutableStageDoFnOperator$SdkHarnessDoFnRunner.finishBundle(ExecutableStageDoFnOperator.java:241) > at > org.apache.beam.runners.flink.metrics.DoFnRunnerWithMetricsUpdate.finishBundle(DoFnRunnerWithMetricsUpdate.java:87) > at > org.apache.beam.runners.core.SimplePushbackSideInputDoFnRunner.finishBundle(SimplePushbackSideInputDoFnRunner.java:118) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator.invokeFinishBundle(DoFnOperator.java:674) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator.dispose(DoFnOperator.java:391) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.ExecutableStageDoFnOperator.dispose(ExecutableStageDoFnOperator.java:166) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:473) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:374) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.IllegalStateException: Already closed. > at > org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:95) > at > org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:251) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.ExecutableStageDoFnOperator$SdkHarnessDoFnRunner.finishBundle(ExecutableStageDoFnOperator.java:238) > ... 9 more > Suppressed: java.lang.IllegalStateException: Processing bundle failed, > TODO: [BEAM-3962] abort bundle. > at > org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:266) > ... 10 more > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6623) Dataflow ValidatesRunner test suite should also exercise ValidatesRunner tests under Python 3.
[ https://issues.apache.org/jira/browse/BEAM-6623?focusedWorklogId=198252&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198252 ] ASF GitHub Bot logged work on BEAM-6623: Author: ASF GitHub Bot Created on: 13/Feb/19 18:08 Start Date: 13/Feb/19 18:08 Worklog Time Spent: 10m Work Description: markflyhigh commented on issue #7825: [BEAM-6623] Exercise ValidatesRunner batch tests in Python 3 URL: https://github.com/apache/beam/pull/7825#issuecomment-462945640 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198252) Time Spent: 1h (was: 50m) > Dataflow ValidatesRunner test suite should also exercise ValidatesRunner > tests under Python 3. > -- > > Key: BEAM-6623 > URL: https://issues.apache.org/jira/browse/BEAM-6623 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Valentyn Tymofieiev >Assignee: Robbe >Priority: Critical > Labels: triaged > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6623) Dataflow ValidatesRunner test suite should also exercise ValidatesRunner tests under Python 3.
[ https://issues.apache.org/jira/browse/BEAM-6623?focusedWorklogId=198250&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198250 ] ASF GitHub Bot logged work on BEAM-6623: Author: ASF GitHub Bot Created on: 13/Feb/19 18:08 Start Date: 13/Feb/19 18:08 Worklog Time Spent: 10m Work Description: markflyhigh commented on issue #7825: [BEAM-6623] Exercise ValidatesRunner batch tests in Python 3 URL: https://github.com/apache/beam/pull/7825#issuecomment-463304817 A java unit test failed Java Precommit, but definitely unrelated to this PR since changes are only in Gradle scripts and doesn't touch any Java code. Do I need to rebase? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198250) Time Spent: 40m (was: 0.5h) > Dataflow ValidatesRunner test suite should also exercise ValidatesRunner > tests under Python 3. > -- > > Key: BEAM-6623 > URL: https://issues.apache.org/jira/browse/BEAM-6623 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Valentyn Tymofieiev >Assignee: Robbe >Priority: Critical > Labels: triaged > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-3310) Push metrics to a backend in an runner agnostic way
[ https://issues.apache.org/jira/browse/BEAM-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767420#comment-16767420 ] Ismaël Mejía commented on BEAM-3310: This ticket was still open because not all of the sub-tasks have been solved. But you received a notification because I updated the core ticket to link to some repeated Jira issues while doing triage. However we found an interesting regression related to this. Since the move to gradle there is a metrics test of the Spark ValidatesRunner kind that it is taking 10 minutes which previously took seconds. Can you please take a look BEAM-6633. > Push metrics to a backend in an runner agnostic way > --- > > Key: BEAM-3310 > URL: https://issues.apache.org/jira/browse/BEAM-3310 > Project: Beam > Issue Type: New Feature > Components: runner-extensions-metrics, sdk-java-core >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Time Spent: 18h 50m > Remaining Estimate: 0h > > The idea is to avoid relying on the runners to provide access to the metrics > (either at the end of the pipeline or while it runs) because they don't have > all the same capabilities towards metrics (e.g. spark runner configures sinks > like csv, graphite or in memory sinks using the spark engine conf). The > target is to push the metrics in the common runner code so that no matter the > chosen runner, a user can get his metrics out of beam. > Here is the link to the discussion thread on the dev ML: > https://lists.apache.org/thread.html/01a80d62f2df6b84bfa41f05e15fda900178f882877c294fed8be91e@%3Cdev.beam.apache.org%3E > And the design doc: > https://s.apache.org/runner_independent_metrics_extraction -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6650) FlinkRunner fails to checkpoint elements emitted during finishBundle
[ https://issues.apache.org/jira/browse/BEAM-6650?focusedWorklogId=198249&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198249 ] ASF GitHub Bot logged work on BEAM-6650: Author: ASF GitHub Bot Created on: 13/Feb/19 18:06 Start Date: 13/Feb/19 18:06 Worklog Time Spent: 10m Work Description: mxm commented on issue #7810: [BEAM-6650] Add bundle test with checkpointing for keyed processing URL: https://github.com/apache/beam/pull/7810#issuecomment-463304417 Run Flink ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198249) Time Spent: 1.5h (was: 1h 20m) > FlinkRunner fails to checkpoint elements emitted during finishBundle > > > Key: BEAM-6650 > URL: https://issues.apache.org/jira/browse/BEAM-6650 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.11.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Elements emitted during the finalizeBundle call in snapshopState are lost > after the pipeline is restored. This only happens when the operator is keyed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5816) Flink runner starts new bundles while disposing operator
[ https://issues.apache.org/jira/browse/BEAM-5816?focusedWorklogId=198244&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198244 ] ASF GitHub Bot logged work on BEAM-5816: Author: ASF GitHub Bot Created on: 13/Feb/19 18:00 Start Date: 13/Feb/19 18:00 Worklog Time Spent: 10m Work Description: tweise commented on issue #7719: [BEAM-5816] Finish Flink bundles exactly once URL: https://github.com/apache/beam/pull/7719#issuecomment-463302036 I think there are two things we want to accomplish: 1) After an exception or cancel, we don't want any more "normal" processing (looks like this is now addressed) 2) Best effort resource cleanup for user code (not sure about that one, would this happen after an exception)? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198244) Time Spent: 2h 20m (was: 2h 10m) > Flink runner starts new bundles while disposing operator > - > > Key: BEAM-5816 > URL: https://issues.apache.org/jira/browse/BEAM-5816 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Micah Wylde >Assignee: Maximilian Michels >Priority: Major > Labels: portability-flink, triaged > Time Spent: 2h 20m > Remaining Estimate: 0h > > We sometimes see exceptions when shutting down portable flink pipelines > (either due to cancellation or failure): > {code} > 2018-10-19 15:54:52,905 ERROR > org.apache.flink.streaming.runtime.tasks.StreamTask - Error during > disposal of stream operator. > java.lang.RuntimeException: Failed to finish remote bundle > at > org.apache.beam.runners.flink.translation.wrappers.streaming.ExecutableStageDoFnOperator$SdkHarnessDoFnRunner.finishBundle(ExecutableStageDoFnOperator.java:241) > at > org.apache.beam.runners.flink.metrics.DoFnRunnerWithMetricsUpdate.finishBundle(DoFnRunnerWithMetricsUpdate.java:87) > at > org.apache.beam.runners.core.SimplePushbackSideInputDoFnRunner.finishBundle(SimplePushbackSideInputDoFnRunner.java:118) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator.invokeFinishBundle(DoFnOperator.java:674) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator.dispose(DoFnOperator.java:391) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.ExecutableStageDoFnOperator.dispose(ExecutableStageDoFnOperator.java:166) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:473) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:374) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.IllegalStateException: Already closed. > at > org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:95) > at > org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:251) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.ExecutableStageDoFnOperator$SdkHarnessDoFnRunner.finishBundle(ExecutableStageDoFnOperator.java:238) > ... 9 more > Suppressed: java.lang.IllegalStateException: Processing bundle failed, > TODO: [BEAM-3962] abort bundle. > at > org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:266) > ... 10 more > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-5816) Flink runner starts new bundles while disposing operator
[ https://issues.apache.org/jira/browse/BEAM-5816?focusedWorklogId=198241&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198241 ] ASF GitHub Bot logged work on BEAM-5816: Author: ASF GitHub Bot Created on: 13/Feb/19 17:57 Start Date: 13/Feb/19 17:57 Worklog Time Spent: 10m Work Description: tweise commented on pull request #7719: [BEAM-5816] Finish Flink bundles exactly once URL: https://github.com/apache/beam/pull/7719#discussion_r256516130 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java ## @@ -178,7 +179,7 @@ private transient PushedBackElementsHandler> pushedBackElementsHandler; // bundle control - private transient boolean bundleStarted = false; + private transient AtomicBoolean bundleStarted; Review comment: Is there a scenario of multiple threads accessing this? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198241) Time Spent: 2h 10m (was: 2h) > Flink runner starts new bundles while disposing operator > - > > Key: BEAM-5816 > URL: https://issues.apache.org/jira/browse/BEAM-5816 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Micah Wylde >Assignee: Maximilian Michels >Priority: Major > Labels: portability-flink, triaged > Time Spent: 2h 10m > Remaining Estimate: 0h > > We sometimes see exceptions when shutting down portable flink pipelines > (either due to cancellation or failure): > {code} > 2018-10-19 15:54:52,905 ERROR > org.apache.flink.streaming.runtime.tasks.StreamTask - Error during > disposal of stream operator. > java.lang.RuntimeException: Failed to finish remote bundle > at > org.apache.beam.runners.flink.translation.wrappers.streaming.ExecutableStageDoFnOperator$SdkHarnessDoFnRunner.finishBundle(ExecutableStageDoFnOperator.java:241) > at > org.apache.beam.runners.flink.metrics.DoFnRunnerWithMetricsUpdate.finishBundle(DoFnRunnerWithMetricsUpdate.java:87) > at > org.apache.beam.runners.core.SimplePushbackSideInputDoFnRunner.finishBundle(SimplePushbackSideInputDoFnRunner.java:118) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator.invokeFinishBundle(DoFnOperator.java:674) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.DoFnOperator.dispose(DoFnOperator.java:391) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.ExecutableStageDoFnOperator.dispose(ExecutableStageDoFnOperator.java:166) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:473) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:374) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.IllegalStateException: Already closed. > at > org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:95) > at > org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:251) > at > org.apache.beam.runners.flink.translation.wrappers.streaming.ExecutableStageDoFnOperator$SdkHarnessDoFnRunner.finishBundle(ExecutableStageDoFnOperator.java:238) > ... 9 more > Suppressed: java.lang.IllegalStateException: Processing bundle failed, > TODO: [BEAM-3962] abort bundle. > at > org.apache.beam.runners.fnexecution.control.SdkHarnessClient$ActiveBundle.close(SdkHarnessClient.java:266) > ... 10 more > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-2185) KafkaIO bounded source
[ https://issues.apache.org/jira/browse/BEAM-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767411#comment-16767411 ] Raghu Angadi commented on BEAM-2185: Yes. The whole batch use case should document and/or log a big caveat listing all these concerns. When a user sets `enable.auto.commit = true`, the user is essentially introducing parallel checkpoint-like functionality outside of Apache Beam control. I think as with 'commitOffsetsInFinalize()' it can help with resuming from a reasonable point on restart, but does not guarantee exactly-once (in fact only 'update' guarantees exact-once in Beam, no restart of a pipeline does). > KafkaIO bounded source > -- > > Key: BEAM-2185 > URL: https://issues.apache.org/jira/browse/BEAM-2185 > Project: Beam > Issue Type: New Feature > Components: io-java-kafka >Reporter: Raghu Angadi >Priority: Major > > KafkaIO could be a useful source for batch applications as well. It could > implement a bounded source. The primary question is how the bounds are > specified. > One option : Source specifies a time period (say 9am-10am), and KafkaIO > fetches appropriate start and end offsets based on time-index in Kafka. This > would suite many batch applications that are launched on a scheduled. > Another option is to always read till the end and commit the offsets to > Kafka. Handling failures and multiple runs of a task might be complicated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-5375) KafkaIO reader should handle runtime exceptions kafka client
[ https://issues.apache.org/jira/browse/BEAM-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi closed BEAM-5375. -- Resolution: Fixed > KafkaIO reader should handle runtime exceptions kafka client > > > Key: BEAM-5375 > URL: https://issues.apache.org/jira/browse/BEAM-5375 > Project: Beam > Issue Type: Bug > Components: io-java-kafka >Affects Versions: 2.7.0 >Reporter: Raghu Angadi >Assignee: Raghu Angadi >Priority: Major > Labels: triaged > Fix For: 2.7.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > KafkaIO reader might stop reading from Kafka without any explicit error > message if KafkaConsumer throws a runtime exception while polling for > messages. One of the Dataflow customers encountered this issue (see [user@ > thread|https://lists.apache.org/thread.html/c0cf8f45f567a0623592e2d8340f5288e3e774b59bca985aec410a81@%3Cuser.beam.apache.org%3E])] > 'consumerPollThread()' in KafkaIO deliberately avoided catching runtime > exceptions. It shoud handle it.. stuff happens at runtime. > It should result in 'IOException' from start()/advance(). The runners will > handle properly reporting and closing the readers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-5375) KafkaIO reader should handle runtime exceptions kafka client
[ https://issues.apache.org/jira/browse/BEAM-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated BEAM-5375: --- Fix Version/s: 2.7.0 > KafkaIO reader should handle runtime exceptions kafka client > > > Key: BEAM-5375 > URL: https://issues.apache.org/jira/browse/BEAM-5375 > Project: Beam > Issue Type: Bug > Components: io-java-kafka >Affects Versions: 2.7.0 >Reporter: Raghu Angadi >Assignee: Raghu Angadi >Priority: Major > Labels: triaged > Fix For: 2.7.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > KafkaIO reader might stop reading from Kafka without any explicit error > message if KafkaConsumer throws a runtime exception while polling for > messages. One of the Dataflow customers encountered this issue (see [user@ > thread|https://lists.apache.org/thread.html/c0cf8f45f567a0623592e2d8340f5288e3e774b59bca985aec410a81@%3Cuser.beam.apache.org%3E])] > 'consumerPollThread()' in KafkaIO deliberately avoided catching runtime > exceptions. It shoud handle it.. stuff happens at runtime. > It should result in 'IOException' from start()/advance(). The runners will > handle properly reporting and closing the readers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-2466) Add Kafka Streams runner
[ https://issues.apache.org/jira/browse/BEAM-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766129#comment-16766129 ] Alexey Romanenko edited comment on BEAM-2466 at 2/13/19 5:31 PM: - [~teabot] 1) I'm not very familiar with KStreams API. Does it require to have Kafka topic as input in any case? Is KStreams source heavily linked with Kafka topics? 2) "Only streaming support" - well, this is kind of against initial Beam idea of easily switching between batch and streaming pipelines (since it won't be possible to use KafkaRunner for other pipelines, written in traditional Beam way). So, it's quite serious limitation, imho, but, perhaps, there could be some workarounds for that. was (Author: aromanenko): [~teabot] 1) I'm not very familiar with KStreams API. Does it require to have Kafka topic as input in any case? Does KStreams source heavily linked with Kafka topics? 2) "Only streaming support" - well, this is kind of against initial Beam idea of easily switching between batch and streaming pipelines (since it won't be possible to use KafkaRunner for other pipelines, written in traditional Beam way). So, it's quite serious limitation, imho, but, perhaps, there could be some workarounds for that. > Add Kafka Streams runner > > > Key: BEAM-2466 > URL: https://issues.apache.org/jira/browse/BEAM-2466 > Project: Beam > Issue Type: Wish > Components: runner-ideas >Reporter: Lorand Peter Kasler >Assignee: Kai Jiang >Priority: Minor > Labels: triaged > > Kafka Streams (https://kafka.apache.org/documentation/streams) has more and > more features that could make it a viable candidate for a streaming runner. > It uses DataFlow-like model -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6664) Temporarily convert dataflowKmsKey flag to experimental flag for Dataflow
Udi Meiri created BEAM-6664: --- Summary: Temporarily convert dataflowKmsKey flag to experimental flag for Dataflow Key: BEAM-6664 URL: https://issues.apache.org/jira/browse/BEAM-6664 Project: Beam Issue Type: Improvement Components: io-java-gcp, sdk-py-core Reporter: Udi Meiri Assignee: Udi Meiri -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-2466) Add Kafka Streams runner
[ https://issues.apache.org/jira/browse/BEAM-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767394#comment-16767394 ] Kai Jiang commented on BEAM-2466: - [~teabot] IMHO, There should not exist a 'batch processing' concept for KStreams. I think we needs limit Beam only in streaming mode for KafkaStreamsRunner. [~aromanenko] I think it requires kafka topic as streaming input. Internally, KStream source utilized kafka topics as input. PoC branch: https://github.com/vectorijk/beam/tree/kafka-stream Welcome any ideas! > Add Kafka Streams runner > > > Key: BEAM-2466 > URL: https://issues.apache.org/jira/browse/BEAM-2466 > Project: Beam > Issue Type: Wish > Components: runner-ideas >Reporter: Lorand Peter Kasler >Assignee: Kai Jiang >Priority: Minor > Labels: triaged > > Kafka Streams (https://kafka.apache.org/documentation/streams) has more and > more features that could make it a viable candidate for a streaming runner. > It uses DataFlow-like model -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6650) FlinkRunner fails to checkpoint elements emitted during finishBundle
[ https://issues.apache.org/jira/browse/BEAM-6650?focusedWorklogId=198205&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198205 ] ASF GitHub Bot logged work on BEAM-6650: Author: ASF GitHub Bot Created on: 13/Feb/19 16:50 Start Date: 13/Feb/19 16:50 Worklog Time Spent: 10m Work Description: mxm commented on issue #7810: [BEAM-6650] Add bundle test with checkpointing for keyed processing URL: https://github.com/apache/beam/pull/7810#issuecomment-463275855 Run Flink ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198205) Time Spent: 1h 10m (was: 1h) > FlinkRunner fails to checkpoint elements emitted during finishBundle > > > Key: BEAM-6650 > URL: https://issues.apache.org/jira/browse/BEAM-6650 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.11.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Elements emitted during the finalizeBundle call in snapshopState are lost > after the pipeline is restored. This only happens when the operator is keyed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6650) FlinkRunner fails to checkpoint elements emitted during finishBundle
[ https://issues.apache.org/jira/browse/BEAM-6650?focusedWorklogId=198206&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198206 ] ASF GitHub Bot logged work on BEAM-6650: Author: ASF GitHub Bot Created on: 13/Feb/19 16:50 Start Date: 13/Feb/19 16:50 Worklog Time Spent: 10m Work Description: mxm commented on issue #7810: [BEAM-6650] Add bundle test with checkpointing for keyed processing URL: https://github.com/apache/beam/pull/7810#issuecomment-463275894 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198206) Time Spent: 1h 20m (was: 1h 10m) > FlinkRunner fails to checkpoint elements emitted during finishBundle > > > Key: BEAM-6650 > URL: https://issues.apache.org/jira/browse/BEAM-6650 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.11.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Elements emitted during the finalizeBundle call in snapshopState are lost > after the pipeline is restored. This only happens when the operator is keyed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6650) FlinkRunner fails to checkpoint elements emitted during finishBundle
[ https://issues.apache.org/jira/browse/BEAM-6650?focusedWorklogId=198204&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198204 ] ASF GitHub Bot logged work on BEAM-6650: Author: ASF GitHub Bot Created on: 13/Feb/19 16:50 Start Date: 13/Feb/19 16:50 Worklog Time Spent: 10m Work Description: mxm commented on issue #7810: [BEAM-6650] Add bundle test with checkpointing for keyed processing URL: https://github.com/apache/beam/pull/7810#issuecomment-463275837 Batch execution fails for ValidatesRunner, due to memory issues on jenkins. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198204) Time Spent: 1h (was: 50m) > FlinkRunner fails to checkpoint elements emitted during finishBundle > > > Key: BEAM-6650 > URL: https://issues.apache.org/jira/browse/BEAM-6650 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.11.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Elements emitted during the finalizeBundle call in snapshopState are lost > after the pipeline is restored. This only happens when the operator is keyed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4461) Create a library of useful transforms that use schemas
[ https://issues.apache.org/jira/browse/BEAM-4461?focusedWorklogId=198196&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198196 ] ASF GitHub Bot logged work on BEAM-4461: Author: ASF GitHub Bot Created on: 13/Feb/19 16:26 Start Date: 13/Feb/19 16:26 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #7353: [BEAM-4461] Support inner and outer style joins in CoGroup. URL: https://github.com/apache/beam/pull/7353#issuecomment-463265969 @kanterov do you have any time to help review this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198196) Time Spent: 19h 20m (was: 19h 10m) > Create a library of useful transforms that use schemas > -- > > Key: BEAM-4461 > URL: https://issues.apache.org/jira/browse/BEAM-4461 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Labels: triaged > Time Spent: 19h 20m > Remaining Estimate: 0h > > e.g. JoinBy(fields). Project, Filter, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4076) Schema followups
[ https://issues.apache.org/jira/browse/BEAM-4076?focusedWorklogId=198194&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198194 ] ASF GitHub Bot logged work on BEAM-4076: Author: ASF GitHub Bot Created on: 13/Feb/19 16:25 Start Date: 13/Feb/19 16:25 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #7635: [BEAM-4076] Generalize schema inputs to ParDo URL: https://github.com/apache/beam/pull/7635#issuecomment-463265598 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198194) Time Spent: 18h 40m (was: 18.5h) > Schema followups > > > Key: BEAM-4076 > URL: https://issues.apache.org/jira/browse/BEAM-4076 > Project: Beam > Issue Type: Improvement > Components: beam-model, dsl-sql, sdk-java-core >Reporter: Kenneth Knowles >Priority: Major > Time Spent: 18h 40m > Remaining Estimate: 0h > > This umbrella bug contains subtasks with followups for Beam schemas, which > were moved from SQL to the core Java SDK and made to be type-name-based > rather than coder based. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-4076) Schema followups
[ https://issues.apache.org/jira/browse/BEAM-4076?focusedWorklogId=198195&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-198195 ] ASF GitHub Bot logged work on BEAM-4076: Author: ASF GitHub Bot Created on: 13/Feb/19 16:25 Start Date: 13/Feb/19 16:25 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #7635: [BEAM-4076] Generalize schema inputs to ParDo URL: https://github.com/apache/beam/pull/7635#issuecomment-463265698 @kennknowles any comments on this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 198195) Time Spent: 18h 50m (was: 18h 40m) > Schema followups > > > Key: BEAM-4076 > URL: https://issues.apache.org/jira/browse/BEAM-4076 > Project: Beam > Issue Type: Improvement > Components: beam-model, dsl-sql, sdk-java-core >Reporter: Kenneth Knowles >Priority: Major > Time Spent: 18h 50m > Remaining Estimate: 0h > > This umbrella bug contains subtasks with followups for Beam schemas, which > were moved from SQL to the core Java SDK and made to be type-name-based > rather than coder based. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6663) SerializablePipelineOptions should override toString()
Etienne Chauchot created BEAM-6663: -- Summary: SerializablePipelineOptions should override toString() Key: BEAM-6663 URL: https://issues.apache.org/jira/browse/BEAM-6663 Project: Beam Issue Type: Improvement Components: runner-core Reporter: Etienne Chauchot Assignee: Etienne Chauchot Many runners, store both options and SerializableOptions in their context whereas SerializableOptions stores both Options and json serialized Options. Options can be obtain by SerializableOptions.get() and json cannot be obtained. I propose to use only SerializableOptions inside the runners (as they all have serialization issues) and simply override toString to get the json version. -- This message was sent by Atlassian JIRA (v7.6.3#76005)