[jira] [Work logged] (BEAM-2857) Create FileIO in Python
[ https://issues.apache.org/jira/browse/BEAM-2857?focusedWorklogId=254222&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254222 ] ASF GitHub Bot logged work on BEAM-2857: Author: ASF GitHub Bot Created on: 05/Jun/19 06:40 Start Date: 05/Jun/19 06:40 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #8394: [BEAM-2857] Implementing WriteToFiles transform for fileio (Python) URL: https://github.com/apache/beam/pull/8394 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254222) Time Spent: 7h 20m (was: 7h 10m) > Create FileIO in Python > --- > > Key: BEAM-2857 > URL: https://issues.apache.org/jira/browse/BEAM-2857 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Eugene Kirpichov >Assignee: Pablo Estrada >Priority: Major > Labels: gsoc, gsoc2019, mentor > Time Spent: 7h 20m > Remaining Estimate: 0h > > Beam Java has a FileIO with operations: match()/matchAll(), readMatches(), > which together cover the majority of needs for general-purpose file > ingestion. Beam Python should have something similar. > An early design document for this: https://s.apache.org/fileio-beam-python -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6620) Do not relocate guava
[ https://issues.apache.org/jira/browse/BEAM-6620?focusedWorklogId=254207&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254207 ] ASF GitHub Bot logged work on BEAM-6620: Author: ASF GitHub Bot Created on: 05/Jun/19 05:56 Start Date: 05/Jun/19 05:56 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #8762: [BEAM-6620, BEAM-3606] Stop shading by default. URL: https://github.com/apache/beam/pull/8762#issuecomment-498948924 Run Dataflow ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254207) Time Spent: 1h 40m (was: 1.5h) > Do not relocate guava > - > > Key: BEAM-6620 > URL: https://issues.apache.org/jira/browse/BEAM-6620 > Project: Beam > Issue Type: Sub-task > Components: build-system >Affects Versions: 2.11.0 >Reporter: Ismaël Mejía >Assignee: Luke Cwik >Priority: Critical > Time Spent: 1h 40m > Remaining Estimate: 0h > > Once guava use is vendored we have to remove the automatic relocation of > guava so modules who use guava in their API (e.g. Cassandra or Kinesis) can > work correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6620) Do not relocate guava
[ https://issues.apache.org/jira/browse/BEAM-6620?focusedWorklogId=254209&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254209 ] ASF GitHub Bot logged work on BEAM-6620: Author: ASF GitHub Bot Created on: 05/Jun/19 05:56 Start Date: 05/Jun/19 05:56 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #8762: [BEAM-6620, BEAM-3606] Stop shading by default. URL: https://github.com/apache/beam/pull/8762#issuecomment-498949048 Run Spark ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254209) Time Spent: 2h (was: 1h 50m) > Do not relocate guava > - > > Key: BEAM-6620 > URL: https://issues.apache.org/jira/browse/BEAM-6620 > Project: Beam > Issue Type: Sub-task > Components: build-system >Affects Versions: 2.11.0 >Reporter: Ismaël Mejía >Assignee: Luke Cwik >Priority: Critical > Time Spent: 2h > Remaining Estimate: 0h > > Once guava use is vendored we have to remove the automatic relocation of > guava so modules who use guava in their API (e.g. Cassandra or Kinesis) can > work correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6620) Do not relocate guava
[ https://issues.apache.org/jira/browse/BEAM-6620?focusedWorklogId=254208&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254208 ] ASF GitHub Bot logged work on BEAM-6620: Author: ASF GitHub Bot Created on: 05/Jun/19 05:56 Start Date: 05/Jun/19 05:56 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #8762: [BEAM-6620, BEAM-3606] Stop shading by default. URL: https://github.com/apache/beam/pull/8762#issuecomment-498948565 Run Dataflow PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254208) Time Spent: 1h 50m (was: 1h 40m) > Do not relocate guava > - > > Key: BEAM-6620 > URL: https://issues.apache.org/jira/browse/BEAM-6620 > Project: Beam > Issue Type: Sub-task > Components: build-system >Affects Versions: 2.11.0 >Reporter: Ismaël Mejía >Assignee: Luke Cwik >Priority: Critical > Time Spent: 1h 50m > Remaining Estimate: 0h > > Once guava use is vendored we have to remove the automatic relocation of > guava so modules who use guava in their API (e.g. Cassandra or Kinesis) can > work correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6620) Do not relocate guava
[ https://issues.apache.org/jira/browse/BEAM-6620?focusedWorklogId=254206&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254206 ] ASF GitHub Bot logged work on BEAM-6620: Author: ASF GitHub Bot Created on: 05/Jun/19 05:55 Start Date: 05/Jun/19 05:55 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #8762: [BEAM-6620, BEAM-3606] Stop shading by default. URL: https://github.com/apache/beam/pull/8762#issuecomment-498948815 Run Flink ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254206) Time Spent: 1.5h (was: 1h 20m) > Do not relocate guava > - > > Key: BEAM-6620 > URL: https://issues.apache.org/jira/browse/BEAM-6620 > Project: Beam > Issue Type: Sub-task > Components: build-system >Affects Versions: 2.11.0 >Reporter: Ismaël Mejía >Assignee: Luke Cwik >Priority: Critical > Time Spent: 1.5h > Remaining Estimate: 0h > > Once guava use is vendored we have to remove the automatic relocation of > guava so modules who use guava in their API (e.g. Cassandra or Kinesis) can > work correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6620) Do not relocate guava
[ https://issues.apache.org/jira/browse/BEAM-6620?focusedWorklogId=254203&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254203 ] ASF GitHub Bot logged work on BEAM-6620: Author: ASF GitHub Bot Created on: 05/Jun/19 05:55 Start Date: 05/Jun/19 05:55 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #8762: [BEAM-6620, BEAM-3606] Stop shading by default. URL: https://github.com/apache/beam/pull/8762#issuecomment-498948610 Run Spark PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254203) Time Spent: 1h 10m (was: 1h) > Do not relocate guava > - > > Key: BEAM-6620 > URL: https://issues.apache.org/jira/browse/BEAM-6620 > Project: Beam > Issue Type: Sub-task > Components: build-system >Affects Versions: 2.11.0 >Reporter: Ismaël Mejía >Assignee: Luke Cwik >Priority: Critical > Time Spent: 1h 10m > Remaining Estimate: 0h > > Once guava use is vendored we have to remove the automatic relocation of > guava so modules who use guava in their API (e.g. Cassandra or Kinesis) can > work correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6620) Do not relocate guava
[ https://issues.apache.org/jira/browse/BEAM-6620?focusedWorklogId=254205&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254205 ] ASF GitHub Bot logged work on BEAM-6620: Author: ASF GitHub Bot Created on: 05/Jun/19 05:55 Start Date: 05/Jun/19 05:55 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #8762: [BEAM-6620, BEAM-3606] Stop shading by default. URL: https://github.com/apache/beam/pull/8762#issuecomment-498948696 Run Flink PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254205) Time Spent: 1h 20m (was: 1h 10m) > Do not relocate guava > - > > Key: BEAM-6620 > URL: https://issues.apache.org/jira/browse/BEAM-6620 > Project: Beam > Issue Type: Sub-task > Components: build-system >Affects Versions: 2.11.0 >Reporter: Ismaël Mejía >Assignee: Luke Cwik >Priority: Critical > Time Spent: 1h 20m > Remaining Estimate: 0h > > Once guava use is vendored we have to remove the automatic relocation of > guava so modules who use guava in their API (e.g. Cassandra or Kinesis) can > work correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6620) Do not relocate guava
[ https://issues.apache.org/jira/browse/BEAM-6620?focusedWorklogId=254202&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254202 ] ASF GitHub Bot logged work on BEAM-6620: Author: ASF GitHub Bot Created on: 05/Jun/19 05:55 Start Date: 05/Jun/19 05:55 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #8762: [BEAM-6620, BEAM-3606] Stop shading by default. URL: https://github.com/apache/beam/pull/8762#issuecomment-498948565 Run Dataflow PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254202) Time Spent: 1h (was: 50m) > Do not relocate guava > - > > Key: BEAM-6620 > URL: https://issues.apache.org/jira/browse/BEAM-6620 > Project: Beam > Issue Type: Sub-task > Components: build-system >Affects Versions: 2.11.0 >Reporter: Ismaël Mejía >Assignee: Luke Cwik >Priority: Critical > Time Spent: 1h > Remaining Estimate: 0h > > Once guava use is vendored we have to remove the automatic relocation of > guava so modules who use guava in their API (e.g. Cassandra or Kinesis) can > work correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6620) Do not relocate guava
[ https://issues.apache.org/jira/browse/BEAM-6620?focusedWorklogId=254201&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254201 ] ASF GitHub Bot logged work on BEAM-6620: Author: ASF GitHub Bot Created on: 05/Jun/19 05:55 Start Date: 05/Jun/19 05:55 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #8762: [BEAM-6620, BEAM-3606] Stop shading by default. URL: https://github.com/apache/beam/pull/8762#issuecomment-498948696 Run Flink PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254201) Time Spent: 50m (was: 40m) > Do not relocate guava > - > > Key: BEAM-6620 > URL: https://issues.apache.org/jira/browse/BEAM-6620 > Project: Beam > Issue Type: Sub-task > Components: build-system >Affects Versions: 2.11.0 >Reporter: Ismaël Mejía >Assignee: Luke Cwik >Priority: Critical > Time Spent: 50m > Remaining Estimate: 0h > > Once guava use is vendored we have to remove the automatic relocation of > guava so modules who use guava in their API (e.g. Cassandra or Kinesis) can > work correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6620) Do not relocate guava
[ https://issues.apache.org/jira/browse/BEAM-6620?focusedWorklogId=254200&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254200 ] ASF GitHub Bot logged work on BEAM-6620: Author: ASF GitHub Bot Created on: 05/Jun/19 05:54 Start Date: 05/Jun/19 05:54 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #8762: [BEAM-6620, BEAM-3606] Stop shading by default. URL: https://github.com/apache/beam/pull/8762#issuecomment-498948610 Run Spark PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254200) Time Spent: 40m (was: 0.5h) > Do not relocate guava > - > > Key: BEAM-6620 > URL: https://issues.apache.org/jira/browse/BEAM-6620 > Project: Beam > Issue Type: Sub-task > Components: build-system >Affects Versions: 2.11.0 >Reporter: Ismaël Mejía >Assignee: Luke Cwik >Priority: Critical > Time Spent: 40m > Remaining Estimate: 0h > > Once guava use is vendored we have to remove the automatic relocation of > guava so modules who use guava in their API (e.g. Cassandra or Kinesis) can > work correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6620) Do not relocate guava
[ https://issues.apache.org/jira/browse/BEAM-6620?focusedWorklogId=254198&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254198 ] ASF GitHub Bot logged work on BEAM-6620: Author: ASF GitHub Bot Created on: 05/Jun/19 05:54 Start Date: 05/Jun/19 05:54 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #8762: [BEAM-6620, BEAM-3606] Stop shading by default. URL: https://github.com/apache/beam/pull/8762#issuecomment-498948503 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254198) Time Spent: 20m (was: 10m) > Do not relocate guava > - > > Key: BEAM-6620 > URL: https://issues.apache.org/jira/browse/BEAM-6620 > Project: Beam > Issue Type: Sub-task > Components: build-system >Affects Versions: 2.11.0 >Reporter: Ismaël Mejía >Assignee: Luke Cwik >Priority: Critical > Time Spent: 20m > Remaining Estimate: 0h > > Once guava use is vendored we have to remove the automatic relocation of > guava so modules who use guava in their API (e.g. Cassandra or Kinesis) can > work correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6620) Do not relocate guava
[ https://issues.apache.org/jira/browse/BEAM-6620?focusedWorklogId=254199&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254199 ] ASF GitHub Bot logged work on BEAM-6620: Author: ASF GitHub Bot Created on: 05/Jun/19 05:54 Start Date: 05/Jun/19 05:54 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #8762: [BEAM-6620, BEAM-3606] Stop shading by default. URL: https://github.com/apache/beam/pull/8762#issuecomment-498948565 Run Data flow PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254199) Time Spent: 0.5h (was: 20m) > Do not relocate guava > - > > Key: BEAM-6620 > URL: https://issues.apache.org/jira/browse/BEAM-6620 > Project: Beam > Issue Type: Sub-task > Components: build-system >Affects Versions: 2.11.0 >Reporter: Ismaël Mejía >Assignee: Luke Cwik >Priority: Critical > Time Spent: 0.5h > Remaining Estimate: 0h > > Once guava use is vendored we have to remove the automatic relocation of > guava so modules who use guava in their API (e.g. Cassandra or Kinesis) can > work correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7463) Bigquery IO ITs are flaky: incorrect checksum
[ https://issues.apache.org/jira/browse/BEAM-7463?focusedWorklogId=254191&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254191 ] ASF GitHub Bot logged work on BEAM-7463: Author: ASF GitHub Bot Created on: 05/Jun/19 05:00 Start Date: 05/Jun/19 05:00 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #8751: [BEAM-7463] parallelize BQ IT tests URL: https://github.com/apache/beam/pull/8751#issuecomment-498938989 Hi @Juta, could you please comment which variables are being shared by test scenarios? This change should improve test parallelism in test modules that have multiple test cases, however I believe it does not remove side-effects in existing test scenarios: the tests you modify use test-case level fixtures (`setUp()`). For every test method a new instance of TestCase is created, and setUp will be called on this instance. From unittest docs: https://docs.python.org/3/library/unittest.html#class-and-module-fixtures > A new TestCase instance is created as a unique test fixture used to execute each individual test method. This change would indeed take effect if we used module-level or class-level fixtures (e.g. `setUpClass()`): with `_multiprocesses_can_split_` module-level fixtures would be called multiple times. However I don't see usages of those types of fixtures in integration tests, and we should not use them since they can create side-effects. So it is not clear to me which variables are shared in the tests, that will not be shared with this change. If you think I am missing something, could you post a simplified code-snippet that demonstrates a potential race/side effect in existing tests? Thank you. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254191) Time Spent: 40m (was: 0.5h) > Bigquery IO ITs are flaky: incorrect checksum > - > > Key: BEAM-7463 > URL: https://issues.apache.org/jira/browse/BEAM-7463 > Project: Beam > Issue Type: Bug > Components: io-python-gcp >Reporter: Valentyn Tymofieiev >Assignee: Juta Staes >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > {noformat} > 15:03:38 FAIL: test_big_query_new_types > (apache_beam.io.gcp.big_query_query_to_table_it_test.BigQueryQueryToTableIT) > 15:03:38 > -- > 15:03:38 Traceback (most recent call last): > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py", > line 211, in test_big_query_new_types > 15:03:38 big_query_query_to_table_pipeline.run_bq_pipeline(options) > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/io/gcp/big_query_query_to_table_pipeline.py", > line 82, in run_bq_pipeline > 15:03:38 result = p.run() > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/testing/test_pipeline.py", > line 107, in run > 15:03:38 else test_runner_api)) > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/pipeline.py", > line 406, in run > 15:03:38 self._options).run(False) > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/pipeline.py", > line 419, in run > 15:03:38 return self.runner.run_pipeline(self, self._options) > 15:03:38 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python3_Verify/src/sdks/python/apache_beam/runners/direct/test_direct_runner.py", > line 51, in run_pipeline > 15:03:38 hc_assert_that(self.result, pickler.loads(on_success_matcher)) > 15:03:38 AssertionError: > 15:03:38 Expected: (Test pipeline expected terminated in state: DONE and > Expected checksum is 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214) > 15:03:38 but: Expected checksum is > 24de460c4d344a4b77ccc4cc1acb7b7ffc11a214 Actual checksum is > da39a3ee5e6b4b0d3255bfef95601890afd80709 > {noformat} > [~Juta] could this be caused by changes to Bigquery matcher? > https://github.com/apache/beam/pull/8621/files#diff-f1ec7e3a3e7e2e5082ddb7043954c108R134 > > cc: [~pabloem] [~chamikara] [~apilloud] > A recent postcommit run has BQ failures in other tests as well: > https://builds.apach
[jira] [Commented] (BEAM-7493) beam-sdks-testing-nexmark produces corrupt pom.xml
[ https://issues.apache.org/jira/browse/BEAM-7493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856347#comment-16856347 ] Kenneth Knowles commented on BEAM-7493: --- Do you have time to look at this? > beam-sdks-testing-nexmark produces corrupt pom.xml > -- > > Key: BEAM-7493 > URL: https://issues.apache.org/jira/browse/BEAM-7493 > Project: Beam > Issue Type: Bug > Components: examples-nexmark >Affects Versions: 2.13.0 >Reporter: Kenneth Knowles >Assignee: Michael Luckey >Priority: Blocker > > Steps to reproduce: > {code} > ./gradlew -Prelease -Ppublishing > :sdks:java:testing:nexmark:publishMavenJavaPublicationTotestPublicationLocalRepository > {code} > and if you inspect the pom you will see > {code} > > org.apache.beam > beam-sdks-java-testing-test-utils > ... > > It appears to be constructed from the directories, but this (and also > load-tests) in the {{testing}} directory have their archive base name > overridden: > https://github.com/apache/beam/blob/master/sdks/java/testing/test-utils/build.gradle#L20 > So after publication, this will result in a dependency not found error. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7493) beam-sdks-testing-nexmark produces corrupt pom.xml
Kenneth Knowles created BEAM-7493: - Summary: beam-sdks-testing-nexmark produces corrupt pom.xml Key: BEAM-7493 URL: https://issues.apache.org/jira/browse/BEAM-7493 Project: Beam Issue Type: Bug Components: examples-nexmark Affects Versions: 2.13.0 Reporter: Kenneth Knowles Assignee: Michael Luckey Steps to reproduce: {code} ./gradlew -Prelease -Ppublishing :sdks:java:testing:nexmark:publishMavenJavaPublicationTotestPublicationLocalRepository {code} and if you inspect the pom you will see {code} org.apache.beam beam-sdks-java-testing-test-utils ... It appears to be constructed from the directories, but this (and also load-tests) in the {{testing}} directory have their archive base name overridden: https://github.com/apache/beam/blob/master/sdks/java/testing/test-utils/build.gradle#L20 So after publication, this will result in a dependency not found error. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7493) beam-sdks-testing-nexmark produces corrupt pom.xml
[ https://issues.apache.org/jira/browse/BEAM-7493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-7493: -- Description: Steps to reproduce: {code} ./gradlew -Prelease -Ppublishing :sdks:java:testing:nexmark:publishMavenJavaPublicationTotestPublicationLocalRepository {code} and if you inspect the pom you will see {code} org.apache.beam beam-sdks-java-testing-test-utils ... {code} It appears to be constructed from the directories, but this (and also load-tests) in the {{testing}} directory have their archive base name overridden: https://github.com/apache/beam/blob/master/sdks/java/testing/test-utils/build.gradle#L20 So after publication, this will result in a dependency not found error. was: Steps to reproduce: {code} ./gradlew -Prelease -Ppublishing :sdks:java:testing:nexmark:publishMavenJavaPublicationTotestPublicationLocalRepository {code} and if you inspect the pom you will see {code} org.apache.beam beam-sdks-java-testing-test-utils ... It appears to be constructed from the directories, but this (and also load-tests) in the {{testing}} directory have their archive base name overridden: https://github.com/apache/beam/blob/master/sdks/java/testing/test-utils/build.gradle#L20 So after publication, this will result in a dependency not found error. > beam-sdks-testing-nexmark produces corrupt pom.xml > -- > > Key: BEAM-7493 > URL: https://issues.apache.org/jira/browse/BEAM-7493 > Project: Beam > Issue Type: Bug > Components: examples-nexmark >Affects Versions: 2.13.0 >Reporter: Kenneth Knowles >Assignee: Michael Luckey >Priority: Blocker > > Steps to reproduce: > {code} > ./gradlew -Prelease -Ppublishing > :sdks:java:testing:nexmark:publishMavenJavaPublicationTotestPublicationLocalRepository > {code} > and if you inspect the pom you will see > {code} > > org.apache.beam > beam-sdks-java-testing-test-utils > ... > {code} > It appears to be constructed from the directories, but this (and also > load-tests) in the {{testing}} directory have their archive base name > overridden: > https://github.com/apache/beam/blob/master/sdks/java/testing/test-utils/build.gradle#L20 > So after publication, this will result in a dependency not found error. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-529) Check immutability violations in DirectPipelineRunner
[ https://issues.apache.org/jira/browse/BEAM-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856338#comment-16856338 ] Innocent commented on BEAM-529: --- [~myffi...@gmail.com], I am trying to get as much as context on this issue as I can. I am curious about what your proposal was. I am sorry I do not always follow all discussions. > Check immutability violations in DirectPipelineRunner > - > > Key: BEAM-529 > URL: https://issues.apache.org/jira/browse/BEAM-529 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Ahmet Altay >Priority: Minor > Labels: newbie, starter > > Users are going to mutate inputs and outputs of DoFn inappropriately. We > should help their tests fail to catch such mistakes. (Similar to the > DirectPipelineRunner in Java SDK) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7490) TfIdfTest should not be marked ValidatesRunner
[ https://issues.apache.org/jira/browse/BEAM-7490?focusedWorklogId=254127&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254127 ] ASF GitHub Bot logged work on BEAM-7490: Author: ASF GitHub Bot Created on: 05/Jun/19 02:14 Start Date: 05/Jun/19 02:14 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #8761: [BEAM-7490] TfIdfTest should be marked ValidatesRunner URL: https://github.com/apache/beam/pull/8761#issuecomment-498912060 Oh, yes, good catch. The commit also has the wrong description. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254127) Time Spent: 0.5h (was: 20m) > TfIdfTest should not be marked ValidatesRunner > -- > > Key: BEAM-7490 > URL: https://issues.apache.org/jira/browse/BEAM-7490 > Project: Beam > Issue Type: Bug > Components: examples-java >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-529) Check immutability violations in DirectPipelineRunner
[ https://issues.apache.org/jira/browse/BEAM-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856227#comment-16856227 ] Yifan Mai commented on BEAM-529: No, not planning to work on this in the near future. As discussed elsewhere, I am not confident that my proposal will actually work. > Check immutability violations in DirectPipelineRunner > - > > Key: BEAM-529 > URL: https://issues.apache.org/jira/browse/BEAM-529 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Ahmet Altay >Priority: Minor > Labels: newbie, starter > > Users are going to mutate inputs and outputs of DoFn inappropriately. We > should help their tests fail to catch such mistakes. (Similar to the > DirectPipelineRunner in Java SDK) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7492) Add Spark runner to Go SDK
Kyle Weaver created BEAM-7492: - Summary: Add Spark runner to Go SDK Key: BEAM-7492 URL: https://issues.apache.org/jira/browse/BEAM-7492 Project: Beam Issue Type: New Feature Components: sdk-go Reporter: Kyle Weaver Assignee: Kyle Weaver I imagine this should be as easy as copying [1] and s/flink/spark. [1] [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/flink/flink.go] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7491) Add Go SDK post commit test on the Spark runner
Kyle Weaver created BEAM-7491: - Summary: Add Go SDK post commit test on the Spark runner Key: BEAM-7491 URL: https://issues.apache.org/jira/browse/BEAM-7491 Project: Beam Issue Type: Test Components: sdk-go Reporter: Kyle Weaver Assignee: Kyle Weaver We already did this for Flink (BEAM-6959), so should be pretty trivial to add Spark. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2857) Create FileIO in Python
[ https://issues.apache.org/jira/browse/BEAM-2857?focusedWorklogId=254113&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254113 ] ASF GitHub Bot logged work on BEAM-2857: Author: ASF GitHub Bot Created on: 05/Jun/19 00:50 Start Date: 05/Jun/19 00:50 Worklog Time Spent: 10m Work Description: pabloem commented on issue #8394: [BEAM-2857] Implementing WriteToFiles transform for fileio (Python) URL: https://github.com/apache/beam/pull/8394#issuecomment-498896124 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254113) Time Spent: 7h (was: 6h 50m) > Create FileIO in Python > --- > > Key: BEAM-2857 > URL: https://issues.apache.org/jira/browse/BEAM-2857 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Eugene Kirpichov >Assignee: Pablo Estrada >Priority: Major > Labels: gsoc, gsoc2019, mentor > Time Spent: 7h > Remaining Estimate: 0h > > Beam Java has a FileIO with operations: match()/matchAll(), readMatches(), > which together cover the majority of needs for general-purpose file > ingestion. Beam Python should have something similar. > An early design document for this: https://s.apache.org/fileio-beam-python -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2857) Create FileIO in Python
[ https://issues.apache.org/jira/browse/BEAM-2857?focusedWorklogId=254114&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254114 ] ASF GitHub Bot logged work on BEAM-2857: Author: ASF GitHub Bot Created on: 05/Jun/19 00:50 Start Date: 05/Jun/19 00:50 Worklog Time Spent: 10m Work Description: pabloem commented on issue #8394: [BEAM-2857] Implementing WriteToFiles transform for fileio (Python) URL: https://github.com/apache/beam/pull/8394#issuecomment-498896188 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254114) Time Spent: 7h 10m (was: 7h) > Create FileIO in Python > --- > > Key: BEAM-2857 > URL: https://issues.apache.org/jira/browse/BEAM-2857 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Eugene Kirpichov >Assignee: Pablo Estrada >Priority: Major > Labels: gsoc, gsoc2019, mentor > Time Spent: 7h 10m > Remaining Estimate: 0h > > Beam Java has a FileIO with operations: match()/matchAll(), readMatches(), > which together cover the majority of needs for general-purpose file > ingestion. Beam Python should have something similar. > An early design document for this: https://s.apache.org/fileio-beam-python -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7351) Failure in Python streaming wordcount test: unexpected messages received on output topic.
[ https://issues.apache.org/jira/browse/BEAM-7351?focusedWorklogId=254112&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254112 ] ASF GitHub Bot logged work on BEAM-7351: Author: ASF GitHub Bot Created on: 05/Jun/19 00:49 Start Date: 05/Jun/19 00:49 Worklog Time Spent: 10m Work Description: pabloem commented on issue #8752: [BEAM-7351] increment ack deadline for flaky pubsub test URL: https://github.com/apache/beam/pull/8752#issuecomment-498895993 Thanks @tvalentyn @Juta This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254112) Time Spent: 2h 10m (was: 2h) > Failure in Python streaming wordcount test: unexpected messages received on > output topic. > - > > Key: BEAM-7351 > URL: https://issues.apache.org/jira/browse/BEAM-7351 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > Saw this in a PostCommit test today. Likely a flake, but it's a strange > failure mode and we may need to investigate this. > {noformat} > 13:32:02 > == > 13:32:02 FAIL: test_streaming_wordcount_it > (apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT) > 13:32:02 > -- > 13:32:02 Traceback (most recent call last): > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py", > line 110, in test_streaming_wordcount_it > 13:32:02 self.test_pipeline.get_full_options_as_args(**extra_opts)) > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/streaming_wordcount.py", > line 101, in run > 13:32:02 result = p.run() > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/pipeline.py", > line 419, in run > 13:32:02 return self.runner.run_pipeline(self, self._options) > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py", > line 68, in run_pipeline > 13:32:02 hc_assert_that(self.result, pickler.loads(on_success_matcher)) > 13:32:02 AssertionError: > 13:32:02 Expected: (Test pipeline expected terminated in state: RUNNING and > Expected 500 messages.) > 13:32:02 but: Expected 500 messages. Got 508 messages. Diffs (item, > count): > 13:32:02 Expected but not in actual: [] > 13:32:02 Unexpected: [(u'476: 1', 1), (u'416: 1', 1), (u'245: 1', 1), > (u'478: 1', 1), (u'58: 1', 1), (u'364: 1', 1), (u'77: 1', 1), (u'283: 1', 1)] > 13:32:02 > 13:32:02 >> begin captured logging << > > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://169.254.169.254 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/project/project-id > 13:32:02 google.auth.transport.requests: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true > 13:32:02 urllib3.connectionpool: DEBUG: Starting new HTTP connection (1): > metadata.google.internal:80 > 13:32:02 urllib3.connectionpool: DEBUG: http://metadata.google.internal:80 > "GET /computeMetadata/v1/instance/service-accounts/default/?recursive=true > HTTP/1.1" 200 144 > 13:32:02 google.auth.transport.requests: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/844138762903-comp...@developer.gserviceaccount.com/token > 13:32:02 urllib3.connectionpool: DEBUG: http://metadata.google.internal:80 > "GET > /computeMetadata/v1/instance/service-accounts/844138762903-comp...@developer.gserviceaccount.com/token > HTTP/1.1" 200 176 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://169.254.169.254 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/project/project-id > 13:32:02 google.auth.transport.
[jira] [Work logged] (BEAM-7351) Failure in Python streaming wordcount test: unexpected messages received on output topic.
[ https://issues.apache.org/jira/browse/BEAM-7351?focusedWorklogId=254111&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254111 ] ASF GitHub Bot logged work on BEAM-7351: Author: ASF GitHub Bot Created on: 05/Jun/19 00:49 Start Date: 05/Jun/19 00:49 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #8752: [BEAM-7351] increment ack deadline for flaky pubsub test URL: https://github.com/apache/beam/pull/8752 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254111) Time Spent: 2h (was: 1h 50m) > Failure in Python streaming wordcount test: unexpected messages received on > output topic. > - > > Key: BEAM-7351 > URL: https://issues.apache.org/jira/browse/BEAM-7351 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > Saw this in a PostCommit test today. Likely a flake, but it's a strange > failure mode and we may need to investigate this. > {noformat} > 13:32:02 > == > 13:32:02 FAIL: test_streaming_wordcount_it > (apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT) > 13:32:02 > -- > 13:32:02 Traceback (most recent call last): > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py", > line 110, in test_streaming_wordcount_it > 13:32:02 self.test_pipeline.get_full_options_as_args(**extra_opts)) > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/streaming_wordcount.py", > line 101, in run > 13:32:02 result = p.run() > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/pipeline.py", > line 419, in run > 13:32:02 return self.runner.run_pipeline(self, self._options) > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py", > line 68, in run_pipeline > 13:32:02 hc_assert_that(self.result, pickler.loads(on_success_matcher)) > 13:32:02 AssertionError: > 13:32:02 Expected: (Test pipeline expected terminated in state: RUNNING and > Expected 500 messages.) > 13:32:02 but: Expected 500 messages. Got 508 messages. Diffs (item, > count): > 13:32:02 Expected but not in actual: [] > 13:32:02 Unexpected: [(u'476: 1', 1), (u'416: 1', 1), (u'245: 1', 1), > (u'478: 1', 1), (u'58: 1', 1), (u'364: 1', 1), (u'77: 1', 1), (u'283: 1', 1)] > 13:32:02 > 13:32:02 >> begin captured logging << > > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://169.254.169.254 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/project/project-id > 13:32:02 google.auth.transport.requests: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true > 13:32:02 urllib3.connectionpool: DEBUG: Starting new HTTP connection (1): > metadata.google.internal:80 > 13:32:02 urllib3.connectionpool: DEBUG: http://metadata.google.internal:80 > "GET /computeMetadata/v1/instance/service-accounts/default/?recursive=true > HTTP/1.1" 200 144 > 13:32:02 google.auth.transport.requests: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/844138762903-comp...@developer.gserviceaccount.com/token > 13:32:02 urllib3.connectionpool: DEBUG: http://metadata.google.internal:80 > "GET > /computeMetadata/v1/instance/service-accounts/844138762903-comp...@developer.gserviceaccount.com/token > HTTP/1.1" 200 176 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://169.254.169.254 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/project/project-id > 13:32:02 google.auth.transport.requests: DEBUG: Making request: GET > http
[jira] [Work logged] (BEAM-6620) Do not relocate guava
[ https://issues.apache.org/jira/browse/BEAM-6620?focusedWorklogId=254103&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254103 ] ASF GitHub Bot logged work on BEAM-6620: Author: ASF GitHub Bot Created on: 04/Jun/19 23:52 Start Date: 04/Jun/19 23:52 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #8762: [WIP] [BEAM-6620, BEAM-3606] Stop shading by default. URL: https://github.com/apache/beam/pull/8762 This stops shading guava by default in all modules that don't have a custom shading configuration which is all modules except for: * sdks/java/core * sdks/java/extensions/kryo * sdks/java/extensions/sql * sdks/java/extensions/sql/jdbc * sdks/java/harness * runners/spark/job-server * runners/direct-java * runners/samza/job-server * runners/google-cloud-dataflow-java/worker * runners/google-cloud-dataflow-java/worker/legacy-worker This migrates all modules except for the ones listed above so that: * publishing is based off of the unshaded jar and the unshaded test jar. * all intra module dependences use the compile (default) or testCompile configuration. * all dependencies within the module to use compile/testCompile/testRuntimeClasspath instead of shadow/shadowTest/shadowTestRuntimeClasspath **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![
[jira] [Work logged] (BEAM-7351) Failure in Python streaming wordcount test: unexpected messages received on output topic.
[ https://issues.apache.org/jira/browse/BEAM-7351?focusedWorklogId=254098&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254098 ] ASF GitHub Bot logged work on BEAM-7351: Author: ASF GitHub Bot Created on: 04/Jun/19 23:17 Start Date: 04/Jun/19 23:17 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #8752: [BEAM-7351] increment ack deadline for flaky pubsub test URL: https://github.com/apache/beam/pull/8752#issuecomment-498877913 Kicked off another run of postcommits. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254098) Time Spent: 1h 50m (was: 1h 40m) > Failure in Python streaming wordcount test: unexpected messages received on > output topic. > - > > Key: BEAM-7351 > URL: https://issues.apache.org/jira/browse/BEAM-7351 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > Saw this in a PostCommit test today. Likely a flake, but it's a strange > failure mode and we may need to investigate this. > {noformat} > 13:32:02 > == > 13:32:02 FAIL: test_streaming_wordcount_it > (apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT) > 13:32:02 > -- > 13:32:02 Traceback (most recent call last): > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py", > line 110, in test_streaming_wordcount_it > 13:32:02 self.test_pipeline.get_full_options_as_args(**extra_opts)) > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/streaming_wordcount.py", > line 101, in run > 13:32:02 result = p.run() > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/pipeline.py", > line 419, in run > 13:32:02 return self.runner.run_pipeline(self, self._options) > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py", > line 68, in run_pipeline > 13:32:02 hc_assert_that(self.result, pickler.loads(on_success_matcher)) > 13:32:02 AssertionError: > 13:32:02 Expected: (Test pipeline expected terminated in state: RUNNING and > Expected 500 messages.) > 13:32:02 but: Expected 500 messages. Got 508 messages. Diffs (item, > count): > 13:32:02 Expected but not in actual: [] > 13:32:02 Unexpected: [(u'476: 1', 1), (u'416: 1', 1), (u'245: 1', 1), > (u'478: 1', 1), (u'58: 1', 1), (u'364: 1', 1), (u'77: 1', 1), (u'283: 1', 1)] > 13:32:02 > 13:32:02 >> begin captured logging << > > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://169.254.169.254 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/project/project-id > 13:32:02 google.auth.transport.requests: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true > 13:32:02 urllib3.connectionpool: DEBUG: Starting new HTTP connection (1): > metadata.google.internal:80 > 13:32:02 urllib3.connectionpool: DEBUG: http://metadata.google.internal:80 > "GET /computeMetadata/v1/instance/service-accounts/default/?recursive=true > HTTP/1.1" 200 144 > 13:32:02 google.auth.transport.requests: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/844138762903-comp...@developer.gserviceaccount.com/token > 13:32:02 urllib3.connectionpool: DEBUG: http://metadata.google.internal:80 > "GET > /computeMetadata/v1/instance/service-accounts/844138762903-comp...@developer.gserviceaccount.com/token > HTTP/1.1" 200 176 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://169.254.169.254 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/project/project-id > 13:32:02 go
[jira] [Work logged] (BEAM-7351) Failure in Python streaming wordcount test: unexpected messages received on output topic.
[ https://issues.apache.org/jira/browse/BEAM-7351?focusedWorklogId=254097&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254097 ] ASF GitHub Bot logged work on BEAM-7351: Author: ASF GitHub Bot Created on: 04/Jun/19 23:17 Start Date: 04/Jun/19 23:17 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #8752: [BEAM-7351] increment ack deadline for flaky pubsub test URL: https://github.com/apache/beam/pull/8752#issuecomment-498877806 We might not fix all of the tests in other PRs, but we may have more luck with another re-run. But as I said, the tests affected by this PR passed last time I looked into the logs, so I think it's safe to merge. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254097) Time Spent: 1h 40m (was: 1.5h) > Failure in Python streaming wordcount test: unexpected messages received on > output topic. > - > > Key: BEAM-7351 > URL: https://issues.apache.org/jira/browse/BEAM-7351 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > Saw this in a PostCommit test today. Likely a flake, but it's a strange > failure mode and we may need to investigate this. > {noformat} > 13:32:02 > == > 13:32:02 FAIL: test_streaming_wordcount_it > (apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT) > 13:32:02 > -- > 13:32:02 Traceback (most recent call last): > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py", > line 110, in test_streaming_wordcount_it > 13:32:02 self.test_pipeline.get_full_options_as_args(**extra_opts)) > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/streaming_wordcount.py", > line 101, in run > 13:32:02 result = p.run() > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/pipeline.py", > line 419, in run > 13:32:02 return self.runner.run_pipeline(self, self._options) > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py", > line 68, in run_pipeline > 13:32:02 hc_assert_that(self.result, pickler.loads(on_success_matcher)) > 13:32:02 AssertionError: > 13:32:02 Expected: (Test pipeline expected terminated in state: RUNNING and > Expected 500 messages.) > 13:32:02 but: Expected 500 messages. Got 508 messages. Diffs (item, > count): > 13:32:02 Expected but not in actual: [] > 13:32:02 Unexpected: [(u'476: 1', 1), (u'416: 1', 1), (u'245: 1', 1), > (u'478: 1', 1), (u'58: 1', 1), (u'364: 1', 1), (u'77: 1', 1), (u'283: 1', 1)] > 13:32:02 > 13:32:02 >> begin captured logging << > > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://169.254.169.254 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/project/project-id > 13:32:02 google.auth.transport.requests: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true > 13:32:02 urllib3.connectionpool: DEBUG: Starting new HTTP connection (1): > metadata.google.internal:80 > 13:32:02 urllib3.connectionpool: DEBUG: http://metadata.google.internal:80 > "GET /computeMetadata/v1/instance/service-accounts/default/?recursive=true > HTTP/1.1" 200 144 > 13:32:02 google.auth.transport.requests: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/844138762903-comp...@developer.gserviceaccount.com/token > 13:32:02 urllib3.connectionpool: DEBUG: http://metadata.google.internal:80 > "GET > /computeMetadata/v1/instance/service-accounts/844138762903-comp...@developer.gserviceaccount.com/token > HTTP/1.1" 200 176 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://169
[jira] [Work logged] (BEAM-7351) Failure in Python streaming wordcount test: unexpected messages received on output topic.
[ https://issues.apache.org/jira/browse/BEAM-7351?focusedWorklogId=254096&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254096 ] ASF GitHub Bot logged work on BEAM-7351: Author: ASF GitHub Bot Created on: 04/Jun/19 23:17 Start Date: 04/Jun/19 23:17 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #8752: [BEAM-7351] increment ack deadline for flaky pubsub test URL: https://github.com/apache/beam/pull/8752#issuecomment-498877821 Run Python PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254096) Time Spent: 1.5h (was: 1h 20m) > Failure in Python streaming wordcount test: unexpected messages received on > output topic. > - > > Key: BEAM-7351 > URL: https://issues.apache.org/jira/browse/BEAM-7351 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > Saw this in a PostCommit test today. Likely a flake, but it's a strange > failure mode and we may need to investigate this. > {noformat} > 13:32:02 > == > 13:32:02 FAIL: test_streaming_wordcount_it > (apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT) > 13:32:02 > -- > 13:32:02 Traceback (most recent call last): > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py", > line 110, in test_streaming_wordcount_it > 13:32:02 self.test_pipeline.get_full_options_as_args(**extra_opts)) > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/streaming_wordcount.py", > line 101, in run > 13:32:02 result = p.run() > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/pipeline.py", > line 419, in run > 13:32:02 return self.runner.run_pipeline(self, self._options) > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py", > line 68, in run_pipeline > 13:32:02 hc_assert_that(self.result, pickler.loads(on_success_matcher)) > 13:32:02 AssertionError: > 13:32:02 Expected: (Test pipeline expected terminated in state: RUNNING and > Expected 500 messages.) > 13:32:02 but: Expected 500 messages. Got 508 messages. Diffs (item, > count): > 13:32:02 Expected but not in actual: [] > 13:32:02 Unexpected: [(u'476: 1', 1), (u'416: 1', 1), (u'245: 1', 1), > (u'478: 1', 1), (u'58: 1', 1), (u'364: 1', 1), (u'77: 1', 1), (u'283: 1', 1)] > 13:32:02 > 13:32:02 >> begin captured logging << > > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://169.254.169.254 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/project/project-id > 13:32:02 google.auth.transport.requests: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true > 13:32:02 urllib3.connectionpool: DEBUG: Starting new HTTP connection (1): > metadata.google.internal:80 > 13:32:02 urllib3.connectionpool: DEBUG: http://metadata.google.internal:80 > "GET /computeMetadata/v1/instance/service-accounts/default/?recursive=true > HTTP/1.1" 200 144 > 13:32:02 google.auth.transport.requests: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/844138762903-comp...@developer.gserviceaccount.com/token > 13:32:02 urllib3.connectionpool: DEBUG: http://metadata.google.internal:80 > "GET > /computeMetadata/v1/instance/service-accounts/844138762903-comp...@developer.gserviceaccount.com/token > HTTP/1.1" 200 176 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://169.254.169.254 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/project/project-id > 13:32:02 google.auth.transport.r
[jira] [Commented] (BEAM-7395) Check immutability violations in Dataflow Runner (as an option)
[ https://issues.apache.org/jira/browse/BEAM-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856186#comment-16856186 ] Ahmet Altay commented on BEAM-7395: --- Hi [~evindj] if you are planning to work on Beam's direct runner, that issue is tracked here: https://issues.apache.org/jira/browse/BEAM-529 There is no DirectPipelineRunner because that name was changed on previous refactorings. Code is here: https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/direct The idea here is that, elements passed to the process method should not be mutated by process(). Direct runner can enforce this by serializing elements before process call and checking after the call. Doing this in distributed runners like Dataflow will have a significant cost, but we can potentially do it based on sampling (e.g. check on 1 every N elements.) > Check immutability violations in Dataflow Runner (as an option) > --- > > Key: BEAM-7395 > URL: https://issues.apache.org/jira/browse/BEAM-7395 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Innocent >Priority: Minor > Labels: newbie, starter > > Users are going to mutate inputs and outputs of DoFn inappropriately. We > should help their tests fail to catch such mistakes. (Similar to the > DirectPipelineRunner in Java SDK) This should be offered as an option and > worked based on sampling because of the cost of these checks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-529) Check immutability violations in DirectPipelineRunner
[ https://issues.apache.org/jira/browse/BEAM-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856185#comment-16856185 ] Ahmet Altay commented on BEAM-529: -- [~myffi...@gmail.com] are you planning to work on this? > Check immutability violations in DirectPipelineRunner > - > > Key: BEAM-529 > URL: https://issues.apache.org/jira/browse/BEAM-529 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Ahmet Altay >Priority: Minor > Labels: newbie, starter > > Users are going to mutate inputs and outputs of DoFn inappropriately. We > should help their tests fail to catch such mistakes. (Similar to the > DirectPipelineRunner in Java SDK) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7357) Kinesis IO.write throws LimitExceededException
[ https://issues.apache.org/jira/browse/BEAM-7357?focusedWorklogId=254087&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254087 ] ASF GitHub Bot logged work on BEAM-7357: Author: ASF GitHub Bot Created on: 04/Jun/19 22:57 Start Date: 04/Jun/19 22:57 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #8758: [BEAM-7357] Remove stream existence check URL: https://github.com/apache/beam/pull/8758 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254087) Time Spent: 50m (was: 40m) > Kinesis IO.write throws LimitExceededException > -- > > Key: BEAM-7357 > URL: https://issues.apache.org/jira/browse/BEAM-7357 > Project: Beam > Issue Type: Bug > Components: io-java-kinesis >Affects Versions: 2.11.0 >Reporter: Brachi Packter >Assignee: Alexey Romanenko >Priority: Major > Fix For: 2.14.0 > > Time Spent: 50m > Remaining Estimate: 0h > > I used Kinesis IO to write to kinesis. I get very quickly many exceptions > like: > [shard_map.cc:150] Shard map update for stream "***" failed. Code: > LimitExceededException Message: Rate exceeded for stream *** under account > ***; retrying in .. > Also, I see many exceptions like: > Caused by: java.lang.IllegalArgumentException: Stream ** does not exist at > org.apache.beam.vendor.guava.v20_0.com.google.common.base.Preconditions.checkArgument(Preconditions.java:191) > at > org.apache.beam.sdk.io.kinesis.KinesisIO$Write$KinesisWriterFn.setup(KinesisIO.java:515) > I'm sure this stream exists because I can see some data from my pipeline that > was successfully ingested to it. > > Here is my code: > > > {code:java} > .apply(KinesisIO.write() > .withStreamName("**") > .withPartitioner(new KinesisPartitioner() { > @Override > public String getPartitionKey(byte[] value) { > return UUID.randomUUID().toString() > } > @Override > public String getExplicitHashKey(byte[] value) { > return null; > } > }) >.withAWSClientsProvider("**","***",Regions.US_EAST_1));{code} > > I tried to not use the Kinesis IO. and everything works well, I can't figure > out what went wrong. > I tried using the same API as the library did. > > {code:java} > .apply( > ParDo.of(new DoFn() { > private transient IKinesisProducer inlineProducer; > @Setup > public void setup(){ > KinesisProducerConfiguration config = > KinesisProducerConfiguration.fromProperties(new Properties()); > config.setRegion(Regions.US_EAST_1.getName()); > config.setCredentialsProvider(new AWSStaticCredentialsProvider(new > BasicAWSCredentials("***", "***"))); > inlineProducer = new KinesisProducer(config); > } > @ProcessElement > public void processElement(ProcessContext c) throws Exception { > ByteBuffer data = ByteBuffer.wrap(c.element()); > String partitionKey =UUID.randomUUID().toString(); > ListenableFuture f = > getProducer().addUserRecord("***", partitionKey, data); >Futures.addCallback(f, new UserRecordResultFutureCallback()); > } > class UserRecordResultFutureCallback implements > FutureCallback { > @Override > public void onFailure(Throwable cause) { >throw new RuntimeException("failed produce:"+cause); > } > @Override > public void onSuccess(UserRecordResult result) { > } > } > }) > ); > > {code} > > Any idea what I did wrong? or what the error in the KinesisIO? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-5148) Implement MongoDB IO for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856179#comment-16856179 ] Ahmet Altay commented on BEAM-5148: --- Hi [~GeoloeG_IsT], we would like to complete this implementation in Beam. Would it be fine if [~yichi] works on this? Are you still planning to work on this in Beam? We could base the implementation in the beam extended repo you shared already. > Implement MongoDB IO for Python SDK > --- > > Key: BEAM-5148 > URL: https://issues.apache.org/jira/browse/BEAM-5148 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Affects Versions: 3.0.0 >Reporter: Pascal Gula >Assignee: Pascal Gula >Priority: Major > Fix For: Not applicable > > > Currently Java SDK has MongoDB support but Python SDK does not. With current > portability efforts other runners may soon be able to use Python SDK. Having > mongoDB support will allow these runners to execute large scale jobs using it. > Since we need this IO components @ Peat, we started working on a PyPi package > available at this repository: [https://github.com/PEAT-AI/beam-extended] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7388) Reify PTransform for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-7388?focusedWorklogId=254076&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254076 ] ASF GitHub Bot logged work on BEAM-7388: Author: ASF GitHub Bot Created on: 04/Jun/19 22:37 Start Date: 04/Jun/19 22:37 Worklog Time Spent: 10m Work Description: pabloem commented on issue #8717: [BEAM-7388] Reify PTransform for Python SDK URL: https://github.com/apache/beam/pull/8717#issuecomment-498868730 Thanks Tanay! These are great. Sorru about the delay. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254076) Time Spent: 50m (was: 40m) > Reify PTransform for Python SDK > --- > > Key: BEAM-7388 > URL: https://issues.apache.org/jira/browse/BEAM-7388 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Affects Versions: 2.12.0 >Reporter: Tanay Tummalapalli >Assignee: Tanay Tummalapalli >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > The Java SDK has a nice [Reify > PTransform|https://beam.apache.org/releases/javadoc/2.4.0/org/apache/beam/sdk/transforms/Reify.html] > that can be used to add timestamp, window info to elements of a PCollection. > This makes adding timestamp, window info reusable and easy instead of > defining DoFns every time. > Also, this can make a pipeline look really neat. Eg: [This > PR|https://github.com/apache/beam/pull/8589]. > This can be added to the util module of the Python SDK. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7388) Reify PTransform for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-7388?focusedWorklogId=254077&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254077 ] ASF GitHub Bot logged work on BEAM-7388: Author: ASF GitHub Bot Created on: 04/Jun/19 22:37 Start Date: 04/Jun/19 22:37 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #8717: [BEAM-7388] Reify PTransform for Python SDK URL: https://github.com/apache/beam/pull/8717 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254077) Time Spent: 1h (was: 50m) > Reify PTransform for Python SDK > --- > > Key: BEAM-7388 > URL: https://issues.apache.org/jira/browse/BEAM-7388 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Affects Versions: 2.12.0 >Reporter: Tanay Tummalapalli >Assignee: Tanay Tummalapalli >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > The Java SDK has a nice [Reify > PTransform|https://beam.apache.org/releases/javadoc/2.4.0/org/apache/beam/sdk/transforms/Reify.html] > that can be used to add timestamp, window info to elements of a PCollection. > This makes adding timestamp, window info reusable and easy instead of > defining DoFns every time. > Also, this can make a pipeline look really neat. Eg: [This > PR|https://github.com/apache/beam/pull/8589]. > This can be added to the util module of the Python SDK. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7490) TfIdfTest should not be marked ValidatesRunner
[ https://issues.apache.org/jira/browse/BEAM-7490?focusedWorklogId=254074&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254074 ] ASF GitHub Bot logged work on BEAM-7490: Author: ASF GitHub Bot Created on: 04/Jun/19 22:34 Start Date: 04/Jun/19 22:34 Worklog Time Spent: 10m Work Description: kennknowles commented on issue #8761: [BEAM-7490] TfIdfTest should be marked ValidatesRunner URL: https://github.com/apache/beam/pull/8761#issuecomment-498868212 run java postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254074) Time Spent: 20m (was: 10m) > TfIdfTest should not be marked ValidatesRunner > -- > > Key: BEAM-7490 > URL: https://issues.apache.org/jira/browse/BEAM-7490 > Project: Beam > Issue Type: Bug > Components: examples-java >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7490) TfIdfTest should not be marked ValidatesRunner
[ https://issues.apache.org/jira/browse/BEAM-7490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-7490: -- Status: Open (was: Triage Needed) > TfIdfTest should not be marked ValidatesRunner > -- > > Key: BEAM-7490 > URL: https://issues.apache.org/jira/browse/BEAM-7490 > Project: Beam > Issue Type: Bug > Components: examples-java >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7490) TfIdfTest should not be marked ValidatesRunner
[ https://issues.apache.org/jira/browse/BEAM-7490?focusedWorklogId=254068&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254068 ] ASF GitHub Bot logged work on BEAM-7490: Author: ASF GitHub Bot Created on: 04/Jun/19 22:29 Start Date: 04/Jun/19 22:29 Worklog Time Spent: 10m Work Description: kennknowles commented on pull request #8761: [BEAM-7490] TfIdfTest should be marked ValidatesRunner URL: https://github.com/apache/beam/pull/8761 This is a test of an example, not a model compliance test. It does not need any annotation at all, in fact. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.a
[jira] [Created] (BEAM-7490) TfIdfTest should not be marked ValidatesRunner
Kenneth Knowles created BEAM-7490: - Summary: TfIdfTest should not be marked ValidatesRunner Key: BEAM-7490 URL: https://issues.apache.org/jira/browse/BEAM-7490 Project: Beam Issue Type: Bug Components: examples-java Reporter: Kenneth Knowles Assignee: Kenneth Knowles -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3606) Improve our relationship, or lack thereof, with Guava
[ https://issues.apache.org/jira/browse/BEAM-3606?focusedWorklogId=254061&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254061 ] ASF GitHub Bot logged work on BEAM-3606: Author: ASF GitHub Bot Created on: 04/Jun/19 22:17 Start Date: 04/Jun/19 22:17 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #8760: [BEAM-3606] Remove unused guava_testlib from dependency sets. URL: https://github.com/apache/beam/pull/8760 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254061) Time Spent: 2h (was: 1h 50m) > Improve our relationship, or lack thereof, with Guava > - > > Key: BEAM-3606 > URL: https://issues.apache.org/jira/browse/BEAM-3606 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp, runner-core, sdk-java-core >Reporter: Kenneth Knowles >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > This is an umbrella task for things that might move us off Guava, such as > replicating our own little utilities or moving to Java 8 features. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-7175) Add Spark Portable ValidatesRunner Batch postcommit test
[ https://issues.apache.org/jira/browse/BEAM-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía resolved BEAM-7175. Resolution: Fixed Fix Version/s: 2.14.0 > Add Spark Portable ValidatesRunner Batch postcommit test > > > Key: BEAM-7175 > URL: https://issues.apache.org/jira/browse/BEAM-7175 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Fix For: 2.14.0 > > Time Spent: 9h 40m > Remaining Estimate: 0h > > when it's ready -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7175) Add Spark Portable ValidatesRunner Batch postcommit test
[ https://issues.apache.org/jira/browse/BEAM-7175?focusedWorklogId=254055&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254055 ] ASF GitHub Bot logged work on BEAM-7175: Author: ASF GitHub Bot Created on: 04/Jun/19 22:12 Start Date: 04/Jun/19 22:12 Worklog Time Spent: 10m Work Description: iemejia commented on issue #8432: [BEAM-7175] add Spark validatesPortableRunner batch test postcommit URL: https://github.com/apache/beam/pull/8432#issuecomment-498862624 Merged manually to change the commit message and squash. Great job @ibzib ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254055) Time Spent: 9.5h (was: 9h 20m) > Add Spark Portable ValidatesRunner Batch postcommit test > > > Key: BEAM-7175 > URL: https://issues.apache.org/jira/browse/BEAM-7175 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 9.5h > Remaining Estimate: 0h > > when it's ready -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7175) Add Spark Portable ValidatesRunner Batch postcommit test
[ https://issues.apache.org/jira/browse/BEAM-7175?focusedWorklogId=254056&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254056 ] ASF GitHub Bot logged work on BEAM-7175: Author: ASF GitHub Bot Created on: 04/Jun/19 22:12 Start Date: 04/Jun/19 22:12 Worklog Time Spent: 10m Work Description: iemejia commented on pull request #8432: [BEAM-7175] add Spark validatesPortableRunner batch test postcommit URL: https://github.com/apache/beam/pull/8432 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254056) Time Spent: 9h 40m (was: 9.5h) > Add Spark Portable ValidatesRunner Batch postcommit test > > > Key: BEAM-7175 > URL: https://issues.apache.org/jira/browse/BEAM-7175 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 9h 40m > Remaining Estimate: 0h > > when it's ready -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (BEAM-6620) Do not relocate guava
[ https://issues.apache.org/jira/browse/BEAM-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Cwik reassigned BEAM-6620: --- Assignee: Luke Cwik (was: Kenneth Knowles) > Do not relocate guava > - > > Key: BEAM-6620 > URL: https://issues.apache.org/jira/browse/BEAM-6620 > Project: Beam > Issue Type: Sub-task > Components: build-system >Affects Versions: 2.11.0 >Reporter: Ismaël Mejía >Assignee: Luke Cwik >Priority: Critical > > Once guava use is vendored we have to remove the automatic relocation of > guava so modules who use guava in their API (e.g. Cassandra or Kinesis) can > work correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-6249) Vendored gRPC doesn't seem to work with dataflow
[ https://issues.apache.org/jira/browse/BEAM-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Cwik resolved BEAM-6249. - Resolution: Fixed Assignee: Kenneth Knowles (was: Tyler Akidau) Fix Version/s: 2.10.0 > Vendored gRPC doesn't seem to work with dataflow > > > Key: BEAM-6249 > URL: https://issues.apache.org/jira/browse/BEAM-6249 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Affects Versions: 2.9.0 >Reporter: Steve Niemitz >Assignee: Kenneth Knowles >Priority: Major > Fix For: 2.10.0 > > Time Spent: 1h > Remaining Estimate: 0h > > I attempted to migrate an existing pipeline (that worked in 2.8.0) to 2.9.0. > This pipeline is using the experimental streaming engine > (–experiments=enable_streaming_engine). > The pipeline fails to start with these logs: > {code:java} > D Unable to load the library > 'org_apache_beam_vendor_grpc_v1_13_1_netty_tcnative_linux_x86_64', trying > other loading mechanism. > D org_apache_beam_vendor_grpc_v1_13_1_netty_tcnative_linux_x86_64 cannot be > loaded from java.libary.path, now trying export to -Dio.netty.native.workdir: > /tmp > D Unable to load the library > '/tmp/liborg_apache_beam_vendor_grpc_v1_13_1_netty_tcnative_linux_x86_646918605450681921540.so', > trying other loading mechanism. > D Unable to load the library 'netty_tcnative_linux_x86_64', trying next > name... > D Unable to load the library > 'org_apache_beam_vendor_grpc_v1_13_1_netty_tcnative_linux_x86_64_fedora', > trying other loading mechanism. > D org_apache_beam_vendor_grpc_v1_13_1_netty_tcnative_linux_x86_64_fedora > cannot be loaded from java.libary.path, now trying export to > -Dio.netty.native.workdir: /tmp > D Unable to load the library 'netty_tcnative_linux_x86_64_fedora', trying > next name... > D Unable to load the library > 'org_apache_beam_vendor_grpc_v1_13_1_netty_tcnative_x86_64', trying other > loading mechanism. > D org_apache_beam_vendor_grpc_v1_13_1_netty_tcnative_x86_64 cannot be loaded > from java.libary.path, now trying export to -Dio.netty.native.workdir: /tmp > D Unable to load the library 'netty_tcnative_x86_64', trying next name... > D Unable to load the library > 'org_apache_beam_vendor_grpc_v1_13_1_netty_tcnative', trying other loading > mechanism. > D org_apache_beam_vendor_grpc_v1_13_1_netty_tcnative cannot be loaded from > java.libary.path, now trying export to -Dio.netty.native.workdir: /tmp > D Unable to load the library 'netty_tcnative', trying next name... > D Failed to load netty-tcnative; OpenSslEngine will be unavailable, unless > the application has already loaded the symbols by some other means. See > http://netty.io/wiki/forked-tomcat-native.html for more information. > D Failed to initialize netty-tcnative; OpenSslEngine will be unavailable. > See http://netty.io/wiki/forked-tomcat-native.html for more information. > I netty-tcnative unavailable (this may be normal) > I Conscrypt not found (this may be normal) > I Jetty ALPN unavailable (this may be normal) > E Uncaught exception in main thread. Exiting with status code 1. > W Please use a logger instead of System.out or System.err. > Please switch to using org.slf4j.Logger. > See: https://cloud.google.com/dataflow/pipelines/logging > E Uncaught exception in main thread. Exiting with status code 1. > E java.lang.IllegalStateException: Could not find TLS ALPN provider; no > working netty-tcnative, Conscrypt, or Jetty NPN/ALPN available > E at > org.apache.beam.vendor.grpc.v1_13_1.io.grpc.netty.GrpcSslContexts.defaultSslProvider(GrpcSslContexts.java:256) > > E at > org.apache.beam.vendor.grpc.v1_13_1.io.grpc.netty.GrpcSslContexts.configure(GrpcSslContexts.java:171) > > E at > org.apache.beam.vendor.grpc.v1_13_1.io.grpc.netty.GrpcSslContexts.forClient(GrpcSslContexts.java:120) > > E at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.remoteChannel(GrpcWindmillServer.java:343) > > E at > org.apache.beam.runners.dataflow.worker.windmill.GrpcWindmillServer.initializeWindmillService(GrpcWindmillServer.java:312) > > {code} > > The interesting part is in the netty load failure, the stack trace is: > {code:java} > exception: "java.lang.UnsatisfiedLinkError at > org.apache.beam.vendor.grpc.v1_13_1.io.netty.util.internal.NativeLibraryLoader.loadLibraryByHelper(NativeLibraryLoader.java:276) > at > org.apache.beam.vendor.grpc.v1_13_1.io.netty.util.internal.NativeLibraryLoader.loadLibrary(NativeLibraryLoader.java:233) > at > org.apache.beam.vendor.grpc.v1_13_1.io.netty.util.internal.NativeLibraryLoader.load(NativeLibraryLoader.java:187) > at > org.apache.beam.vendor.grpc.v1_13_1.io.netty.util.internal.NativeLibraryLoader.loadF
[jira] [Commented] (BEAM-5826) Vendor slf4j API
[ https://issues.apache.org/jira/browse/BEAM-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856155#comment-16856155 ] Luke Cwik commented on BEAM-5826: - I concur with Thomas, the non shaded SLF4J api is meant to be used by users so that we can capture the logs in the SDK harness. > Vendor slf4j API > > > Key: BEAM-5826 > URL: https://issues.apache.org/jira/browse/BEAM-5826 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-harness >Reporter: Kenneth Knowles >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-5826) Vendor slf4j API
[ https://issues.apache.org/jira/browse/BEAM-5826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Cwik resolved BEAM-5826. - Resolution: Won't Do Fix Version/s: Not applicable > Vendor slf4j API > > > Key: BEAM-5826 > URL: https://issues.apache.org/jira/browse/BEAM-5826 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-harness >Reporter: Kenneth Knowles >Priority: Major > Fix For: Not applicable > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7175) Add Spark Portable ValidatesRunner Batch postcommit test
[ https://issues.apache.org/jira/browse/BEAM-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-7175: --- Summary: Add Spark Portable ValidatesRunner Batch postcommit test (was: Run Spark validatesPortableRunner test as a postcommit) > Add Spark Portable ValidatesRunner Batch postcommit test > > > Key: BEAM-7175 > URL: https://issues.apache.org/jira/browse/BEAM-7175 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 9h 20m > Remaining Estimate: 0h > > when it's ready -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6683) Add an integration test suite for cross-language transforms for Flink runner
[ https://issues.apache.org/jira/browse/BEAM-6683?focusedWorklogId=254033&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254033 ] ASF GitHub Bot logged work on BEAM-6683: Author: ASF GitHub Bot Created on: 04/Jun/19 21:17 Start Date: 04/Jun/19 21:17 Worklog Time Spent: 10m Work Description: ihji commented on issue #8174: [BEAM-6683] add createCrossLanguageValidatesRunner task URL: https://github.com/apache/beam/pull/8174#issuecomment-498847354 @mxm Sure. It would be great if you can look into it. I also wonder why #8693 breaks it since the change #8693 introduced looks pretty trivial. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254033) Time Spent: 13h 20m (was: 13h 10m) > Add an integration test suite for cross-language transforms for Flink runner > > > Key: BEAM-6683 > URL: https://issues.apache.org/jira/browse/BEAM-6683 > Project: Beam > Issue Type: Test > Components: testing >Reporter: Chamikara Jayalath >Assignee: Heejong Lee >Priority: Major > Time Spent: 13h 20m > Remaining Estimate: 0h > > We should add an integration test suite that covers following. > (1) Currently available Java IO connectors that do not use UDFs work for > Python SDK on Flink runner. > (2) Currently available Python IO connectors that do not use UDFs work for > Java SDK on Flink runner. > (3) Currently available Java/Python pipelines work in a scalable manner for > cross-language pipelines (for example, try 10GB, 100GB input for > textio/avroio for Java and Python). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3606) Improve our relationship, or lack thereof, with Guava
[ https://issues.apache.org/jira/browse/BEAM-3606?focusedWorklogId=254027&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254027 ] ASF GitHub Bot logged work on BEAM-3606: Author: ASF GitHub Bot Created on: 04/Jun/19 20:58 Start Date: 04/Jun/19 20:58 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #8759: [BEAM-3606] Remove guava-testlib from sdks/java/core main dependency set URL: https://github.com/apache/beam/pull/8759#discussion_r290494236 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/fs/ResourceIdTester.java ## @@ -77,31 +80,58 @@ private static void validateResolvingIds( allResourceIds.add(dir2); // ResourceIds in equality groups. -new EqualsTester() -.addEqualityGroup(file1) -.addEqualityGroup(file2, file2a) -.addEqualityGroup(dir1, dir1.getCurrentDirectory()) -.addEqualityGroup(dir2, dir2a, dir2.getCurrentDirectory()) -.addEqualityGroup(baseDirectory, file1.getCurrentDirectory(), file2.getCurrentDirectory()) -.testEquals(); +assertEqualityGroups( +Arrays.asList( +Arrays.asList(file1), +Arrays.asList(file2, file2a), +Arrays.asList(dir1, dir1.getCurrentDirectory()), +Arrays.asList(dir2, dir2a, dir2.getCurrentDirectory()), +Arrays.asList( +baseDirectory, file1.getCurrentDirectory(), file2.getCurrentDirectory(; // ResourceId toString() in equality groups. -new EqualsTester() -.addEqualityGroup(file1.toString()) -.addEqualityGroup(file2.toString(), file2a.toString()) -.addEqualityGroup(dir1.toString(), dir1.getCurrentDirectory().toString()) -.addEqualityGroup(dir2.toString(), dir2a.toString(), dir2.getCurrentDirectory().toString()) -.addEqualityGroup( -baseDirectory.toString(), -file1.getCurrentDirectory().toString(), -file2.getCurrentDirectory().toString()) -.testEquals(); +assertEqualityGroups( +Arrays.asList( +Arrays.asList(file1.toString()), +Arrays.asList(file2.toString(), file2a.toString()), +Arrays.asList(dir1.toString(), dir1.getCurrentDirectory().toString()), +Arrays.asList(dir2.toString(), dir2a.toString(), dir2.getCurrentDirectory().toString()), +Arrays.asList( +baseDirectory.toString(), +file1.getCurrentDirectory().toString(), +file2.getCurrentDirectory().toString(; // TODO: test resolving strings that need to be escaped. // Possible spec: https://tools.ietf.org/html/rfc3986#section-2 // May need options to be filesystem-independent, e.g., if filesystems ban certain chars. } + /** + * Asserts that all elements in each group are equal to each other but not equal to any other + * element in another group. + */ + private static void assertEqualityGroups(List> equalityGroups) { +for (int i = 0; i < equalityGroups.size(); ++i) { + List current = equalityGroups.get(i); + for (int j = 0; j < current.size(); ++j) { +for (int k = 0; k < current.size(); ++k) { Review comment: Ah, I didn't consider that case. Ok that makes sense now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254027) Time Spent: 1h 50m (was: 1h 40m) > Improve our relationship, or lack thereof, with Guava > - > > Key: BEAM-3606 > URL: https://issues.apache.org/jira/browse/BEAM-3606 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp, runner-core, sdk-java-core >Reporter: Kenneth Knowles >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > This is an umbrella task for things that might move us off Guava, such as > replicating our own little utilities or moving to Java 8 features. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-7176) Spark validatesPortableRunner test OOM
[ https://issues.apache.org/jira/browse/BEAM-7176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver resolved BEAM-7176. --- Resolution: Fixed Fix Version/s: 2.14.0 > Spark validatesPortableRunner test OOM > -- > > Key: BEAM-7176 > URL: https://issues.apache.org/jira/browse/BEAM-7176 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Fix For: 2.14.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > First ~50 tests run okay, and then this happens: > {{org.apache.beam.sdk.transforms.CombineTest$WindowingTests > > testGlobalCombineWithDefaultsAndTriggers FAILED}} > {{ java.lang.AssertionError at > [CombineTest.java:1209|http://combinetest.java:1209/]}} > {{org.apache.beam.sdk.transforms.CombineTest$WindowingTests > > testCombineGloballyLambda FAILED}} > {{ java.lang.RuntimeException at > [CombineTest.java:1391|http://combinetest.java:1391/]}} > {{ Caused by: java.lang.RuntimeException at > [CombineTest.java:1391|http://combinetest.java:1391/]}} > {{ Caused by: java.util.concurrent.ExecutionException at > [CombineTest.java:1391|http://combinetest.java:1391/]}} > {{ Caused by: java.lang.OutOfMemoryError}} > {{}} > > Most subsequent tests OOM after that. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-7412) portable spark: thread/memory leak in local mode
[ https://issues.apache.org/jira/browse/BEAM-7412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver resolved BEAM-7412. --- Resolution: Fixed Fix Version/s: 2.14.0 > portable spark: thread/memory leak in local mode > > > Key: BEAM-7412 > URL: https://issues.apache.org/jira/browse/BEAM-7412 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Fix For: 2.14.0 > > Attachments: Screenshot from 2019-05-20 14-26-03.png > > Time Spent: 2h > Remaining Estimate: 0h > > When running validatesPortableRunnerBatch test on local mode, the portable > Spark runner creates ~18k threads before becoming unable to create any more > threads, at which point it crashes. This does not happen on standalone > cluster mode, where ephemeral executors (independent JVMs) are used and then > shut down, preventing whatever leakage is occurring from becoming too much of > a problem. > Sample stack trace: > {{"pool-3965-thread-3" #16059 daemon prio=5 os_prio=0 tid=0x7f9f85a81000 > nid=0x32a0 waiting on condition [0x7f9f8b9f1000]}} > {{ java.lang.Thread.State: TIMED_WAITING (parking)}} > {{ at (C/C++) 0x7fa2f32c2dae (Unknown Source)}} > {{ at (C/C++) 0x7fa2f2851351 (Unknown Source)}} > {{ at sun.misc.Unsafe.park(Native Method)}} > {{ - parking to wait for <0x000727bda848> (a > java.util.concurrent.SynchronousQueue$TransferStack)}} > {{ at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)}} > {{ at > java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)}} > {{ at > java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362)}} > {{ at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941)}} > {{ at > java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)}} > {{ at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)}} > {{ at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)}} > {{ at java.lang.Thread.run(Thread.java:748)}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-6959) Run Go SDK Post Commit tests against the Flink Runner.
[ https://issues.apache.org/jira/browse/BEAM-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver resolved BEAM-6959. --- Resolution: Fixed Fix Version/s: 2.14.0 > Run Go SDK Post Commit tests against the Flink Runner. > --- > > Key: BEAM-6959 > URL: https://issues.apache.org/jira/browse/BEAM-6959 > Project: Beam > Issue Type: Sub-task > Components: runner-flink, sdk-go, testing >Reporter: Robert Burke >Assignee: Kyle Weaver >Priority: Minor > Fix For: 2.14.0 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > See parent task BEAM-6958 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3606) Improve our relationship, or lack thereof, with Guava
[ https://issues.apache.org/jira/browse/BEAM-3606?focusedWorklogId=254013&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254013 ] ASF GitHub Bot logged work on BEAM-3606: Author: ASF GitHub Bot Created on: 04/Jun/19 20:26 Start Date: 04/Jun/19 20:26 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #8760: [BEAM-3606] Remove unused guava_testlib from dependency sets. URL: https://github.com/apache/beam/pull/8760#issuecomment-498829802 R: @Ardagan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254013) Time Spent: 1h 40m (was: 1.5h) > Improve our relationship, or lack thereof, with Guava > - > > Key: BEAM-3606 > URL: https://issues.apache.org/jira/browse/BEAM-3606 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp, runner-core, sdk-java-core >Reporter: Kenneth Knowles >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > This is an umbrella task for things that might move us off Guava, such as > replicating our own little utilities or moving to Java 8 features. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3606) Improve our relationship, or lack thereof, with Guava
[ https://issues.apache.org/jira/browse/BEAM-3606?focusedWorklogId=254012&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254012 ] ASF GitHub Bot logged work on BEAM-3606: Author: ASF GitHub Bot Created on: 04/Jun/19 20:25 Start Date: 04/Jun/19 20:25 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #8760: [BEAM-3606] Remove unused guava_testlib from dependency sets. URL: https://github.com/apache/beam/pull/8760 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/) | --- | --- | --- Pre-Commit Tests Status (on master branch) -
[jira] [Work logged] (BEAM-3606) Improve our relationship, or lack thereof, with Guava
[ https://issues.apache.org/jira/browse/BEAM-3606?focusedWorklogId=254010&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254010 ] ASF GitHub Bot logged work on BEAM-3606: Author: ASF GitHub Bot Created on: 04/Jun/19 20:22 Start Date: 04/Jun/19 20:22 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #8759: [BEAM-3606] Remove guava-testlib from sdks/java/core main dependency set URL: https://github.com/apache/beam/pull/8759 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254010) Time Spent: 1h 20m (was: 1h 10m) > Improve our relationship, or lack thereof, with Guava > - > > Key: BEAM-3606 > URL: https://issues.apache.org/jira/browse/BEAM-3606 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp, runner-core, sdk-java-core >Reporter: Kenneth Knowles >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > This is an umbrella task for things that might move us off Guava, such as > replicating our own little utilities or moving to Java 8 features. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2857) Create FileIO in Python
[ https://issues.apache.org/jira/browse/BEAM-2857?focusedWorklogId=254007&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254007 ] ASF GitHub Bot logged work on BEAM-2857: Author: ASF GitHub Bot Created on: 04/Jun/19 20:19 Start Date: 04/Jun/19 20:19 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #8394: [BEAM-2857] Implementing WriteToFiles transform for fileio (Python) URL: https://github.com/apache/beam/pull/8394#issuecomment-498827277 Also, please fix the conflict. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254007) Time Spent: 6h 50m (was: 6h 40m) > Create FileIO in Python > --- > > Key: BEAM-2857 > URL: https://issues.apache.org/jira/browse/BEAM-2857 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Eugene Kirpichov >Assignee: Pablo Estrada >Priority: Major > Labels: gsoc, gsoc2019, mentor > Time Spent: 6h 50m > Remaining Estimate: 0h > > Beam Java has a FileIO with operations: match()/matchAll(), readMatches(), > which together cover the majority of needs for general-purpose file > ingestion. Beam Python should have something similar. > An early design document for this: https://s.apache.org/fileio-beam-python -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7351) Failure in Python streaming wordcount test: unexpected messages received on output topic.
[ https://issues.apache.org/jira/browse/BEAM-7351?focusedWorklogId=254004&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254004 ] ASF GitHub Bot logged work on BEAM-7351: Author: ASF GitHub Bot Created on: 04/Jun/19 20:15 Start Date: 04/Jun/19 20:15 Worklog Time Spent: 10m Work Description: pabloem commented on issue #8752: [BEAM-7351] increment ack deadline for flaky pubsub test URL: https://github.com/apache/beam/pull/8752#issuecomment-498825923 hm I see what you mean. There's many broken tests. You're saying that there's fixes for the other tests in other PRs then? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254004) Time Spent: 1h 20m (was: 1h 10m) > Failure in Python streaming wordcount test: unexpected messages received on > output topic. > - > > Key: BEAM-7351 > URL: https://issues.apache.org/jira/browse/BEAM-7351 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > Saw this in a PostCommit test today. Likely a flake, but it's a strange > failure mode and we may need to investigate this. > {noformat} > 13:32:02 > == > 13:32:02 FAIL: test_streaming_wordcount_it > (apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT) > 13:32:02 > -- > 13:32:02 Traceback (most recent call last): > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py", > line 110, in test_streaming_wordcount_it > 13:32:02 self.test_pipeline.get_full_options_as_args(**extra_opts)) > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/streaming_wordcount.py", > line 101, in run > 13:32:02 result = p.run() > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/pipeline.py", > line 419, in run > 13:32:02 return self.runner.run_pipeline(self, self._options) > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py", > line 68, in run_pipeline > 13:32:02 hc_assert_that(self.result, pickler.loads(on_success_matcher)) > 13:32:02 AssertionError: > 13:32:02 Expected: (Test pipeline expected terminated in state: RUNNING and > Expected 500 messages.) > 13:32:02 but: Expected 500 messages. Got 508 messages. Diffs (item, > count): > 13:32:02 Expected but not in actual: [] > 13:32:02 Unexpected: [(u'476: 1', 1), (u'416: 1', 1), (u'245: 1', 1), > (u'478: 1', 1), (u'58: 1', 1), (u'364: 1', 1), (u'77: 1', 1), (u'283: 1', 1)] > 13:32:02 > 13:32:02 >> begin captured logging << > > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://169.254.169.254 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/project/project-id > 13:32:02 google.auth.transport.requests: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true > 13:32:02 urllib3.connectionpool: DEBUG: Starting new HTTP connection (1): > metadata.google.internal:80 > 13:32:02 urllib3.connectionpool: DEBUG: http://metadata.google.internal:80 > "GET /computeMetadata/v1/instance/service-accounts/default/?recursive=true > HTTP/1.1" 200 144 > 13:32:02 google.auth.transport.requests: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/844138762903-comp...@developer.gserviceaccount.com/token > 13:32:02 urllib3.connectionpool: DEBUG: http://metadata.google.internal:80 > "GET > /computeMetadata/v1/instance/service-accounts/844138762903-comp...@developer.gserviceaccount.com/token > HTTP/1.1" 200 176 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://169.254.169.254 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET >
[jira] [Work logged] (BEAM-3606) Improve our relationship, or lack thereof, with Guava
[ https://issues.apache.org/jira/browse/BEAM-3606?focusedWorklogId=254002&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254002 ] ASF GitHub Bot logged work on BEAM-3606: Author: ASF GitHub Bot Created on: 04/Jun/19 20:13 Start Date: 04/Jun/19 20:13 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #8759: [BEAM-3606] Remove guava-testlib from sdks/java/core main dependency set URL: https://github.com/apache/beam/pull/8759#discussion_r290477000 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/fs/ResourceIdTester.java ## @@ -77,31 +80,58 @@ private static void validateResolvingIds( allResourceIds.add(dir2); // ResourceIds in equality groups. -new EqualsTester() -.addEqualityGroup(file1) -.addEqualityGroup(file2, file2a) -.addEqualityGroup(dir1, dir1.getCurrentDirectory()) -.addEqualityGroup(dir2, dir2a, dir2.getCurrentDirectory()) -.addEqualityGroup(baseDirectory, file1.getCurrentDirectory(), file2.getCurrentDirectory()) -.testEquals(); +assertEqualityGroups( +Arrays.asList( +Arrays.asList(file1), +Arrays.asList(file2, file2a), +Arrays.asList(dir1, dir1.getCurrentDirectory()), +Arrays.asList(dir2, dir2a, dir2.getCurrentDirectory()), +Arrays.asList( +baseDirectory, file1.getCurrentDirectory(), file2.getCurrentDirectory(; // ResourceId toString() in equality groups. -new EqualsTester() -.addEqualityGroup(file1.toString()) -.addEqualityGroup(file2.toString(), file2a.toString()) -.addEqualityGroup(dir1.toString(), dir1.getCurrentDirectory().toString()) -.addEqualityGroup(dir2.toString(), dir2a.toString(), dir2.getCurrentDirectory().toString()) -.addEqualityGroup( -baseDirectory.toString(), -file1.getCurrentDirectory().toString(), -file2.getCurrentDirectory().toString()) -.testEquals(); +assertEqualityGroups( +Arrays.asList( +Arrays.asList(file1.toString()), +Arrays.asList(file2.toString(), file2a.toString()), +Arrays.asList(dir1.toString(), dir1.getCurrentDirectory().toString()), +Arrays.asList(dir2.toString(), dir2a.toString(), dir2.getCurrentDirectory().toString()), +Arrays.asList( +baseDirectory.toString(), +file1.getCurrentDirectory().toString(), +file2.getCurrentDirectory().toString(; // TODO: test resolving strings that need to be escaped. // Possible spec: https://tools.ietf.org/html/rfc3986#section-2 // May need options to be filesystem-independent, e.g., if filesystems ban certain chars. } + /** + * Asserts that all elements in each group are equal to each other but not equal to any other + * element in another group. + */ + private static void assertEqualityGroups(List> equalityGroups) { +for (int i = 0; i < equalityGroups.size(); ++i) { + List current = equalityGroups.get(i); + for (int j = 0; j < current.size(); ++j) { +for (int k = 0; k < current.size(); ++k) { Review comment: Since equals is implemented within the class a.equals(b) is not the same thing as b.equals(a) if a and b are two different classes, we also test a.equals(a) for posterity. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254002) Time Spent: 1h (was: 50m) > Improve our relationship, or lack thereof, with Guava > - > > Key: BEAM-3606 > URL: https://issues.apache.org/jira/browse/BEAM-3606 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp, runner-core, sdk-java-core >Reporter: Kenneth Knowles >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > This is an umbrella task for things that might move us off Guava, such as > replicating our own little utilities or moving to Java 8 features. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3606) Improve our relationship, or lack thereof, with Guava
[ https://issues.apache.org/jira/browse/BEAM-3606?focusedWorklogId=254003&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254003 ] ASF GitHub Bot logged work on BEAM-3606: Author: ASF GitHub Bot Created on: 04/Jun/19 20:13 Start Date: 04/Jun/19 20:13 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #8759: [BEAM-3606] Remove guava-testlib from sdks/java/core main dependency set URL: https://github.com/apache/beam/pull/8759#discussion_r290477276 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/fs/ResourceIdTester.java ## @@ -77,31 +80,58 @@ private static void validateResolvingIds( allResourceIds.add(dir2); // ResourceIds in equality groups. -new EqualsTester() -.addEqualityGroup(file1) -.addEqualityGroup(file2, file2a) -.addEqualityGroup(dir1, dir1.getCurrentDirectory()) -.addEqualityGroup(dir2, dir2a, dir2.getCurrentDirectory()) -.addEqualityGroup(baseDirectory, file1.getCurrentDirectory(), file2.getCurrentDirectory()) -.testEquals(); +assertEqualityGroups( +Arrays.asList( +Arrays.asList(file1), +Arrays.asList(file2, file2a), +Arrays.asList(dir1, dir1.getCurrentDirectory()), +Arrays.asList(dir2, dir2a, dir2.getCurrentDirectory()), +Arrays.asList( +baseDirectory, file1.getCurrentDirectory(), file2.getCurrentDirectory(; // ResourceId toString() in equality groups. -new EqualsTester() -.addEqualityGroup(file1.toString()) -.addEqualityGroup(file2.toString(), file2a.toString()) -.addEqualityGroup(dir1.toString(), dir1.getCurrentDirectory().toString()) -.addEqualityGroup(dir2.toString(), dir2a.toString(), dir2.getCurrentDirectory().toString()) -.addEqualityGroup( -baseDirectory.toString(), -file1.getCurrentDirectory().toString(), -file2.getCurrentDirectory().toString()) -.testEquals(); +assertEqualityGroups( +Arrays.asList( +Arrays.asList(file1.toString()), +Arrays.asList(file2.toString(), file2a.toString()), +Arrays.asList(dir1.toString(), dir1.getCurrentDirectory().toString()), +Arrays.asList(dir2.toString(), dir2a.toString(), dir2.getCurrentDirectory().toString()), +Arrays.asList( +baseDirectory.toString(), +file1.getCurrentDirectory().toString(), +file2.getCurrentDirectory().toString(; // TODO: test resolving strings that need to be escaped. // Possible spec: https://tools.ietf.org/html/rfc3986#section-2 // May need options to be filesystem-independent, e.g., if filesystems ban certain chars. } + /** + * Asserts that all elements in each group are equal to each other but not equal to any other + * element in another group. + */ + private static void assertEqualityGroups(List> equalityGroups) { +for (int i = 0; i < equalityGroups.size(); ++i) { + List current = equalityGroups.get(i); + for (int j = 0; j < current.size(); ++j) { +for (int k = 0; k < current.size(); ++k) { + assertEquals( + "Value at " + j + " should equal value at " + k + " in equality group " + i, + current.get(j), + current.get(k)); +} + } + for (int j = 0; j < equalityGroups.size(); ++j) { Review comment: Same reasoning about the equals as above, we ensure that they are not equals both ways as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254003) Time Spent: 1h 10m (was: 1h) > Improve our relationship, or lack thereof, with Guava > - > > Key: BEAM-3606 > URL: https://issues.apache.org/jira/browse/BEAM-3606 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp, runner-core, sdk-java-core >Reporter: Kenneth Knowles >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > This is an umbrella task for things that might move us off Guava, such as > replicating our own little utilities or moving to Java 8 features. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2857) Create FileIO in Python
[ https://issues.apache.org/jira/browse/BEAM-2857?focusedWorklogId=254000&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-254000 ] ASF GitHub Bot logged work on BEAM-2857: Author: ASF GitHub Bot Created on: 04/Jun/19 20:07 Start Date: 04/Jun/19 20:07 Worklog Time Spent: 10m Work Description: pabloem commented on issue #8394: [BEAM-2857] Implementing WriteToFiles transform for fileio (Python) URL: https://github.com/apache/beam/pull/8394#issuecomment-498823034 Run Python_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 254000) Time Spent: 6h 40m (was: 6.5h) > Create FileIO in Python > --- > > Key: BEAM-2857 > URL: https://issues.apache.org/jira/browse/BEAM-2857 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Eugene Kirpichov >Assignee: Pablo Estrada >Priority: Major > Labels: gsoc, gsoc2019, mentor > Time Spent: 6h 40m > Remaining Estimate: 0h > > Beam Java has a FileIO with operations: match()/matchAll(), readMatches(), > which together cover the majority of needs for general-purpose file > ingestion. Beam Python should have something similar. > An early design document for this: https://s.apache.org/fileio-beam-python -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6683) Add an integration test suite for cross-language transforms for Flink runner
[ https://issues.apache.org/jira/browse/BEAM-6683?focusedWorklogId=253978&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-253978 ] ASF GitHub Bot logged work on BEAM-6683: Author: ASF GitHub Bot Created on: 04/Jun/19 19:27 Start Date: 04/Jun/19 19:27 Worklog Time Spent: 10m Work Description: mxm commented on issue #8174: [BEAM-6683] add createCrossLanguageValidatesRunner task URL: https://github.com/apache/beam/pull/8174#issuecomment-498809192 @ihji I can take a look if you want. Curious why this breaks because #8693 didn't break any other integration tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 253978) Time Spent: 13h 10m (was: 13h) > Add an integration test suite for cross-language transforms for Flink runner > > > Key: BEAM-6683 > URL: https://issues.apache.org/jira/browse/BEAM-6683 > Project: Beam > Issue Type: Test > Components: testing >Reporter: Chamikara Jayalath >Assignee: Heejong Lee >Priority: Major > Time Spent: 13h 10m > Remaining Estimate: 0h > > We should add an integration test suite that covers following. > (1) Currently available Java IO connectors that do not use UDFs work for > Python SDK on Flink runner. > (2) Currently available Python IO connectors that do not use UDFs work for > Java SDK on Flink runner. > (3) Currently available Java/Python pipelines work in a scalable manner for > cross-language pipelines (for example, try 10GB, 100GB input for > textio/avroio for Java and Python). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7489) High CPU and memory usage in the case of SqlTransform with complex triggers
[ https://issues.apache.org/jira/browse/BEAM-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang updated BEAM-7489: --- Description: http://mail-archives.apache.org/mod_mbox/beam-user/201906.mbox/%3ccapjhctfmimy067uzudqejf-+zud1dqb+fkl8wbfopcz79bm...@mail.gmail.com%3E Code that expose this JIRA: https://pastebin.com/nNtc9ZaG was:http://mail-archives.apache.org/mod_mbox/beam-user/201906.mbox/%3ccapjhctfmimy067uzudqejf-+zud1dqb+fkl8wbfopcz79bm...@mail.gmail.com%3E > High CPU and memory usage in the case of SqlTransform with complex triggers > --- > > Key: BEAM-7489 > URL: https://issues.apache.org/jira/browse/BEAM-7489 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Rui Wang >Priority: Major > > http://mail-archives.apache.org/mod_mbox/beam-user/201906.mbox/%3ccapjhctfmimy067uzudqejf-+zud1dqb+fkl8wbfopcz79bm...@mail.gmail.com%3E > Code that expose this JIRA: https://pastebin.com/nNtc9ZaG -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7489) High CPU and memory usage in the case of SqlTransform with complex triggers
[ https://issues.apache.org/jira/browse/BEAM-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856045#comment-16856045 ] Rui Wang commented on BEAM-7489: Note that this JIRA was found by a SqlTransform query with session windowing, which is less tested before. > High CPU and memory usage in the case of SqlTransform with complex triggers > --- > > Key: BEAM-7489 > URL: https://issues.apache.org/jira/browse/BEAM-7489 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Rui Wang >Priority: Major > > http://mail-archives.apache.org/mod_mbox/beam-user/201906.mbox/%3ccapjhctfmimy067uzudqejf-+zud1dqb+fkl8wbfopcz79bm...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7489) High CPU and memory usage in the case of SqlTransform with complex triggers
[ https://issues.apache.org/jira/browse/BEAM-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang updated BEAM-7489: --- Description: http://mail-archives.apache.org/mod_mbox/beam-user/201906.mbox/%3ccapjhctfmimy067uzudqejf-+zud1dqb+fkl8wbfopcz79bm...@mail.gmail.com%3E > High CPU and memory usage in the case of SqlTransform with complex triggers > --- > > Key: BEAM-7489 > URL: https://issues.apache.org/jira/browse/BEAM-7489 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Rui Wang >Priority: Major > > http://mail-archives.apache.org/mod_mbox/beam-user/201906.mbox/%3ccapjhctfmimy067uzudqejf-+zud1dqb+fkl8wbfopcz79bm...@mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7489) High CPU and memory usage in the case of SqlTransform with complex triggers
Rui Wang created BEAM-7489: -- Summary: High CPU and memory usage in the case of SqlTransform with complex triggers Key: BEAM-7489 URL: https://issues.apache.org/jira/browse/BEAM-7489 Project: Beam Issue Type: Bug Components: dsl-sql Reporter: Rui Wang -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7348) Option to expire SDK worker environments
[ https://issues.apache.org/jira/browse/BEAM-7348?focusedWorklogId=253963&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-253963 ] ASF GitHub Bot logged work on BEAM-7348: Author: ASF GitHub Bot Created on: 04/Jun/19 19:19 Start Date: 04/Jun/19 19:19 Worklog Time Spent: 10m Work Description: mxm commented on pull request #8714: [BEAM-7348] Support environment expiration URL: https://github.com/apache/beam/pull/8714 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 253963) Time Spent: 2h 40m (was: 2.5h) > Option to expire SDK worker environments > > > Key: BEAM-7348 > URL: https://issues.apache.org/jira/browse/BEAM-7348 > Project: Beam > Issue Type: Improvement > Components: runner-core >Affects Versions: 2.12.0 >Reporter: Thomas Weise >Assignee: Thomas Weise >Priority: Major > Labels: portability, portability-flink > Fix For: 2.14.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > We discovered that Python SDK workers are susceptible to memory leaks that > are quite hard to identify and/or fix. This becomes an issue in streaming > pipelines, where the workers run "forever". It would be good if the user has > an option to recycle the workers when there is no other practical way to > address (slow) resource leaks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-6407) regression: FileIO.writeDynamic() with side inputs fails in DirectRunner
[ https://issues.apache.org/jira/browse/BEAM-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856033#comment-16856033 ] Kenneth Knowles commented on BEAM-6407: --- Yea, makes sense. Did the rolled back thing get rolled forwards again? Unfortunately the commit messages in the revert do not specify the hash of the commit(s) nor PR number. I don't currently have the time/appetite for search PR by subject line. > regression: FileIO.writeDynamic() with side inputs fails in DirectRunner > > > Key: BEAM-6407 > URL: https://issues.apache.org/jira/browse/BEAM-6407 > Project: Beam > Issue Type: Bug > Components: runner-direct >Affects Versions: 2.9.0 >Reporter: Niel Markwick >Assignee: Niel Markwick >Priority: Blocker > Labels: regression > Fix For: 2.10.0 > > Attachments: beam-filewriter-demo.tgz > > Time Spent: 2.5h > Remaining Estimate: 0h > > When FileIO.writeDynamic is used with automatic sharding and a Contextful.Fn > that uses side inputs for the file naming, DirectRunner (and TestPipeline) > fail with: > {{java.lang.IllegalStateException: All PCollectionViews that are consumed > must be written by some WriteView PTransform: Missing [ > [RunnerPCollectionView]]}} > > Example code: > {code:java} > PCollectionView outputFileName = > pipeline.apply( > "outputDir", > Create.of("/tmp/testout")).apply(View.asSingleton()); > Contextful.Fn manifestNaming = > (element, c) -> > (window, pane, numShards, shardIndex, compression) -> > c.sideInput(outputFileName)+shardIndex; > pipeline.apply(FileIO.writeDynamic() > .by(SerializableFunctions.constant("")) > .withDestinationCoder(StringUtf8Coder.of()) > .via(TextIO.sink()) > .withTempDirectory("/tmp") > .withNaming(Contextful.of( > manifestNaming, > Requirements.requiresSideInputs(outputFileName; > {code} > > This does not occur in Dataflow-runner > It does not occur if the ContextFul.Fn is not given side inputs. > It does not occur if withNumShards(1) is set. > It did not occur in 2.8.0, and does in 2.9.0 and 2.10.0-SNAPSHOT (as of today) > > The cause appears to be due to the DirectRunner using TransformOverrides > re-writing FileIO sinks to use runner-determined-sharding > ( see [DirectRunner.java line > 226|https://github.com/apache/beam/blob/master/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectRunner.java#L226] > ) > but I do not know why this started occuring in 2.9.0... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (BEAM-6407) regression: FileIO.writeDynamic() with side inputs fails in DirectRunner
[ https://issues.apache.org/jira/browse/BEAM-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles reopened BEAM-6407: --- Did it roll forward? > regression: FileIO.writeDynamic() with side inputs fails in DirectRunner > > > Key: BEAM-6407 > URL: https://issues.apache.org/jira/browse/BEAM-6407 > Project: Beam > Issue Type: Bug > Components: runner-direct >Affects Versions: 2.9.0 >Reporter: Niel Markwick >Assignee: Niel Markwick >Priority: Blocker > Labels: regression > Fix For: 2.10.0 > > Attachments: beam-filewriter-demo.tgz > > Time Spent: 2.5h > Remaining Estimate: 0h > > When FileIO.writeDynamic is used with automatic sharding and a Contextful.Fn > that uses side inputs for the file naming, DirectRunner (and TestPipeline) > fail with: > {{java.lang.IllegalStateException: All PCollectionViews that are consumed > must be written by some WriteView PTransform: Missing [ > [RunnerPCollectionView]]}} > > Example code: > {code:java} > PCollectionView outputFileName = > pipeline.apply( > "outputDir", > Create.of("/tmp/testout")).apply(View.asSingleton()); > Contextful.Fn manifestNaming = > (element, c) -> > (window, pane, numShards, shardIndex, compression) -> > c.sideInput(outputFileName)+shardIndex; > pipeline.apply(FileIO.writeDynamic() > .by(SerializableFunctions.constant("")) > .withDestinationCoder(StringUtf8Coder.of()) > .via(TextIO.sink()) > .withTempDirectory("/tmp") > .withNaming(Contextful.of( > manifestNaming, > Requirements.requiresSideInputs(outputFileName; > {code} > > This does not occur in Dataflow-runner > It does not occur if the ContextFul.Fn is not given side inputs. > It does not occur if withNumShards(1) is set. > It did not occur in 2.8.0, and does in 2.9.0 and 2.10.0-SNAPSHOT (as of today) > > The cause appears to be due to the DirectRunner using TransformOverrides > re-writing FileIO sinks to use runner-determined-sharding > ( see [DirectRunner.java line > 226|https://github.com/apache/beam/blob/master/runners/direct-java/src/main/java/org/apache/beam/runners/direct/DirectRunner.java#L226] > ) > but I do not know why this started occuring in 2.9.0... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3606) Improve our relationship, or lack thereof, with Guava
[ https://issues.apache.org/jira/browse/BEAM-3606?focusedWorklogId=253956&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-253956 ] ASF GitHub Bot logged work on BEAM-3606: Author: ASF GitHub Bot Created on: 04/Jun/19 19:06 Start Date: 04/Jun/19 19:06 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #8759: [BEAM-3606] Remove guava-testlib from sdks/java/core main dependency set URL: https://github.com/apache/beam/pull/8759#discussion_r290450318 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/fs/ResourceIdTester.java ## @@ -77,31 +80,58 @@ private static void validateResolvingIds( allResourceIds.add(dir2); // ResourceIds in equality groups. -new EqualsTester() -.addEqualityGroup(file1) -.addEqualityGroup(file2, file2a) -.addEqualityGroup(dir1, dir1.getCurrentDirectory()) -.addEqualityGroup(dir2, dir2a, dir2.getCurrentDirectory()) -.addEqualityGroup(baseDirectory, file1.getCurrentDirectory(), file2.getCurrentDirectory()) -.testEquals(); +assertEqualityGroups( +Arrays.asList( +Arrays.asList(file1), +Arrays.asList(file2, file2a), +Arrays.asList(dir1, dir1.getCurrentDirectory()), +Arrays.asList(dir2, dir2a, dir2.getCurrentDirectory()), +Arrays.asList( +baseDirectory, file1.getCurrentDirectory(), file2.getCurrentDirectory(; // ResourceId toString() in equality groups. -new EqualsTester() -.addEqualityGroup(file1.toString()) -.addEqualityGroup(file2.toString(), file2a.toString()) -.addEqualityGroup(dir1.toString(), dir1.getCurrentDirectory().toString()) -.addEqualityGroup(dir2.toString(), dir2a.toString(), dir2.getCurrentDirectory().toString()) -.addEqualityGroup( -baseDirectory.toString(), -file1.getCurrentDirectory().toString(), -file2.getCurrentDirectory().toString()) -.testEquals(); +assertEqualityGroups( +Arrays.asList( +Arrays.asList(file1.toString()), +Arrays.asList(file2.toString(), file2a.toString()), +Arrays.asList(dir1.toString(), dir1.getCurrentDirectory().toString()), +Arrays.asList(dir2.toString(), dir2a.toString(), dir2.getCurrentDirectory().toString()), +Arrays.asList( +baseDirectory.toString(), +file1.getCurrentDirectory().toString(), +file2.getCurrentDirectory().toString(; // TODO: test resolving strings that need to be escaped. // Possible spec: https://tools.ietf.org/html/rfc3986#section-2 // May need options to be filesystem-independent, e.g., if filesystems ban certain chars. } + /** + * Asserts that all elements in each group are equal to each other but not equal to any other + * element in another group. + */ + private static void assertEqualityGroups(List> equalityGroups) { +for (int i = 0; i < equalityGroups.size(); ++i) { + List current = equalityGroups.get(i); + for (int j = 0; j < current.size(); ++j) { +for (int k = 0; k < current.size(); ++k) { + assertEquals( + "Value at " + j + " should equal value at " + k + " in equality group " + i, + current.get(j), + current.get(k)); +} + } + for (int j = 0; j < equalityGroups.size(); ++j) { Review comment: Here too, it seems like you could just do `int j = i + 1` and avoid duplicate comparisons (and self comparisons) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 253956) Time Spent: 50m (was: 40m) > Improve our relationship, or lack thereof, with Guava > - > > Key: BEAM-3606 > URL: https://issues.apache.org/jira/browse/BEAM-3606 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp, runner-core, sdk-java-core >Reporter: Kenneth Knowles >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > This is an umbrella task for things that might move us off Guava, such as > replicating our own little utilities or moving to Java 8 features. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3606) Improve our relationship, or lack thereof, with Guava
[ https://issues.apache.org/jira/browse/BEAM-3606?focusedWorklogId=253955&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-253955 ] ASF GitHub Bot logged work on BEAM-3606: Author: ASF GitHub Bot Created on: 04/Jun/19 19:06 Start Date: 04/Jun/19 19:06 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #8759: [BEAM-3606] Remove guava-testlib from sdks/java/core main dependency set URL: https://github.com/apache/beam/pull/8759#discussion_r290448829 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/fs/ResourceIdTester.java ## @@ -77,31 +80,58 @@ private static void validateResolvingIds( allResourceIds.add(dir2); // ResourceIds in equality groups. -new EqualsTester() -.addEqualityGroup(file1) -.addEqualityGroup(file2, file2a) -.addEqualityGroup(dir1, dir1.getCurrentDirectory()) -.addEqualityGroup(dir2, dir2a, dir2.getCurrentDirectory()) -.addEqualityGroup(baseDirectory, file1.getCurrentDirectory(), file2.getCurrentDirectory()) -.testEquals(); +assertEqualityGroups( +Arrays.asList( +Arrays.asList(file1), +Arrays.asList(file2, file2a), +Arrays.asList(dir1, dir1.getCurrentDirectory()), +Arrays.asList(dir2, dir2a, dir2.getCurrentDirectory()), +Arrays.asList( +baseDirectory, file1.getCurrentDirectory(), file2.getCurrentDirectory(; // ResourceId toString() in equality groups. -new EqualsTester() -.addEqualityGroup(file1.toString()) -.addEqualityGroup(file2.toString(), file2a.toString()) -.addEqualityGroup(dir1.toString(), dir1.getCurrentDirectory().toString()) -.addEqualityGroup(dir2.toString(), dir2a.toString(), dir2.getCurrentDirectory().toString()) -.addEqualityGroup( -baseDirectory.toString(), -file1.getCurrentDirectory().toString(), -file2.getCurrentDirectory().toString()) -.testEquals(); +assertEqualityGroups( +Arrays.asList( +Arrays.asList(file1.toString()), +Arrays.asList(file2.toString(), file2a.toString()), +Arrays.asList(dir1.toString(), dir1.getCurrentDirectory().toString()), +Arrays.asList(dir2.toString(), dir2a.toString(), dir2.getCurrentDirectory().toString()), +Arrays.asList( +baseDirectory.toString(), +file1.getCurrentDirectory().toString(), +file2.getCurrentDirectory().toString(; // TODO: test resolving strings that need to be escaped. // Possible spec: https://tools.ietf.org/html/rfc3986#section-2 // May need options to be filesystem-independent, e.g., if filesystems ban certain chars. } + /** + * Asserts that all elements in each group are equal to each other but not equal to any other + * element in another group. + */ + private static void assertEqualityGroups(List> equalityGroups) { +for (int i = 0; i < equalityGroups.size(); ++i) { + List current = equalityGroups.get(i); + for (int j = 0; j < current.size(); ++j) { +for (int k = 0; k < current.size(); ++k) { Review comment: Could you do `int k = j` instead here to avoid duplicate comparisons? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 253955) Time Spent: 40m (was: 0.5h) > Improve our relationship, or lack thereof, with Guava > - > > Key: BEAM-3606 > URL: https://issues.apache.org/jira/browse/BEAM-3606 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp, runner-core, sdk-java-core >Reporter: Kenneth Knowles >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > This is an umbrella task for things that might move us off Guava, such as > replicating our own little utilities or moving to Java 8 features. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7232) Python test_external_transforms fails on Spark runner
[ https://issues.apache.org/jira/browse/BEAM-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-7232: -- Description: test_external_transforms should theoretically work, but it instead throws a cryptic error: <_Rendezvous of RPC that terminated with: status = StatusCode.UNKNOWN details = "" debug_error_string = "\{"created":"@1557171218.145369401","description":"Error received from peer ipv4:127.0.0.1:48159","file":"src/core/lib/surface/call.cc","file_line":1041,"grpc_message":"","grpc_status":2}" > I could just be overlooking some obvious config issue. EDIT: the previous issue has been fixed. now it seems to fail because the read source is unbounded [https://github.com/apache/beam/blob/23714a335e8a6e0d91106a366e470a3a4820ae27/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ReadTranslation.java#L92] was: test_external_transforms should theoretically work, but it instead throws a cryptic error: <_Rendezvous of RPC that terminated with: status = StatusCode.UNKNOWN details = "" debug_error_string = "\{"created":"@1557171218.145369401","description":"Error received from peer ipv4:127.0.0.1:48159","file":"src/core/lib/surface/call.cc","file_line":1041,"grpc_message":"","grpc_status":2}" > I could just be overlooking some obvious config issue. EDIT: the previous issue has been fixed. now the issue seems to be because the read source is unbounded [https://github.com/apache/beam/blob/23714a335e8a6e0d91106a366e470a3a4820ae27/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ReadTranslation.java#L92] > Python test_external_transforms fails on Spark runner > - > > Key: BEAM-7232 > URL: https://issues.apache.org/jira/browse/BEAM-7232 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > > test_external_transforms should theoretically work, but it instead throws a > cryptic error: > <_Rendezvous of RPC that terminated with: > status = StatusCode.UNKNOWN > details = "" > debug_error_string = > "\{"created":"@1557171218.145369401","description":"Error received from peer > ipv4:127.0.0.1:48159","file":"src/core/lib/surface/call.cc","file_line":1041,"grpc_message":"","grpc_status":2}" > > > I could just be overlooking some obvious config issue. > > EDIT: the previous issue has been fixed. now it seems to fail because the > read source is unbounded > [https://github.com/apache/beam/blob/23714a335e8a6e0d91106a366e470a3a4820ae27/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/ReadTranslation.java#L92] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-7401) beam_PreCommit_CommunityMetrics_Cron consistently failing
[ https://issues.apache.org/jira/browse/BEAM-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Gryzykhin resolved BEAM-7401. - Resolution: Fixed Fix Version/s: Not applicable > beam_PreCommit_CommunityMetrics_Cron consistently failing > - > > Key: BEAM-7401 > URL: https://issues.apache.org/jira/browse/BEAM-7401 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Michael Luckey >Assignee: Mikhail Gryzykhin >Priority: Major > Fix For: Not applicable > > > Since a while, community metrics Jenkins job [1] is broken. > [1] > https://builds.apache.org/job/beam_PreCommit_CommunityMetrics_Cron/buildTimeTrend -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7401) beam_PreCommit_CommunityMetrics_Cron consistently failing
[ https://issues.apache.org/jira/browse/BEAM-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856003#comment-16856003 ] Mikhail Gryzykhin commented on BEAM-7401: - Missed this ticket. The issue was with docker credentials that was common across all our jenkins builds. Resolved by now. Relevant ticket: https://issues.apache.org/jira/browse/BEAM-7381 > beam_PreCommit_CommunityMetrics_Cron consistently failing > - > > Key: BEAM-7401 > URL: https://issues.apache.org/jira/browse/BEAM-7401 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Michael Luckey >Assignee: Mikhail Gryzykhin >Priority: Major > > Since a while, community metrics Jenkins job [1] is broken. > [1] > https://builds.apache.org/job/beam_PreCommit_CommunityMetrics_Cron/buildTimeTrend -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-6284) [FLAKE][beam_PostCommit_Java_ValidatesRunner_Dataflow] TestRunner fails with result UNKNOWN on succeeded job and checks passed
[ https://issues.apache.org/jira/browse/BEAM-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Gryzykhin resolved BEAM-6284. - Resolution: Fixed Fix Version/s: 2.13.0 > [FLAKE][beam_PostCommit_Java_ValidatesRunner_Dataflow] TestRunner fails with > result UNKNOWN on succeeded job and checks passed > -- > > Key: BEAM-6284 > URL: https://issues.apache.org/jira/browse/BEAM-6284 > Project: Beam > Issue Type: Bug > Components: test-failures, testing >Reporter: Mikhail Gryzykhin >Assignee: Mikhail Gryzykhin >Priority: Major > Labels: currently-failing > Fix For: 2.13.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > _Use this form to file an issue for test failure:_ > * > https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/testReport/junit/org.apache.beam.sdk.transforms/ViewTest/testWindowedSideInputFixedToGlobal/ > Initial investigation: > According to logs all test-relevant checks have passed and it seem to be > testing framework failure. > > _After you've filled out the above details, please [assign the issue to an > individual|https://beam.apache.org/contribute/postcommits-guides/index.html#find_specialist]. > Assignee should [treat test failures as > high-priority|https://beam.apache.org/contribute/postcommits-policies/#assigned-failing-test], > helping to fix the issue or find a more appropriate owner. See [Apache Beam > Post-Commit > Policies|https://beam.apache.org/contribute/postcommits-policies]._ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2857) Create FileIO in Python
[ https://issues.apache.org/jira/browse/BEAM-2857?focusedWorklogId=253934&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-253934 ] ASF GitHub Bot logged work on BEAM-2857: Author: ASF GitHub Bot Created on: 04/Jun/19 18:32 Start Date: 04/Jun/19 18:32 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #8394: [BEAM-2857] Implementing WriteToFiles transform for fileio (Python) URL: https://github.com/apache/beam/pull/8394#issuecomment-498790363 LGTM. Thanks. Please fix-up and self-merge after tests pass. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 253934) Time Spent: 6.5h (was: 6h 20m) > Create FileIO in Python > --- > > Key: BEAM-2857 > URL: https://issues.apache.org/jira/browse/BEAM-2857 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Eugene Kirpichov >Assignee: Pablo Estrada >Priority: Major > Labels: gsoc, gsoc2019, mentor > Time Spent: 6.5h > Remaining Estimate: 0h > > Beam Java has a FileIO with operations: match()/matchAll(), readMatches(), > which together cover the majority of needs for general-purpose file > ingestion. Beam Python should have something similar. > An early design document for this: https://s.apache.org/fileio-beam-python -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2857) Create FileIO in Python
[ https://issues.apache.org/jira/browse/BEAM-2857?focusedWorklogId=253913&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-253913 ] ASF GitHub Bot logged work on BEAM-2857: Author: ASF GitHub Bot Created on: 04/Jun/19 18:26 Start Date: 04/Jun/19 18:26 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #8394: [BEAM-2857] Implementing WriteToFiles transform for fileio (Python) URL: https://github.com/apache/beam/pull/8394#discussion_r290433622 ## File path: sdks/python/apache_beam/io/fileio.py ## @@ -170,3 +237,487 @@ def __init__(self, compression=None, skip_directories=True): def expand(self, pcoll): return pcoll | beam.ParDo(_ReadMatchesFn(self._compression, self._skip_directories)) + + +class FileSink(object): + """Specifies how to write elements to individual files in ``WriteToFiles``. + + **NOTE: THIS CLASS IS EXPERIMENTAL.** + + A Sink class must implement the following: + + - The ``open`` method, which initializes writing to a file handler (it is not + responsible for opening the file handler itself). + - The ``write`` method, which writes an element to the file that was passed + in ``open``. + - The ``flush`` method, which flushes any buffered state. This is most often + called before closing a file (but not exclusively called in that + situation). The sink is not responsible for closing the file handler. + """ + + def open(self, fh): +raise NotImplementedError + + def write(self, record): +raise NotImplementedError + + def flush(self): +raise NotImplementedError + + +@beam.typehints.with_input_types(str) +class TextSink(FileSink): + """A sink that encodes utf8 elements, and writes to file handlers. + + **NOTE: THIS CLASS IS EXPERIMENTAL.** + + This sink simply calls file_handler.write(record.encode('utf8') + '\n') on all + records that come into it. + """ + + def open(self, fh): +self._fh = fh + + def write(self, record): +self._fh.write(record.encode('utf8')) +self._fh.write(b'\n') + + def flush(self): +self._fh.flush() + + +def prefix_naming(prefix): + return default_file_naming(prefix) + + +_DEFAULT_FILE_NAME_TEMPLATE = ( +'{prefix}-{start}-{end}-{pane}-' +'{shard:05d}-{total_shards:05d}' +'{suffix}{compression}') + + +def destination_prefix_naming(): + + def _inner(window, pane, shard_index, total_shards, compression, destination): +kwargs = {'prefix': str(destination), + 'start': '', + 'end': '', + 'pane': '', + 'shard': 0, + 'total_shards': 0, + 'suffix': '', + 'compression': ''} +if total_shards is not None and shard_index is not None: + kwargs['shard'] = int(shard_index) + kwargs['total_shards'] = int(total_shards) + +if window != GlobalWindow(): + kwargs['start'] = window.start.to_utc_datetime().isoformat() + kwargs['end'] = window.end.to_utc_datetime().isoformat() + +# TODO(pabloem): Add support for PaneInfo +# If the PANE is the ONLY firing in the window, we don't add it. +#if pane and not (pane.is_first and pane.is_last): +# kwargs['pane'] = pane.index + +if compression: + kwargs['compression'] = '.%s' % compression + +return _DEFAULT_FILE_NAME_TEMPLATE.format(**kwargs) + + return _inner + + +def default_file_naming(prefix, suffix=None): + + def _inner(window, pane, shard_index, total_shards, compression, destination): +kwargs = {'prefix': prefix, + 'start': '', + 'end': '', + 'pane': '', + 'shard': 0, + 'total_shards': 0, + 'suffix': '', + 'compression': ''} +if total_shards is not None and shard_index is not None: + kwargs['shard'] = int(shard_index) + kwargs['total_shards'] = int(total_shards) + +if window != GlobalWindow(): + kwargs['start'] = window.start.to_utc_datetime().isoformat() + kwargs['end'] = window.end.to_utc_datetime().isoformat() + +# TODO(pabloem): Add support for PaneInfo Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 253913) Time Spent: 6h (was: 5h 50m) > Create FileIO in Python > --- > > Key: BEAM-2857 > URL: https://issues.apache.org/jira/browse/BEAM-2857 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >
[jira] [Work logged] (BEAM-2857) Create FileIO in Python
[ https://issues.apache.org/jira/browse/BEAM-2857?focusedWorklogId=253914&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-253914 ] ASF GitHub Bot logged work on BEAM-2857: Author: ASF GitHub Bot Created on: 04/Jun/19 18:26 Start Date: 04/Jun/19 18:26 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #8394: [BEAM-2857] Implementing WriteToFiles transform for fileio (Python) URL: https://github.com/apache/beam/pull/8394#discussion_r290433584 ## File path: sdks/python/apache_beam/io/fileio.py ## @@ -502,12 +502,13 @@ def expand(self, pcoll): file_results = (files_by_destination_pc | beam.ParDo( _MoveTempFilesIntoFinalDestinationFn( -self.path, self.file_naming_fn))) +self.path, self.file_naming_fn, +self._temp_directory))) return file_results -def _create_writer(base_path, file_name): +def _create_writer(base_path, writer_key): # TODO(pabloem) Is this a good place to create a temporary directory? Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 253914) Time Spent: 6h 10m (was: 6h) > Create FileIO in Python > --- > > Key: BEAM-2857 > URL: https://issues.apache.org/jira/browse/BEAM-2857 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Eugene Kirpichov >Assignee: Pablo Estrada >Priority: Major > Labels: gsoc, gsoc2019, mentor > Time Spent: 6h 10m > Remaining Estimate: 0h > > Beam Java has a FileIO with operations: match()/matchAll(), readMatches(), > which together cover the majority of needs for general-purpose file > ingestion. Beam Python should have something similar. > An early design document for this: https://s.apache.org/fileio-beam-python -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2857) Create FileIO in Python
[ https://issues.apache.org/jira/browse/BEAM-2857?focusedWorklogId=253915&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-253915 ] ASF GitHub Bot logged work on BEAM-2857: Author: ASF GitHub Bot Created on: 04/Jun/19 18:26 Start Date: 04/Jun/19 18:26 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #8394: [BEAM-2857] Implementing WriteToFiles transform for fileio (Python) URL: https://github.com/apache/beam/pull/8394#discussion_r290433653 ## File path: sdks/python/apache_beam/io/fileio.py ## @@ -23,17 +23,83 @@ metadata records, and produces a ``PCollection`` of ``ReadableFile`` objects. These transforms currently do not support splitting by themselves. +Writing to Files + + +The transforms in this file include ``WriteToFiles``, which allows you to write +a ``beam.PCollection`` to files, and gives you many options to customize how to +do this. + +File Naming +--- +One of the parameters received by ``WriteToFiles`` is a function specifying how +to name the files that are written. This is a function that takes in the Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 253915) Time Spent: 6h 20m (was: 6h 10m) > Create FileIO in Python > --- > > Key: BEAM-2857 > URL: https://issues.apache.org/jira/browse/BEAM-2857 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Eugene Kirpichov >Assignee: Pablo Estrada >Priority: Major > Labels: gsoc, gsoc2019, mentor > Time Spent: 6h 20m > Remaining Estimate: 0h > > Beam Java has a FileIO with operations: match()/matchAll(), readMatches(), > which together cover the majority of needs for general-purpose file > ingestion. Beam Python should have something similar. > An early design document for this: https://s.apache.org/fileio-beam-python -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-2857) Create FileIO in Python
[ https://issues.apache.org/jira/browse/BEAM-2857?focusedWorklogId=253912&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-253912 ] ASF GitHub Bot logged work on BEAM-2857: Author: ASF GitHub Bot Created on: 04/Jun/19 18:25 Start Date: 04/Jun/19 18:25 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #8394: [BEAM-2857] Implementing WriteToFiles transform for fileio (Python) URL: https://github.com/apache/beam/pull/8394#discussion_r290433597 ## File path: sdks/python/apache_beam/io/fileio.py ## @@ -562,16 +569,21 @@ def process(self, element): r.pane, destination) -logging.info('Cautiously removing temporary files for destination %s', - destination) -self._remove_temporary_files([r.file_name for r in file_results]) +logging.info('Cautiously removing temporary files for' + ' destination %s and window %s', destination, w) +writer_key = (destination, w) +self._remove_temporary_files(writer_key) - @staticmethod - def _remove_temporary_files(files): + def _remove_temporary_files(self, writer_key): try: - filesystems.FileSystems.delete(files) + prefix = filesystems.FileSystems.join( + self.temporary_directory.get(), str(abs(hash(writer_key + match_result = filesystems.FileSystems.match(['%s*' % prefix]) + orphaned_files = [m.path for m in match_result[0].metadata_list] + + logging.debug('Deleting orphaned files: %s', orphaned_files) + filesystems.FileSystems.delete(orphaned_files) Review comment: Yes. All we're doing is logging the failures (in debug level) to be aware of them. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 253912) Time Spent: 5h 50m (was: 5h 40m) > Create FileIO in Python > --- > > Key: BEAM-2857 > URL: https://issues.apache.org/jira/browse/BEAM-2857 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Eugene Kirpichov >Assignee: Pablo Estrada >Priority: Major > Labels: gsoc, gsoc2019, mentor > Time Spent: 5h 50m > Remaining Estimate: 0h > > Beam Java has a FileIO with operations: match()/matchAll(), readMatches(), > which together cover the majority of needs for general-purpose file > ingestion. Beam Python should have something similar. > An early design document for this: https://s.apache.org/fileio-beam-python -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-6683) Add an integration test suite for cross-language transforms for Flink runner
[ https://issues.apache.org/jira/browse/BEAM-6683?focusedWorklogId=253899&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-253899 ] ASF GitHub Bot logged work on BEAM-6683: Author: ASF GitHub Bot Created on: 04/Jun/19 18:11 Start Date: 04/Jun/19 18:11 Worklog Time Spent: 10m Work Description: ihji commented on issue #8174: [BEAM-6683] add createCrossLanguageValidatesRunner task URL: https://github.com/apache/beam/pull/8174#issuecomment-498782464 Found that XVR_Flink PostCommit is broken because of https://github.com/apache/beam/pull/8693 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 253899) Time Spent: 13h (was: 12h 50m) > Add an integration test suite for cross-language transforms for Flink runner > > > Key: BEAM-6683 > URL: https://issues.apache.org/jira/browse/BEAM-6683 > Project: Beam > Issue Type: Test > Components: testing >Reporter: Chamikara Jayalath >Assignee: Heejong Lee >Priority: Major > Time Spent: 13h > Remaining Estimate: 0h > > We should add an integration test suite that covers following. > (1) Currently available Java IO connectors that do not use UDFs work for > Python SDK on Flink runner. > (2) Currently available Python IO connectors that do not use UDFs work for > Java SDK on Flink runner. > (3) Currently available Java/Python pipelines work in a scalable manner for > cross-language pipelines (for example, try 10GB, 100GB input for > textio/avroio for Java and Python). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7351) Failure in Python streaming wordcount test: unexpected messages received on output topic.
[ https://issues.apache.org/jira/browse/BEAM-7351?focusedWorklogId=253893&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-253893 ] ASF GitHub Bot logged work on BEAM-7351: Author: ASF GitHub Bot Created on: 04/Jun/19 17:58 Start Date: 04/Jun/19 17:58 Worklog Time Spent: 10m Work Description: pabloem commented on issue #8752: [BEAM-7351] increment ack deadline for flaky pubsub test URL: https://github.com/apache/beam/pull/8752#issuecomment-498777859 I'd feel more comfortable having a passing run. I'll merge after tests pass. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 253893) Time Spent: 1h (was: 50m) > Failure in Python streaming wordcount test: unexpected messages received on > output topic. > - > > Key: BEAM-7351 > URL: https://issues.apache.org/jira/browse/BEAM-7351 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > Saw this in a PostCommit test today. Likely a flake, but it's a strange > failure mode and we may need to investigate this. > {noformat} > 13:32:02 > == > 13:32:02 FAIL: test_streaming_wordcount_it > (apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT) > 13:32:02 > -- > 13:32:02 Traceback (most recent call last): > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py", > line 110, in test_streaming_wordcount_it > 13:32:02 self.test_pipeline.get_full_options_as_args(**extra_opts)) > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/streaming_wordcount.py", > line 101, in run > 13:32:02 result = p.run() > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/pipeline.py", > line 419, in run > 13:32:02 return self.runner.run_pipeline(self, self._options) > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py", > line 68, in run_pipeline > 13:32:02 hc_assert_that(self.result, pickler.loads(on_success_matcher)) > 13:32:02 AssertionError: > 13:32:02 Expected: (Test pipeline expected terminated in state: RUNNING and > Expected 500 messages.) > 13:32:02 but: Expected 500 messages. Got 508 messages. Diffs (item, > count): > 13:32:02 Expected but not in actual: [] > 13:32:02 Unexpected: [(u'476: 1', 1), (u'416: 1', 1), (u'245: 1', 1), > (u'478: 1', 1), (u'58: 1', 1), (u'364: 1', 1), (u'77: 1', 1), (u'283: 1', 1)] > 13:32:02 > 13:32:02 >> begin captured logging << > > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://169.254.169.254 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/project/project-id > 13:32:02 google.auth.transport.requests: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true > 13:32:02 urllib3.connectionpool: DEBUG: Starting new HTTP connection (1): > metadata.google.internal:80 > 13:32:02 urllib3.connectionpool: DEBUG: http://metadata.google.internal:80 > "GET /computeMetadata/v1/instance/service-accounts/default/?recursive=true > HTTP/1.1" 200 144 > 13:32:02 google.auth.transport.requests: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/844138762903-comp...@developer.gserviceaccount.com/token > 13:32:02 urllib3.connectionpool: DEBUG: http://metadata.google.internal:80 > "GET > /computeMetadata/v1/instance/service-accounts/844138762903-comp...@developer.gserviceaccount.com/token > HTTP/1.1" 200 176 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://169.254.169.254 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/project
[jira] [Work logged] (BEAM-7351) Failure in Python streaming wordcount test: unexpected messages received on output topic.
[ https://issues.apache.org/jira/browse/BEAM-7351?focusedWorklogId=253894&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-253894 ] ASF GitHub Bot logged work on BEAM-7351: Author: ASF GitHub Bot Created on: 04/Jun/19 17:58 Start Date: 04/Jun/19 17:58 Worklog Time Spent: 10m Work Description: pabloem commented on issue #8752: [BEAM-7351] increment ack deadline for flaky pubsub test URL: https://github.com/apache/beam/pull/8752#issuecomment-498777897 Run Python PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 253894) Time Spent: 1h 10m (was: 1h) > Failure in Python streaming wordcount test: unexpected messages received on > output topic. > - > > Key: BEAM-7351 > URL: https://issues.apache.org/jira/browse/BEAM-7351 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > Saw this in a PostCommit test today. Likely a flake, but it's a strange > failure mode and we may need to investigate this. > {noformat} > 13:32:02 > == > 13:32:02 FAIL: test_streaming_wordcount_it > (apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT) > 13:32:02 > -- > 13:32:02 Traceback (most recent call last): > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py", > line 110, in test_streaming_wordcount_it > 13:32:02 self.test_pipeline.get_full_options_as_args(**extra_opts)) > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/streaming_wordcount.py", > line 101, in run > 13:32:02 result = p.run() > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/pipeline.py", > line 419, in run > 13:32:02 return self.runner.run_pipeline(self, self._options) > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py", > line 68, in run_pipeline > 13:32:02 hc_assert_that(self.result, pickler.loads(on_success_matcher)) > 13:32:02 AssertionError: > 13:32:02 Expected: (Test pipeline expected terminated in state: RUNNING and > Expected 500 messages.) > 13:32:02 but: Expected 500 messages. Got 508 messages. Diffs (item, > count): > 13:32:02 Expected but not in actual: [] > 13:32:02 Unexpected: [(u'476: 1', 1), (u'416: 1', 1), (u'245: 1', 1), > (u'478: 1', 1), (u'58: 1', 1), (u'364: 1', 1), (u'77: 1', 1), (u'283: 1', 1)] > 13:32:02 > 13:32:02 >> begin captured logging << > > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://169.254.169.254 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/project/project-id > 13:32:02 google.auth.transport.requests: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true > 13:32:02 urllib3.connectionpool: DEBUG: Starting new HTTP connection (1): > metadata.google.internal:80 > 13:32:02 urllib3.connectionpool: DEBUG: http://metadata.google.internal:80 > "GET /computeMetadata/v1/instance/service-accounts/default/?recursive=true > HTTP/1.1" 200 144 > 13:32:02 google.auth.transport.requests: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/844138762903-comp...@developer.gserviceaccount.com/token > 13:32:02 urllib3.connectionpool: DEBUG: http://metadata.google.internal:80 > "GET > /computeMetadata/v1/instance/service-accounts/844138762903-comp...@developer.gserviceaccount.com/token > HTTP/1.1" 200 176 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://169.254.169.254 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/project/project-id > 13:32:02 google.auth.transport.req
[jira] [Work logged] (BEAM-7351) Failure in Python streaming wordcount test: unexpected messages received on output topic.
[ https://issues.apache.org/jira/browse/BEAM-7351?focusedWorklogId=253874&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-253874 ] ASF GitHub Bot logged work on BEAM-7351: Author: ASF GitHub Bot Created on: 04/Jun/19 17:40 Start Date: 04/Jun/19 17:40 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #8752: [BEAM-7351] increment ack deadline for flaky pubsub test URL: https://github.com/apache/beam/pull/8752#issuecomment-498771499 Thanks, @Juta ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 253874) Time Spent: 50m (was: 40m) > Failure in Python streaming wordcount test: unexpected messages received on > output topic. > - > > Key: BEAM-7351 > URL: https://issues.apache.org/jira/browse/BEAM-7351 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Saw this in a PostCommit test today. Likely a flake, but it's a strange > failure mode and we may need to investigate this. > {noformat} > 13:32:02 > == > 13:32:02 FAIL: test_streaming_wordcount_it > (apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT) > 13:32:02 > -- > 13:32:02 Traceback (most recent call last): > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py", > line 110, in test_streaming_wordcount_it > 13:32:02 self.test_pipeline.get_full_options_as_args(**extra_opts)) > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/streaming_wordcount.py", > line 101, in run > 13:32:02 result = p.run() > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/pipeline.py", > line 419, in run > 13:32:02 return self.runner.run_pipeline(self, self._options) > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py", > line 68, in run_pipeline > 13:32:02 hc_assert_that(self.result, pickler.loads(on_success_matcher)) > 13:32:02 AssertionError: > 13:32:02 Expected: (Test pipeline expected terminated in state: RUNNING and > Expected 500 messages.) > 13:32:02 but: Expected 500 messages. Got 508 messages. Diffs (item, > count): > 13:32:02 Expected but not in actual: [] > 13:32:02 Unexpected: [(u'476: 1', 1), (u'416: 1', 1), (u'245: 1', 1), > (u'478: 1', 1), (u'58: 1', 1), (u'364: 1', 1), (u'77: 1', 1), (u'283: 1', 1)] > 13:32:02 > 13:32:02 >> begin captured logging << > > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://169.254.169.254 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/project/project-id > 13:32:02 google.auth.transport.requests: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true > 13:32:02 urllib3.connectionpool: DEBUG: Starting new HTTP connection (1): > metadata.google.internal:80 > 13:32:02 urllib3.connectionpool: DEBUG: http://metadata.google.internal:80 > "GET /computeMetadata/v1/instance/service-accounts/default/?recursive=true > HTTP/1.1" 200 144 > 13:32:02 google.auth.transport.requests: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/844138762903-comp...@developer.gserviceaccount.com/token > 13:32:02 urllib3.connectionpool: DEBUG: http://metadata.google.internal:80 > "GET > /computeMetadata/v1/instance/service-accounts/844138762903-comp...@developer.gserviceaccount.com/token > HTTP/1.1" 200 176 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://169.254.169.254 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/project/project-id > 13:32:02 google.auth.transport.requests: DE
[jira] [Work logged] (BEAM-7351) Failure in Python streaming wordcount test: unexpected messages received on output topic.
[ https://issues.apache.org/jira/browse/BEAM-7351?focusedWorklogId=253873&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-253873 ] ASF GitHub Bot logged work on BEAM-7351: Author: ASF GitHub Bot Created on: 04/Jun/19 17:39 Start Date: 04/Jun/19 17:39 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #8752: [BEAM-7351] increment ack deadline for flaky pubsub test URL: https://github.com/apache/beam/pull/8752#issuecomment-498771302 I checked that streaming wordcount tests passed in all Py2 and Py3 test suites. ``` 03:41:46 test_streaming_wordcount_it (apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT) ... ok ``` We have other PRs in flight to address flakiness in other tests, so I think we can go ahead with the merge. @pabloem, could you please help with the merge? Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 253873) Time Spent: 40m (was: 0.5h) > Failure in Python streaming wordcount test: unexpected messages received on > output topic. > - > > Key: BEAM-7351 > URL: https://issues.apache.org/jira/browse/BEAM-7351 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > Saw this in a PostCommit test today. Likely a flake, but it's a strange > failure mode and we may need to investigate this. > {noformat} > 13:32:02 > == > 13:32:02 FAIL: test_streaming_wordcount_it > (apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT) > 13:32:02 > -- > 13:32:02 Traceback (most recent call last): > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py", > line 110, in test_streaming_wordcount_it > 13:32:02 self.test_pipeline.get_full_options_as_args(**extra_opts)) > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/streaming_wordcount.py", > line 101, in run > 13:32:02 result = p.run() > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/pipeline.py", > line 419, in run > 13:32:02 return self.runner.run_pipeline(self, self._options) > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py", > line 68, in run_pipeline > 13:32:02 hc_assert_that(self.result, pickler.loads(on_success_matcher)) > 13:32:02 AssertionError: > 13:32:02 Expected: (Test pipeline expected terminated in state: RUNNING and > Expected 500 messages.) > 13:32:02 but: Expected 500 messages. Got 508 messages. Diffs (item, > count): > 13:32:02 Expected but not in actual: [] > 13:32:02 Unexpected: [(u'476: 1', 1), (u'416: 1', 1), (u'245: 1', 1), > (u'478: 1', 1), (u'58: 1', 1), (u'364: 1', 1), (u'77: 1', 1), (u'283: 1', 1)] > 13:32:02 > 13:32:02 >> begin captured logging << > > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://169.254.169.254 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/project/project-id > 13:32:02 google.auth.transport.requests: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true > 13:32:02 urllib3.connectionpool: DEBUG: Starting new HTTP connection (1): > metadata.google.internal:80 > 13:32:02 urllib3.connectionpool: DEBUG: http://metadata.google.internal:80 > "GET /computeMetadata/v1/instance/service-accounts/default/?recursive=true > HTTP/1.1" 200 144 > 13:32:02 google.auth.transport.requests: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/844138762903-comp...@developer.gserviceaccount.com/token > 13:32:02 urllib3.connectionpool: DEBUG: http://metadata.google.internal:80 > "GET > /computeMetadata/v1/instance/service-accounts/
[jira] [Commented] (BEAM-7351) Failure in Python streaming wordcount test: unexpected messages received on output topic.
[ https://issues.apache.org/jira/browse/BEAM-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855937#comment-16855937 ] Valentyn Tymofieiev commented on BEAM-7351: --- Thanks, [~Juta]. This explanation makes sense to me, and we do randomize the topic with a random uuid suffix, so a topic collision is unlikely. > Failure in Python streaming wordcount test: unexpected messages received on > output topic. > - > > Key: BEAM-7351 > URL: https://issues.apache.org/jira/browse/BEAM-7351 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Valentyn Tymofieiev >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Saw this in a PostCommit test today. Likely a flake, but it's a strange > failure mode and we may need to investigate this. > {noformat} > 13:32:02 > == > 13:32:02 FAIL: test_streaming_wordcount_it > (apache_beam.examples.streaming_wordcount_it_test.StreamingWordCountIT) > 13:32:02 > -- > 13:32:02 Traceback (most recent call last): > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/streaming_wordcount_it_test.py", > line 110, in test_streaming_wordcount_it > 13:32:02 self.test_pipeline.get_full_options_as_args(**extra_opts)) > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/examples/streaming_wordcount.py", > line 101, in run > 13:32:02 result = p.run() > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/pipeline.py", > line 419, in run > 13:32:02 return self.runner.run_pipeline(self, self._options) > 13:32:02 File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify_PR/src/sdks/python/apache_beam/runners/dataflow/test_dataflow_runner.py", > line 68, in run_pipeline > 13:32:02 hc_assert_that(self.result, pickler.loads(on_success_matcher)) > 13:32:02 AssertionError: > 13:32:02 Expected: (Test pipeline expected terminated in state: RUNNING and > Expected 500 messages.) > 13:32:02 but: Expected 500 messages. Got 508 messages. Diffs (item, > count): > 13:32:02 Expected but not in actual: [] > 13:32:02 Unexpected: [(u'476: 1', 1), (u'416: 1', 1), (u'245: 1', 1), > (u'478: 1', 1), (u'58: 1', 1), (u'364: 1', 1), (u'77: 1', 1), (u'283: 1', 1)] > 13:32:02 > 13:32:02 >> begin captured logging << > > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://169.254.169.254 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/project/project-id > 13:32:02 google.auth.transport.requests: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true > 13:32:02 urllib3.connectionpool: DEBUG: Starting new HTTP connection (1): > metadata.google.internal:80 > 13:32:02 urllib3.connectionpool: DEBUG: http://metadata.google.internal:80 > "GET /computeMetadata/v1/instance/service-accounts/default/?recursive=true > HTTP/1.1" 200 144 > 13:32:02 google.auth.transport.requests: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/844138762903-comp...@developer.gserviceaccount.com/token > 13:32:02 urllib3.connectionpool: DEBUG: http://metadata.google.internal:80 > "GET > /computeMetadata/v1/instance/service-accounts/844138762903-comp...@developer.gserviceaccount.com/token > HTTP/1.1" 200 176 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://169.254.169.254 > 13:32:02 google.auth.transport._http_client: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/project/project-id > 13:32:02 google.auth.transport.requests: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true > 13:32:02 urllib3.connectionpool: DEBUG: Starting new HTTP connection (1): > metadata.google.internal:80 > 13:32:02 urllib3.connectionpool: DEBUG: http://metadata.google.internal:80 > "GET /computeMetadata/v1/instance/service-accounts/default/?recursive=true > HTTP/1.1" 200 144 > 13:32:02 google.auth.transport.requests: DEBUG: Making request: GET > http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/844138762903-comp...@developer.gserviceaccount.com/token > 13:32:02
[jira] [Work logged] (BEAM-3606) Improve our relationship, or lack thereof, with Guava
[ https://issues.apache.org/jira/browse/BEAM-3606?focusedWorklogId=253861&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-253861 ] ASF GitHub Bot logged work on BEAM-3606: Author: ASF GitHub Bot Created on: 04/Jun/19 17:31 Start Date: 04/Jun/19 17:31 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #8759: [BEAM-3606] Remove guava-testlib from sdks/java/core main dependency set URL: https://github.com/apache/beam/pull/8759#issuecomment-498768132 R: @youngoli @Ardagan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 253861) Time Spent: 0.5h (was: 20m) > Improve our relationship, or lack thereof, with Guava > - > > Key: BEAM-3606 > URL: https://issues.apache.org/jira/browse/BEAM-3606 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp, runner-core, sdk-java-core >Reporter: Kenneth Knowles >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > This is an umbrella task for things that might move us off Guava, such as > replicating our own little utilities or moving to Java 8 features. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3606) Improve our relationship, or lack thereof, with Guava
[ https://issues.apache.org/jira/browse/BEAM-3606?focusedWorklogId=253860&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-253860 ] ASF GitHub Bot logged work on BEAM-3606: Author: ASF GitHub Bot Created on: 04/Jun/19 17:31 Start Date: 04/Jun/19 17:31 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #8759: [BEAM-3606] Remove guava-testlib from sdks/java/core main dependency set URL: https://github.com/apache/beam/pull/8759#issuecomment-498768132 R: @youngoli This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 253860) Time Spent: 20m (was: 10m) > Improve our relationship, or lack thereof, with Guava > - > > Key: BEAM-3606 > URL: https://issues.apache.org/jira/browse/BEAM-3606 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp, runner-core, sdk-java-core >Reporter: Kenneth Knowles >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > This is an umbrella task for things that might move us off Guava, such as > replicating our own little utilities or moving to Java 8 features. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-3606) Improve our relationship, or lack thereof, with Guava
[ https://issues.apache.org/jira/browse/BEAM-3606?focusedWorklogId=253858&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-253858 ] ASF GitHub Bot logged work on BEAM-3606: Author: ASF GitHub Bot Created on: 04/Jun/19 17:27 Start Date: 04/Jun/19 17:27 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #8759: [BEAM-3606] Remove guava-testlib from sdks/java/core main dependency set URL: https://github.com/apache/beam/pull/8759 This replaces the usage of an EqualsTester with our own implementation. The eventual goal is to stop shading guava as a default and then eventually to stop shading modules by default. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.or
[jira] [Updated] (BEAM-7259) Add monthly aggregated view for "Stability critical jobs status - Greenness per week"
[ https://issues.apache.org/jira/browse/BEAM-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira updated BEAM-7259: -- Status: Open (was: Triage Needed) > Add monthly aggregated view for "Stability critical jobs status - Greenness > per week" > - > > Key: BEAM-7259 > URL: https://issues.apache.org/jira/browse/BEAM-7259 > Project: Beam > Issue Type: New Feature > Components: project-management, testing >Reporter: Daniel Oliveira >Assignee: Mikhail Gryzykhin >Priority: Minor > > Hey Mikhail, could you create a view of the Greenness metric in Grafana > ([Stability Critical Jobs > Status|http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1]) > that's aggregated per month? Specifically, as an average greenness for an > entire calendar month (ex. April 1 - April 30). > I need to track this metric and seeing the weekly greenness makes it hard to > average over a month, especially because different weeks may have different > numbers of runs, and they don't line up perfectly over a calendar month. I > already tried changing the timespan in the upper right, but when I set it to > a month range the graph's points are still one point per week instead of a > monthly average. > Also just to be clear, this shouldn't replace the existing graph, just be a > new one placed wherever makes the most sense. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-7259) Add monthly aggregated view for "Stability critical jobs status - Greenness per week"
[ https://issues.apache.org/jira/browse/BEAM-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Oliveira resolved BEAM-7259. --- Resolution: Implemented Fix Version/s: Not applicable > Add monthly aggregated view for "Stability critical jobs status - Greenness > per week" > - > > Key: BEAM-7259 > URL: https://issues.apache.org/jira/browse/BEAM-7259 > Project: Beam > Issue Type: New Feature > Components: project-management, testing >Reporter: Daniel Oliveira >Assignee: Mikhail Gryzykhin >Priority: Minor > Fix For: Not applicable > > > Hey Mikhail, could you create a view of the Greenness metric in Grafana > ([Stability Critical Jobs > Status|http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1]) > that's aggregated per month? Specifically, as an average greenness for an > entire calendar month (ex. April 1 - April 30). > I need to track this metric and seeing the weekly greenness makes it hard to > average over a month, especially because different weeks may have different > numbers of runs, and they don't line up perfectly over a calendar month. I > already tried changing the timespan in the upper right, but when I set it to > a month range the graph's points are still one point per week instead of a > monthly average. > Also just to be clear, this shouldn't replace the existing graph, just be a > new one placed wherever makes the most sense. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-7259) Add monthly aggregated view for "Stability critical jobs status - Greenness per week"
[ https://issues.apache.org/jira/browse/BEAM-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855920#comment-16855920 ] Daniel Oliveira commented on BEAM-7259: --- This was added: http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1 https://github.com/apache/beam/pull/8630 Forgot to update this bug. I'll mark it closed. > Add monthly aggregated view for "Stability critical jobs status - Greenness > per week" > - > > Key: BEAM-7259 > URL: https://issues.apache.org/jira/browse/BEAM-7259 > Project: Beam > Issue Type: New Feature > Components: project-management, testing >Reporter: Daniel Oliveira >Assignee: Mikhail Gryzykhin >Priority: Minor > > Hey Mikhail, could you create a view of the Greenness metric in Grafana > ([Stability Critical Jobs > Status|http://104.154.241.245/d/McTAiu0ik/stability-critical-jobs-status?orgId=1]) > that's aggregated per month? Specifically, as an average greenness for an > entire calendar month (ex. April 1 - April 30). > I need to track this metric and seeing the weekly greenness makes it hard to > average over a month, especially because different weeks may have different > numbers of runs, and they don't line up perfectly over a calendar month. I > already tried changing the timespan in the upper right, but when I set it to > a month range the graph's points are still one point per week instead of a > monthly average. > Also just to be clear, this shouldn't replace the existing graph, just be a > new one placed wherever makes the most sense. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7488) Grafana request: Add monthly aggregated view for "Precommit Test Latency - Time In Queue"
Daniel Oliveira created BEAM-7488: - Summary: Grafana request: Add monthly aggregated view for "Precommit Test Latency - Time In Queue" Key: BEAM-7488 URL: https://issues.apache.org/jira/browse/BEAM-7488 Project: Beam Issue Type: New Feature Components: project-management, testing Reporter: Daniel Oliveira Assignee: Mikhail Gryzykhin Similar to [BEAM-7259|https://issues.apache.org/jira/browse/BEAM-7259], was hoping to get this metric in a view that's aggregated per month. In this case averaging all the queue times seems like it would work well. Even better if you could average only the 95th percentile. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7357) Kinesis IO.write throws LimitExceededException
[ https://issues.apache.org/jira/browse/BEAM-7357?focusedWorklogId=253855&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-253855 ] ASF GitHub Bot logged work on BEAM-7357: Author: ASF GitHub Bot Created on: 04/Jun/19 17:16 Start Date: 04/Jun/19 17:16 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on pull request #8758: [BEAM-7357] Remove stream existence check URL: https://github.com/apache/beam/pull/8758 AWS Kinesis has a limit of 10 transactions of `DescribeStream` (which is used to check a stream) call per second per account. It will fail in case if we run a pipeline with many (>10) workers. So, we move the responsibility of this checking to AWS Kinesis. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompleted
[jira] [Work logged] (BEAM-7141) Expose kv and window parameters for on_timer
[ https://issues.apache.org/jira/browse/BEAM-7141?focusedWorklogId=253837&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-253837 ] ASF GitHub Bot logged work on BEAM-7141: Author: ASF GitHub Bot Created on: 04/Jun/19 16:27 Start Date: 04/Jun/19 16:27 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #8739: BEAM-7141: add key value timer callback URL: https://github.com/apache/beam/pull/8739#discussion_r290385191 ## File path: sdks/python/apache_beam/runners/common.py ## @@ -183,6 +184,8 @@ def __init__(self, obj_to_invoke, method_name): self.timestamp_arg_name = kw elif v == core.DoFn.WindowParam: self.window_arg_name = kw + elif v == core.DoFn.KeyParam: Review comment: You have a valid a point. There may not always be a key that can be passed to process() method. Although sometimes there might be. Now that we have a KeyParam, a user might right a process method using that e.g. (process(mykey=KeyParam). What would be the reasonable thing to do in this case: 1. We can do nothing and mykey will literally have the value KeyParam. 2. We can fail with an error saying that KeyParam is not a valid parameter for process method. 3. We can try to pass the key (e.g. k, v = element) and set mykey = k. And if that fails (i.e. element is not a K,V) we can fail. I was leaning towards the third option. I think second option would also be fine and the first option will be confusing. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 253837) Time Spent: 5h (was: 4h 50m) > Expose kv and window parameters for on_timer > > > Key: BEAM-7141 > URL: https://issues.apache.org/jira/browse/BEAM-7141 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Affects Versions: 2.12.0 >Reporter: Thomas Weise >Assignee: Rakesh Kumar >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > We would like to have access to key and window inside the timer callback. > Without, it is also difficult to debug. We run into this while working on > BEAM-7112 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7427) JmsCheckpointMark AVRO Serialisation issue with unbounded Source
[ https://issues.apache.org/jira/browse/BEAM-7427?focusedWorklogId=253835&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-253835 ] ASF GitHub Bot logged work on BEAM-7427: Author: ASF GitHub Bot Created on: 04/Jun/19 16:20 Start Date: 04/Jun/19 16:20 Worklog Time Spent: 10m Work Description: zouabimourad commented on pull request #8757: [BEAM-7427] Fix JmsCheckpointMark Avro Encoding URL: https://github.com/apache/beam/pull/8757 `JmsCheckpointMark` can't be serialized with AvroCoder beause of the inner class `State` Avro schema generation of `JmsCheckpointMark` fails the externalisation of the inner class fixes the issue. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | --- Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/) [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/bea
[jira] [Resolved] (BEAM-6695) Latest transform for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-6695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanay Tummalapalli resolved BEAM-6695. -- Resolution: Fixed PR merged. > Latest transform for Python SDK > --- > > Key: BEAM-6695 > URL: https://issues.apache.org/jira/browse/BEAM-6695 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Tanay Tummalapalli >Priority: Minor > Fix For: 2.14.0 > > Time Spent: 10h 50m > Remaining Estimate: 0h > > Add a PTransform} and Combine.CombineFn for computing the latest element in a > PCollection. > It should offer the same API as its Java counterpart: > https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Latest.java -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6695) Latest transform for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-6695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanay Tummalapalli updated BEAM-6695: - Fix Version/s: 2.14.0 > Latest transform for Python SDK > --- > > Key: BEAM-6695 > URL: https://issues.apache.org/jira/browse/BEAM-6695 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Tanay Tummalapalli >Priority: Minor > Fix For: 2.14.0 > > Time Spent: 10h 50m > Remaining Estimate: 0h > > Add a PTransform} and Combine.CombineFn for computing the latest element in a > PCollection. > It should offer the same API as its Java counterpart: > https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Latest.java -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7437) Integration Test for BQ streaming inserts for streaming pipelines
[ https://issues.apache.org/jira/browse/BEAM-7437?focusedWorklogId=253757&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-253757 ] ASF GitHub Bot logged work on BEAM-7437: Author: ASF GitHub Bot Created on: 04/Jun/19 14:09 Start Date: 04/Jun/19 14:09 Worklog Time Spent: 10m Work Description: ttanay commented on issue #8748: [BEAM-7437] BQ integration test for streaming inserts in streaming URL: https://github.com/apache/beam/pull/8748#issuecomment-498688272 R: @pabloem This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 253757) Time Spent: 50m (was: 40m) > Integration Test for BQ streaming inserts for streaming pipelines > - > > Key: BEAM-7437 > URL: https://issues.apache.org/jira/browse/BEAM-7437 > Project: Beam > Issue Type: Test > Components: io-python-gcp >Affects Versions: 2.12.0 >Reporter: Tanay Tummalapalli >Assignee: Tanay Tummalapalli >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > Integration Test for BigQuery Sink using Streaming Inserts for streaming > pipelines. > Integration tests currently exist for batch pipelines, it can also be added > for streaming pipelines using TestStream. This will be a precursor to the > failing integration test to be added for [BEAM-6611| > https://issues.apache.org/jira/browse/BEAM-6611]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (BEAM-7437) Integration Test for BQ streaming inserts for streaming pipelines
[ https://issues.apache.org/jira/browse/BEAM-7437?focusedWorklogId=253755&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-253755 ] ASF GitHub Bot logged work on BEAM-7437: Author: ASF GitHub Bot Created on: 04/Jun/19 14:09 Start Date: 04/Jun/19 14:09 Worklog Time Spent: 10m Work Description: ttanay commented on issue #8748: [BEAM-7437] BQ integration test for streaming inserts in streaming URL: https://github.com/apache/beam/pull/8748#issuecomment-498688165 The test - `test_multiple_destinations_transform_streaming` in `BigQueryStreamingInsertTransformIntegrationTests` passes on the TestDirectRunner and skips on TestDataflowRunner as expected. The failure is not because of the test added in this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 253755) Time Spent: 40m (was: 0.5h) > Integration Test for BQ streaming inserts for streaming pipelines > - > > Key: BEAM-7437 > URL: https://issues.apache.org/jira/browse/BEAM-7437 > Project: Beam > Issue Type: Test > Components: io-python-gcp >Affects Versions: 2.12.0 >Reporter: Tanay Tummalapalli >Assignee: Tanay Tummalapalli >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > Integration Test for BigQuery Sink using Streaming Inserts for streaming > pipelines. > Integration tests currently exist for batch pipelines, it can also be added > for streaming pipelines using TestStream. This will be a precursor to the > failing integration test to be added for [BEAM-6611| > https://issues.apache.org/jira/browse/BEAM-6611]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)