[jira] [Work logged] (BEAM-8996) Auto-generate pipeline options documentation for FlinkRunner
[ https://issues.apache.org/jira/browse/BEAM-8996?focusedWorklogId=362081&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-362081 ] ASF GitHub Bot logged work on BEAM-8996: Author: ASF GitHub Bot Created on: 21/Dec/19 06:23 Start Date: 21/Dec/19 06:23 Worklog Time Spent: 10m Work Description: lgajowy commented on issue #10434: [BEAM-8996] Improvements to the Flink runner page URL: https://github.com/apache/beam/pull/10434#issuecomment-568156779 Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 362081) Time Spent: 3h 50m (was: 3h 40m) > Auto-generate pipeline options documentation for FlinkRunner > > > Key: BEAM-8996 > URL: https://issues.apache.org/jira/browse/BEAM-8996 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.19.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > The documentation on the pipeline options on the [runner > page|https://beam.apache.org/documentation/runners/flink/] easily becomes > outdated. In order for them to stay up to date, we should auto-generate the > documentation from the {{FlinkPipelineOptions}} class. This should be done > for both Java and Python. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8996) Auto-generate pipeline options documentation for FlinkRunner
[ https://issues.apache.org/jira/browse/BEAM-8996?focusedWorklogId=362080&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-362080 ] ASF GitHub Bot logged work on BEAM-8996: Author: ASF GitHub Bot Created on: 21/Dec/19 06:22 Start Date: 21/Dec/19 06:22 Worklog Time Spent: 10m Work Description: lgajowy commented on pull request #10434: [BEAM-8996] Improvements to the Flink runner page URL: https://github.com/apache/beam/pull/10434 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 362080) Time Spent: 3h 40m (was: 3.5h) > Auto-generate pipeline options documentation for FlinkRunner > > > Key: BEAM-8996 > URL: https://issues.apache.org/jira/browse/BEAM-8996 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.19.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > The documentation on the pipeline options on the [runner > page|https://beam.apache.org/documentation/runners/flink/] easily becomes > outdated. In order for them to stay up to date, we should auto-generate the > documentation from the {{FlinkPipelineOptions}} class. This should be done > for both Java and Python. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8676) Beam Dependency Update Request: com.google.api:gax-grpc
[ https://issues.apache.org/jira/browse/BEAM-8676?focusedWorklogId=362019&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-362019 ] ASF GitHub Bot logged work on BEAM-8676: Author: ASF GitHub Bot Created on: 21/Dec/19 03:58 Start Date: 21/Dec/19 03:58 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10438: [BEAM-8676] sdks/java: gax and grpc upgrades URL: https://github.com/apache/beam/pull/10438#issuecomment-568149841 R: @kennknowles This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 362019) Time Spent: 3h 10m (was: 3h) > Beam Dependency Update Request: com.google.api:gax-grpc > --- > > Key: BEAM-8676 > URL: https://issues.apache.org/jira/browse/BEAM-8676 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Tomo Suzuki >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > - 2019-11-15 19:38:32.410774 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.50.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-11-19 21:03:23.809273 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.50.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-02 12:08:16.165687 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.50.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-09 12:07:17.894174 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.51.0 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9005) Go SDK post-commit failures due to https://github.com/apache/beam/pull/10183
[ https://issues.apache.org/jira/browse/BEAM-9005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17001616#comment-17001616 ] Chamikara Madhusanka Jayalath commented on BEAM-9005: - Regarding Flink and Spark VR failures, this seems to be due to environment ID not being set for some of the ParDo transforms in the generated runner API proto. I set the environment ID here: [https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L272] But I think there are other locations where Go SDK generates ParDo transforms that does not go through this location during translation. Due to this Spark/Flink fails since some ParDos do not have environment set. [~lostluck] and [~danoliveira] any idea ? Is there any location where Go SDK generates ParDos. I suspect COGBK but not sure. > Go SDK post-commit failures due to https://github.com/apache/beam/pull/10183 > - > > Key: BEAM-9005 > URL: https://issues.apache.org/jira/browse/BEAM-9005 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Critical > Time Spent: 1.5h > Remaining Estimate: 0h > > Looking into this. > > cc: [~bhulette] [~lostluck] [~danoliveira] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8951) Stop using nose in load tests
[ https://issues.apache.org/jira/browse/BEAM-8951?focusedWorklogId=362017&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-362017 ] ASF GitHub Bot logged work on BEAM-8951: Author: ASF GitHub Bot Created on: 21/Dec/19 03:13 Start Date: 21/Dec/19 03:13 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10435: [BEAM-8951] Stop using nose in load tests URL: https://github.com/apache/beam/pull/10435#issuecomment-568147273 Didn't have time to take a look today and I am planning to be out next week. @udim have converted several suites to pytest recently and may have some feedback here. With nose, I think we had to configure output collectors via xml files, see: https://github.com/apache/beam/blob/754b64b4a59f717d84032570acb8ed4cad87b227/sdks/python/scripts/run_integration_test.sh#L248 , I have not yet learned how change output collection with pytest. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 362017) Time Spent: 1h 40m (was: 1.5h) > Stop using nose in load tests > - > > Key: BEAM-8951 > URL: https://issues.apache.org/jira/browse/BEAM-8951 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Fix For: Not applicable > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The community is considering moving away from nose to pytest: > https://issues.apache.org/jira/browse/BEAM-3713. We should change the way of > running Python load tests: instead of being subclasses of > `unittest.TestCase`, they could be plain Python scripts, just like wordcount > examples. This will bring one additional benefit: _LOAD_TEST_ENABLED_ guard > will be no longer needed and could be safely removed -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9010) BigQuery TableRow's size is toString().length() ?
[ https://issues.apache.org/jira/browse/BEAM-9010?focusedWorklogId=362015&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-362015 ] ASF GitHub Bot logged work on BEAM-9010: Author: ASF GitHub Bot Created on: 21/Dec/19 03:00 Start Date: 21/Dec/19 03:00 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10444: [BEAM-9010] Proper TableRow size calculation via TableRowJsonCoder URL: https://github.com/apache/beam/pull/10444#issuecomment-568146219 R: @reuvenlax This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 362015) Time Spent: 1h 50m (was: 1h 40m) > BigQuery TableRow's size is toString().length() ? > - > > Key: BEAM-9010 > URL: https://issues.apache.org/jira/browse/BEAM-9010 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Minor > Attachments: TableRowJsonCoder_behavior_remains_same.png > > Time Spent: 1h 50m > Remaining Estimate: 0h > > The following tests failed when I tried to upgrade google-http-client 1.34.0 > from 1.28.0: > {noformat} > org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithoutStreamingBuffer > org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithStreamingBuffer > org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtilTest.testInsertAll > {noformat} > [https://builds.apache.org/job/beam_PreCommit_Java_Commit/9288/#showFailuresLink] > h3. Reason of the test failures > [org.apache.beam.sdk.io.gcp.testing.TableContainer|https://github.com/apache/beam/blob/6fa94c9/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/TableContainer.java#L43] > and > [org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl|https://github.com/apache/beam/blob/c2f0d28/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L758] > rely on {{TableRow.toString().length()}} to calculate the size. Example: > {code:java} > dataSize += row.toString().length(); > if (dataSize >= maxRowBatchSize > || rows.size() >= maxRowsPerBatch > || i == rowsToPublish.size() - 1) { > {code} > However, with [google-http-client's > PR#589|https://github.com/googleapis/google-http-java-client/pull/589/files#diff-914cd7ff18143b3d2398149e1cfb4f45R218], > the GenericData.toString output has changed since v1.29.0. > In old google-http-client 1.28.0, an example row's toString returned: > {noformat} > {f=[{v=foo}, {v=1234}]} > {noformat} > In new google-http-client 1.29.0 and higher, the same row's toString returns: > {noformat} > GenericData{classInfo=[f], {f=[GenericData{classInfo=[v], {v=foo}}, > GenericData{classInfo=[v], {v=1234}}]}} > {noformat} > h1. Question: > Is this right thing to rely on {{toString().length()}} in the BigQuery > classes? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8974) apache_beam.runners.worker.log_handler_test.FnApiLogRecordHandlerTest.test_exc_info is flaky
[ https://issues.apache.org/jira/browse/BEAM-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17001606#comment-17001606 ] Valentyn Tymofieiev commented on BEAM-8974: --- Thanks, everyone. We can reopen if this comes up again. > apache_beam.runners.worker.log_handler_test.FnApiLogRecordHandlerTest.test_exc_info > is flaky > > > Key: BEAM-8974 > URL: https://issues.apache.org/jira/browse/BEAM-8974 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: Valentyn Tymofieiev >Assignee: Robert Bradshaw >Priority: Major > Fix For: 2.18.0 > > Time Spent: 50m > Remaining Estimate: 0h > > The test is failing at apache_beam/runners/worker/log_handler_test.py:110: > IndexError > Added in https://github.com/apache/beam/pull/10292 > Sample job: [https://builds.apache.org/job/beam_PreCommit_Python_Cron/2160/] > Console logs: > {noformat} > 06:37:37 === FAILURES > === > 06:37:37 ___ FnApiLogRecordHandlerTest.test_exc_info > > 06:37:37 [gw1] linux2 -- Python 2.7.12 > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/target/.tox-py27-gcp-pytest/py27-gcp-pytest/bin/python > 06:37:37 > 06:37:37 self = > testMethod=test_exc_info> > 06:37:37 > 06:37:37 def test_exc_info(self): > 06:37:37 try: > 06:37:37 raise ValueError('some message') > 06:37:37 except ValueError: > 06:37:37 _LOGGER.error('some error', exc_info=True) > 06:37:37 > 06:37:37 self.fn_log_handler.close() > 06:37:37 > 06:37:37 > log_entry = > self.test_logging_service.log_records_received[0].log_entries[0] > 06:37:37 E IndexError: list index out of range > 06:37:37 > 06:37:37 apache_beam/runners/worker/log_handler_test.py:110: IndexError > 06:37:37 - Captured stderr call > - > 06:37:37 ERROR:apache_beam.runners.worker.log_handler_test:some error > 06:37:37 Traceback (most recent call last): > 06:37:37 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/log_handler_test.py", > line 104, in test_exc_info > 06:37:37 raise ValueError('some message') > 06:37:37 ValueError: some message > 06:37:37 -- Captured log call > --- > 06:37:37 ERROR > apache_beam.runners.worker.log_handler_test:log_handler_test.py:106 some error > 06:37:37 Traceback (most recent call last): > 06:37:37 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/log_handler_test.py", > line 104, in test_exc_info > 06:37:37 raise ValueError('some message') > 06:37:37 ValueError: some message > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8974) apache_beam.runners.worker.log_handler_test.FnApiLogRecordHandlerTest.test_exc_info is flaky
[ https://issues.apache.org/jira/browse/BEAM-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev resolved BEAM-8974. --- Resolution: Fixed > apache_beam.runners.worker.log_handler_test.FnApiLogRecordHandlerTest.test_exc_info > is flaky > > > Key: BEAM-8974 > URL: https://issues.apache.org/jira/browse/BEAM-8974 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: Valentyn Tymofieiev >Assignee: Robert Bradshaw >Priority: Major > Fix For: 2.18.0 > > Time Spent: 50m > Remaining Estimate: 0h > > The test is failing at apache_beam/runners/worker/log_handler_test.py:110: > IndexError > Added in https://github.com/apache/beam/pull/10292 > Sample job: [https://builds.apache.org/job/beam_PreCommit_Python_Cron/2160/] > Console logs: > {noformat} > 06:37:37 === FAILURES > === > 06:37:37 ___ FnApiLogRecordHandlerTest.test_exc_info > > 06:37:37 [gw1] linux2 -- Python 2.7.12 > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/target/.tox-py27-gcp-pytest/py27-gcp-pytest/bin/python > 06:37:37 > 06:37:37 self = > testMethod=test_exc_info> > 06:37:37 > 06:37:37 def test_exc_info(self): > 06:37:37 try: > 06:37:37 raise ValueError('some message') > 06:37:37 except ValueError: > 06:37:37 _LOGGER.error('some error', exc_info=True) > 06:37:37 > 06:37:37 self.fn_log_handler.close() > 06:37:37 > 06:37:37 > log_entry = > self.test_logging_service.log_records_received[0].log_entries[0] > 06:37:37 E IndexError: list index out of range > 06:37:37 > 06:37:37 apache_beam/runners/worker/log_handler_test.py:110: IndexError > 06:37:37 - Captured stderr call > - > 06:37:37 ERROR:apache_beam.runners.worker.log_handler_test:some error > 06:37:37 Traceback (most recent call last): > 06:37:37 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/log_handler_test.py", > line 104, in test_exc_info > 06:37:37 raise ValueError('some message') > 06:37:37 ValueError: some message > 06:37:37 -- Captured log call > --- > 06:37:37 ERROR > apache_beam.runners.worker.log_handler_test:log_handler_test.py:106 some error > 06:37:37 Traceback (most recent call last): > 06:37:37 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/log_handler_test.py", > line 104, in test_exc_info > 06:37:37 raise ValueError('some message') > 06:37:37 ValueError: some message > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8671) Migrate Python version to 3.7
[ https://issues.apache.org/jira/browse/BEAM-8671?focusedWorklogId=362013&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-362013 ] ASF GitHub Bot logged work on BEAM-8671: Author: ASF GitHub Bot Created on: 21/Dec/19 02:52 Start Date: 21/Dec/19 02:52 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #10125: [BEAM-8671] Added ParDo test running on Python 3.7 URL: https://github.com/apache/beam/pull/10125 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 362013) Time Spent: 11h 50m (was: 11h 40m) > Migrate Python version to 3.7 > - > > Key: BEAM-8671 > URL: https://issues.apache.org/jira/browse/BEAM-8671 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 11h 50m > Remaining Estimate: 0h > > Currently, load tests run on Python 2.7. We should migrate to 3.7 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7949) Add time-based cache threshold support in the data service of the Python SDK harness
[ https://issues.apache.org/jira/browse/BEAM-7949?focusedWorklogId=362011&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-362011 ] ASF GitHub Bot logged work on BEAM-7949: Author: ASF GitHub Bot Created on: 21/Dec/19 01:58 Start Date: 21/Dec/19 01:58 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on issue #10246: [BEAM-7949] Add time-based cache threshold support in the data service of the Python SDK harness URL: https://github.com/apache/beam/pull/10246#issuecomment-568142289 Thanks for your great comments, I have update the PR accordingly. ;) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 362011) Time Spent: 3h 20m (was: 3h 10m) > Add time-based cache threshold support in the data service of the Python SDK > harness > > > Key: BEAM-7949 > URL: https://issues.apache.org/jira/browse/BEAM-7949 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-harness >Reporter: sunjincheng >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > Currently only size-based cache threshold is supported in the data service of > Python SDK harness. It should also support the time-based cache threshold. > This is very important, especially for streaming jobs which are sensitive to > the delay. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9016) Select PTransform result order is not predictable
Yang Zhang created BEAM-9016: Summary: Select PTransform result order is not predictable Key: BEAM-9016 URL: https://issues.apache.org/jira/browse/BEAM-9016 Project: Beam Issue Type: Bug Components: beam-community Reporter: Yang Zhang Assignee: Aizhamal Nurmamat kyzy pipeline.apply(Select.fieldNames("x", "y")) pipeline.apply(Select.fieldNames("a", "b")) The returned output order is not predictable. In the above two examples, field `x` may return first, while field `a` (also queries in the first place) may return in the second place. Shall we add `withOrderByFieldInsertionOrder` to fieldAccessDescriptor in Select PTransform, so that the return order is predictable? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8988) apache_beam.io.gcp.bigquery_read_it_test failing with: NotImplementedError: BigQuery source must be split before being read
[ https://issues.apache.org/jira/browse/BEAM-8988?focusedWorklogId=362008&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-362008 ] ASF GitHub Bot logged work on BEAM-8988: Author: ASF GitHub Bot Created on: 21/Dec/19 01:44 Start Date: 21/Dec/19 01:44 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #10412: [BEAM-8988] RangeTracker for _CustomBigQuerySource URL: https://github.com/apache/beam/pull/10412#issuecomment-568141169 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 362008) Time Spent: 2h (was: 1h 50m) > apache_beam.io.gcp.bigquery_read_it_test failing with: NotImplementedError: > BigQuery source must be split before being read > --- > > Key: BEAM-8988 > URL: https://issues.apache.org/jira/browse/BEAM-8988 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Valentyn Tymofieiev >Assignee: Kamil Wasilewski >Priority: Critical > Time Spent: 2h > Remaining Estimate: 0h > > Sample failure: https://builds.apache.org/job/beam_PostCommit_Python37_PR/58/ > Triggered by https://github.com/apache/beam/pull/9772. > Stacktrace: > {noformat} > Pipeline > BeamApp-jenkins-1217231928-2108ede4_7476773b-6b06-4536-a0d5-c5fafb6c0935 > failed in state FAILED: java.lang.RuntimeException: Error received from SDK > harness for instruction 96: Traceback (most recent call last): > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/common.py", > line 879, in process > return self.do_fn_invoker.invoke_process(windowed_value) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/common.py", > line 669, in invoke_process > windowed_value, additional_args, additional_kwargs, output_processor) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/common.py", > line 747, in _invoke_process_per_window > windowed_value, self.process_method(*args_for_process)) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/common.py", > line 998, in process_outputs > for result in results: > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/worker/bundle_processor.py", > line 1256, in process > yield element, self.restriction_provider.initial_restriction(element) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/iobase.py", > line 1518, in initial_restriction > range_tracker = self._source.get_range_tracker(None, None) > File > "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/io/gcp/bigquery.py", > line 652, in get_range_tracker > raise NotImplementedError('BigQuery source must be split before being > read') > NotImplementedError: BigQuery source must be split before being read > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7274) Protobuf Beam Schema support
[ https://issues.apache.org/jira/browse/BEAM-7274?focusedWorklogId=362009&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-362009 ] ASF GitHub Bot logged work on BEAM-7274: Author: ASF GitHub Bot Created on: 21/Dec/19 01:44 Start Date: 21/Dec/19 01:44 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #10356: [BEAM-7274] Infer a Beam Schema from a protocol buffer class. URL: https://github.com/apache/beam/pull/10356#issuecomment-568141194 @alexvanboxel let me know if you have more thoughts here or if this looks good. One more comment - once your options work is in, we should switch my use of field metadata over to the structured options approach. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 362009) Time Spent: 17h 50m (was: 17h 40m) > Protobuf Beam Schema support > > > Key: BEAM-7274 > URL: https://issues.apache.org/jira/browse/BEAM-7274 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Alex Van Boxel >Assignee: Alex Van Boxel >Priority: Minor > Time Spent: 17h 50m > Remaining Estimate: 0h > > Add support for the new Beam Schema to the Protobuf extension. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7951) Allow runner to configure customization WindowedValue coder such as ValueOnlyWindowedValueCoder
[ https://issues.apache.org/jira/browse/BEAM-7951?focusedWorklogId=362010&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-362010 ] ASF GitHub Bot logged work on BEAM-7951: Author: ASF GitHub Bot Created on: 21/Dec/19 01:45 Start Date: 21/Dec/19 01:45 Worklog Time Spent: 10m Work Description: sunjincheng121 commented on issue #9979: [BEAM-7951] Allow runner to configure customization WindowedValue coder. URL: https://github.com/apache/beam/pull/9979#issuecomment-568141245 Rebase code and squash the commits. :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 362010) Time Spent: 7h 40m (was: 7.5h) > Allow runner to configure customization WindowedValue coder such as > ValueOnlyWindowedValueCoder > --- > > Key: BEAM-7951 > URL: https://issues.apache.org/jira/browse/BEAM-7951 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Time Spent: 7h 40m > Remaining Estimate: 0h > > The coder of WindowedValue cannot be configured and it’s always > FullWindowedValueCoder. We don't need to serialize the timestamp, window and > pane properties in Flink and so it will be better to make the coder > configurable (i.e. allowing to use ValueOnlyWindowedValueCoder) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7274) Protobuf Beam Schema support
[ https://issues.apache.org/jira/browse/BEAM-7274?focusedWorklogId=362007&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-362007 ] ASF GitHub Bot logged work on BEAM-7274: Author: ASF GitHub Bot Created on: 21/Dec/19 01:42 Start Date: 21/Dec/19 01:42 Worklog Time Spent: 10m Work Description: reuvenlax commented on pull request #10356: [BEAM-7274] Infer a Beam Schema from a protocol buffer class. URL: https://github.com/apache/beam/pull/10356#discussion_r360621201 ## File path: sdks/java/extensions/protobuf/src/test/resources/README.md ## @@ -0,0 +1,34 @@ + + +This recreates the proto descriptor set included in this resource directory. + +```bash +export PROTO_INCLUDE= +``` +Execute the following command to create the pb files, in the beam root folder: + +```bash +protoc \ + -Isdks/java/extensions/protobuf/src/test/resources/ \ + -I$PROTO_INCLUDE \ + --descriptor_set_out=sdks/java/extensions/protobuf/src/test/resources/org/apache/beam/sdk/extensions/protobuf/test_option_v1.pb \ + --include_imports \ + sdks/java/extensions/protobuf/src/test/resources/test/option/v1/simple.proto Review comment: Removed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 362007) Time Spent: 17h 40m (was: 17.5h) > Protobuf Beam Schema support > > > Key: BEAM-7274 > URL: https://issues.apache.org/jira/browse/BEAM-7274 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Alex Van Boxel >Assignee: Alex Van Boxel >Priority: Minor > Time Spent: 17h 40m > Remaining Estimate: 0h > > Add support for the new Beam Schema to the Protobuf extension. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7274) Protobuf Beam Schema support
[ https://issues.apache.org/jira/browse/BEAM-7274?focusedWorklogId=362005&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-362005 ] ASF GitHub Bot logged work on BEAM-7274: Author: ASF GitHub Bot Created on: 21/Dec/19 01:41 Start Date: 21/Dec/19 01:41 Worklog Time Spent: 10m Work Description: reuvenlax commented on issue #10356: [BEAM-7274] Infer a Beam Schema from a protocol buffer class. URL: https://github.com/apache/beam/pull/10356#issuecomment-568140963 You are correct, that this requires other language to implement this parsing as well. However I think the visibility advantage of having a fully-represented proto (v.s. just embedding a bytes field in a proto) is worth that tax - and it shouldn't be a huge tax on Beam SDKs (it only took me about 30-40 minutes to write the code here) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 362005) Time Spent: 17.5h (was: 17h 20m) > Protobuf Beam Schema support > > > Key: BEAM-7274 > URL: https://issues.apache.org/jira/browse/BEAM-7274 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Alex Van Boxel >Assignee: Alex Van Boxel >Priority: Minor > Time Spent: 17.5h > Remaining Estimate: 0h > > Add support for the new Beam Schema to the Protobuf extension. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9000) Java Test Assertions without toString for GenericJson subclasses
[ https://issues.apache.org/jira/browse/BEAM-9000?focusedWorklogId=361995&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361995 ] ASF GitHub Bot logged work on BEAM-9000: Author: ASF GitHub Bot Created on: 21/Dec/19 01:11 Start Date: 21/Dec/19 01:11 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10441: [BEAM-9000] Java Test Assertions without toString for GenericJson subclasses URL: https://github.com/apache/beam/pull/10441#issuecomment-568138304 R: @lukecwik This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361995) Time Spent: 1h 40m (was: 1.5h) > Java Test Assertions without toString for GenericJson subclasses > > > Key: BEAM-9000 > URL: https://issues.apache.org/jira/browse/BEAM-9000 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Minor > Time Spent: 1h 40m > Remaining Estimate: 0h > > As of now, there are many tests that assert on {{toString()}} of objects. > {code:java} > CounterUpdate result = testObject.transform(monitoringInfo); > assertEquals( > "{cumulative=true, integer={highBits=0, lowBits=0}, " > + "nameAndKind={kind=SUM, " > + "name=transformedValue-ElementCount}}", > result.toString()); > {code} > This style is prone to unnecessary maintenance of the test code when > upgrading dependencies. Dependencies may change the internal ordering of > fields and trivial change in {{toString()}}. In BEAM-8695, where I tried to > upgrade google-http-client, there are ~30 comparison failure due to this > {{toString}} assertions. > They are subclasses of {{com.google.api.client.json.GenericJson}}. > Several options to enhance these assertions. > h1. Option 1: Assertion using Map > Leveraging the fact that GenericJson is a subclass of AbstractMap Object>, the assertion can be written as > {code:java} > ImmutableMap expected = ImmutableMap.of("cumulative", > true, > "integer", ImmutableMap.of("highBits", 0, "lowBits", 0), > "nameAndKind", ImmutableMap.of("kind", "SUM", "name", > "transformedValue-ElementCount")); > assertEquals(expected, (Map)result); > {code} > Credit: Ben Whitehead. > h1. Option 2: Create assertEqualsOnJson > Leveraging the fact that instance of GenericJson can be instantiated through > JSON, the assertion can be written as > {code:java} > assertEqualsOnJson( > "{\"cumulative\":true, \"integer\":{\"highBits\":0, \"lowBits\":0}, " > + "\"nameAndKind\":{\"kind\":\"SUM\", " > + "\"name\":\"transformedValue-ElementCount\"}}", > result); > {code} > > {{assertEqualsOnJson}} is implemented as below. The following field and > methods should go to shared test utility class (sdks/testing?) > {code:java} > private static final JacksonFactory jacksonFactory = > JacksonFactory.getDefaultInstance(); > public static void assertEqualsOnJson(String > expectedJsonText, T actual) { > CounterUpdate expected = parse(expectedJsonText, CounterUpdate.class); > assertEquals(expected, actual); > } > public static T parse(String text, Class clazz) { > try { > JsonParser parser = jacksonFactory.createJsonParser(text); > return parser.parse(clazz); > } catch (IOException ex) { > throw new IllegalArgumentException("Could not parse the text as " + > clazz, ex); > } > } > {code} > A feature request to handle escaping double quotes via JacksonFactory: > [https://github.com/googleapis/google-http-java-client/issues/923] > > h1. Option3: Check JSON equality via JSONassert > * https://github.com/skyscreamer/JSONassert > * https://github.com/hertzsprung/hamcrest-json (Not using as last commit was > in 2012) > The JSONassert example does not carry quoted double quote characters. The > implementation would be converting actual object into JSON object and calling > {{JSONAssert.assertEqual}}. > Credit: Luke Cwik > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite
[ https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=361993&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361993 ] ASF GitHub Bot logged work on BEAM-7961: Author: ASF GitHub Bot Created on: 21/Dec/19 01:10 Start Date: 21/Dec/19 01:10 Worklog Time Spent: 10m Work Description: ihji commented on issue #10051: [BEAM-7961] Add tests for all runner native transforms for XLang URL: https://github.com/apache/beam/pull/10051#issuecomment-568138206 Run XVR_Flink PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361993) Time Spent: 10h 40m (was: 10.5h) > Add tests for all runner native transforms and some widely used composite > transforms to cross-language validates runner test suite > -- > > Key: BEAM-7961 > URL: https://issues.apache.org/jira/browse/BEAM-7961 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 10h 40m > Remaining Estimate: 0h > > Add tests for all runner native transforms and some widely used composite > transforms to cross-language validates runner test suite -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7961) Add tests for all runner native transforms and some widely used composite transforms to cross-language validates runner test suite
[ https://issues.apache.org/jira/browse/BEAM-7961?focusedWorklogId=361994&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361994 ] ASF GitHub Bot logged work on BEAM-7961: Author: ASF GitHub Bot Created on: 21/Dec/19 01:10 Start Date: 21/Dec/19 01:10 Worklog Time Spent: 10m Work Description: ihji commented on issue #10051: [BEAM-7961] Add tests for all runner native transforms for XLang URL: https://github.com/apache/beam/pull/10051#issuecomment-567742923 Run XVR_Flink PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361994) Time Spent: 10h 50m (was: 10h 40m) > Add tests for all runner native transforms and some widely used composite > transforms to cross-language validates runner test suite > -- > > Key: BEAM-7961 > URL: https://issues.apache.org/jira/browse/BEAM-7961 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 10h 50m > Remaining Estimate: 0h > > Add tests for all runner native transforms and some widely used composite > transforms to cross-language validates runner test suite -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8974) apache_beam.runners.worker.log_handler_test.FnApiLogRecordHandlerTest.test_exc_info is flaky
[ https://issues.apache.org/jira/browse/BEAM-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17001590#comment-17001590 ] Robert Bradshaw commented on BEAM-8974: --- https://github.com/apache/beam/pull/10389 has been merged. This is mostly a testing issue--on a loaded machine the log writing thread might not start up before the test tries to close it. (It was a race with the pre-existing test as well but that generally did "enough work" to make the failure rarer.) The only way this could affect a real worker is if it was brought up and shut down very quickly (as in quicker than opening up the grpc channel to get work). I don't think it's worth the overhead of a cherry-pick. > apache_beam.runners.worker.log_handler_test.FnApiLogRecordHandlerTest.test_exc_info > is flaky > > > Key: BEAM-8974 > URL: https://issues.apache.org/jira/browse/BEAM-8974 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: Valentyn Tymofieiev >Assignee: Robert Bradshaw >Priority: Major > Fix For: 2.18.0 > > Time Spent: 50m > Remaining Estimate: 0h > > The test is failing at apache_beam/runners/worker/log_handler_test.py:110: > IndexError > Added in https://github.com/apache/beam/pull/10292 > Sample job: [https://builds.apache.org/job/beam_PreCommit_Python_Cron/2160/] > Console logs: > {noformat} > 06:37:37 === FAILURES > === > 06:37:37 ___ FnApiLogRecordHandlerTest.test_exc_info > > 06:37:37 [gw1] linux2 -- Python 2.7.12 > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/target/.tox-py27-gcp-pytest/py27-gcp-pytest/bin/python > 06:37:37 > 06:37:37 self = > testMethod=test_exc_info> > 06:37:37 > 06:37:37 def test_exc_info(self): > 06:37:37 try: > 06:37:37 raise ValueError('some message') > 06:37:37 except ValueError: > 06:37:37 _LOGGER.error('some error', exc_info=True) > 06:37:37 > 06:37:37 self.fn_log_handler.close() > 06:37:37 > 06:37:37 > log_entry = > self.test_logging_service.log_records_received[0].log_entries[0] > 06:37:37 E IndexError: list index out of range > 06:37:37 > 06:37:37 apache_beam/runners/worker/log_handler_test.py:110: IndexError > 06:37:37 - Captured stderr call > - > 06:37:37 ERROR:apache_beam.runners.worker.log_handler_test:some error > 06:37:37 Traceback (most recent call last): > 06:37:37 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/log_handler_test.py", > line 104, in test_exc_info > 06:37:37 raise ValueError('some message') > 06:37:37 ValueError: some message > 06:37:37 -- Captured log call > --- > 06:37:37 ERROR > apache_beam.runners.worker.log_handler_test:log_handler_test.py:106 some error > 06:37:37 Traceback (most recent call last): > 06:37:37 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/log_handler_test.py", > line 104, in test_exc_info > 06:37:37 raise ValueError('some message') > 06:37:37 ValueError: some message > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2572) Implement an S3 filesystem for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-2572?focusedWorklogId=361991&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361991 ] ASF GitHub Bot logged work on BEAM-2572: Author: ASF GitHub Bot Created on: 21/Dec/19 01:01 Start Date: 21/Dec/19 01:01 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9955: [BEAM-2572] Python SDK S3 Filesystem URL: https://github.com/apache/beam/pull/9955#issuecomment-568137249 Thank you all very much! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361991) Time Spent: 5h 10m (was: 5h) > Implement an S3 filesystem for Python SDK > - > > Key: BEAM-2572 > URL: https://issues.apache.org/jira/browse/BEAM-2572 > Project: Beam > Issue Type: Task > Components: sdk-py-core >Reporter: Dmitry Demeshchuk >Priority: Minor > Labels: GSoC2019, gsoc, gsoc2019, mentor, outreachy19dec > Time Spent: 5h 10m > Remaining Estimate: 0h > > There are two paths worth exploring, to my understanding: > 1. Sticking to the HDFS-based approach (like it's done in Java). > 2. Using boto/boto3 for accessing S3 through its common API endpoints. > I personally prefer the second approach, for a few reasons: > 1. In real life, HDFS and S3 have different consistency guarantees, therefore > their behaviors may contradict each other in some edge cases (say, we write > something to S3, but it's not immediately accessible for reading from another > end). > 2. There are other AWS-based sources and sinks we may want to create in the > future: DynamoDB, Kinesis, SQS, etc. > 3. boto3 already provides somewhat good logic for basic things like > reattempting. > Whatever path we choose, there's another problem related to this: we > currently cannot pass any global settings (say, pipeline options, or just an > arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the > runner nodes to have AWS keys set up in the environment, which is not trivial > to achieve and doesn't look too clean either (I'd rather see one single place > for configuring the runner options). > Also, it's worth mentioning that I already have a janky S3 filesystem > implementation that only supports DirectRunner at the moment (because of the > previous paragraph). I'm perfectly fine finishing it myself, with some > guidance from the maintainers. > Where should I move on from here, and whose input should I be looking for? > Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8974) apache_beam.runners.worker.log_handler_test.FnApiLogRecordHandlerTest.test_exc_info is flaky
[ https://issues.apache.org/jira/browse/BEAM-8974?focusedWorklogId=361990&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361990 ] ASF GitHub Bot logged work on BEAM-8974: Author: ASF GitHub Bot Created on: 21/Dec/19 01:00 Start Date: 21/Dec/19 01:00 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10389: [BEAM-8974] Wait for log messages to be processed before checking them. URL: https://github.com/apache/beam/pull/10389 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361990) Time Spent: 50m (was: 40m) > apache_beam.runners.worker.log_handler_test.FnApiLogRecordHandlerTest.test_exc_info > is flaky > > > Key: BEAM-8974 > URL: https://issues.apache.org/jira/browse/BEAM-8974 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: Valentyn Tymofieiev >Assignee: Robert Bradshaw >Priority: Major > Fix For: 2.18.0 > > Time Spent: 50m > Remaining Estimate: 0h > > The test is failing at apache_beam/runners/worker/log_handler_test.py:110: > IndexError > Added in https://github.com/apache/beam/pull/10292 > Sample job: [https://builds.apache.org/job/beam_PreCommit_Python_Cron/2160/] > Console logs: > {noformat} > 06:37:37 === FAILURES > === > 06:37:37 ___ FnApiLogRecordHandlerTest.test_exc_info > > 06:37:37 [gw1] linux2 -- Python 2.7.12 > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/target/.tox-py27-gcp-pytest/py27-gcp-pytest/bin/python > 06:37:37 > 06:37:37 self = > testMethod=test_exc_info> > 06:37:37 > 06:37:37 def test_exc_info(self): > 06:37:37 try: > 06:37:37 raise ValueError('some message') > 06:37:37 except ValueError: > 06:37:37 _LOGGER.error('some error', exc_info=True) > 06:37:37 > 06:37:37 self.fn_log_handler.close() > 06:37:37 > 06:37:37 > log_entry = > self.test_logging_service.log_records_received[0].log_entries[0] > 06:37:37 E IndexError: list index out of range > 06:37:37 > 06:37:37 apache_beam/runners/worker/log_handler_test.py:110: IndexError > 06:37:37 - Captured stderr call > - > 06:37:37 ERROR:apache_beam.runners.worker.log_handler_test:some error > 06:37:37 Traceback (most recent call last): > 06:37:37 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/log_handler_test.py", > line 104, in test_exc_info > 06:37:37 raise ValueError('some message') > 06:37:37 ValueError: some message > 06:37:37 -- Captured log call > --- > 06:37:37 ERROR > apache_beam.runners.worker.log_handler_test:log_handler_test.py:106 some error > 06:37:37 Traceback (most recent call last): > 06:37:37 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/log_handler_test.py", > line 104, in test_exc_info > 06:37:37 raise ValueError('some message') > 06:37:37 ValueError: some message > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2572) Implement an S3 filesystem for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-2572?focusedWorklogId=361988&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361988 ] ASF GitHub Bot logged work on BEAM-2572: Author: ASF GitHub Bot Created on: 21/Dec/19 00:59 Start Date: 21/Dec/19 00:59 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9955: [BEAM-2572] Python SDK S3 Filesystem URL: https://github.com/apache/beam/pull/9955#issuecomment-568136984 Thanks so much @tamera-lanham @MattMorgis - y'all went the extra mile to write a good feature with testable code. Lots of people have wanted this feature added, so I'm very grateful to you two : ) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361988) Time Spent: 5h (was: 4h 50m) > Implement an S3 filesystem for Python SDK > - > > Key: BEAM-2572 > URL: https://issues.apache.org/jira/browse/BEAM-2572 > Project: Beam > Issue Type: Task > Components: sdk-py-core >Reporter: Dmitry Demeshchuk >Priority: Minor > Labels: GSoC2019, gsoc, gsoc2019, mentor, outreachy19dec > Time Spent: 5h > Remaining Estimate: 0h > > There are two paths worth exploring, to my understanding: > 1. Sticking to the HDFS-based approach (like it's done in Java). > 2. Using boto/boto3 for accessing S3 through its common API endpoints. > I personally prefer the second approach, for a few reasons: > 1. In real life, HDFS and S3 have different consistency guarantees, therefore > their behaviors may contradict each other in some edge cases (say, we write > something to S3, but it's not immediately accessible for reading from another > end). > 2. There are other AWS-based sources and sinks we may want to create in the > future: DynamoDB, Kinesis, SQS, etc. > 3. boto3 already provides somewhat good logic for basic things like > reattempting. > Whatever path we choose, there's another problem related to this: we > currently cannot pass any global settings (say, pipeline options, or just an > arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the > runner nodes to have AWS keys set up in the environment, which is not trivial > to achieve and doesn't look too clean either (I'd rather see one single place > for configuring the runner options). > Also, it's worth mentioning that I already have a janky S3 filesystem > implementation that only supports DirectRunner at the moment (because of the > previous paragraph). I'm perfectly fine finishing it myself, with some > guidance from the maintainers. > Where should I move on from here, and whose input should I be looking for? > Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8944) Python SDK harness performance degradation with UnboundedThreadPoolExecutor
[ https://issues.apache.org/jira/browse/BEAM-8944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17001589#comment-17001589 ] Ahmet Altay commented on BEAM-8944: --- Could this be closed after the cherry pick PR ([https://github.com/apache/beam/pull/10430]) ? > Python SDK harness performance degradation with UnboundedThreadPoolExecutor > --- > > Key: BEAM-8944 > URL: https://issues.apache.org/jira/browse/BEAM-8944 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Affects Versions: 2.18.0 >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Blocker > Fix For: 2.18.0 > > Attachments: profiling.png, profiling_one_thread.png, > profiling_twelve_threads.png > > Time Spent: 3h > Remaining Estimate: 0h > > We are seeing a performance degradation for python streaming word count load > tests. > > After some investigation, it appears to be caused by swapping the original > ThreadPoolExecutor to UnboundedThreadPoolExecutor in sdk worker. Suspicion is > that python performance is worse with more threads on cpu-bounded tasks. > > A simple test for comparing the multiple thread pool executor performance: > > {code:python} > def test_performance(self): > def run_perf(executor): > total_number = 100 > q = queue.Queue() > def task(number): > hash(number) > q.put(number + 200) > return number > t = time.time() > count = 0 > for i in range(200): > q.put(i) > while count < total_number: > executor.submit(task, q.get(block=True)) > count += 1 > print('%s uses %s' % (executor, time.time() - t)) > with UnboundedThreadPoolExecutor() as executor: > run_perf(executor) > with futures.ThreadPoolExecutor(max_workers=1) as executor: > run_perf(executor) > with futures.ThreadPoolExecutor(max_workers=12) as executor: > run_perf(executor) > {code} > Results: > 0x7fab400dbe50> uses 268.160675049 > uses > 79.904583931 > uses > 191.179054976 > ``` > Profiling: > UnboundedThreadPoolExecutor: > !profiling.png! > 1 Thread ThreadPoolExecutor: > !profiling_one_thread.png! > 12 Threads ThreadPoolExecutor: > !profiling_twelve_threads.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9015) Make py37-cloud test suite for TOX instead of separate py37-gcp, and py37-aws
Pablo Estrada created BEAM-9015: --- Summary: Make py37-cloud test suite for TOX instead of separate py37-gcp, and py37-aws Key: BEAM-9015 URL: https://issues.apache.org/jira/browse/BEAM-9015 Project: Beam Issue Type: Bug Components: sdk-py-core, testing Reporter: Pablo Estrada Assignee: Pablo Estrada -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8974) apache_beam.runners.worker.log_handler_test.FnApiLogRecordHandlerTest.test_exc_info is flaky
[ https://issues.apache.org/jira/browse/BEAM-8974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17001562#comment-17001562 ] Ahmet Altay commented on BEAM-8974: --- What is the next action here with respect to 2.18 release? * Revert the cherry pick to release branch? * Fix forward in the release branch? Do we know what is the fix? * Leave it as it its? – Is this just a test flakiness? Would this affect end users? > apache_beam.runners.worker.log_handler_test.FnApiLogRecordHandlerTest.test_exc_info > is flaky > > > Key: BEAM-8974 > URL: https://issues.apache.org/jira/browse/BEAM-8974 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: Valentyn Tymofieiev >Assignee: Robert Bradshaw >Priority: Major > Fix For: 2.18.0 > > Time Spent: 40m > Remaining Estimate: 0h > > The test is failing at apache_beam/runners/worker/log_handler_test.py:110: > IndexError > Added in https://github.com/apache/beam/pull/10292 > Sample job: [https://builds.apache.org/job/beam_PreCommit_Python_Cron/2160/] > Console logs: > {noformat} > 06:37:37 === FAILURES > === > 06:37:37 ___ FnApiLogRecordHandlerTest.test_exc_info > > 06:37:37 [gw1] linux2 -- Python 2.7.12 > /home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/target/.tox-py27-gcp-pytest/py27-gcp-pytest/bin/python > 06:37:37 > 06:37:37 self = > testMethod=test_exc_info> > 06:37:37 > 06:37:37 def test_exc_info(self): > 06:37:37 try: > 06:37:37 raise ValueError('some message') > 06:37:37 except ValueError: > 06:37:37 _LOGGER.error('some error', exc_info=True) > 06:37:37 > 06:37:37 self.fn_log_handler.close() > 06:37:37 > 06:37:37 > log_entry = > self.test_logging_service.log_records_received[0].log_entries[0] > 06:37:37 E IndexError: list index out of range > 06:37:37 > 06:37:37 apache_beam/runners/worker/log_handler_test.py:110: IndexError > 06:37:37 - Captured stderr call > - > 06:37:37 ERROR:apache_beam.runners.worker.log_handler_test:some error > 06:37:37 Traceback (most recent call last): > 06:37:37 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/log_handler_test.py", > line 104, in test_exc_info > 06:37:37 raise ValueError('some message') > 06:37:37 ValueError: some message > 06:37:37 -- Captured log call > --- > 06:37:37 ERROR > apache_beam.runners.worker.log_handler_test:log_handler_test.py:106 some error > 06:37:37 Traceback (most recent call last): > 06:37:37 File > "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Cron/src/sdks/python/test-suites/tox/py2/build/srcs/sdks/python/apache_beam/runners/worker/log_handler_test.py", > line 104, in test_exc_info > 06:37:37 raise ValueError('some message') > 06:37:37 ValueError: some message > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2572) Implement an S3 filesystem for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-2572?focusedWorklogId=361987&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361987 ] ASF GitHub Bot logged work on BEAM-2572: Author: ASF GitHub Bot Created on: 21/Dec/19 00:56 Start Date: 21/Dec/19 00:56 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9955: [BEAM-2572] Python SDK S3 Filesystem URL: https://github.com/apache/beam/pull/9955 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361987) Time Spent: 4h 50m (was: 4h 40m) > Implement an S3 filesystem for Python SDK > - > > Key: BEAM-2572 > URL: https://issues.apache.org/jira/browse/BEAM-2572 > Project: Beam > Issue Type: Task > Components: sdk-py-core >Reporter: Dmitry Demeshchuk >Priority: Minor > Labels: GSoC2019, gsoc, gsoc2019, mentor, outreachy19dec > Time Spent: 4h 50m > Remaining Estimate: 0h > > There are two paths worth exploring, to my understanding: > 1. Sticking to the HDFS-based approach (like it's done in Java). > 2. Using boto/boto3 for accessing S3 through its common API endpoints. > I personally prefer the second approach, for a few reasons: > 1. In real life, HDFS and S3 have different consistency guarantees, therefore > their behaviors may contradict each other in some edge cases (say, we write > something to S3, but it's not immediately accessible for reading from another > end). > 2. There are other AWS-based sources and sinks we may want to create in the > future: DynamoDB, Kinesis, SQS, etc. > 3. boto3 already provides somewhat good logic for basic things like > reattempting. > Whatever path we choose, there's another problem related to this: we > currently cannot pass any global settings (say, pipeline options, or just an > arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the > runner nodes to have AWS keys set up in the environment, which is not trivial > to achieve and doesn't look too clean either (I'd rather see one single place > for configuring the runner options). > Also, it's worth mentioning that I already have a janky S3 filesystem > implementation that only supports DirectRunner at the moment (because of the > previous paragraph). I'm perfectly fine finishing it myself, with some > guidance from the maintainers. > Where should I move on from here, and whose input should I be looking for? > Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8337) Add Flink job server container images to release process
[ https://issues.apache.org/jira/browse/BEAM-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17001455#comment-17001455 ] Ahmet Altay commented on BEAM-8337: --- Do we have containers built? > Add Flink job server container images to release process > > > Key: BEAM-8337 > URL: https://issues.apache.org/jira/browse/BEAM-8337 > Project: Beam > Issue Type: Improvement > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Fix For: 2.18.0 > > Time Spent: 4.5h > Remaining Estimate: 0h > > Could be added to the release process similar to how we now publish SDK > worker images. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2572) Implement an S3 filesystem for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-2572?focusedWorklogId=361986&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361986 ] ASF GitHub Bot logged work on BEAM-2572: Author: ASF GitHub Bot Created on: 21/Dec/19 00:55 Start Date: 21/Dec/19 00:55 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9955: [BEAM-2572] Python SDK S3 Filesystem URL: https://github.com/apache/beam/pull/9955#issuecomment-568136585 lovely! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361986) Time Spent: 4h 40m (was: 4.5h) > Implement an S3 filesystem for Python SDK > - > > Key: BEAM-2572 > URL: https://issues.apache.org/jira/browse/BEAM-2572 > Project: Beam > Issue Type: Task > Components: sdk-py-core >Reporter: Dmitry Demeshchuk >Priority: Minor > Labels: GSoC2019, gsoc, gsoc2019, mentor, outreachy19dec > Time Spent: 4h 40m > Remaining Estimate: 0h > > There are two paths worth exploring, to my understanding: > 1. Sticking to the HDFS-based approach (like it's done in Java). > 2. Using boto/boto3 for accessing S3 through its common API endpoints. > I personally prefer the second approach, for a few reasons: > 1. In real life, HDFS and S3 have different consistency guarantees, therefore > their behaviors may contradict each other in some edge cases (say, we write > something to S3, but it's not immediately accessible for reading from another > end). > 2. There are other AWS-based sources and sinks we may want to create in the > future: DynamoDB, Kinesis, SQS, etc. > 3. boto3 already provides somewhat good logic for basic things like > reattempting. > Whatever path we choose, there's another problem related to this: we > currently cannot pass any global settings (say, pipeline options, or just an > arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the > runner nodes to have AWS keys set up in the environment, which is not trivial > to achieve and doesn't look too clean either (I'd rather see one single place > for configuring the runner options). > Also, it's worth mentioning that I already have a janky S3 filesystem > implementation that only supports DirectRunner at the moment (because of the > previous paragraph). I'm perfectly fine finishing it myself, with some > guidance from the maintainers. > Where should I move on from here, and whose input should I be looking for? > Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8825) OOM when writing large numbers of 'narrow' rows
[ https://issues.apache.org/jira/browse/BEAM-8825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17001437#comment-17001437 ] Ahmet Altay commented on BEAM-8825: --- Closing this. cherry pick PR is merged. > OOM when writing large numbers of 'narrow' rows > --- > > Key: BEAM-8825 > URL: https://issues.apache.org/jira/browse/BEAM-8825 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.9.0, 2.10.0, 2.11.0, 2.12.0, 2.13.0, 2.14.0, 2.15.0, > 2.16.0, 2.17.0 >Reporter: Niel Markwick >Assignee: Niel Markwick >Priority: Major > Fix For: 2.18.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > SpannerIO can OOM when writing large numbers of 'narrow' rows. > > SpannerIO puts input mutation elements into batches for efficient writing. > These batches are limited by number of cells mutated, and size of data > written (5000 cells, 1MB data). SpannerIO groups enough mutations to build > 1000 of these groups (5M cells, 1GB data), then sorts and batches them. > When the number of cells and size of data is very small (<5 cells, <100 > bytes), the memory overhead of storing millions of mutations for batching is > significant, and can lead to OOMs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8825) OOM when writing large numbers of 'narrow' rows
[ https://issues.apache.org/jira/browse/BEAM-8825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Altay resolved BEAM-8825. --- Resolution: Fixed > OOM when writing large numbers of 'narrow' rows > --- > > Key: BEAM-8825 > URL: https://issues.apache.org/jira/browse/BEAM-8825 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.9.0, 2.10.0, 2.11.0, 2.12.0, 2.13.0, 2.14.0, 2.15.0, > 2.16.0, 2.17.0 >Reporter: Niel Markwick >Assignee: Niel Markwick >Priority: Major > Fix For: 2.18.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > SpannerIO can OOM when writing large numbers of 'narrow' rows. > > SpannerIO puts input mutation elements into batches for efficient writing. > These batches are limited by number of cells mutated, and size of data > written (5000 cells, 1MB data). SpannerIO groups enough mutations to build > 1000 of these groups (5M cells, 1GB data), then sorts and batches them. > When the number of cells and size of data is very small (<5 cells, <100 > bytes), the memory overhead of storing millions of mutations for batching is > significant, and can lead to OOMs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8882) Allow Dataflow to automatically choose portability or not.
[ https://issues.apache.org/jira/browse/BEAM-8882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Altay resolved BEAM-8882. --- Resolution: Fixed > Allow Dataflow to automatically choose portability or not. > -- > > Key: BEAM-8882 > URL: https://issues.apache.org/jira/browse/BEAM-8882 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Critical > Fix For: 2.18.0 > > Time Spent: 5h 50m > Remaining Estimate: 0h > > We would like the Dataflow service to be able to automatically choose whether > to run pipelines in a portable way. In order to do this, we need to provide > more information even if portability is not explicitly requested. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8882) Allow Dataflow to automatically choose portability or not.
[ https://issues.apache.org/jira/browse/BEAM-8882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17001365#comment-17001365 ] Ahmet Altay commented on BEAM-8882: --- Closing this. I do not see any other open PRs related to this JIRA. > Allow Dataflow to automatically choose portability or not. > -- > > Key: BEAM-8882 > URL: https://issues.apache.org/jira/browse/BEAM-8882 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Critical > Fix For: 2.18.0 > > Time Spent: 5h 50m > Remaining Estimate: 0h > > We would like the Dataflow service to be able to automatically choose whether > to run pipelines in a portable way. In order to do this, we need to provide > more information even if portability is not explicitly requested. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8624) Implement FnService for status api in Dataflow runner
[ https://issues.apache.org/jira/browse/BEAM-8624?focusedWorklogId=361979&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361979 ] ASF GitHub Bot logged work on BEAM-8624: Author: ASF GitHub Bot Created on: 21/Dec/19 00:27 Start Date: 21/Dec/19 00:27 Worklog Time Spent: 10m Work Description: y1chi commented on issue #10115: [BEAM-8624] Implement Worker Status FnService in Dataflow runner URL: https://github.com/apache/beam/pull/10115#issuecomment-568023025 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361979) Time Spent: 5h 50m (was: 5h 40m) > Implement FnService for status api in Dataflow runner > - > > Key: BEAM-8624 > URL: https://issues.apache.org/jira/browse/BEAM-8624 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8624) Implement FnService for status api in Dataflow runner
[ https://issues.apache.org/jira/browse/BEAM-8624?focusedWorklogId=361978&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361978 ] ASF GitHub Bot logged work on BEAM-8624: Author: ASF GitHub Bot Created on: 21/Dec/19 00:26 Start Date: 21/Dec/19 00:26 Worklog Time Spent: 10m Work Description: y1chi commented on issue #10115: [BEAM-8624] Implement Worker Status FnService in Dataflow runner URL: https://github.com/apache/beam/pull/10115#issuecomment-568133369 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361978) Time Spent: 5h 40m (was: 5.5h) > Implement FnService for status api in Dataflow runner > - > > Key: BEAM-8624 > URL: https://issues.apache.org/jira/browse/BEAM-8624 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Major > Time Spent: 5h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8676) Beam Dependency Update Request: com.google.api:gax-grpc
[ https://issues.apache.org/jira/browse/BEAM-8676?focusedWorklogId=361976&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361976 ] ASF GitHub Bot logged work on BEAM-8676: Author: ASF GitHub Bot Created on: 21/Dec/19 00:22 Start Date: 21/Dec/19 00:22 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10438: [BEAM-8676] sdks/java: gax and grpc upgrades URL: https://github.com/apache/beam/pull/10438#issuecomment-568132683 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361976) Time Spent: 3h (was: 2h 50m) > Beam Dependency Update Request: com.google.api:gax-grpc > --- > > Key: BEAM-8676 > URL: https://issues.apache.org/jira/browse/BEAM-8676 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Tomo Suzuki >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > - 2019-11-15 19:38:32.410774 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.50.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-11-19 21:03:23.809273 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.50.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-02 12:08:16.165687 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.50.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-09 12:07:17.894174 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.51.0 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-9014) Update CachingShuffleBatchReader to record weights by size in bytes
[ https://issues.apache.org/jira/browse/BEAM-9014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Cwik closed BEAM-9014. --- Fix Version/s: 2.19.0 Resolution: Fixed > Update CachingShuffleBatchReader to record weights by size in bytes > --- > > Key: BEAM-9014 > URL: https://issues.apache.org/jira/browse/BEAM-9014 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Luke Cwik >Priority: Minor > Fix For: 2.19.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Currently the CachingShuffleBatchReader caches based upon the number of > batches and not the size of those batches. This task is about updating > CachingShuffleBatchReader to cache based on the size of those batches. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9013) Multi-output TestStream breaks the DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-9013?focusedWorklogId=361973&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361973 ] ASF GitHub Bot logged work on BEAM-9013: Author: ASF GitHub Bot Created on: 21/Dec/19 00:11 Start Date: 21/Dec/19 00:11 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #10445: [BEAM-9013] TestStream fix for DataflowRunner URL: https://github.com/apache/beam/pull/10445#issuecomment-568131163 Would it be possible to make some or all of the tests pipelines in https://github.com/apache/beam/blob/master/sdks/python/apache_beam/testing/test_stream_test.py, run on Dataflow? I guess this is tricky since we don't want it to run on all runners, just Dataflow and DirectRunner, but maybe you can do something like this: https://github.com/kamilwu/beam/blob/82db02dc68ffac074435bd0142dda900d7bfbec5/sdks/python/apache_beam/io/gcp/bigquery_read_it_test.py#L141 https://github.com/kamilwu/beam/blob/82db02dc68ffac074435bd0142dda900d7bfbec5/sdks/python/apache_beam/io/gcp/bigquery_read_it_test.py#L53-L66 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361973) Time Spent: 40m (was: 0.5h) > Multi-output TestStream breaks the DataflowRunner > - > > Key: BEAM-9013 > URL: https://issues.apache.org/jira/browse/BEAM-9013 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Affects Versions: 2.17.0 >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Fix For: 2.17.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=361972&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361972 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 21/Dec/19 00:11 Start Date: 21/Dec/19 00:11 Worklog Time Spent: 10m Work Description: liumomo315 commented on issue #10447: [BEAM-8575] Refactor test_do_fn_with_windowing_in_finish_bundle to work with Dataflow runner URL: https://github.com/apache/beam/pull/10447#issuecomment-568131156 R: @y1chi Hi Yichi, this is a refactoring of https://github.com/apache/beam/pull/10145 to make this test run on Dataflow runner. PTAL, thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361972) Time Spent: 38.5h (was: 38h 20m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 38.5h > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=361969&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361969 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 21/Dec/19 00:06 Start Date: 21/Dec/19 00:06 Worklog Time Spent: 10m Work Description: liumomo315 commented on pull request #10447: [BEAM-8575] Refactor test_do_fn_with_windowing_in_finish_bundle to work with Dataflow runner URL: https://github.com/apache/beam/pull/10447 The original test assumes there is always one bundle. The assumption is not true on the Dataflow runner. Limit input to one single element to enforce that. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/b
[jira] [Work logged] (BEAM-9014) Update CachingShuffleBatchReader to record weights by size in bytes
[ https://issues.apache.org/jira/browse/BEAM-9014?focusedWorklogId=361965&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361965 ] ASF GitHub Bot logged work on BEAM-9014: Author: ASF GitHub Bot Created on: 20/Dec/19 23:59 Start Date: 20/Dec/19 23:59 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10418: [BEAM-9014] CachingShuffleBatchReader use bytes to limit cache size. URL: https://github.com/apache/beam/pull/10418 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361965) Time Spent: 20m (was: 10m) > Update CachingShuffleBatchReader to record weights by size in bytes > --- > > Key: BEAM-9014 > URL: https://issues.apache.org/jira/browse/BEAM-9014 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Luke Cwik >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > Currently the CachingShuffleBatchReader caches based upon the number of > batches and not the size of those batches. This task is about updating > CachingShuffleBatchReader to cache based on the size of those batches. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9014) Update CachingShuffleBatchReader to record weights by size in bytes
[ https://issues.apache.org/jira/browse/BEAM-9014?focusedWorklogId=361964&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361964 ] ASF GitHub Bot logged work on BEAM-9014: Author: ASF GitHub Bot Created on: 20/Dec/19 23:58 Start Date: 20/Dec/19 23:58 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10418: [BEAM-9014] CachingShuffleBatchReader use bytes to limit cache size. URL: https://github.com/apache/beam/pull/10418#issuecomment-568129014 Thanks for the contribution. Tyson, could you create a JIRA account as per the [contribution guide](https://beam.apache.org/contribute/#share-your-intent) for sharing your intent. Then I can add you as a contributor to the project which would allow you to assign JIRAs to yourself (specifically BEAM-9014 which I created for this change). Note that all PRs should have an accompanying JIRA associated with them. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361964) Remaining Estimate: 0h Time Spent: 10m > Update CachingShuffleBatchReader to record weights by size in bytes > --- > > Key: BEAM-9014 > URL: https://issues.apache.org/jira/browse/BEAM-9014 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Luke Cwik >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Currently the CachingShuffleBatchReader caches based upon the number of > batches and not the size of those batches. This task is about updating > CachingShuffleBatchReader to cache based on the size of those batches. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9014) Update CachingShuffleBatchReader to record weights by size in bytes
Luke Cwik created BEAM-9014: --- Summary: Update CachingShuffleBatchReader to record weights by size in bytes Key: BEAM-9014 URL: https://issues.apache.org/jira/browse/BEAM-9014 Project: Beam Issue Type: Improvement Components: runner-dataflow Reporter: Luke Cwik Currently the CachingShuffleBatchReader caches based upon the number of batches and not the size of those batches. This task is about updating CachingShuffleBatchReader to cache based on the size of those batches. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=361963&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361963 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 20/Dec/19 23:53 Start Date: 20/Dec/19 23:53 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #10442: [BEAM-8335] On Unbounded Source change URL: https://github.com/apache/beam/pull/10442#discussion_r360610858 ## File path: sdks/python/apache_beam/runners/interactive/background_caching_job.py ## @@ -75,14 +77,16 @@ def is_background_caching_job_needed(user_pipeline): return (has_source_to_cache(user_pipeline) and # Checks if it's the first time running a job from the pipeline. (not background_caching_job_result or - # Or checks if there is no valid previous job. + # Or checks if there is no previous job. background_caching_job_result.state not in ( # DONE means a previous job has completed successfully and the # cached events are still valid. runners.runner.PipelineState.DONE, # RUNNING means a previous job has been started and is still # running. - runners.runner.PipelineState.RUNNING))) + runners.runner.PipelineState.RUNNING) or + # Or checks if we can invalidate the previous job. + is_unbounded_source_changed(user_pipeline))) Review comment: Yes, I agree. Changing it into `is_source_to_cache_changed`. I was thinking about change in a bounded source wouldn't affect cached unbounded sources. But it feels like that is going to split background caching job into 2 categories or make the instrumenting process complicated (when we add support to cache arbitrary source). Let's unify the source caching. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361963) Time Spent: 50h 20m (was: 50h 10m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 50h 20m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8623) Add additional message field to Provision API response for passing status endpoint
[ https://issues.apache.org/jira/browse/BEAM-8623?focusedWorklogId=361962&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361962 ] ASF GitHub Bot logged work on BEAM-8623: Author: ASF GitHub Bot Created on: 20/Dec/19 23:50 Start Date: 20/Dec/19 23:50 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10075: [BEAM-8623] Add status_endpoint field to provision api ProvisionInfo URL: https://github.com/apache/beam/pull/10075#issuecomment-568127734 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361962) Time Spent: 3h 40m (was: 3.5h) > Add additional message field to Provision API response for passing status > endpoint > -- > > Key: BEAM-8623 > URL: https://issues.apache.org/jira/browse/BEAM-8623 > Project: Beam > Issue Type: Sub-task > Components: beam-model >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: Minor > Time Spent: 3h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9012) Include `-> None` on Pipeline and PipelineOptions `__init__` methods for pytype compatibility
[ https://issues.apache.org/jira/browse/BEAM-9012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17001300#comment-17001300 ] Brian Hulette commented on BEAM-9012: - My motivation: we use pytype internally at Google. Some teams are already running pytype on code that uses beam python. Before we had type hints it just happily ignored the beam code, but with the change some errors are cropping up. You have a good point that it could be a slippery slope to promise full pytype support... but so far across a lot of different code this is actually the only issue that's come up. > Include `-> None` on Pipeline and PipelineOptions `__init__` methods for > pytype compatibility > - > > Key: BEAM-9012 > URL: https://issues.apache.org/jira/browse/BEAM-9012 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Fix For: 2.19.0 > > > mypy [made a decision|https://github.com/python/mypy/issues/604] to allow > init methods to omit {{\-> None}} return type annotations, but pytype has no > such feature. I think we should include {{\-> None}} annotations for pytype > compatibility. > cc: [~chadrik] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=361961&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361961 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 20/Dec/19 23:39 Start Date: 20/Dec/19 23:39 Worklog Time Spent: 10m Work Description: davidyan74 commented on pull request #10442: [BEAM-8335] On Unbounded Source change URL: https://github.com/apache/beam/pull/10442#discussion_r360608502 ## File path: sdks/python/apache_beam/runners/interactive/background_caching_job.py ## @@ -75,14 +77,16 @@ def is_background_caching_job_needed(user_pipeline): return (has_source_to_cache(user_pipeline) and # Checks if it's the first time running a job from the pipeline. (not background_caching_job_result or - # Or checks if there is no valid previous job. + # Or checks if there is no previous job. background_caching_job_result.state not in ( # DONE means a previous job has completed successfully and the # cached events are still valid. runners.runner.PipelineState.DONE, # RUNNING means a previous job has been started and is still # running. - runners.runner.PipelineState.RUNNING))) + runners.runner.PipelineState.RUNNING) or + # Or checks if we can invalidate the previous job. + is_unbounded_source_changed(user_pipeline))) Review comment: Similar to the reason why we use has_source_to_cache() above instead of has_unbounded_source(), we need to see whether any of the sources to cache has changed. So perhaps we should change this to something like cache_sources_changed() This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361961) Time Spent: 50h 10m (was: 50h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 50h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8676) Beam Dependency Update Request: com.google.api:gax-grpc
[ https://issues.apache.org/jira/browse/BEAM-8676?focusedWorklogId=361960&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361960 ] ASF GitHub Bot logged work on BEAM-8676: Author: ASF GitHub Bot Created on: 20/Dec/19 23:36 Start Date: 20/Dec/19 23:36 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10438: [BEAM-8676] sdks/java: gax and grpc upgrades URL: https://github.com/apache/beam/pull/10438#issuecomment-568125622 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361960) Time Spent: 2h 50m (was: 2h 40m) > Beam Dependency Update Request: com.google.api:gax-grpc > --- > > Key: BEAM-8676 > URL: https://issues.apache.org/jira/browse/BEAM-8676 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Tomo Suzuki >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > - 2019-11-15 19:38:32.410774 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.50.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-11-19 21:03:23.809273 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.50.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-02 12:08:16.165687 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.50.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-09 12:07:17.894174 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.51.0 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=361959&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361959 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 20/Dec/19 23:32 Start Date: 20/Dec/19 23:32 Worklog Time Spent: 10m Work Description: KevinGG commented on issue #10442: [BEAM-8335] On Unbounded Source change URL: https://github.com/apache/beam/pull/10442#issuecomment-568125058 R: @davidyan74 R: @rohdesamuel PTAL. Thanks! Merry Christmas! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361959) Time Spent: 50h (was: 49h 50m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 50h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8676) Beam Dependency Update Request: com.google.api:gax-grpc
[ https://issues.apache.org/jira/browse/BEAM-8676?focusedWorklogId=361958&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361958 ] ASF GitHub Bot logged work on BEAM-8676: Author: ASF GitHub Bot Created on: 20/Dec/19 23:31 Start Date: 20/Dec/19 23:31 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10438: [BEAM-8676] sdks/java: gax and grpc upgrades URL: https://github.com/apache/beam/pull/10438#issuecomment-568124925 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361958) Time Spent: 2h 40m (was: 2.5h) > Beam Dependency Update Request: com.google.api:gax-grpc > --- > > Key: BEAM-8676 > URL: https://issues.apache.org/jira/browse/BEAM-8676 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Tomo Suzuki >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > - 2019-11-15 19:38:32.410774 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.50.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-11-19 21:03:23.809273 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.50.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-02 12:08:16.165687 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.50.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-09 12:07:17.894174 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.51.0 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9013) Multi-output TestStream breaks the DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-9013?focusedWorklogId=361956&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361956 ] ASF GitHub Bot logged work on BEAM-9013: Author: ASF GitHub Bot Created on: 20/Dec/19 23:26 Start Date: 20/Dec/19 23:26 Worklog Time Spent: 10m Work Description: rohdesamuel commented on issue #10445: [BEAM-9013] TestStream fix for DataflowRunner URL: https://github.com/apache/beam/pull/10445#issuecomment-568124111 > Is there a test that verifies TestStream on DataflowRunner? It seems like this should've been caught in a PreCommit or PostCommit I guess not. As you said, it should have been caught in a Pre/PostCommit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361956) Time Spent: 0.5h (was: 20m) > Multi-output TestStream breaks the DataflowRunner > - > > Key: BEAM-9013 > URL: https://issues.apache.org/jira/browse/BEAM-9013 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Affects Versions: 2.17.0 >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Fix For: 2.17.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9012) Include `-> None` on Pipeline and PipelineOptions `__init__` methods for pytype compatibility
[ https://issues.apache.org/jira/browse/BEAM-9012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17001295#comment-17001295 ] Chad Dombrova commented on BEAM-9012: - I imagine there are going to be _lots_ of little differences between mypy and pytype. I'm curious your motivation for using pytype. Do you think we should aim to support both? I'd be a bit wary of doing so, since getting mypy to pass can be challenging enough on its own. I can imagine scenarios where there is no solution that appeases both mypy and pytype (thinking particularly of overloads, whose semantics seem to vary between tools). > Include `-> None` on Pipeline and PipelineOptions `__init__` methods for > pytype compatibility > - > > Key: BEAM-9012 > URL: https://issues.apache.org/jira/browse/BEAM-9012 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Fix For: 2.19.0 > > > mypy [made a decision|https://github.com/python/mypy/issues/604] to allow > init methods to omit {{\-> None}} return type annotations, but pytype has no > such feature. I think we should include {{\-> None}} annotations for pytype > compatibility. > cc: [~chadrik] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9013) Multi-output TestStream breaks the DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-9013?focusedWorklogId=361953&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361953 ] ASF GitHub Bot logged work on BEAM-9013: Author: ASF GitHub Bot Created on: 20/Dec/19 23:21 Start Date: 20/Dec/19 23:21 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #10445: [BEAM-9013] TestStream fix for DataflowRunner URL: https://github.com/apache/beam/pull/10445#issuecomment-568123120 Is there a test that verifies TestStream on DataflowRunner? It seems like this should've been caught in a PreCommit or PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361953) Time Spent: 20m (was: 10m) > Multi-output TestStream breaks the DataflowRunner > - > > Key: BEAM-9013 > URL: https://issues.apache.org/jira/browse/BEAM-9013 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Affects Versions: 2.17.0 >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Fix For: 2.17.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=361952&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361952 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 20/Dec/19 23:20 Start Date: 20/Dec/19 23:20 Worklog Time Spent: 10m Work Description: KevinGG commented on issue #10442: [BEAM-8335] On Unbounded Source change URL: https://github.com/apache/beam/pull/10442#issuecomment-568122842 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361952) Time Spent: 49h 50m (was: 49h 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 49h 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9013) Multi-output TestStream breaks the DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-9013?focusedWorklogId=361951&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361951 ] ASF GitHub Bot logged work on BEAM-9013: Author: ASF GitHub Bot Created on: 20/Dec/19 23:16 Start Date: 20/Dec/19 23:16 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #10445: [BEAM-9013] TestStream fix for DataflowRunner URL: https://github.com/apache/beam/pull/10445 The DataflowRunner relies on the old implementation of the TestStream with only a single output and different watermark controlling mechanices. This adds the _DeprecatedSingleOutputTestStream which allows for any more development of the TestStream to occur in the _TestStream class without breaking backwards compatibility with Dataflow. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructu
[jira] [Created] (BEAM-9013) Multi-output TestStream breaks the DataflowRunner
Sam Rohde created BEAM-9013: --- Summary: Multi-output TestStream breaks the DataflowRunner Key: BEAM-9013 URL: https://issues.apache.org/jira/browse/BEAM-9013 Project: Beam Issue Type: Bug Components: runner-dataflow Affects Versions: 2.17.0 Reporter: Sam Rohde Assignee: Sam Rohde Fix For: 2.17.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8999) PGBKCVOperation does not respect timestamp combiners
[ https://issues.apache.org/jira/browse/BEAM-8999?focusedWorklogId=361950&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361950 ] ASF GitHub Bot logged work on BEAM-8999: Author: ASF GitHub Bot Created on: 20/Dec/19 23:11 Start Date: 20/Dec/19 23:11 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #10425: [BEAM-8999] Respect timestamp combiners in PGBKCVOperation. URL: https://github.com/apache/beam/pull/10425#issuecomment-568120931 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361950) Time Spent: 40m (was: 0.5h) > PGBKCVOperation does not respect timestamp combiners > > > Key: BEAM-8999 > URL: https://issues.apache.org/jira/browse/BEAM-8999 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: Robert Bradshaw >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > We prevent lifting in the FnAPI runner in this case, but other optimizers > (e.g. the Greedy Fuser and Dataflow) do not, resulting in incorrect > timestamps. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9005) Go SDK post-commit failures due to https://github.com/apache/beam/pull/10183
[ https://issues.apache.org/jira/browse/BEAM-9005?focusedWorklogId=361949&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361949 ] ASF GitHub Bot logged work on BEAM-9005: Author: ASF GitHub Bot Created on: 20/Dec/19 23:05 Start Date: 20/Dec/19 23:05 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #10443: [BEAM-9005] Fixes Go formatting URL: https://github.com/apache/beam/pull/10443 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361949) Time Spent: 1.5h (was: 1h 20m) > Go SDK post-commit failures due to https://github.com/apache/beam/pull/10183 > - > > Key: BEAM-9005 > URL: https://issues.apache.org/jira/browse/BEAM-9005 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Critical > Time Spent: 1.5h > Remaining Estimate: 0h > > Looking into this. > > cc: [~bhulette] [~lostluck] [~danoliveira] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9005) Go SDK post-commit failures due to https://github.com/apache/beam/pull/10183
[ https://issues.apache.org/jira/browse/BEAM-9005?focusedWorklogId=361948&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361948 ] ASF GitHub Bot logged work on BEAM-9005: Author: ASF GitHub Bot Created on: 20/Dec/19 23:05 Start Date: 20/Dec/19 23:05 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10443: [BEAM-9005] Fixes Go formatting URL: https://github.com/apache/beam/pull/10443#issuecomment-568119545 Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361948) Time Spent: 1h 20m (was: 1h 10m) > Go SDK post-commit failures due to https://github.com/apache/beam/pull/10183 > - > > Key: BEAM-9005 > URL: https://issues.apache.org/jira/browse/BEAM-9005 > Project: Beam > Issue Type: Bug > Components: sdk-go >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Critical > Time Spent: 1h 20m > Remaining Estimate: 0h > > Looking into this. > > cc: [~bhulette] [~lostluck] [~danoliveira] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9012) Include `-> None` on Pipeline and PipelineOptions `__init__` methods for pytype compatibility
[ https://issues.apache.org/jira/browse/BEAM-9012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17001281#comment-17001281 ] Brian Hulette commented on BEAM-9012: - The gotcha is that pytype won't let you specify just a return type (see https://github.com/google/pytype/issues/480). The only workaround I've found is to include a full type annotation for the function, like {{#type: (int, str, float) -> None}}, which can be ugly, particularly for Pipeline :/ > Include `-> None` on Pipeline and PipelineOptions `__init__` methods for > pytype compatibility > - > > Key: BEAM-9012 > URL: https://issues.apache.org/jira/browse/BEAM-9012 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Fix For: 2.19.0 > > > mypy [made a decision|https://github.com/python/mypy/issues/604] to allow > init methods to omit {{\-> None}} return type annotations, but pytype has no > such feature. I think we should include {{\-> None}} annotations for pytype > compatibility. > cc: [~chadrik] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9012) Include `-> None` on Pipeline and PipelineOptions `__init__` methods for pytype compatibility
[ https://issues.apache.org/jira/browse/BEAM-9012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17001279#comment-17001279 ] Chad Dombrova commented on BEAM-9012: - Fine by me. Brian, if you're into the static typing thing, you may want to poke in over at my second PR, which is waiting on some feedback: [https://github.com/apache/beam/pull/10367] There will probably be a third (and hopefully final) PR after that one to get the project to a point where mypy is fully passing. We can take care of this issue in that final PR. > Include `-> None` on Pipeline and PipelineOptions `__init__` methods for > pytype compatibility > - > > Key: BEAM-9012 > URL: https://issues.apache.org/jira/browse/BEAM-9012 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Fix For: 2.19.0 > > > mypy [made a decision|https://github.com/python/mypy/issues/604] to allow > init methods to omit {{\-> None}} return type annotations, but pytype has no > such feature. I think we should include {{\-> None}} annotations for pytype > compatibility. > cc: [~chadrik] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9012) Include `-> None` on Pipeline and PipelineOptions `__init__` methods for pytype compatibility
[ https://issues.apache.org/jira/browse/BEAM-9012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette updated BEAM-9012: Description: mypy [made a decision|https://github.com/python/mypy/issues/604] to allow init methods to omit {{\-> None}} return type annotations, but pytype has no such feature. I think we should include {{\-> None}} annotations for pytype compatibility. cc: [~chadrik] was: mypy [made a decision|https://github.com/python/mypy/issues/604] to allow init methods to omit `-> None` return type annotations, but pytype has no such feature. I think we should include `-> None` annotations for pytype compatibility. cc: [~chadrik] > Include `-> None` on Pipeline and PipelineOptions `__init__` methods for > pytype compatibility > - > > Key: BEAM-9012 > URL: https://issues.apache.org/jira/browse/BEAM-9012 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Fix For: 2.19.0 > > > mypy [made a decision|https://github.com/python/mypy/issues/604] to allow > init methods to omit {{\-> None}} return type annotations, but pytype has no > such feature. I think we should include {{\-> None}} annotations for pytype > compatibility. > cc: [~chadrik] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9012) Include `-> None` on Pipeline and PipelineOptions `__init__` methods for pytype compatibility
[ https://issues.apache.org/jira/browse/BEAM-9012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette updated BEAM-9012: Description: mypy [made a decision|https://github.com/python/mypy/issues/604] to allow init methods to omit `-> None` return type annotations, but pytype has no such feature. I think we should include `-> None` annotations for pytype compatibility. cc: [~chadrik] was: mypy made a decision to allow `__init__` methods to omit `-> None` return type annotations, but pytype has no such feature. I think we should include `-> None` annotations for pytype compatibility. cc: [~chadrik] > Include `-> None` on Pipeline and PipelineOptions `__init__` methods for > pytype compatibility > - > > Key: BEAM-9012 > URL: https://issues.apache.org/jira/browse/BEAM-9012 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Fix For: 2.19.0 > > > mypy [made a decision|https://github.com/python/mypy/issues/604] to allow > init methods to omit `-> None` return type annotations, but pytype has no > such feature. I think we should include `-> None` annotations for pytype > compatibility. > cc: [~chadrik] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9012) Include `-> None` on Pipeline and PipelineOptions `__init__` methods for pytype compatibility
Brian Hulette created BEAM-9012: --- Summary: Include `-> None` on Pipeline and PipelineOptions `__init__` methods for pytype compatibility Key: BEAM-9012 URL: https://issues.apache.org/jira/browse/BEAM-9012 Project: Beam Issue Type: Improvement Components: sdk-py-core Reporter: Brian Hulette Assignee: Brian Hulette Fix For: 2.19.0 mypy made a decision to allow `__init__` methods to omit `-> None` return type annotations, but pytype has no such feature. I think we should include `-> None` annotations for pytype compatibility. cc: [~chadrik] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8977) apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display is flaky
[ https://issues.apache.org/jira/browse/BEAM-8977?focusedWorklogId=361943&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361943 ] ASF GitHub Bot logged work on BEAM-8977: Author: ASF GitHub Bot Created on: 20/Dec/19 22:39 Start Date: 20/Dec/19 22:39 Worklog Time Spent: 10m Work Description: KevinGG commented on issue #10404: [BEAM-8977] Resolve test flakiness URL: https://github.com/apache/beam/pull/10404#issuecomment-568113752 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361943) Time Spent: 2.5h (was: 2h 20m) > apache_beam.runners.interactive.display.pcoll_visualization_test.PCollectionVisualizationTest.test_dynamic_plotting_update_same_display > is flaky > > > Key: BEAM-8977 > URL: https://issues.apache.org/jira/browse/BEAM-8977 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Valentyn Tymofieiev >Assignee: Ning Kang >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > Sample failure: > > [https://builds.apache.org/job/beam_PreCommit_Python_Phrase/1273/testReport/apache_beam.runners.interactive.display.pcoll_visualization_test/PCollectionVisualizationTest/test_dynamic_plotting_update_same_display/] > Error Message > IndexError: list index out of range > Stacktrace > self = > testMethod=test_dynamic_plotting_update_same_display> > mocked_display_facets = id='139889868386376'> > @patch('apache_beam.runners.interactive.display.pcoll_visualization' > '.PCollectionVisualization.display_facets') > def test_dynamic_plotting_update_same_display(self, > mocked_display_facets): > fake_pipeline_result = runner.PipelineResult(runner.PipelineState.RUNNING) > ie.current_env().set_pipeline_result(self._p, fake_pipeline_result) > # Starts async dynamic plotting that never ends in this test. > h = pv.visualize(self._pcoll, dynamic_plotting_interval=0.001) > # Blocking so the above async task can execute some iterations. > time.sleep(1) > # The first iteration doesn't provide updating_pv to display_facets. > _, first_kwargs = mocked_display_facets.call_args_list[0] > self.assertEqual(first_kwargs, {}) > # The following iterations use the same updating_pv to display_facets and so > # on. > > _, second_kwargs = mocked_display_facets.call_args_list[1] > E IndexError: list index out of range > apache_beam/runners/interactive/display/pcoll_visualization_test.py:105: > IndexError > Standard Output > > Standard Error > WARNING:apache_beam.runners.interactive.interactive_environment:You cannot > use Interactive Beam features when you are not in an interactive environment > such as a Jupyter notebook or ipython terminal. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8936) BigQuery related ITs are failing in PostCommit: quota exceeded
[ https://issues.apache.org/jira/browse/BEAM-8936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17001261#comment-17001261 ] Mark Liu commented on BEAM-8936: Error details: _Project apache-beam-testing has insufficient quota(s) to execute this workflow with 1 instances in region us-central1. Quota summary (required/available): 1/12148 instances, 1/0 CPUs, 250/332935 disk GB, 0/3608 SSD disk GB, 1/31 instance groups, 1/32 managed instance groups, 1/271 instance templates, 1/854 in-use IP addresses._ We have increase the CPU quota in us-central1 from 1250 to 2000. Should relief the peak usage. > BigQuery related ITs are failing in PostCommit: quota exceeded > -- > > Key: BEAM-8936 > URL: https://issues.apache.org/jira/browse/BEAM-8936 > Project: Beam > Issue Type: Bug > Components: io-py-gcp, test-failures >Reporter: Yueyang Qiu >Assignee: Mark Liu >Priority: Major > Labels: currently-failing > > beam_PostCommit_Java: > [https://builds.apache.org/job/beam_PostCommit_Java/4852/] > beam_PostCommit_Python2: > [https://builds.apache.org/job/beam_PostCommit_Python2/1178|https://builds.apache.org/job/beam_PostCommit_Python2/1178/#showFailuresLink] > beam_PostCommit_Python35: > [https://builds.apache.org/job/beam_PostCommit_Python35/1185] > ... > > This seems to be a GCP quota issue. Mark, could you help take a look or find > a owner of this bug? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8936) BigQuery related ITs are failing in PostCommit: quota exceeded
[ https://issues.apache.org/jira/browse/BEAM-8936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu resolved BEAM-8936. Fix Version/s: Not applicable Resolution: Fixed > BigQuery related ITs are failing in PostCommit: quota exceeded > -- > > Key: BEAM-8936 > URL: https://issues.apache.org/jira/browse/BEAM-8936 > Project: Beam > Issue Type: Bug > Components: io-py-gcp, test-failures >Reporter: Yueyang Qiu >Assignee: Mark Liu >Priority: Major > Labels: currently-failing > Fix For: Not applicable > > > beam_PostCommit_Java: > [https://builds.apache.org/job/beam_PostCommit_Java/4852/] > beam_PostCommit_Python2: > [https://builds.apache.org/job/beam_PostCommit_Python2/1178|https://builds.apache.org/job/beam_PostCommit_Python2/1178/#showFailuresLink] > beam_PostCommit_Python35: > [https://builds.apache.org/job/beam_PostCommit_Python35/1185] > ... > > This seems to be a GCP quota issue. Mark, could you help take a look or find > a owner of this bug? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9010) BigQuery TableRow's size is toString().length() ?
[ https://issues.apache.org/jira/browse/BEAM-9010?focusedWorklogId=361935&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361935 ] ASF GitHub Bot logged work on BEAM-9010: Author: ASF GitHub Bot Created on: 20/Dec/19 22:30 Start Date: 20/Dec/19 22:30 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10444: [BEAM-9010] Proper TableRow size calculation via TableRowJsonCoder URL: https://github.com/apache/beam/pull/10444#issuecomment-568111413 Run Java HadoopFormatIO Performance Test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361935) Time Spent: 1h (was: 50m) > BigQuery TableRow's size is toString().length() ? > - > > Key: BEAM-9010 > URL: https://issues.apache.org/jira/browse/BEAM-9010 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Minor > Attachments: TableRowJsonCoder_behavior_remains_same.png > > Time Spent: 1h > Remaining Estimate: 0h > > The following tests failed when I tried to upgrade google-http-client 1.34.0 > from 1.28.0: > {noformat} > org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithoutStreamingBuffer > org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithStreamingBuffer > org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtilTest.testInsertAll > {noformat} > [https://builds.apache.org/job/beam_PreCommit_Java_Commit/9288/#showFailuresLink] > h3. Reason of the test failures > [org.apache.beam.sdk.io.gcp.testing.TableContainer|https://github.com/apache/beam/blob/6fa94c9/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/TableContainer.java#L43] > and > [org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl|https://github.com/apache/beam/blob/c2f0d28/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L758] > rely on {{TableRow.toString().length()}} to calculate the size. Example: > {code:java} > dataSize += row.toString().length(); > if (dataSize >= maxRowBatchSize > || rows.size() >= maxRowsPerBatch > || i == rowsToPublish.size() - 1) { > {code} > However, with [google-http-client's > PR#589|https://github.com/googleapis/google-http-java-client/pull/589/files#diff-914cd7ff18143b3d2398149e1cfb4f45R218], > the GenericData.toString output has changed since v1.29.0. > In old google-http-client 1.28.0, an example row's toString returned: > {noformat} > {f=[{v=foo}, {v=1234}]} > {noformat} > In new google-http-client 1.29.0 and higher, the same row's toString returns: > {noformat} > GenericData{classInfo=[f], {f=[GenericData{classInfo=[v], {v=foo}}, > GenericData{classInfo=[v], {v=1234}}]}} > {noformat} > h1. Question: > Is this right thing to rely on {{toString().length()}} in the BigQuery > classes? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9010) BigQuery TableRow's size is toString().length() ?
[ https://issues.apache.org/jira/browse/BEAM-9010?focusedWorklogId=361939&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361939 ] ASF GitHub Bot logged work on BEAM-9010: Author: ASF GitHub Bot Created on: 20/Dec/19 22:30 Start Date: 20/Dec/19 22:30 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10444: [BEAM-9010] Proper TableRow size calculation via TableRowJsonCoder URL: https://github.com/apache/beam/pull/10444#issuecomment-568111524 Run SQL Postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361939) Time Spent: 1h 40m (was: 1.5h) > BigQuery TableRow's size is toString().length() ? > - > > Key: BEAM-9010 > URL: https://issues.apache.org/jira/browse/BEAM-9010 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Minor > Attachments: TableRowJsonCoder_behavior_remains_same.png > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The following tests failed when I tried to upgrade google-http-client 1.34.0 > from 1.28.0: > {noformat} > org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithoutStreamingBuffer > org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithStreamingBuffer > org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtilTest.testInsertAll > {noformat} > [https://builds.apache.org/job/beam_PreCommit_Java_Commit/9288/#showFailuresLink] > h3. Reason of the test failures > [org.apache.beam.sdk.io.gcp.testing.TableContainer|https://github.com/apache/beam/blob/6fa94c9/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/TableContainer.java#L43] > and > [org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl|https://github.com/apache/beam/blob/c2f0d28/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L758] > rely on {{TableRow.toString().length()}} to calculate the size. Example: > {code:java} > dataSize += row.toString().length(); > if (dataSize >= maxRowBatchSize > || rows.size() >= maxRowsPerBatch > || i == rowsToPublish.size() - 1) { > {code} > However, with [google-http-client's > PR#589|https://github.com/googleapis/google-http-java-client/pull/589/files#diff-914cd7ff18143b3d2398149e1cfb4f45R218], > the GenericData.toString output has changed since v1.29.0. > In old google-http-client 1.28.0, an example row's toString returned: > {noformat} > {f=[{v=foo}, {v=1234}]} > {noformat} > In new google-http-client 1.29.0 and higher, the same row's toString returns: > {noformat} > GenericData{classInfo=[f], {f=[GenericData{classInfo=[v], {v=foo}}, > GenericData{classInfo=[v], {v=1234}}]}} > {noformat} > h1. Question: > Is this right thing to rely on {{toString().length()}} in the BigQuery > classes? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9010) BigQuery TableRow's size is toString().length() ?
[ https://issues.apache.org/jira/browse/BEAM-9010?focusedWorklogId=361938&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361938 ] ASF GitHub Bot logged work on BEAM-9010: Author: ASF GitHub Bot Created on: 20/Dec/19 22:30 Start Date: 20/Dec/19 22:30 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10444: [BEAM-9010] Proper TableRow size calculation via TableRowJsonCoder URL: https://github.com/apache/beam/pull/10444#issuecomment-568111488 Run Spark ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361938) Time Spent: 1.5h (was: 1h 20m) > BigQuery TableRow's size is toString().length() ? > - > > Key: BEAM-9010 > URL: https://issues.apache.org/jira/browse/BEAM-9010 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Minor > Attachments: TableRowJsonCoder_behavior_remains_same.png > > Time Spent: 1.5h > Remaining Estimate: 0h > > The following tests failed when I tried to upgrade google-http-client 1.34.0 > from 1.28.0: > {noformat} > org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithoutStreamingBuffer > org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithStreamingBuffer > org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtilTest.testInsertAll > {noformat} > [https://builds.apache.org/job/beam_PreCommit_Java_Commit/9288/#showFailuresLink] > h3. Reason of the test failures > [org.apache.beam.sdk.io.gcp.testing.TableContainer|https://github.com/apache/beam/blob/6fa94c9/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/TableContainer.java#L43] > and > [org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl|https://github.com/apache/beam/blob/c2f0d28/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L758] > rely on {{TableRow.toString().length()}} to calculate the size. Example: > {code:java} > dataSize += row.toString().length(); > if (dataSize >= maxRowBatchSize > || rows.size() >= maxRowsPerBatch > || i == rowsToPublish.size() - 1) { > {code} > However, with [google-http-client's > PR#589|https://github.com/googleapis/google-http-java-client/pull/589/files#diff-914cd7ff18143b3d2398149e1cfb4f45R218], > the GenericData.toString output has changed since v1.29.0. > In old google-http-client 1.28.0, an example row's toString returned: > {noformat} > {f=[{v=foo}, {v=1234}]} > {noformat} > In new google-http-client 1.29.0 and higher, the same row's toString returns: > {noformat} > GenericData{classInfo=[f], {f=[GenericData{classInfo=[v], {v=foo}}, > GenericData{classInfo=[v], {v=1234}}]}} > {noformat} > h1. Question: > Is this right thing to rely on {{toString().length()}} in the BigQuery > classes? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9010) BigQuery TableRow's size is toString().length() ?
[ https://issues.apache.org/jira/browse/BEAM-9010?focusedWorklogId=361937&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361937 ] ASF GitHub Bot logged work on BEAM-9010: Author: ASF GitHub Bot Created on: 20/Dec/19 22:30 Start Date: 20/Dec/19 22:30 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10444: [BEAM-9010] Proper TableRow size calculation via TableRowJsonCoder URL: https://github.com/apache/beam/pull/10444#issuecomment-568111460 Run Dataflow ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361937) Time Spent: 1h 20m (was: 1h 10m) > BigQuery TableRow's size is toString().length() ? > - > > Key: BEAM-9010 > URL: https://issues.apache.org/jira/browse/BEAM-9010 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Minor > Attachments: TableRowJsonCoder_behavior_remains_same.png > > Time Spent: 1h 20m > Remaining Estimate: 0h > > The following tests failed when I tried to upgrade google-http-client 1.34.0 > from 1.28.0: > {noformat} > org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithoutStreamingBuffer > org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithStreamingBuffer > org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtilTest.testInsertAll > {noformat} > [https://builds.apache.org/job/beam_PreCommit_Java_Commit/9288/#showFailuresLink] > h3. Reason of the test failures > [org.apache.beam.sdk.io.gcp.testing.TableContainer|https://github.com/apache/beam/blob/6fa94c9/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/TableContainer.java#L43] > and > [org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl|https://github.com/apache/beam/blob/c2f0d28/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L758] > rely on {{TableRow.toString().length()}} to calculate the size. Example: > {code:java} > dataSize += row.toString().length(); > if (dataSize >= maxRowBatchSize > || rows.size() >= maxRowsPerBatch > || i == rowsToPublish.size() - 1) { > {code} > However, with [google-http-client's > PR#589|https://github.com/googleapis/google-http-java-client/pull/589/files#diff-914cd7ff18143b3d2398149e1cfb4f45R218], > the GenericData.toString output has changed since v1.29.0. > In old google-http-client 1.28.0, an example row's toString returned: > {noformat} > {f=[{v=foo}, {v=1234}]} > {noformat} > In new google-http-client 1.29.0 and higher, the same row's toString returns: > {noformat} > GenericData{classInfo=[f], {f=[GenericData{classInfo=[v], {v=foo}}, > GenericData{classInfo=[v], {v=1234}}]}} > {noformat} > h1. Question: > Is this right thing to rely on {{toString().length()}} in the BigQuery > classes? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9010) BigQuery TableRow's size is toString().length() ?
[ https://issues.apache.org/jira/browse/BEAM-9010?focusedWorklogId=361936&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361936 ] ASF GitHub Bot logged work on BEAM-9010: Author: ASF GitHub Bot Created on: 20/Dec/19 22:30 Start Date: 20/Dec/19 22:30 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10444: [BEAM-9010] Proper TableRow size calculation via TableRowJsonCoder URL: https://github.com/apache/beam/pull/10444#issuecomment-568111435 Run BigQueryIO Streaming Performance Test Java This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361936) Time Spent: 1h 10m (was: 1h) > BigQuery TableRow's size is toString().length() ? > - > > Key: BEAM-9010 > URL: https://issues.apache.org/jira/browse/BEAM-9010 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Minor > Attachments: TableRowJsonCoder_behavior_remains_same.png > > Time Spent: 1h 10m > Remaining Estimate: 0h > > The following tests failed when I tried to upgrade google-http-client 1.34.0 > from 1.28.0: > {noformat} > org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithoutStreamingBuffer > org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithStreamingBuffer > org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtilTest.testInsertAll > {noformat} > [https://builds.apache.org/job/beam_PreCommit_Java_Commit/9288/#showFailuresLink] > h3. Reason of the test failures > [org.apache.beam.sdk.io.gcp.testing.TableContainer|https://github.com/apache/beam/blob/6fa94c9/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/TableContainer.java#L43] > and > [org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl|https://github.com/apache/beam/blob/c2f0d28/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L758] > rely on {{TableRow.toString().length()}} to calculate the size. Example: > {code:java} > dataSize += row.toString().length(); > if (dataSize >= maxRowBatchSize > || rows.size() >= maxRowsPerBatch > || i == rowsToPublish.size() - 1) { > {code} > However, with [google-http-client's > PR#589|https://github.com/googleapis/google-http-java-client/pull/589/files#diff-914cd7ff18143b3d2398149e1cfb4f45R218], > the GenericData.toString output has changed since v1.29.0. > In old google-http-client 1.28.0, an example row's toString returned: > {noformat} > {f=[{v=foo}, {v=1234}]} > {noformat} > In new google-http-client 1.29.0 and higher, the same row's toString returns: > {noformat} > GenericData{classInfo=[f], {f=[GenericData{classInfo=[v], {v=foo}}, > GenericData{classInfo=[v], {v=1234}}]}} > {noformat} > h1. Question: > Is this right thing to rely on {{toString().length()}} in the BigQuery > classes? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9000) Java Test Assertions without toString for GenericJson subclasses
[ https://issues.apache.org/jira/browse/BEAM-9000?focusedWorklogId=361934&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361934 ] ASF GitHub Bot logged work on BEAM-9000: Author: ASF GitHub Bot Created on: 20/Dec/19 22:29 Start Date: 20/Dec/19 22:29 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10441: [BEAM-9000] Java Test Assertions without toString for GenericJson subclasses URL: https://github.com/apache/beam/pull/10441#issuecomment-56899 Run SQL Postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361934) Time Spent: 1.5h (was: 1h 20m) > Java Test Assertions without toString for GenericJson subclasses > > > Key: BEAM-9000 > URL: https://issues.apache.org/jira/browse/BEAM-9000 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > > As of now, there are many tests that assert on {{toString()}} of objects. > {code:java} > CounterUpdate result = testObject.transform(monitoringInfo); > assertEquals( > "{cumulative=true, integer={highBits=0, lowBits=0}, " > + "nameAndKind={kind=SUM, " > + "name=transformedValue-ElementCount}}", > result.toString()); > {code} > This style is prone to unnecessary maintenance of the test code when > upgrading dependencies. Dependencies may change the internal ordering of > fields and trivial change in {{toString()}}. In BEAM-8695, where I tried to > upgrade google-http-client, there are ~30 comparison failure due to this > {{toString}} assertions. > They are subclasses of {{com.google.api.client.json.GenericJson}}. > Several options to enhance these assertions. > h1. Option 1: Assertion using Map > Leveraging the fact that GenericJson is a subclass of AbstractMap Object>, the assertion can be written as > {code:java} > ImmutableMap expected = ImmutableMap.of("cumulative", > true, > "integer", ImmutableMap.of("highBits", 0, "lowBits", 0), > "nameAndKind", ImmutableMap.of("kind", "SUM", "name", > "transformedValue-ElementCount")); > assertEquals(expected, (Map)result); > {code} > Credit: Ben Whitehead. > h1. Option 2: Create assertEqualsOnJson > Leveraging the fact that instance of GenericJson can be instantiated through > JSON, the assertion can be written as > {code:java} > assertEqualsOnJson( > "{\"cumulative\":true, \"integer\":{\"highBits\":0, \"lowBits\":0}, " > + "\"nameAndKind\":{\"kind\":\"SUM\", " > + "\"name\":\"transformedValue-ElementCount\"}}", > result); > {code} > > {{assertEqualsOnJson}} is implemented as below. The following field and > methods should go to shared test utility class (sdks/testing?) > {code:java} > private static final JacksonFactory jacksonFactory = > JacksonFactory.getDefaultInstance(); > public static void assertEqualsOnJson(String > expectedJsonText, T actual) { > CounterUpdate expected = parse(expectedJsonText, CounterUpdate.class); > assertEquals(expected, actual); > } > public static T parse(String text, Class clazz) { > try { > JsonParser parser = jacksonFactory.createJsonParser(text); > return parser.parse(clazz); > } catch (IOException ex) { > throw new IllegalArgumentException("Could not parse the text as " + > clazz, ex); > } > } > {code} > A feature request to handle escaping double quotes via JacksonFactory: > [https://github.com/googleapis/google-http-java-client/issues/923] > > h1. Option3: Check JSON equality via JSONassert > * https://github.com/skyscreamer/JSONassert > * https://github.com/hertzsprung/hamcrest-json (Not using as last commit was > in 2012) > The JSONassert example does not carry quoted double quote characters. The > implementation would be converting actual object into JSON object and calling > {{JSONAssert.assertEqual}}. > Credit: Luke Cwik > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9000) Java Test Assertions without toString for GenericJson subclasses
[ https://issues.apache.org/jira/browse/BEAM-9000?focusedWorklogId=361930&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361930 ] ASF GitHub Bot logged work on BEAM-9000: Author: ASF GitHub Bot Created on: 20/Dec/19 22:29 Start Date: 20/Dec/19 22:29 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10441: [BEAM-9000] Java Test Assertions without toString for GenericJson subclasses URL: https://github.com/apache/beam/pull/10441#issuecomment-568111073 Run Java HadoopFormatIO Performance Test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361930) Time Spent: 50m (was: 40m) > Java Test Assertions without toString for GenericJson subclasses > > > Key: BEAM-9000 > URL: https://issues.apache.org/jira/browse/BEAM-9000 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > As of now, there are many tests that assert on {{toString()}} of objects. > {code:java} > CounterUpdate result = testObject.transform(monitoringInfo); > assertEquals( > "{cumulative=true, integer={highBits=0, lowBits=0}, " > + "nameAndKind={kind=SUM, " > + "name=transformedValue-ElementCount}}", > result.toString()); > {code} > This style is prone to unnecessary maintenance of the test code when > upgrading dependencies. Dependencies may change the internal ordering of > fields and trivial change in {{toString()}}. In BEAM-8695, where I tried to > upgrade google-http-client, there are ~30 comparison failure due to this > {{toString}} assertions. > They are subclasses of {{com.google.api.client.json.GenericJson}}. > Several options to enhance these assertions. > h1. Option 1: Assertion using Map > Leveraging the fact that GenericJson is a subclass of AbstractMap Object>, the assertion can be written as > {code:java} > ImmutableMap expected = ImmutableMap.of("cumulative", > true, > "integer", ImmutableMap.of("highBits", 0, "lowBits", 0), > "nameAndKind", ImmutableMap.of("kind", "SUM", "name", > "transformedValue-ElementCount")); > assertEquals(expected, (Map)result); > {code} > Credit: Ben Whitehead. > h1. Option 2: Create assertEqualsOnJson > Leveraging the fact that instance of GenericJson can be instantiated through > JSON, the assertion can be written as > {code:java} > assertEqualsOnJson( > "{\"cumulative\":true, \"integer\":{\"highBits\":0, \"lowBits\":0}, " > + "\"nameAndKind\":{\"kind\":\"SUM\", " > + "\"name\":\"transformedValue-ElementCount\"}}", > result); > {code} > > {{assertEqualsOnJson}} is implemented as below. The following field and > methods should go to shared test utility class (sdks/testing?) > {code:java} > private static final JacksonFactory jacksonFactory = > JacksonFactory.getDefaultInstance(); > public static void assertEqualsOnJson(String > expectedJsonText, T actual) { > CounterUpdate expected = parse(expectedJsonText, CounterUpdate.class); > assertEquals(expected, actual); > } > public static T parse(String text, Class clazz) { > try { > JsonParser parser = jacksonFactory.createJsonParser(text); > return parser.parse(clazz); > } catch (IOException ex) { > throw new IllegalArgumentException("Could not parse the text as " + > clazz, ex); > } > } > {code} > A feature request to handle escaping double quotes via JacksonFactory: > [https://github.com/googleapis/google-http-java-client/issues/923] > > h1. Option3: Check JSON equality via JSONassert > * https://github.com/skyscreamer/JSONassert > * https://github.com/hertzsprung/hamcrest-json (Not using as last commit was > in 2012) > The JSONassert example does not carry quoted double quote characters. The > implementation would be converting actual object into JSON object and calling > {{JSONAssert.assertEqual}}. > Credit: Luke Cwik > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9000) Java Test Assertions without toString for GenericJson subclasses
[ https://issues.apache.org/jira/browse/BEAM-9000?focusedWorklogId=361933&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361933 ] ASF GitHub Bot logged work on BEAM-9000: Author: ASF GitHub Bot Created on: 20/Dec/19 22:29 Start Date: 20/Dec/19 22:29 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10441: [BEAM-9000] Java Test Assertions without toString for GenericJson subclasses URL: https://github.com/apache/beam/pull/10441#issuecomment-56857 Run Spark ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361933) Time Spent: 1h 20m (was: 1h 10m) > Java Test Assertions without toString for GenericJson subclasses > > > Key: BEAM-9000 > URL: https://issues.apache.org/jira/browse/BEAM-9000 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Minor > Time Spent: 1h 20m > Remaining Estimate: 0h > > As of now, there are many tests that assert on {{toString()}} of objects. > {code:java} > CounterUpdate result = testObject.transform(monitoringInfo); > assertEquals( > "{cumulative=true, integer={highBits=0, lowBits=0}, " > + "nameAndKind={kind=SUM, " > + "name=transformedValue-ElementCount}}", > result.toString()); > {code} > This style is prone to unnecessary maintenance of the test code when > upgrading dependencies. Dependencies may change the internal ordering of > fields and trivial change in {{toString()}}. In BEAM-8695, where I tried to > upgrade google-http-client, there are ~30 comparison failure due to this > {{toString}} assertions. > They are subclasses of {{com.google.api.client.json.GenericJson}}. > Several options to enhance these assertions. > h1. Option 1: Assertion using Map > Leveraging the fact that GenericJson is a subclass of AbstractMap Object>, the assertion can be written as > {code:java} > ImmutableMap expected = ImmutableMap.of("cumulative", > true, > "integer", ImmutableMap.of("highBits", 0, "lowBits", 0), > "nameAndKind", ImmutableMap.of("kind", "SUM", "name", > "transformedValue-ElementCount")); > assertEquals(expected, (Map)result); > {code} > Credit: Ben Whitehead. > h1. Option 2: Create assertEqualsOnJson > Leveraging the fact that instance of GenericJson can be instantiated through > JSON, the assertion can be written as > {code:java} > assertEqualsOnJson( > "{\"cumulative\":true, \"integer\":{\"highBits\":0, \"lowBits\":0}, " > + "\"nameAndKind\":{\"kind\":\"SUM\", " > + "\"name\":\"transformedValue-ElementCount\"}}", > result); > {code} > > {{assertEqualsOnJson}} is implemented as below. The following field and > methods should go to shared test utility class (sdks/testing?) > {code:java} > private static final JacksonFactory jacksonFactory = > JacksonFactory.getDefaultInstance(); > public static void assertEqualsOnJson(String > expectedJsonText, T actual) { > CounterUpdate expected = parse(expectedJsonText, CounterUpdate.class); > assertEquals(expected, actual); > } > public static T parse(String text, Class clazz) { > try { > JsonParser parser = jacksonFactory.createJsonParser(text); > return parser.parse(clazz); > } catch (IOException ex) { > throw new IllegalArgumentException("Could not parse the text as " + > clazz, ex); > } > } > {code} > A feature request to handle escaping double quotes via JacksonFactory: > [https://github.com/googleapis/google-http-java-client/issues/923] > > h1. Option3: Check JSON equality via JSONassert > * https://github.com/skyscreamer/JSONassert > * https://github.com/hertzsprung/hamcrest-json (Not using as last commit was > in 2012) > The JSONassert example does not carry quoted double quote characters. The > implementation would be converting actual object into JSON object and calling > {{JSONAssert.assertEqual}}. > Credit: Luke Cwik > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9000) Java Test Assertions without toString for GenericJson subclasses
[ https://issues.apache.org/jira/browse/BEAM-9000?focusedWorklogId=361931&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361931 ] ASF GitHub Bot logged work on BEAM-9000: Author: ASF GitHub Bot Created on: 20/Dec/19 22:29 Start Date: 20/Dec/19 22:29 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10441: [BEAM-9000] Java Test Assertions without toString for GenericJson subclasses URL: https://github.com/apache/beam/pull/10441#issuecomment-56800 Run BigQueryIO Streaming Performance Test Java This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361931) Time Spent: 1h (was: 50m) > Java Test Assertions without toString for GenericJson subclasses > > > Key: BEAM-9000 > URL: https://issues.apache.org/jira/browse/BEAM-9000 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > As of now, there are many tests that assert on {{toString()}} of objects. > {code:java} > CounterUpdate result = testObject.transform(monitoringInfo); > assertEquals( > "{cumulative=true, integer={highBits=0, lowBits=0}, " > + "nameAndKind={kind=SUM, " > + "name=transformedValue-ElementCount}}", > result.toString()); > {code} > This style is prone to unnecessary maintenance of the test code when > upgrading dependencies. Dependencies may change the internal ordering of > fields and trivial change in {{toString()}}. In BEAM-8695, where I tried to > upgrade google-http-client, there are ~30 comparison failure due to this > {{toString}} assertions. > They are subclasses of {{com.google.api.client.json.GenericJson}}. > Several options to enhance these assertions. > h1. Option 1: Assertion using Map > Leveraging the fact that GenericJson is a subclass of AbstractMap Object>, the assertion can be written as > {code:java} > ImmutableMap expected = ImmutableMap.of("cumulative", > true, > "integer", ImmutableMap.of("highBits", 0, "lowBits", 0), > "nameAndKind", ImmutableMap.of("kind", "SUM", "name", > "transformedValue-ElementCount")); > assertEquals(expected, (Map)result); > {code} > Credit: Ben Whitehead. > h1. Option 2: Create assertEqualsOnJson > Leveraging the fact that instance of GenericJson can be instantiated through > JSON, the assertion can be written as > {code:java} > assertEqualsOnJson( > "{\"cumulative\":true, \"integer\":{\"highBits\":0, \"lowBits\":0}, " > + "\"nameAndKind\":{\"kind\":\"SUM\", " > + "\"name\":\"transformedValue-ElementCount\"}}", > result); > {code} > > {{assertEqualsOnJson}} is implemented as below. The following field and > methods should go to shared test utility class (sdks/testing?) > {code:java} > private static final JacksonFactory jacksonFactory = > JacksonFactory.getDefaultInstance(); > public static void assertEqualsOnJson(String > expectedJsonText, T actual) { > CounterUpdate expected = parse(expectedJsonText, CounterUpdate.class); > assertEquals(expected, actual); > } > public static T parse(String text, Class clazz) { > try { > JsonParser parser = jacksonFactory.createJsonParser(text); > return parser.parse(clazz); > } catch (IOException ex) { > throw new IllegalArgumentException("Could not parse the text as " + > clazz, ex); > } > } > {code} > A feature request to handle escaping double quotes via JacksonFactory: > [https://github.com/googleapis/google-http-java-client/issues/923] > > h1. Option3: Check JSON equality via JSONassert > * https://github.com/skyscreamer/JSONassert > * https://github.com/hertzsprung/hamcrest-json (Not using as last commit was > in 2012) > The JSONassert example does not carry quoted double quote characters. The > implementation would be converting actual object into JSON object and calling > {{JSONAssert.assertEqual}}. > Credit: Luke Cwik > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9000) Java Test Assertions without toString for GenericJson subclasses
[ https://issues.apache.org/jira/browse/BEAM-9000?focusedWorklogId=361932&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361932 ] ASF GitHub Bot logged work on BEAM-9000: Author: ASF GitHub Bot Created on: 20/Dec/19 22:29 Start Date: 20/Dec/19 22:29 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10441: [BEAM-9000] Java Test Assertions without toString for GenericJson subclasses URL: https://github.com/apache/beam/pull/10441#issuecomment-56825 Run Dataflow ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361932) Time Spent: 1h 10m (was: 1h) > Java Test Assertions without toString for GenericJson subclasses > > > Key: BEAM-9000 > URL: https://issues.apache.org/jira/browse/BEAM-9000 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > As of now, there are many tests that assert on {{toString()}} of objects. > {code:java} > CounterUpdate result = testObject.transform(monitoringInfo); > assertEquals( > "{cumulative=true, integer={highBits=0, lowBits=0}, " > + "nameAndKind={kind=SUM, " > + "name=transformedValue-ElementCount}}", > result.toString()); > {code} > This style is prone to unnecessary maintenance of the test code when > upgrading dependencies. Dependencies may change the internal ordering of > fields and trivial change in {{toString()}}. In BEAM-8695, where I tried to > upgrade google-http-client, there are ~30 comparison failure due to this > {{toString}} assertions. > They are subclasses of {{com.google.api.client.json.GenericJson}}. > Several options to enhance these assertions. > h1. Option 1: Assertion using Map > Leveraging the fact that GenericJson is a subclass of AbstractMap Object>, the assertion can be written as > {code:java} > ImmutableMap expected = ImmutableMap.of("cumulative", > true, > "integer", ImmutableMap.of("highBits", 0, "lowBits", 0), > "nameAndKind", ImmutableMap.of("kind", "SUM", "name", > "transformedValue-ElementCount")); > assertEquals(expected, (Map)result); > {code} > Credit: Ben Whitehead. > h1. Option 2: Create assertEqualsOnJson > Leveraging the fact that instance of GenericJson can be instantiated through > JSON, the assertion can be written as > {code:java} > assertEqualsOnJson( > "{\"cumulative\":true, \"integer\":{\"highBits\":0, \"lowBits\":0}, " > + "\"nameAndKind\":{\"kind\":\"SUM\", " > + "\"name\":\"transformedValue-ElementCount\"}}", > result); > {code} > > {{assertEqualsOnJson}} is implemented as below. The following field and > methods should go to shared test utility class (sdks/testing?) > {code:java} > private static final JacksonFactory jacksonFactory = > JacksonFactory.getDefaultInstance(); > public static void assertEqualsOnJson(String > expectedJsonText, T actual) { > CounterUpdate expected = parse(expectedJsonText, CounterUpdate.class); > assertEquals(expected, actual); > } > public static T parse(String text, Class clazz) { > try { > JsonParser parser = jacksonFactory.createJsonParser(text); > return parser.parse(clazz); > } catch (IOException ex) { > throw new IllegalArgumentException("Could not parse the text as " + > clazz, ex); > } > } > {code} > A feature request to handle escaping double quotes via JacksonFactory: > [https://github.com/googleapis/google-http-java-client/issues/923] > > h1. Option3: Check JSON equality via JSONassert > * https://github.com/skyscreamer/JSONassert > * https://github.com/hertzsprung/hamcrest-json (Not using as last commit was > in 2012) > The JSONassert example does not carry quoted double quote characters. The > implementation would be converting actual object into JSON object and calling > {{JSONAssert.assertEqual}}. > Credit: Luke Cwik > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9000) Java Test Assertions without toString for GenericJson subclasses
[ https://issues.apache.org/jira/browse/BEAM-9000?focusedWorklogId=361924&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361924 ] ASF GitHub Bot logged work on BEAM-9000: Author: ASF GitHub Bot Created on: 20/Dec/19 22:28 Start Date: 20/Dec/19 22:28 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10441: [BEAM-9000] Java Test Assertions without toString for GenericJson subclasses URL: https://github.com/apache/beam/pull/10441#issuecomment-568110802 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361924) Time Spent: 40m (was: 0.5h) > Java Test Assertions without toString for GenericJson subclasses > > > Key: BEAM-9000 > URL: https://issues.apache.org/jira/browse/BEAM-9000 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > As of now, there are many tests that assert on {{toString()}} of objects. > {code:java} > CounterUpdate result = testObject.transform(monitoringInfo); > assertEquals( > "{cumulative=true, integer={highBits=0, lowBits=0}, " > + "nameAndKind={kind=SUM, " > + "name=transformedValue-ElementCount}}", > result.toString()); > {code} > This style is prone to unnecessary maintenance of the test code when > upgrading dependencies. Dependencies may change the internal ordering of > fields and trivial change in {{toString()}}. In BEAM-8695, where I tried to > upgrade google-http-client, there are ~30 comparison failure due to this > {{toString}} assertions. > They are subclasses of {{com.google.api.client.json.GenericJson}}. > Several options to enhance these assertions. > h1. Option 1: Assertion using Map > Leveraging the fact that GenericJson is a subclass of AbstractMap Object>, the assertion can be written as > {code:java} > ImmutableMap expected = ImmutableMap.of("cumulative", > true, > "integer", ImmutableMap.of("highBits", 0, "lowBits", 0), > "nameAndKind", ImmutableMap.of("kind", "SUM", "name", > "transformedValue-ElementCount")); > assertEquals(expected, (Map)result); > {code} > Credit: Ben Whitehead. > h1. Option 2: Create assertEqualsOnJson > Leveraging the fact that instance of GenericJson can be instantiated through > JSON, the assertion can be written as > {code:java} > assertEqualsOnJson( > "{\"cumulative\":true, \"integer\":{\"highBits\":0, \"lowBits\":0}, " > + "\"nameAndKind\":{\"kind\":\"SUM\", " > + "\"name\":\"transformedValue-ElementCount\"}}", > result); > {code} > > {{assertEqualsOnJson}} is implemented as below. The following field and > methods should go to shared test utility class (sdks/testing?) > {code:java} > private static final JacksonFactory jacksonFactory = > JacksonFactory.getDefaultInstance(); > public static void assertEqualsOnJson(String > expectedJsonText, T actual) { > CounterUpdate expected = parse(expectedJsonText, CounterUpdate.class); > assertEquals(expected, actual); > } > public static T parse(String text, Class clazz) { > try { > JsonParser parser = jacksonFactory.createJsonParser(text); > return parser.parse(clazz); > } catch (IOException ex) { > throw new IllegalArgumentException("Could not parse the text as " + > clazz, ex); > } > } > {code} > A feature request to handle escaping double quotes via JacksonFactory: > [https://github.com/googleapis/google-http-java-client/issues/923] > > h1. Option3: Check JSON equality via JSONassert > * https://github.com/skyscreamer/JSONassert > * https://github.com/hertzsprung/hamcrest-json (Not using as last commit was > in 2012) > The JSONassert example does not carry quoted double quote characters. The > implementation would be converting actual object into JSON object and calling > {{JSONAssert.assertEqual}}. > Credit: Luke Cwik > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8676) Beam Dependency Update Request: com.google.api:gax-grpc
[ https://issues.apache.org/jira/browse/BEAM-8676?focusedWorklogId=361925&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361925 ] ASF GitHub Bot logged work on BEAM-8676: Author: ASF GitHub Bot Created on: 20/Dec/19 22:28 Start Date: 20/Dec/19 22:28 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10438: [BEAM-8676] sdks/java: gax and grpc upgrades URL: https://github.com/apache/beam/pull/10438#issuecomment-568110929 Run Java HadoopFormatIO Performance Test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361925) Time Spent: 1h 50m (was: 1h 40m) > Beam Dependency Update Request: com.google.api:gax-grpc > --- > > Key: BEAM-8676 > URL: https://issues.apache.org/jira/browse/BEAM-8676 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Tomo Suzuki >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > - 2019-11-15 19:38:32.410774 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.50.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-11-19 21:03:23.809273 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.50.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-02 12:08:16.165687 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.50.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-09 12:07:17.894174 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.51.0 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8676) Beam Dependency Update Request: com.google.api:gax-grpc
[ https://issues.apache.org/jira/browse/BEAM-8676?focusedWorklogId=361927&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361927 ] ASF GitHub Bot logged work on BEAM-8676: Author: ASF GitHub Bot Created on: 20/Dec/19 22:28 Start Date: 20/Dec/19 22:28 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10438: [BEAM-8676] sdks/java: gax and grpc upgrades URL: https://github.com/apache/beam/pull/10438#issuecomment-568110969 Run Dataflow ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361927) Time Spent: 2h 10m (was: 2h) > Beam Dependency Update Request: com.google.api:gax-grpc > --- > > Key: BEAM-8676 > URL: https://issues.apache.org/jira/browse/BEAM-8676 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Tomo Suzuki >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > - 2019-11-15 19:38:32.410774 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.50.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-11-19 21:03:23.809273 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.50.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-02 12:08:16.165687 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.50.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-09 12:07:17.894174 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.51.0 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8676) Beam Dependency Update Request: com.google.api:gax-grpc
[ https://issues.apache.org/jira/browse/BEAM-8676?focusedWorklogId=361929&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361929 ] ASF GitHub Bot logged work on BEAM-8676: Author: ASF GitHub Bot Created on: 20/Dec/19 22:28 Start Date: 20/Dec/19 22:28 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10438: [BEAM-8676] sdks/java: gax and grpc upgrades URL: https://github.com/apache/beam/pull/10438#issuecomment-568111016 Run SQL Postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361929) Time Spent: 2.5h (was: 2h 20m) > Beam Dependency Update Request: com.google.api:gax-grpc > --- > > Key: BEAM-8676 > URL: https://issues.apache.org/jira/browse/BEAM-8676 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Tomo Suzuki >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > - 2019-11-15 19:38:32.410774 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.50.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-11-19 21:03:23.809273 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.50.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-02 12:08:16.165687 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.50.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-09 12:07:17.894174 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.51.0 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8676) Beam Dependency Update Request: com.google.api:gax-grpc
[ https://issues.apache.org/jira/browse/BEAM-8676?focusedWorklogId=361926&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361926 ] ASF GitHub Bot logged work on BEAM-8676: Author: ASF GitHub Bot Created on: 20/Dec/19 22:28 Start Date: 20/Dec/19 22:28 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10438: [BEAM-8676] sdks/java: gax and grpc upgrades URL: https://github.com/apache/beam/pull/10438#issuecomment-568110952 Run BigQueryIO Streaming Performance Test Java This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361926) Time Spent: 2h (was: 1h 50m) > Beam Dependency Update Request: com.google.api:gax-grpc > --- > > Key: BEAM-8676 > URL: https://issues.apache.org/jira/browse/BEAM-8676 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Tomo Suzuki >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > - 2019-11-15 19:38:32.410774 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.50.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-11-19 21:03:23.809273 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.50.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-02 12:08:16.165687 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.50.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-09 12:07:17.894174 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.51.0 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8676) Beam Dependency Update Request: com.google.api:gax-grpc
[ https://issues.apache.org/jira/browse/BEAM-8676?focusedWorklogId=361928&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361928 ] ASF GitHub Bot logged work on BEAM-8676: Author: ASF GitHub Bot Created on: 20/Dec/19 22:28 Start Date: 20/Dec/19 22:28 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10438: [BEAM-8676] sdks/java: gax and grpc upgrades URL: https://github.com/apache/beam/pull/10438#issuecomment-568110998 Run Spark ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361928) Time Spent: 2h 20m (was: 2h 10m) > Beam Dependency Update Request: com.google.api:gax-grpc > --- > > Key: BEAM-8676 > URL: https://issues.apache.org/jira/browse/BEAM-8676 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Tomo Suzuki >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > - 2019-11-15 19:38:32.410774 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.50.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-11-19 21:03:23.809273 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.50.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-02 12:08:16.165687 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.50.1 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-12-09 12:07:17.894174 > - > Please consider upgrading the dependency com.google.api:gax-grpc. > The current version is 1.38.0. The latest version is 1.51.0 > cc: > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator
[ https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361922&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361922 ] ASF GitHub Bot logged work on BEAM-8630: Author: ASF GitHub Bot Created on: 20/Dec/19 22:26 Start Date: 20/Dec/19 22:26 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #9913: [BEAM-8630] Prototype of BeamZetaSqlCalcRel URL: https://github.com/apache/beam/pull/9913#discussion_r360592230 ## File path: sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLQueryPlanner.java ## @@ -64,6 +68,28 @@ public ZetaSQLQueryPlanner(JdbcConnection jdbcConnection, RuleSet[] ruleSets) { plannerImpl = new ZetaSQLPlannerImpl(defaultConfig(jdbcConnection, ruleSets)); } + public static RuleSet[] getZetaSqlRuleSets() { +// TODO[BEAM-8630]: uncomment the next line once we have fully migrated to BeamZetaSqlCalcRel +// return replaceBeamCalcRule(BeamRuleSets.getRuleSets()); Review comment: I assume you'd also need to add this line up in `ZetaSQLQueryPlanner`? Or does that not actually do anything? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361922) Time Spent: 2h 20m (was: 2h 10m) > Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator > > > Key: BEAM-8630 > URL: https://issues.apache.org/jira/browse/BEAM-8630 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator
[ https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361910&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361910 ] ASF GitHub Bot logged work on BEAM-8630: Author: ASF GitHub Bot Created on: 20/Dec/19 22:20 Start Date: 20/Dec/19 22:20 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #9913: [BEAM-8630] Prototype of BeamZetaSqlCalcRel URL: https://github.com/apache/beam/pull/9913#discussion_r360586132 ## File path: sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCalcRel.java ## @@ -0,0 +1,259 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.zetasql; + +import com.google.zetasql.AnalyzerOptions; +import com.google.zetasql.PreparedExpression; +import com.google.zetasql.Value; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.function.IntFunction; +import java.util.stream.Collectors; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.extensions.sql.impl.planner.BeamCostModel; +import org.apache.beam.sdk.extensions.sql.impl.planner.NodeStats; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSortRel; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils; +import org.apache.beam.sdk.extensions.sql.impl.rel.WithLimitableInput; +import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils; +import org.apache.beam.sdk.extensions.sql.meta.provider.bigquery.BeamBigQuerySqlDialect; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.Field; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionList; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rel2sql.SqlImplementor; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLocalRef; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlDialect; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIdentifier; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParserPos; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; + +/** + * BeamRelNode to replace {@code Project} and {@code Filter} node based on the {@code ZetaSQL} + * expression evaluator. + */ +// TODO[BEAM-8630]: This class is currently a prototype and not used in runtime. Review comment: nit: this should probably be part of the comment block above This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ---
[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator
[ https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361909&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361909 ] ASF GitHub Bot logged work on BEAM-8630: Author: ASF GitHub Bot Created on: 20/Dec/19 22:20 Start Date: 20/Dec/19 22:20 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #9913: [BEAM-8630] Prototype of BeamZetaSqlCalcRel URL: https://github.com/apache/beam/pull/9913#discussion_r360587972 ## File path: sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/WithLimitableInput.java ## @@ -0,0 +1,28 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.impl.rel; + +import org.apache.beam.sdk.annotations.Internal; + +/** Interface for a {@code Calc} whose input can be a {@link BeamSortRel} with a limit count. */ +@Internal +public interface WithLimitableInput { Review comment: nit: The implementation of this interface is identical between here and BeamCalcRel. What you actually have is a common class `BeamCalc` on top of calcite's `core.Calc`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361909) Time Spent: 1h 10m (was: 1h) > Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator > > > Key: BEAM-8630 > URL: https://issues.apache.org/jira/browse/BEAM-8630 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator
[ https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361914&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361914 ] ASF GitHub Bot logged work on BEAM-8630: Author: ASF GitHub Bot Created on: 20/Dec/19 22:20 Start Date: 20/Dec/19 22:20 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #9913: [BEAM-8630] Prototype of BeamZetaSqlCalcRel URL: https://github.com/apache/beam/pull/9913#discussion_r360593644 ## File path: sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlUtils.java ## @@ -0,0 +1,199 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.zetasql; + +import com.google.protobuf.ByteString; +import com.google.zetasql.ArrayType; +import com.google.zetasql.StructType; +import com.google.zetasql.StructType.StructField; +import com.google.zetasql.Type; +import com.google.zetasql.TypeFactory; +import com.google.zetasql.Value; +import com.google.zetasql.ZetaSQLType.TypeKind; +import java.util.ArrayList; +import java.util.List; +import java.util.stream.Collectors; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.Field; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.math.LongMath; +import org.joda.time.Instant; + +/** Utility methods for ZetaSQL related operations. */ +@Internal +public final class ZetaSqlUtils { + + private static final long MICROS_PER_MILLI = 1000L; + + private ZetaSqlUtils() {} + + // Unsupported ZetaSQL types: INT32, UINT32, UINT64, FLOAT, ENUM, PROTO, GEOGRAPHY + // TODO[BEAM-8630]: support ZetaSQL types: DATE, TIME, DATETIME + public static Type beamFieldTypeToZetaSqlType(FieldType fieldType) { +switch (fieldType.getTypeName()) { + case INT64: +return TypeFactory.createSimpleType(TypeKind.TYPE_INT64); + case DECIMAL: +return TypeFactory.createSimpleType(TypeKind.TYPE_NUMERIC); + case DOUBLE: +return TypeFactory.createSimpleType(TypeKind.TYPE_DOUBLE); + case STRING: +return TypeFactory.createSimpleType(TypeKind.TYPE_STRING); + case DATETIME: +// TODO[BEAM-8630]: Mapping Timestamp to DATETIME results in some timezone/precision issues. Review comment: We determined the timezone issue is non-existent. I wonder if we could make a logical type that gave us an extra field to stuff nanoseconds without breaking window functions? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361914) Time Spent: 1h 50m (was: 1h 40m) > Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator > > > Key: BEAM-8630 > URL: https://issues.apache.org/jira/browse/BEAM-8630 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator
[ https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361913&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361913 ] ASF GitHub Bot logged work on BEAM-8630: Author: ASF GitHub Bot Created on: 20/Dec/19 22:20 Start Date: 20/Dec/19 22:20 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #9913: [BEAM-8630] Prototype of BeamZetaSqlCalcRel URL: https://github.com/apache/beam/pull/9913#discussion_r360590462 ## File path: sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCalcRel.java ## @@ -0,0 +1,259 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.zetasql; + +import com.google.zetasql.AnalyzerOptions; +import com.google.zetasql.PreparedExpression; +import com.google.zetasql.Value; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.function.IntFunction; +import java.util.stream.Collectors; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.extensions.sql.impl.planner.BeamCostModel; +import org.apache.beam.sdk.extensions.sql.impl.planner.NodeStats; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSortRel; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils; +import org.apache.beam.sdk.extensions.sql.impl.rel.WithLimitableInput; +import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils; +import org.apache.beam.sdk.extensions.sql.meta.provider.bigquery.BeamBigQuerySqlDialect; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.Field; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionList; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rel2sql.SqlImplementor; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLocalRef; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlDialect; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIdentifier; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParserPos; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; + +/** + * BeamRelNode to replace {@code Project} and {@code Filter} node based on the {@code ZetaSQL} + * expression evaluator. + */ +// TODO[BEAM-8630]: This class is currently a prototype and not used in runtime. +@Internal +public class BeamZetaSqlCalcRel extends Calc implements BeamRelNode, WithLimitableInput { + + private static final SqlDialect DIALECT = BeamBigQuerySqlDialect.DEFAULT; + private final SqlImplementor.Context context; + + public BeamZetaSqlCalcRel( + RelOptCluster cluster, RelTraitSet traits, RelNode input, RexProgram program) { +super(cluster, traits, input, program); +final IntFunction fn = +i -> +
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=361908&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361908 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 20/Dec/19 22:20 Start Date: 20/Dec/19 22:20 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10120: [BEAM-8335] Add a TestStreamService Python Implementation URL: https://github.com/apache/beam/pull/10120 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361908) Time Spent: 49h 40m (was: 49.5h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 49h 40m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator
[ https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361917&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361917 ] ASF GitHub Bot logged work on BEAM-8630: Author: ASF GitHub Bot Created on: 20/Dec/19 22:20 Start Date: 20/Dec/19 22:20 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #9913: [BEAM-8630] Prototype of BeamZetaSqlCalcRel URL: https://github.com/apache/beam/pull/9913#discussion_r360593247 ## File path: sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/translation/SingleRowScanConverter.java ## @@ -36,6 +42,22 @@ public boolean canConvert(ResolvedSingleRowScan zetaNode) { @Override public RelNode convert(ResolvedSingleRowScan zetaNode, List inputs) { -return LogicalValues.createOneRow(getCluster()); +return createOneRow(getCluster()); + } + + // This function is a reimplementation of Calcite's LogicalValues.createOneRow() with a single + // line change: SqlTypeName.INTEGER replaced by SqlTypeName.BIGINT. + // Would like to use LogicalValues.createOneRow(), but it uses type SqlTypeName.INTEGER which + // correspond to TypeKind.TYPE_INT32 in ZetaSQL, a type not supported in PRODUCT_EXTERNAL mode. + private static LogicalValues createOneRow(RelOptCluster cluster) { Review comment: I don't understand this. More context please? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361917) Time Spent: 2h (was: 1h 50m) > Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator > > > Key: BEAM-8630 > URL: https://issues.apache.org/jira/browse/BEAM-8630 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator
[ https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361916&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361916 ] ASF GitHub Bot logged work on BEAM-8630: Author: ASF GitHub Bot Created on: 20/Dec/19 22:20 Start Date: 20/Dec/19 22:20 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #9913: [BEAM-8630] Prototype of BeamZetaSqlCalcRel URL: https://github.com/apache/beam/pull/9913#discussion_r360592000 ## File path: sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLQueryPlanner.java ## @@ -64,6 +68,28 @@ public ZetaSQLQueryPlanner(JdbcConnection jdbcConnection, RuleSet[] ruleSets) { plannerImpl = new ZetaSQLPlannerImpl(defaultConfig(jdbcConnection, ruleSets)); } + public static RuleSet[] getZetaSqlRuleSets() { +// TODO[BEAM-8630]: uncomment the next line once we have fully migrated to BeamZetaSqlCalcRel +// return replaceBeamCalcRule(BeamRuleSets.getRuleSets()); +return BeamRuleSets.getRuleSets(); + } + + private static RuleSet[] replaceBeamCalcRule(RuleSet[] ruleSets) { Review comment: This is perfect! Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361916) > Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator > > > Key: BEAM-8630 > URL: https://issues.apache.org/jira/browse/BEAM-8630 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2572) Implement an S3 filesystem for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-2572?focusedWorklogId=361907&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361907 ] ASF GitHub Bot logged work on BEAM-2572: Author: ASF GitHub Bot Created on: 20/Dec/19 22:20 Start Date: 20/Dec/19 22:20 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9955: [BEAM-2572] Python SDK S3 Filesystem URL: https://github.com/apache/beam/pull/9955#issuecomment-568108785 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361907) Time Spent: 4.5h (was: 4h 20m) > Implement an S3 filesystem for Python SDK > - > > Key: BEAM-2572 > URL: https://issues.apache.org/jira/browse/BEAM-2572 > Project: Beam > Issue Type: Task > Components: sdk-py-core >Reporter: Dmitry Demeshchuk >Priority: Minor > Labels: GSoC2019, gsoc, gsoc2019, mentor, outreachy19dec > Time Spent: 4.5h > Remaining Estimate: 0h > > There are two paths worth exploring, to my understanding: > 1. Sticking to the HDFS-based approach (like it's done in Java). > 2. Using boto/boto3 for accessing S3 through its common API endpoints. > I personally prefer the second approach, for a few reasons: > 1. In real life, HDFS and S3 have different consistency guarantees, therefore > their behaviors may contradict each other in some edge cases (say, we write > something to S3, but it's not immediately accessible for reading from another > end). > 2. There are other AWS-based sources and sinks we may want to create in the > future: DynamoDB, Kinesis, SQS, etc. > 3. boto3 already provides somewhat good logic for basic things like > reattempting. > Whatever path we choose, there's another problem related to this: we > currently cannot pass any global settings (say, pipeline options, or just an > arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the > runner nodes to have AWS keys set up in the environment, which is not trivial > to achieve and doesn't look too clean either (I'd rather see one single place > for configuring the runner options). > Also, it's worth mentioning that I already have a janky S3 filesystem > implementation that only supports DirectRunner at the moment (because of the > previous paragraph). I'm perfectly fine finishing it myself, with some > guidance from the maintainers. > Where should I move on from here, and whose input should I be looking for? > Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator
[ https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361911&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361911 ] ASF GitHub Bot logged work on BEAM-8630: Author: ASF GitHub Bot Created on: 20/Dec/19 22:20 Start Date: 20/Dec/19 22:20 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #9913: [BEAM-8630] Prototype of BeamZetaSqlCalcRel URL: https://github.com/apache/beam/pull/9913#discussion_r36051 ## File path: sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCalcRel.java ## @@ -0,0 +1,259 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.zetasql; + +import com.google.zetasql.AnalyzerOptions; +import com.google.zetasql.PreparedExpression; +import com.google.zetasql.Value; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.function.IntFunction; +import java.util.stream.Collectors; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.extensions.sql.impl.planner.BeamCostModel; +import org.apache.beam.sdk.extensions.sql.impl.planner.NodeStats; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSortRel; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils; +import org.apache.beam.sdk.extensions.sql.impl.rel.WithLimitableInput; +import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils; +import org.apache.beam.sdk.extensions.sql.meta.provider.bigquery.BeamBigQuerySqlDialect; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.Field; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionList; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rel2sql.SqlImplementor; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLocalRef; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlDialect; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIdentifier; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParserPos; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; + +/** + * BeamRelNode to replace {@code Project} and {@code Filter} node based on the {@code ZetaSQL} + * expression evaluator. + */ +// TODO[BEAM-8630]: This class is currently a prototype and not used in runtime. +@Internal +public class BeamZetaSqlCalcRel extends Calc implements BeamRelNode, WithLimitableInput { + + private static final SqlDialect DIALECT = BeamBigQuerySqlDialect.DEFAULT; + private final SqlImplementor.Context context; + + public BeamZetaSqlCalcRel( + RelOptCluster cluster, RelTraitSet traits, RelNode input, RexProgram program) { +super(cluster, traits, input, program); +final IntFunction fn = +i -> +
[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator
[ https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361918&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361918 ] ASF GitHub Bot logged work on BEAM-8630: Author: ASF GitHub Bot Created on: 20/Dec/19 22:20 Start Date: 20/Dec/19 22:20 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #9913: [BEAM-8630] Prototype of BeamZetaSqlCalcRel URL: https://github.com/apache/beam/pull/9913#discussion_r360591572 ## File path: sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCalcRel.java ## @@ -0,0 +1,259 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.zetasql; + +import com.google.zetasql.AnalyzerOptions; +import com.google.zetasql.PreparedExpression; +import com.google.zetasql.Value; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.function.IntFunction; +import java.util.stream.Collectors; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.extensions.sql.impl.planner.BeamCostModel; +import org.apache.beam.sdk.extensions.sql.impl.planner.NodeStats; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSortRel; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils; +import org.apache.beam.sdk.extensions.sql.impl.rel.WithLimitableInput; +import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils; +import org.apache.beam.sdk.extensions.sql.meta.provider.bigquery.BeamBigQuerySqlDialect; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.Field; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionList; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rel2sql.SqlImplementor; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLocalRef; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlDialect; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIdentifier; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParserPos; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; + +/** + * BeamRelNode to replace {@code Project} and {@code Filter} node based on the {@code ZetaSQL} + * expression evaluator. + */ +// TODO[BEAM-8630]: This class is currently a prototype and not used in runtime. +@Internal +public class BeamZetaSqlCalcRel extends Calc implements BeamRelNode, WithLimitableInput { + + private static final SqlDialect DIALECT = BeamBigQuerySqlDialect.DEFAULT; + private final SqlImplementor.Context context; + + public BeamZetaSqlCalcRel( + RelOptCluster cluster, RelTraitSet traits, RelNode input, RexProgram program) { +super(cluster, traits, input, program); +final IntFunction fn = +i -> +
[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator
[ https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361919&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361919 ] ASF GitHub Bot logged work on BEAM-8630: Author: ASF GitHub Bot Created on: 20/Dec/19 22:20 Start Date: 20/Dec/19 22:20 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #9913: [BEAM-8630] Prototype of BeamZetaSqlCalcRel URL: https://github.com/apache/beam/pull/9913#discussion_r360592858 ## File path: sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlUtilsTest.java ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.zetasql; + +import static org.junit.Assert.assertEquals; + +import com.google.protobuf.ByteString; +import com.google.zetasql.ArrayType; +import com.google.zetasql.StructType; +import com.google.zetasql.StructType.StructField; +import com.google.zetasql.TypeFactory; +import com.google.zetasql.Value; +import com.google.zetasql.ZetaSQLType.TypeKind; +import java.util.Arrays; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.FieldType; +import org.apache.beam.sdk.values.Row; +import org.joda.time.Instant; +import org.junit.Test; +import org.junit.runner.RunWith; +import org.junit.runners.JUnit4; + +/** Tests for utility methods for ZetaSQL related operations. */ +@RunWith(JUnit4.class) +public class ZetaSqlUtilsTest { + + private static final Schema TEST_INNER_SCHEMA = + Schema.builder().addField("i1", FieldType.INT64).addField("i2", FieldType.STRING).build(); + + private static final Schema TEST_SCHEMA = + Schema.builder() + .addField("f1", FieldType.INT64) Review comment: Yes, nullable fields has been a huge pain point in the past. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361919) Time Spent: 2h 10m (was: 2h) > Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator > > > Key: BEAM-8630 > URL: https://issues.apache.org/jira/browse/BEAM-8630 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator
[ https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361915&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361915 ] ASF GitHub Bot logged work on BEAM-8630: Author: ASF GitHub Bot Created on: 20/Dec/19 22:20 Start Date: 20/Dec/19 22:20 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #9913: [BEAM-8630] Prototype of BeamZetaSqlCalcRel URL: https://github.com/apache/beam/pull/9913#discussion_r360591277 ## File path: sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCalcRel.java ## @@ -0,0 +1,259 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.zetasql; + +import com.google.zetasql.AnalyzerOptions; +import com.google.zetasql.PreparedExpression; +import com.google.zetasql.Value; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.function.IntFunction; +import java.util.stream.Collectors; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.extensions.sql.impl.planner.BeamCostModel; +import org.apache.beam.sdk.extensions.sql.impl.planner.NodeStats; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSortRel; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils; +import org.apache.beam.sdk.extensions.sql.impl.rel.WithLimitableInput; +import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils; +import org.apache.beam.sdk.extensions.sql.meta.provider.bigquery.BeamBigQuerySqlDialect; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.Field; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionList; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rel2sql.SqlImplementor; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLocalRef; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlDialect; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIdentifier; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParserPos; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; + +/** + * BeamRelNode to replace {@code Project} and {@code Filter} node based on the {@code ZetaSQL} + * expression evaluator. + */ +// TODO[BEAM-8630]: This class is currently a prototype and not used in runtime. +@Internal +public class BeamZetaSqlCalcRel extends Calc implements BeamRelNode, WithLimitableInput { + + private static final SqlDialect DIALECT = BeamBigQuerySqlDialect.DEFAULT; + private final SqlImplementor.Context context; + + public BeamZetaSqlCalcRel( + RelOptCluster cluster, RelTraitSet traits, RelNode input, RexProgram program) { +super(cluster, traits, input, program); +final IntFunction fn = +i -> +
[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator
[ https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361920&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361920 ] ASF GitHub Bot logged work on BEAM-8630: Author: ASF GitHub Bot Created on: 20/Dec/19 22:20 Start Date: 20/Dec/19 22:20 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #9913: [BEAM-8630] Prototype of BeamZetaSqlCalcRel URL: https://github.com/apache/beam/pull/9913#discussion_r360592230 ## File path: sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSQLQueryPlanner.java ## @@ -64,6 +68,28 @@ public ZetaSQLQueryPlanner(JdbcConnection jdbcConnection, RuleSet[] ruleSets) { plannerImpl = new ZetaSQLPlannerImpl(defaultConfig(jdbcConnection, ruleSets)); } + public static RuleSet[] getZetaSqlRuleSets() { +// TODO[BEAM-8630]: uncomment the next line once we have fully migrated to BeamZetaSqlCalcRel +// return replaceBeamCalcRule(BeamRuleSets.getRuleSets()); Review comment: I assume you'd also need to add this line up in `ZetaSQLQueryPlanner`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361920) Time Spent: 2h 10m (was: 2h) > Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator > > > Key: BEAM-8630 > URL: https://issues.apache.org/jira/browse/BEAM-8630 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8630) Prototype of BeamSQL Calc using ZetaSQL Expression Evaluator
[ https://issues.apache.org/jira/browse/BEAM-8630?focusedWorklogId=361912&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361912 ] ASF GitHub Bot logged work on BEAM-8630: Author: ASF GitHub Bot Created on: 20/Dec/19 22:20 Start Date: 20/Dec/19 22:20 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #9913: [BEAM-8630] Prototype of BeamZetaSqlCalcRel URL: https://github.com/apache/beam/pull/9913#discussion_r360590920 ## File path: sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/BeamZetaSqlCalcRel.java ## @@ -0,0 +1,259 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.extensions.sql.zetasql; + +import com.google.zetasql.AnalyzerOptions; +import com.google.zetasql.PreparedExpression; +import com.google.zetasql.Value; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.function.IntFunction; +import java.util.stream.Collectors; +import javax.annotation.Nullable; +import org.apache.beam.sdk.annotations.Internal; +import org.apache.beam.sdk.extensions.sql.impl.planner.BeamCostModel; +import org.apache.beam.sdk.extensions.sql.impl.planner.NodeStats; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamRelNode; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSortRel; +import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils; +import org.apache.beam.sdk.extensions.sql.impl.rel.WithLimitableInput; +import org.apache.beam.sdk.extensions.sql.impl.utils.CalciteUtils; +import org.apache.beam.sdk.extensions.sql.meta.provider.bigquery.BeamBigQuerySqlDialect; +import org.apache.beam.sdk.schemas.Schema; +import org.apache.beam.sdk.schemas.Schema.Field; +import org.apache.beam.sdk.transforms.DoFn; +import org.apache.beam.sdk.transforms.PTransform; +import org.apache.beam.sdk.transforms.ParDo; +import org.apache.beam.sdk.values.PCollection; +import org.apache.beam.sdk.values.PCollectionList; +import org.apache.beam.sdk.values.Row; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptCluster; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelOptPlanner; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.plan.RelTraitSet; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.RelNode; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.core.Calc; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.metadata.RelMetadataQuery; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rel.rel2sql.SqlImplementor; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexLocalRef; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexNode; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.rex.RexProgram; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlDialect; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlIdentifier; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.SqlNode; +import org.apache.beam.vendor.calcite.v1_20_0.org.apache.calcite.sql.parser.SqlParserPos; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList; + +/** + * BeamRelNode to replace {@code Project} and {@code Filter} node based on the {@code ZetaSQL} + * expression evaluator. + */ +// TODO[BEAM-8630]: This class is currently a prototype and not used in runtime. +@Internal +public class BeamZetaSqlCalcRel extends Calc implements BeamRelNode, WithLimitableInput { + + private static final SqlDialect DIALECT = BeamBigQuerySqlDialect.DEFAULT; + private final SqlImplementor.Context context; + + public BeamZetaSqlCalcRel( + RelOptCluster cluster, RelTraitSet traits, RelNode input, RexProgram program) { +super(cluster, traits, input, program); +final IntFunction fn = +i -> +
[jira] [Work logged] (BEAM-2572) Implement an S3 filesystem for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-2572?focusedWorklogId=361906&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361906 ] ASF GitHub Bot logged work on BEAM-2572: Author: ASF GitHub Bot Created on: 20/Dec/19 22:19 Start Date: 20/Dec/19 22:19 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9955: [BEAM-2572] Python SDK S3 Filesystem URL: https://github.com/apache/beam/pull/9955#issuecomment-568108754 Run Python2_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361906) Time Spent: 4h 20m (was: 4h 10m) > Implement an S3 filesystem for Python SDK > - > > Key: BEAM-2572 > URL: https://issues.apache.org/jira/browse/BEAM-2572 > Project: Beam > Issue Type: Task > Components: sdk-py-core >Reporter: Dmitry Demeshchuk >Priority: Minor > Labels: GSoC2019, gsoc, gsoc2019, mentor, outreachy19dec > Time Spent: 4h 20m > Remaining Estimate: 0h > > There are two paths worth exploring, to my understanding: > 1. Sticking to the HDFS-based approach (like it's done in Java). > 2. Using boto/boto3 for accessing S3 through its common API endpoints. > I personally prefer the second approach, for a few reasons: > 1. In real life, HDFS and S3 have different consistency guarantees, therefore > their behaviors may contradict each other in some edge cases (say, we write > something to S3, but it's not immediately accessible for reading from another > end). > 2. There are other AWS-based sources and sinks we may want to create in the > future: DynamoDB, Kinesis, SQS, etc. > 3. boto3 already provides somewhat good logic for basic things like > reattempting. > Whatever path we choose, there's another problem related to this: we > currently cannot pass any global settings (say, pipeline options, or just an > arbitrary kwarg) to a filesystem. Because of that, we'd have to setup the > runner nodes to have AWS keys set up in the environment, which is not trivial > to achieve and doesn't look too clean either (I'd rather see one single place > for configuring the runner options). > Also, it's worth mentioning that I already have a janky S3 filesystem > implementation that only supports DirectRunner at the moment (because of the > previous paragraph). I'm perfectly fine finishing it myself, with some > guidance from the maintainers. > Where should I move on from here, and whose input should I be looking for? > Thanks! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9010) BigQuery TableRow's size is toString().length() ?
[ https://issues.apache.org/jira/browse/BEAM-9010?focusedWorklogId=361904&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-361904 ] ASF GitHub Bot logged work on BEAM-9010: Author: ASF GitHub Bot Created on: 20/Dec/19 22:14 Start Date: 20/Dec/19 22:14 Worklog Time Spent: 10m Work Description: suztomo commented on issue #10444: [BEAM-9010] Proper TableRow size calculation via TableRowJsonCoder URL: https://github.com/apache/beam/pull/10444#issuecomment-568107404 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 361904) Time Spent: 50m (was: 40m) > BigQuery TableRow's size is toString().length() ? > - > > Key: BEAM-9010 > URL: https://issues.apache.org/jira/browse/BEAM-9010 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Minor > Attachments: TableRowJsonCoder_behavior_remains_same.png > > Time Spent: 50m > Remaining Estimate: 0h > > The following tests failed when I tried to upgrade google-http-client 1.34.0 > from 1.28.0: > {noformat} > org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithoutStreamingBuffer > org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOReadTest.testEstimatedSizeWithStreamingBuffer > org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtilTest.testInsertAll > {noformat} > [https://builds.apache.org/job/beam_PreCommit_Java_Commit/9288/#showFailuresLink] > h3. Reason of the test failures > [org.apache.beam.sdk.io.gcp.testing.TableContainer|https://github.com/apache/beam/blob/6fa94c9/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/testing/TableContainer.java#L43] > and > [org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl|https://github.com/apache/beam/blob/c2f0d28/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java#L758] > rely on {{TableRow.toString().length()}} to calculate the size. Example: > {code:java} > dataSize += row.toString().length(); > if (dataSize >= maxRowBatchSize > || rows.size() >= maxRowsPerBatch > || i == rowsToPublish.size() - 1) { > {code} > However, with [google-http-client's > PR#589|https://github.com/googleapis/google-http-java-client/pull/589/files#diff-914cd7ff18143b3d2398149e1cfb4f45R218], > the GenericData.toString output has changed since v1.29.0. > In old google-http-client 1.28.0, an example row's toString returned: > {noformat} > {f=[{v=foo}, {v=1234}]} > {noformat} > In new google-http-client 1.29.0 and higher, the same row's toString returns: > {noformat} > GenericData{classInfo=[f], {f=[GenericData{classInfo=[v], {v=foo}}, > GenericData{classInfo=[v], {v=1234}}]}} > {noformat} > h1. Question: > Is this right thing to rely on {{toString().length()}} in the BigQuery > classes? -- This message was sent by Atlassian Jira (v8.3.4#803005)