[jira] [Work logged] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable
[ https://issues.apache.org/jira/browse/BEAM-9286?focusedWorklogId=389339&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389339 ] ASF GitHub Bot logged work on BEAM-9286: Author: ASF GitHub Bot Created on: 19/Feb/20 06:41 Start Date: 19/Feb/20 06:41 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #10823: [BEAM-9286] Create validation runner test for metrics (user counter). URL: https://github.com/apache/beam/pull/10823#issuecomment-588058804 > I'm surprised we don't already have a test for such a basic feature. > > FYI, you can use yapf to format your code: https://cwiki.apache.org/confluence/display/BEAM/Python+Tips Thanks Robert. I was able to only find much a larger test that covers user counters, where counter verification is just small part of the whole thing. With a dedicated self-contained test for metrics, we may also keep adding other metrics related things here. PR updated. Also applied yapf. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389339) Time Spent: 2h 10m (was: 2h) > Create validation tests for metrics based on MonitoringInfo if applicable > - > > Key: BEAM-9286 > URL: https://issues.apache.org/jira/browse/BEAM-9286 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Ruoyun Huang >Assignee: Ruoyun Huang >Priority: Minor > Time Spent: 2h 10m > Remaining Estimate: 0h > > Create dedicated validation runner tests for metrics (those based Monitoring > Info). > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable
[ https://issues.apache.org/jira/browse/BEAM-9286?focusedWorklogId=389338&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389338 ] ASF GitHub Bot logged work on BEAM-9286: Author: ASF GitHub Bot Created on: 19/Feb/20 06:41 Start Date: 19/Feb/20 06:41 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #10823: [BEAM-9286] Create validation runner test for metrics (user counter). URL: https://github.com/apache/beam/pull/10823#issuecomment-588058804 > I'm surprised we don't already have a test for such a basic feature. > > FYI, you can use yapf to format your code: https://cwiki.apache.org/confluence/display/BEAM/Python+Tips Thanks Robert. I was able to only find much larger test that cover user counters, where counter verification is just small part of the whole thing. With a dedicated self-contained test for metrics, we may also keep adding other metrics related things here. PR updated. Also applied yapf. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389338) Time Spent: 2h (was: 1h 50m) > Create validation tests for metrics based on MonitoringInfo if applicable > - > > Key: BEAM-9286 > URL: https://issues.apache.org/jira/browse/BEAM-9286 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Ruoyun Huang >Assignee: Ruoyun Huang >Priority: Minor > Time Spent: 2h > Remaining Estimate: 0h > > Create dedicated validation runner tests for metrics (those based Monitoring > Info). > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable
[ https://issues.apache.org/jira/browse/BEAM-9286?focusedWorklogId=389337&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389337 ] ASF GitHub Bot logged work on BEAM-9286: Author: ASF GitHub Bot Created on: 19/Feb/20 06:32 Start Date: 19/Feb/20 06:32 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #10823: [BEAM-9286] Create validation runner test for metrics (user counter). URL: https://github.com/apache/beam/pull/10823#issuecomment-588058804 > I'm surprised we don't already have a test for such a basic feature. > > FYI, you can use yapf to format your code: https://cwiki.apache.org/confluence/display/BEAM/Python+Tips Thanks Robert. I was able to find larger test that cover user counters, but no dedicated validation runner test for metrics. PR updated. Also applied yapf. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389337) Time Spent: 1h 50m (was: 1h 40m) > Create validation tests for metrics based on MonitoringInfo if applicable > - > > Key: BEAM-9286 > URL: https://issues.apache.org/jira/browse/BEAM-9286 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Ruoyun Huang >Assignee: Ruoyun Huang >Priority: Minor > Time Spent: 1h 50m > Remaining Estimate: 0h > > Create dedicated validation runner tests for metrics (those based Monitoring > Info). > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable
[ https://issues.apache.org/jira/browse/BEAM-9286?focusedWorklogId=389335&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389335 ] ASF GitHub Bot logged work on BEAM-9286: Author: ASF GitHub Bot Created on: 19/Feb/20 06:32 Start Date: 19/Feb/20 06:32 Worklog Time Spent: 10m Work Description: HuangLED commented on issue #10823: [BEAM-9286] Create validation runner test for metrics (user counter). URL: https://github.com/apache/beam/pull/10823#issuecomment-588058804 > I'm surprised we don't already have a test for such a basic feature. > > FYI, you can use yapf to format your code: https://cwiki.apache.org/confluence/display/BEAM/Python+Tips Thanks Robert. I was able to find larger test that cover user counters, but no dedicated validation runner test for metrics. PR updated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389335) Time Spent: 1h 40m (was: 1.5h) > Create validation tests for metrics based on MonitoringInfo if applicable > - > > Key: BEAM-9286 > URL: https://issues.apache.org/jira/browse/BEAM-9286 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Ruoyun Huang >Assignee: Ruoyun Huang >Priority: Minor > Time Spent: 1h 40m > Remaining Estimate: 0h > > Create dedicated validation runner tests for metrics (those based Monitoring > Info). > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9276) python: create a class to encapsulate the work required to submit a pipeline to a job service
[ https://issues.apache.org/jira/browse/BEAM-9276?focusedWorklogId=389334&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389334 ] ASF GitHub Bot logged work on BEAM-9276: Author: ASF GitHub Bot Created on: 19/Feb/20 06:16 Start Date: 19/Feb/20 06:16 Worklog Time Spent: 10m Work Description: chadrik commented on issue #10811: [BEAM-9276] Create a class to encapsulate the steps required to submit a pipeline URL: https://github.com/apache/beam/pull/10811#issuecomment-588054888 Any thoughts on this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389334) Time Spent: 40m (was: 0.5h) > python: create a class to encapsulate the work required to submit a pipeline > to a job service > - > > Key: BEAM-9276 > URL: https://issues.apache.org/jira/browse/BEAM-9276 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > {{PortableRunner.run_pipeline}} is somewhat of a monolithic method for > submitting a pipeline. It would be useful to factor out the code responsible > for interacting with the job and artifact services (prepare, stage, run) to > make this easier to modify this behavior in portable runner subclasses, as > well as in tests. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=389333&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389333 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 19/Feb/20 06:15 Start Date: 19/Feb/20 06:15 Worklog Time Spent: 10m Work Description: chadrik commented on issue #10822: [BEAM-7746] Minor typing updates / fixes URL: https://github.com/apache/beam/pull/10822#issuecomment-584807444 R: @udim This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389333) Time Spent: 65h 10m (was: 65h) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 65h 10m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable
[ https://issues.apache.org/jira/browse/BEAM-9286?focusedWorklogId=389330&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389330 ] ASF GitHub Bot logged work on BEAM-9286: Author: ASF GitHub Bot Created on: 19/Feb/20 06:14 Start Date: 19/Feb/20 06:14 Worklog Time Spent: 10m Work Description: HuangLED commented on pull request #10823: [BEAM-9286] Create validation runner test for metrics (user counter). URL: https://github.com/apache/beam/pull/10823#discussion_r381094747 ## File path: sdks/python/apache_beam/metrics/metric_test.py ## @@ -127,6 +134,35 @@ def test_distribution_empty_namespace(self): with self.assertRaises(ValueError): Metrics.distribution("", "names") + @attr('ValidatesRunner') + def test_user_counter_using_pardo(self): +class SomeDoFn(beam.DoFn): + """A custom dummy DoFn using yield.""" + def __init__(self): +self.user_counter_elements = metrics.Metrics.counter( + self.__class__, 'metrics_user_counter_element') + + def process(self, element): +self.user_counter_elements.inc() +yield element + +pipeline = TestPipeline() +nums = pipeline | 'Input' >> beam.Create([1, 2, 3, 4]) +results = nums | 'ApplyPardo' >> beam.ParDo(SomeDoFn()) + +res = pipeline.run() +res.wait_until_finish() +metric_results = ( + res.metrics().query(MetricsFilter() +.with_name('metrics_user_counter_element'))) +outputs_counter = metric_results['counters'][0] +assert_that(results, equal_to([1, 2, 3, 4])) Review comment: doen. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389330) Time Spent: 1h 20m (was: 1h 10m) > Create validation tests for metrics based on MonitoringInfo if applicable > - > > Key: BEAM-9286 > URL: https://issues.apache.org/jira/browse/BEAM-9286 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Ruoyun Huang >Assignee: Ruoyun Huang >Priority: Minor > Time Spent: 1h 20m > Remaining Estimate: 0h > > Create dedicated validation runner tests for metrics (those based Monitoring > Info). > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable
[ https://issues.apache.org/jira/browse/BEAM-9286?focusedWorklogId=389331&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389331 ] ASF GitHub Bot logged work on BEAM-9286: Author: ASF GitHub Bot Created on: 19/Feb/20 06:14 Start Date: 19/Feb/20 06:14 Worklog Time Spent: 10m Work Description: HuangLED commented on pull request #10823: [BEAM-9286] Create validation runner test for metrics (user counter). URL: https://github.com/apache/beam/pull/10823#discussion_r381094747 ## File path: sdks/python/apache_beam/metrics/metric_test.py ## @@ -127,6 +134,35 @@ def test_distribution_empty_namespace(self): with self.assertRaises(ValueError): Metrics.distribution("", "names") + @attr('ValidatesRunner') + def test_user_counter_using_pardo(self): +class SomeDoFn(beam.DoFn): + """A custom dummy DoFn using yield.""" + def __init__(self): +self.user_counter_elements = metrics.Metrics.counter( + self.__class__, 'metrics_user_counter_element') + + def process(self, element): +self.user_counter_elements.inc() +yield element + +pipeline = TestPipeline() +nums = pipeline | 'Input' >> beam.Create([1, 2, 3, 4]) +results = nums | 'ApplyPardo' >> beam.ParDo(SomeDoFn()) + +res = pipeline.run() +res.wait_until_finish() +metric_results = ( + res.metrics().query(MetricsFilter() +.with_name('metrics_user_counter_element'))) +outputs_counter = metric_results['counters'][0] +assert_that(results, equal_to([1, 2, 3, 4])) Review comment: done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389331) Time Spent: 1.5h (was: 1h 20m) > Create validation tests for metrics based on MonitoringInfo if applicable > - > > Key: BEAM-9286 > URL: https://issues.apache.org/jira/browse/BEAM-9286 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Ruoyun Huang >Assignee: Ruoyun Huang >Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > > Create dedicated validation runner tests for metrics (those based Monitoring > Info). > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable
[ https://issues.apache.org/jira/browse/BEAM-9286?focusedWorklogId=389329&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389329 ] ASF GitHub Bot logged work on BEAM-9286: Author: ASF GitHub Bot Created on: 19/Feb/20 06:14 Start Date: 19/Feb/20 06:14 Worklog Time Spent: 10m Work Description: HuangLED commented on pull request #10823: [BEAM-9286] Create validation runner test for metrics (user counter). URL: https://github.com/apache/beam/pull/10823#discussion_r381094727 ## File path: sdks/python/apache_beam/metrics/metric_test.py ## @@ -127,6 +134,35 @@ def test_distribution_empty_namespace(self): with self.assertRaises(ValueError): Metrics.distribution("", "names") + @attr('ValidatesRunner') + def test_user_counter_using_pardo(self): +class SomeDoFn(beam.DoFn): + """A custom dummy DoFn using yield.""" + def __init__(self): +self.user_counter_elements = metrics.Metrics.counter( + self.__class__, 'metrics_user_counter_element') + + def process(self, element): +self.user_counter_elements.inc() +yield element + +pipeline = TestPipeline() +nums = pipeline | 'Input' >> beam.Create([1, 2, 3, 4]) +results = nums | 'ApplyPardo' >> beam.ParDo(SomeDoFn()) + +res = pipeline.run() +res.wait_until_finish() +metric_results = ( + res.metrics().query(MetricsFilter() +.with_name('metrics_user_counter_element'))) +outputs_counter = metric_results['counters'][0] +assert_that(results, equal_to([1, 2, 3, 4])) + +self.assertEqual(outputs_counter.key.metric.name, +'metrics_user_counter_element') +self.assertEqual(outputs_counter.committed, 4) +self.assertEqual(outputs_counter.attempted, 4) Review comment: removed the assert on attempted. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389329) Time Spent: 1h 10m (was: 1h) > Create validation tests for metrics based on MonitoringInfo if applicable > - > > Key: BEAM-9286 > URL: https://issues.apache.org/jira/browse/BEAM-9286 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Ruoyun Huang >Assignee: Ruoyun Huang >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > Create dedicated validation runner tests for metrics (those based Monitoring > Info). > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=389332&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389332 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 19/Feb/20 06:14 Start Date: 19/Feb/20 06:14 Worklog Time Spent: 10m Work Description: chadrik commented on issue #10822: [BEAM-7746] Minor typing updates / fixes URL: https://github.com/apache/beam/pull/10822#issuecomment-588054472 R: @robertwb This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389332) Time Spent: 65h (was: 64h 50m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 65h > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9274) Support running yapf in a git pre-commit hook
[ https://issues.apache.org/jira/browse/BEAM-9274?focusedWorklogId=389328&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389328 ] ASF GitHub Bot logged work on BEAM-9274: Author: ASF GitHub Bot Created on: 19/Feb/20 06:12 Start Date: 19/Feb/20 06:12 Worklog Time Spent: 10m Work Description: chadrik commented on issue #10810: [BEAM-9274] Support running yapf in a git pre-commit hook URL: https://github.com/apache/beam/pull/10810#issuecomment-588053806 How are people feeling about this MR? There's one outstanding review note for a comment in tox.ini. After that are we ok with merging this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389328) Time Spent: 1h 10m (was: 1h) > Support running yapf in a git pre-commit hook > - > > Key: BEAM-9274 > URL: https://issues.apache.org/jira/browse/BEAM-9274 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > As a developer I want to be able to automatically run yapf before I make a > commit so that I don't waste time with failures on jenkins. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8979) protoc-gen-mypy: program not found or is not executable
[ https://issues.apache.org/jira/browse/BEAM-8979?focusedWorklogId=389327&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389327 ] ASF GitHub Bot logged work on BEAM-8979: Author: ASF GitHub Bot Created on: 19/Feb/20 06:11 Start Date: 19/Feb/20 06:11 Worklog Time Spent: 10m Work Description: chadrik commented on issue #10734: [BEAM-8979] reintroduce mypy-protobuf stub generation URL: https://github.com/apache/beam/pull/10734#issuecomment-588053537 The Python test has been stuck for days. Btw, I've submitted my CLAs to the secretary, so hopefully I'll be able to solve these problems on my own soon. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389327) Time Spent: 10h 20m (was: 10h 10m) > protoc-gen-mypy: program not found or is not executable > --- > > Key: BEAM-8979 > URL: https://issues.apache.org/jira/browse/BEAM-8979 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, test-failures >Reporter: Kamil Wasilewski >Assignee: Chad Dombrova >Priority: Major > Time Spent: 10h 20m > Remaining Estimate: 0h > > In some tests, `:sdks:python:sdist:` task fails due to problems in finding > protoc-gen-mypy. The following tests are affected (there might be more): > * > [https://builds.apache.org/job/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/] > * > [https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/ > > |https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/] > Relevant logs: > {code:java} > 10:46:32 > Task :sdks:python:sdist FAILED > 10:46:32 Requirement already satisfied: mypy-protobuf==1.12 in > /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages > (1.12) > 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto > but not used. > 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto > but not used. > 10:46:32 protoc-gen-mypy: program not found or is not executable > 10:46:32 --mypy_out: protoc-gen-mypy: Plugin failed with status code 1. > 10:46:32 > /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/dist.py:476: > UserWarning: Normalizing '2.19.0.dev' to '2.19.0.dev0' > 10:46:32 normalized_version, > 10:46:32 Traceback (most recent call last): > 10:46:32 File "setup.py", line 295, in > 10:46:32 'mypy': generate_protos_first(mypy), > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/__init__.py", > line 145, in setup > 10:46:32 return distutils.core.setup(**attrs) > 10:46:32 File "/usr/lib/python3.7/distutils/core.py", line 148, in setup > 10:46:32 dist.run_commands() > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 966, in > run_commands > 10:46:32 self.run_command(cmd) > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 985, in > run_command > 10:46:32 cmd_obj.run() > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/command/sdist.py", > line 44, in run > 10:46:32 self.run_command('egg_info') > 10:46:32 File "/usr/lib/python3.7/distutils/cmd.py", line 313, in > run_command > 10:46:32 self.distribution.run_command(command) > 10:46:32 File "/usr/lib/python3.7/distutils/dist.py", line 985, in > run_command > 10:46:32 cmd_obj.run() > 10:46:32 File "setup.py", line 220, in run > 10:46:32 gen_protos.generate_proto_files(log=log) > 10:46:32 File > "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/sdks/python/gen_protos.py", > line 144, in generate_proto_files > 10:46:32 '%s' % ret_code) > 10:46:32 RuntimeError: Protoc returned non-zero status (see logs for > details): 1 > {code} > > This is what I have tried so far to resolve this (without being successful): > * Including _--plugin=protoc-gen-mypy=\{abs_path_to_executable}_ parameter > to the _protoc_ call ingen_protos.py:131 > * Appending protoc-gen-mypy's directory to the PATH variable > I wasn't able to reproduce th
[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage
[ https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=389318&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389318 ] ASF GitHub Bot logged work on BEAM-8889: Author: ASF GitHub Bot Created on: 19/Feb/20 05:42 Start Date: 19/Feb/20 05:42 Worklog Time Spent: 10m Work Description: veblush commented on issue #10769: [BEAM-8889] Upgrades gcsio to 2.0.0 URL: https://github.com/apache/beam/pull/10769#issuecomment-588046268 It seems that `Java11`, `JavaPortabilityApiJava11`, and `Java_Examples_Dataflow_Java11` are stuck? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389318) Remaining Estimate: 148h 50m (was: 149h) Time Spent: 19h 10m (was: 19h) > Make GcsUtil use GoogleCloudStorage > --- > > Key: BEAM-8889 > URL: https://issues.apache.org/jira/browse/BEAM-8889 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Affects Versions: 2.16.0 >Reporter: Esun Kim >Assignee: VASU NORI >Priority: Major > Labels: gcs > Original Estimate: 168h > Time Spent: 19h 10m > Remaining Estimate: 148h 50m > > [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java] > is a primary class to access Google Cloud Storage on Apache Beam. Current > implementation directly creates GoogleCloudStorageReadChannel and > GoogleCloudStorageWriteChannel by itself to read and write GCS data rather > than using > [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java] > which is an abstract class providing basic IO capability which eventually > creates channel objects. This request is about updating GcsUtil to use > GoogleCloudStorage to create read and write channel, which is expected > flexible because it can easily pick up the new change; e.g. new channel > implementation using new protocol without code change. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9240) Check for Nullability in typesEqual() method of FieldType class
[ https://issues.apache.org/jira/browse/BEAM-9240?focusedWorklogId=389310&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389310 ] ASF GitHub Bot logged work on BEAM-9240: Author: ASF GitHub Bot Created on: 19/Feb/20 05:08 Start Date: 19/Feb/20 05:08 Worklog Time Spent: 10m Work Description: rahul8383 commented on issue #10744: [BEAM-9240]: Check for Nullability in typesEqual() method of FieldTyp… URL: https://github.com/apache/beam/pull/10744#issuecomment-588038492 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389310) Time Spent: 1.5h (was: 1h 20m) > Check for Nullability in typesEqual() method of FieldType class > --- > > Key: BEAM-9240 > URL: https://issues.apache.org/jira/browse/BEAM-9240 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Affects Versions: 2.18.0 >Reporter: Rahul Patwari >Assignee: Rahul Patwari >Priority: Major > Fix For: 2.20.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > {{If two schemas are created like this:}} > {{Schema schema1 = Schema.builder().addStringField("col1").build();}} > {{Schema schema2 = Schema.builder().addNullableField("col1", > FieldType.STRING).build();}} > > {{schema1.typeEquals(schema2) returns "true" even though the schemas differ > by Nullability}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8965) WriteToBigQuery failed in BundleBasedDirectRunner
[ https://issues.apache.org/jira/browse/BEAM-8965?focusedWorklogId=389308&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389308 ] ASF GitHub Bot logged work on BEAM-8965: Author: ASF GitHub Bot Created on: 19/Feb/20 04:56 Start Date: 19/Feb/20 04:56 Worklog Time Spent: 10m Work Description: bobingm commented on issue #10901: [BEAM-8965] Remove duplicate sideinputs in ConsumerTrackingPipelineVisitor URL: https://github.com/apache/beam/pull/10901#issuecomment-588035568 R: @pabloem @charlesccychen This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389308) Time Spent: 20m (was: 10m) > WriteToBigQuery failed in BundleBasedDirectRunner > - > > Key: BEAM-8965 > URL: https://issues.apache.org/jira/browse/BEAM-8965 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Affects Versions: 2.16.0, 2.17.0, 2.18.0, 2.19.0 >Reporter: Wenbing Bai >Assignee: Wenbing Bai >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > *{{WriteToBigQuery}}* fails in *{{BundleBasedDirectRunner}}* with error > {{PCollection of size 2 with more than one element accessed as a singleton > view.}} > Here is the code > > {code:python} > with Pipeline() as p: > query_results = ( > p > | beam.io.Read(beam.io.BigQuerySource( > query='SELECT ... FROM ...') > ) > query_results | beam.io.gcp.WriteToBigQuery( > table=, > method=WriteToBigQuery.Method.FILE_LOADS, > schema={"fields": []} > ) > {code} > > Here is the error > > {code:none} > File "apache_beam/runners/common.py", line 778, in > apache_beam.runners.common.DoFnRunner.process > def process(self, windowed_value): > File "apache_beam/runners/common.py", line 782, in > apache_beam.runners.common.DoFnRunner.process > self._reraise_augmented(exn) > File "apache_beam/runners/common.py", line 849, in > apache_beam.runners.common.DoFnRunner._reraise_augmented > raise_with_traceback(new_exn) > File "apache_beam/runners/common.py", line 780, in > apache_beam.runners.common.DoFnRunner.process > return self.do_fn_invoker.invoke_process(windowed_value) > File "apache_beam/runners/common.py", line 587, in > apache_beam.runners.common.PerWindowInvoker.invoke_process > self._invoke_process_per_window( > File "apache_beam/runners/common.py", line 610, in > apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window > [si[global_window] for si in self.side_inputs])) > File > "/home/wbai/terra/terra_py2/local/lib/python2.7/site-packages/apache_beam/transforms/sideinputs.py", > line 65, in __getitem__ > _FilteringIterable(self._iterable, target_window), self._view_options) > File > "/home/wbai/terra/terra_py2/local/lib/python2.7/site-packages/apache_beam/pvalue.py", > line 443, in _from_runtime_iterable > len(head), str(head[0]), str(head[1]))) > ValueError: PCollection of size 2 with more than one element accessed as a > singleton view. First two elements encountered are > "gs://temp-dev/temp/bq_load/3edbf2172dd540edb5c8e9597206b10f", > "gs://temp-dev/temp/bq_load/3edbf2172dd540edb5c8e9597206b10f". [while running > 'WriteToBigQuery/BigQueryBatchFileLoads/ParDo(WriteRecordsToFile)/ParDo(WriteRecordsToFile)'] > {code} > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8965) WriteToBigQuery failed in BundleBasedDirectRunner
[ https://issues.apache.org/jira/browse/BEAM-8965?focusedWorklogId=389307&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389307 ] ASF GitHub Bot logged work on BEAM-8965: Author: ASF GitHub Bot Created on: 19/Feb/20 04:54 Start Date: 19/Feb/20 04:54 Worklog Time Spent: 10m Work Description: bobingm commented on pull request #10901: [BEAM-8965] Remove duplicate sideinputs in ConsumerTrackingPipelineVisitor URL: https://github.com/apache/beam/pull/10901 ## Summary This PR is mainly preventing `BundleBasedDirectRunner` evaluates single side_input more than once. To achieve this, this PR changes the logic to get side_input views in `ConsumerTrackingPipelineVisitor`, and modify the unit tests to make sure that when one single side_input is used in two different PTransforms, it will be evaluated once. ## Check List Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badg
[jira] [Work logged] (BEAM-7304) Twister2 Beam runner
[ https://issues.apache.org/jira/browse/BEAM-7304?focusedWorklogId=389301&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389301 ] ASF GitHub Bot logged work on BEAM-7304: Author: ASF GitHub Bot Created on: 19/Feb/20 04:29 Start Date: 19/Feb/20 04:29 Worklog Time Spent: 10m Work Description: pulasthi commented on issue #10888: [BEAM-7304] Twister2 Beam runner URL: https://github.com/apache/beam/pull/10888#issuecomment-588029642 @iemejia the failure details are showing errors from a .gcp package which are not related to this pull, I did not notice any other failures. Is this due to some other commit or am I missing something? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389301) Time Spent: 3h 20m (was: 3h 10m) > Twister2 Beam runner > > > Key: BEAM-7304 > URL: https://issues.apache.org/jira/browse/BEAM-7304 > Project: Beam > Issue Type: New Feature > Components: runner-ideas >Reporter: Pulasthi Wickramasinghe >Assignee: Pulasthi Wickramasinghe >Priority: Minor > Time Spent: 3h 20m > Remaining Estimate: 0h > > Twister2 is a big data framework which supports both batch and stream > processing [1] [2]. The goal is to develop an beam runner for Twister2. > [1] [https://github.com/DSC-SPIDAL/twister2] > [2] [https://twister2.org/] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8965) WriteToBigQuery failed in BundleBasedDirectRunner
[ https://issues.apache.org/jira/browse/BEAM-8965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbing Bai updated BEAM-8965: -- Affects Version/s: 2.17.0 2.18.0 2.19.0 > WriteToBigQuery failed in BundleBasedDirectRunner > - > > Key: BEAM-8965 > URL: https://issues.apache.org/jira/browse/BEAM-8965 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Affects Versions: 2.16.0, 2.17.0, 2.18.0, 2.19.0 >Reporter: Wenbing Bai >Assignee: Wenbing Bai >Priority: Major > > *{{WriteToBigQuery}}* fails in *{{BundleBasedDirectRunner}}* with error > {{PCollection of size 2 with more than one element accessed as a singleton > view.}} > Here is the code > > {code:python} > with Pipeline() as p: > query_results = ( > p > | beam.io.Read(beam.io.BigQuerySource( > query='SELECT ... FROM ...') > ) > query_results | beam.io.gcp.WriteToBigQuery( > table=, > method=WriteToBigQuery.Method.FILE_LOADS, > schema={"fields": []} > ) > {code} > > Here is the error > > {code:none} > File "apache_beam/runners/common.py", line 778, in > apache_beam.runners.common.DoFnRunner.process > def process(self, windowed_value): > File "apache_beam/runners/common.py", line 782, in > apache_beam.runners.common.DoFnRunner.process > self._reraise_augmented(exn) > File "apache_beam/runners/common.py", line 849, in > apache_beam.runners.common.DoFnRunner._reraise_augmented > raise_with_traceback(new_exn) > File "apache_beam/runners/common.py", line 780, in > apache_beam.runners.common.DoFnRunner.process > return self.do_fn_invoker.invoke_process(windowed_value) > File "apache_beam/runners/common.py", line 587, in > apache_beam.runners.common.PerWindowInvoker.invoke_process > self._invoke_process_per_window( > File "apache_beam/runners/common.py", line 610, in > apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window > [si[global_window] for si in self.side_inputs])) > File > "/home/wbai/terra/terra_py2/local/lib/python2.7/site-packages/apache_beam/transforms/sideinputs.py", > line 65, in __getitem__ > _FilteringIterable(self._iterable, target_window), self._view_options) > File > "/home/wbai/terra/terra_py2/local/lib/python2.7/site-packages/apache_beam/pvalue.py", > line 443, in _from_runtime_iterable > len(head), str(head[0]), str(head[1]))) > ValueError: PCollection of size 2 with more than one element accessed as a > singleton view. First two elements encountered are > "gs://temp-dev/temp/bq_load/3edbf2172dd540edb5c8e9597206b10f", > "gs://temp-dev/temp/bq_load/3edbf2172dd540edb5c8e9597206b10f". [while running > 'WriteToBigQuery/BigQueryBatchFileLoads/ParDo(WriteRecordsToFile)/ParDo(WriteRecordsToFile)'] > {code} > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9335) update hard-coded coder id when translating Java external transforms
[ https://issues.apache.org/jira/browse/BEAM-9335?focusedWorklogId=389271&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389271 ] ASF GitHub Bot logged work on BEAM-9335: Author: ASF GitHub Bot Created on: 19/Feb/20 02:08 Start Date: 19/Feb/20 02:08 Worklog Time Spent: 10m Work Description: ihji commented on pull request #10900: [BEAM-9335] update hard-coded coder id when translating Java external transforms URL: https://github.com/apache/beam/pull/10900 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apac
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389270&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389270 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 19/Feb/20 02:08 Start Date: 19/Feb/20 02:08 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10899: [BEAM-8335] Background Caching job URL: https://github.com/apache/beam/pull/10899#issuecomment-587998300 Could someone working on the interactive Beam review first? @rohdesamuel ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389270) Time Spent: 60h (was: 59h 50m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 60h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9335) update hard-coded coder id when translating Java external transforms
Heejong Lee created BEAM-9335: - Summary: update hard-coded coder id when translating Java external transforms Key: BEAM-9335 URL: https://issues.apache.org/jira/browse/BEAM-9335 Project: Beam Issue Type: Bug Components: java-fn-execution Reporter: Heejong Lee Assignee: Heejong Lee hard-coded coder id needs to be updated when translating Java external transforms. Otherwise pipeline will fail if coder id is reused. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9335) update hard-coded coder id when translating Java external transforms
[ https://issues.apache.org/jira/browse/BEAM-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heejong Lee updated BEAM-9335: -- Status: Open (was: Triage Needed) > update hard-coded coder id when translating Java external transforms > > > Key: BEAM-9335 > URL: https://issues.apache.org/jira/browse/BEAM-9335 > Project: Beam > Issue Type: Bug > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > > hard-coded coder id needs to be updated when translating Java external > transforms. Otherwise pipeline will fail if coder id is reused. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5605) Support Portable SplittableDoFn for batch
[ https://issues.apache.org/jira/browse/BEAM-5605?focusedWorklogId=389242&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389242 ] ASF GitHub Bot logged work on BEAM-5605: Author: ASF GitHub Bot Created on: 19/Feb/20 01:02 Start Date: 19/Feb/20 01:02 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10893: [BEAM-5605] Honor the bounded source timestamps timestamp in the SDF wrapper. URL: https://github.com/apache/beam/pull/10893#issuecomment-587981986 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389242) Time Spent: 16h 10m (was: 16h) > Support Portable SplittableDoFn for batch > - > > Key: BEAM-5605 > URL: https://issues.apache.org/jira/browse/BEAM-5605 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Scott Wegner >Assignee: Luke Cwik >Priority: Major > Labels: portability > Time Spent: 16h 10m > Remaining Estimate: 0h > > Roll-up item tracking work towards supporting portable SplittableDoFn for > batch -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-9334) beam_PreCommit_Java_Cron failed on org.apache.beam.runners.samza.runtime.SamzaTimerInternalsFactoryTest.testProcessingTimeTimers
[ https://issues.apache.org/jira/browse/BEAM-9334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yichi Zhang closed BEAM-9334. - Fix Version/s: Not applicable Resolution: Duplicate > beam_PreCommit_Java_Cron failed on > org.apache.beam.runners.samza.runtime.SamzaTimerInternalsFactoryTest.testProcessingTimeTimers > > > Key: BEAM-9334 > URL: https://issues.apache.org/jira/browse/BEAM-9334 > Project: Beam > Issue Type: Task > Components: runner-samza >Reporter: Yichi Zhang >Priority: Major > Fix For: Not applicable > > > h3. Error Message > org.apache.samza.SamzaException: Error opening RocksDB store beamStore at > location /tmp/store3 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9334) beam_PreCommit_Java_Cron failed on org.apache.beam.runners.samza.runtime.SamzaTimerInternalsFactoryTest.testProcessingTimeTimers
Yichi Zhang created BEAM-9334: - Summary: beam_PreCommit_Java_Cron failed on org.apache.beam.runners.samza.runtime.SamzaTimerInternalsFactoryTest.testProcessingTimeTimers Key: BEAM-9334 URL: https://issues.apache.org/jira/browse/BEAM-9334 Project: Beam Issue Type: Task Components: runner-samza Reporter: Yichi Zhang h3. Error Message org.apache.samza.SamzaException: Error opening RocksDB store beamStore at location /tmp/store3 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389239&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389239 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 19/Feb/20 00:56 Start Date: 19/Feb/20 00:56 Worklog Time Spent: 10m Work Description: KevinGG commented on issue #10899: [BEAM-8335] Background Caching job URL: https://github.com/apache/beam/pull/10899#issuecomment-587980476 Major changes: 1. Self-contained `BackgroundCachingJob` abstraction: [background_caching_job](https://github.com/apache/beam/pull/10899/commits/9396da777ae7670f2ed339a9f0dc4917a39629f5#diff-20e39cd33002649cd6f5eea77c73b188) 2. Control APIs around `BackgroundCachingJob` exposed: [interactive_beam](https://github.com/apache/beam/pull/10899/commits/9396da777ae7670f2ed339a9f0dc4917a39629f5#diff-34a47fb3ecb0865f176f1c893b4cee89) R: @aaltay PTAL, thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389239) Time Spent: 59h 50m (was: 59h 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 59h 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable
[ https://issues.apache.org/jira/browse/BEAM-9286?focusedWorklogId=389237&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389237 ] ASF GitHub Bot logged work on BEAM-9286: Author: ASF GitHub Bot Created on: 19/Feb/20 00:53 Start Date: 19/Feb/20 00:53 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10823: [BEAM-9286] Create validation runner test for metrics (user counter). URL: https://github.com/apache/beam/pull/10823#discussion_r381020497 ## File path: sdks/python/apache_beam/metrics/metric_test.py ## @@ -127,6 +134,35 @@ def test_distribution_empty_namespace(self): with self.assertRaises(ValueError): Metrics.distribution("", "names") + @attr('ValidatesRunner') + def test_user_counter_using_pardo(self): +class SomeDoFn(beam.DoFn): + """A custom dummy DoFn using yield.""" + def __init__(self): +self.user_counter_elements = metrics.Metrics.counter( + self.__class__, 'metrics_user_counter_element') + + def process(self, element): +self.user_counter_elements.inc() +yield element + +pipeline = TestPipeline() +nums = pipeline | 'Input' >> beam.Create([1, 2, 3, 4]) +results = nums | 'ApplyPardo' >> beam.ParDo(SomeDoFn()) + +res = pipeline.run() +res.wait_until_finish() +metric_results = ( + res.metrics().query(MetricsFilter() +.with_name('metrics_user_counter_element'))) +outputs_counter = metric_results['counters'][0] +assert_that(results, equal_to([1, 2, 3, 4])) + +self.assertEqual(outputs_counter.key.metric.name, +'metrics_user_counter_element') +self.assertEqual(outputs_counter.committed, 4) +self.assertEqual(outputs_counter.attempted, 4) Review comment: >= 4, no guarantee it's exactly 4. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389237) Time Spent: 50m (was: 40m) > Create validation tests for metrics based on MonitoringInfo if applicable > - > > Key: BEAM-9286 > URL: https://issues.apache.org/jira/browse/BEAM-9286 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Ruoyun Huang >Assignee: Ruoyun Huang >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > Create dedicated validation runner tests for metrics (those based Monitoring > Info). > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable
[ https://issues.apache.org/jira/browse/BEAM-9286?focusedWorklogId=389238&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389238 ] ASF GitHub Bot logged work on BEAM-9286: Author: ASF GitHub Bot Created on: 19/Feb/20 00:53 Start Date: 19/Feb/20 00:53 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10823: [BEAM-9286] Create validation runner test for metrics (user counter). URL: https://github.com/apache/beam/pull/10823#discussion_r381020881 ## File path: sdks/python/apache_beam/metrics/metric_test.py ## @@ -127,6 +134,35 @@ def test_distribution_empty_namespace(self): with self.assertRaises(ValueError): Metrics.distribution("", "names") + @attr('ValidatesRunner') + def test_user_counter_using_pardo(self): +class SomeDoFn(beam.DoFn): + """A custom dummy DoFn using yield.""" + def __init__(self): +self.user_counter_elements = metrics.Metrics.counter( + self.__class__, 'metrics_user_counter_element') + + def process(self, element): +self.user_counter_elements.inc() +yield element + +pipeline = TestPipeline() +nums = pipeline | 'Input' >> beam.Create([1, 2, 3, 4]) +results = nums | 'ApplyPardo' >> beam.ParDo(SomeDoFn()) + +res = pipeline.run() +res.wait_until_finish() +metric_results = ( + res.metrics().query(MetricsFilter() +.with_name('metrics_user_counter_element'))) +outputs_counter = metric_results['counters'][0] +assert_that(results, equal_to([1, 2, 3, 4])) Review comment: This must be applied before the pipeline is run. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389238) Time Spent: 1h (was: 50m) > Create validation tests for metrics based on MonitoringInfo if applicable > - > > Key: BEAM-9286 > URL: https://issues.apache.org/jira/browse/BEAM-9286 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Ruoyun Huang >Assignee: Ruoyun Huang >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > Create dedicated validation runner tests for metrics (those based Monitoring > Info). > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9228) _SDFBoundedSourceWrapper doesn't distribute data to multiple workers
[ https://issues.apache.org/jira/browse/BEAM-9228?focusedWorklogId=389236&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389236 ] ASF GitHub Bot logged work on BEAM-9228: Author: ASF GitHub Bot Created on: 19/Feb/20 00:49 Start Date: 19/Feb/20 00:49 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10847: [BEAM-9228] Support further partition for FnApi ListBuffer URL: https://github.com/apache/beam/pull/10847#discussion_r381019006 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py ## @@ -932,10 +998,14 @@ def get_buffer(buffer_id): return pcoll_buffers[buffer_id] def get_input_coder_impl(transform_id): - return context.coders[ - safe_coders[beam_fn_api_pb2.RemoteGrpcPort.FromString( - process_bundle_descriptor.transforms[transform_id].spec.payload). - coder_id]].get_impl() + coder_id = beam_fn_api_pb2.RemoteGrpcPort.FromString( + process_bundle_descriptor.transforms[transform_id].spec.payload).\ +coder_id + assert coder_id is not None and coder_id != '' Review comment: OK, in the SDF case we can get the coder of the preceding transform (the one that produces its input `ref_PCollection_PCollection_3_split`) which should always be a RemoteGrpcPort. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389236) Time Spent: 2h 10m (was: 2h) > _SDFBoundedSourceWrapper doesn't distribute data to multiple workers > > > Key: BEAM-9228 > URL: https://issues.apache.org/jira/browse/BEAM-9228 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.16.0, 2.18.0, 2.19.0 >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.20.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > A user reported following issue. > - > I have a set of tfrecord files, obtained by converting parquet files with > Spark. Each file is roughly 1GB and I have 11 of those. > I would expect simple statistics gathering (ie counting number of items of > all files) to scale linearly with respect to the number of cores on my system. > I am able to reproduce the issue with the minimal snippet below > {code:java} > import apache_beam as beam > from apache_beam.options.pipeline_options import PipelineOptions > from apache_beam.runners.portability import fn_api_runner > from apache_beam.portability.api import beam_runner_api_pb2 > from apache_beam.portability import python_urns > import sys > pipeline_options = PipelineOptions(['--direct_num_workers', '4']) > file_pattern = 'part-r-00* > runner=fn_api_runner.FnApiRunner( > default_environment=beam_runner_api_pb2.Environment( > urn=python_urns.SUBPROCESS_SDK, > payload=b'%s -m apache_beam.runners.worker.sdk_worker_main' > % sys.executable.encode('ascii'))) > p = beam.Pipeline(runner=runner, options=pipeline_options) > lines = (p | 'read' >> beam.io.tfrecordio.ReadFromTFRecord(file_pattern) > | beam.combiners.Count.Globally() > | beam.io.WriteToText('/tmp/output')) > p.run() > {code} > Only one combination of apache_beam revision / worker type seems to work (I > refer to https://beam.apache.org/documentation/runners/direct/ for the worker > types) > * beam 2.16; neither multithread nor multiprocess achieve high cpu usage on > multiple cores > * beam 2.17: able to achieve high cpu usage on all 4 cores > * beam 2.18: not tested the mulithreaded mode but the multiprocess mode fails > when trying to serialize the Environment instance most likely because of a > change from 2.17 to 2.18. > I also tried briefly SparkRunner with version 2.16 but was no able to achieve > any throughput. > What is the recommnended way to achieve what I am trying to ? How can I > troubleshoot ? > -- > This is caused by [this > PR|https://github.com/apache/beam/commit/02f8ad4eee3ec0ea8cbdc0f99c1dad29f00a9f60]. > A [workaround|https://github.com/apache/beam/pull/10729] is tried, which is > rolling back iobase.py not to use _SDFBoundedSourceWrapper. This confirmed > that d
[jira] [Work logged] (BEAM-9228) _SDFBoundedSourceWrapper doesn't distribute data to multiple workers
[ https://issues.apache.org/jira/browse/BEAM-9228?focusedWorklogId=389233&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389233 ] ASF GitHub Bot logged work on BEAM-9228: Author: ASF GitHub Bot Created on: 19/Feb/20 00:49 Start Date: 19/Feb/20 00:49 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10847: [BEAM-9228] Support further partition for FnApi ListBuffer URL: https://github.com/apache/beam/pull/10847#discussion_r381016780 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py ## @@ -287,43 +287,48 @@ def partition(self, n): class _ListBuffer(): """Used to support parititioning of a list.""" - def __init__(self, input_coder): -self._input_coder = input_coder + def __init__(self, coder_impl): +self._coder_impl = coder_impl self._inputs = [] self._grouped_output = None self.cleared = False def append(self, element): +if self.cleared: + raise RuntimeError('Trying to append to a cleared ListBuffer.') if self._grouped_output: raise RuntimeError('ListBuffer append after read.') self._inputs.append(element) def partition(self, n): # type: (int) -> List[List[bytes]] +if self.cleared: + raise RuntimeError('Trying to partition a cleared ListBuffer.') if len(self._inputs) >= n or len(self._inputs) == 0: return [self._inputs[k::n] for k in range(n)] else: if not self._grouped_output: -self._grouped_output = [[] for _ in range(n)] -coder_impl = self._input_coder.get_impl() -decoded_input = [] -output_stream_list = [] -for _ in range(n): - output_stream_list.append(create_OutputStream()) +output_stream_list = [create_OutputStream() for _ in range(n)] +self._grouped_output = [output_stream.get() for output_stream Review comment: I meant you could put this instead of the loop at line 324. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389233) Time Spent: 2h (was: 1h 50m) > _SDFBoundedSourceWrapper doesn't distribute data to multiple workers > > > Key: BEAM-9228 > URL: https://issues.apache.org/jira/browse/BEAM-9228 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.16.0, 2.18.0, 2.19.0 >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.20.0 > > Time Spent: 2h > Remaining Estimate: 0h > > A user reported following issue. > - > I have a set of tfrecord files, obtained by converting parquet files with > Spark. Each file is roughly 1GB and I have 11 of those. > I would expect simple statistics gathering (ie counting number of items of > all files) to scale linearly with respect to the number of cores on my system. > I am able to reproduce the issue with the minimal snippet below > {code:java} > import apache_beam as beam > from apache_beam.options.pipeline_options import PipelineOptions > from apache_beam.runners.portability import fn_api_runner > from apache_beam.portability.api import beam_runner_api_pb2 > from apache_beam.portability import python_urns > import sys > pipeline_options = PipelineOptions(['--direct_num_workers', '4']) > file_pattern = 'part-r-00* > runner=fn_api_runner.FnApiRunner( > default_environment=beam_runner_api_pb2.Environment( > urn=python_urns.SUBPROCESS_SDK, > payload=b'%s -m apache_beam.runners.worker.sdk_worker_main' > % sys.executable.encode('ascii'))) > p = beam.Pipeline(runner=runner, options=pipeline_options) > lines = (p | 'read' >> beam.io.tfrecordio.ReadFromTFRecord(file_pattern) > | beam.combiners.Count.Globally() > | beam.io.WriteToText('/tmp/output')) > p.run() > {code} > Only one combination of apache_beam revision / worker type seems to work (I > refer to https://beam.apache.org/documentation/runners/direct/ for the worker > types) > * beam 2.16; neither multithread nor multiprocess achieve high cpu usage on > multiple cores > * beam 2.17: able to achieve high cpu usage on all 4 cores > * beam 2.18: not tested the mulithreaded mode but the multiprocess mode fails > when trying to serialize the Environment instance most likely because of a > change from 2.17 to 2.18.
[jira] [Work logged] (BEAM-9228) _SDFBoundedSourceWrapper doesn't distribute data to multiple workers
[ https://issues.apache.org/jira/browse/BEAM-9228?focusedWorklogId=389235&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389235 ] ASF GitHub Bot logged work on BEAM-9228: Author: ASF GitHub Bot Created on: 19/Feb/20 00:49 Start Date: 19/Feb/20 00:49 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10847: [BEAM-9228] Support further partition for FnApi ListBuffer URL: https://github.com/apache/beam/pull/10847#discussion_r381018471 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py ## @@ -932,10 +998,14 @@ def get_buffer(buffer_id): return pcoll_buffers[buffer_id] def get_input_coder_impl(transform_id): - return context.coders[ - safe_coders[beam_fn_api_pb2.RemoteGrpcPort.FromString( - process_bundle_descriptor.transforms[transform_id].spec.payload). - coder_id]].get_impl() + coder_id = beam_fn_api_pb2.RemoteGrpcPort.FromString( + process_bundle_descriptor.transforms[transform_id].spec.payload).\ Review comment: Lint: don't use backslashes for continuation. (These days you can just run yapf to format your code, see https://cwiki.apache.org/confluence/display/BEAM/Python+Tips.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389235) > _SDFBoundedSourceWrapper doesn't distribute data to multiple workers > > > Key: BEAM-9228 > URL: https://issues.apache.org/jira/browse/BEAM-9228 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.16.0, 2.18.0, 2.19.0 >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.20.0 > > Time Spent: 2h > Remaining Estimate: 0h > > A user reported following issue. > - > I have a set of tfrecord files, obtained by converting parquet files with > Spark. Each file is roughly 1GB and I have 11 of those. > I would expect simple statistics gathering (ie counting number of items of > all files) to scale linearly with respect to the number of cores on my system. > I am able to reproduce the issue with the minimal snippet below > {code:java} > import apache_beam as beam > from apache_beam.options.pipeline_options import PipelineOptions > from apache_beam.runners.portability import fn_api_runner > from apache_beam.portability.api import beam_runner_api_pb2 > from apache_beam.portability import python_urns > import sys > pipeline_options = PipelineOptions(['--direct_num_workers', '4']) > file_pattern = 'part-r-00* > runner=fn_api_runner.FnApiRunner( > default_environment=beam_runner_api_pb2.Environment( > urn=python_urns.SUBPROCESS_SDK, > payload=b'%s -m apache_beam.runners.worker.sdk_worker_main' > % sys.executable.encode('ascii'))) > p = beam.Pipeline(runner=runner, options=pipeline_options) > lines = (p | 'read' >> beam.io.tfrecordio.ReadFromTFRecord(file_pattern) > | beam.combiners.Count.Globally() > | beam.io.WriteToText('/tmp/output')) > p.run() > {code} > Only one combination of apache_beam revision / worker type seems to work (I > refer to https://beam.apache.org/documentation/runners/direct/ for the worker > types) > * beam 2.16; neither multithread nor multiprocess achieve high cpu usage on > multiple cores > * beam 2.17: able to achieve high cpu usage on all 4 cores > * beam 2.18: not tested the mulithreaded mode but the multiprocess mode fails > when trying to serialize the Environment instance most likely because of a > change from 2.17 to 2.18. > I also tried briefly SparkRunner with version 2.16 but was no able to achieve > any throughput. > What is the recommnended way to achieve what I am trying to ? How can I > troubleshoot ? > -- > This is caused by [this > PR|https://github.com/apache/beam/commit/02f8ad4eee3ec0ea8cbdc0f99c1dad29f00a9f60]. > A [workaround|https://github.com/apache/beam/pull/10729] is tried, which is > rolling back iobase.py not to use _SDFBoundedSourceWrapper. This confirmed > that data is distributed to multiple workers, however, there are some > regressions with SDF wrapper tests. -- This message wa
[jira] [Work logged] (BEAM-9228) _SDFBoundedSourceWrapper doesn't distribute data to multiple workers
[ https://issues.apache.org/jira/browse/BEAM-9228?focusedWorklogId=389234&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389234 ] ASF GitHub Bot logged work on BEAM-9228: Author: ASF GitHub Bot Created on: 19/Feb/20 00:49 Start Date: 19/Feb/20 00:49 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10847: [BEAM-9228] Support further partition for FnApi ListBuffer URL: https://github.com/apache/beam/pull/10847#discussion_r381017895 ## File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py ## @@ -769,7 +774,7 @@ def _collect_written_timers_and_add_to_deferred_inputs( for windowed_key_timer in timers_by_key_and_window.values(): windowed_timer_coder_impl.encode_to_stream( windowed_key_timer, out, True) -deferred_inputs[transform_id] = _ListBuffer(input_coder=coder) +deferred_inputs[transform_id] = _ListBuffer(coder_impl=coder.get_impl()) Review comment: You can now revert the changes up at the top of the loop. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389234) > _SDFBoundedSourceWrapper doesn't distribute data to multiple workers > > > Key: BEAM-9228 > URL: https://issues.apache.org/jira/browse/BEAM-9228 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.16.0, 2.18.0, 2.19.0 >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.20.0 > > Time Spent: 2h > Remaining Estimate: 0h > > A user reported following issue. > - > I have a set of tfrecord files, obtained by converting parquet files with > Spark. Each file is roughly 1GB and I have 11 of those. > I would expect simple statistics gathering (ie counting number of items of > all files) to scale linearly with respect to the number of cores on my system. > I am able to reproduce the issue with the minimal snippet below > {code:java} > import apache_beam as beam > from apache_beam.options.pipeline_options import PipelineOptions > from apache_beam.runners.portability import fn_api_runner > from apache_beam.portability.api import beam_runner_api_pb2 > from apache_beam.portability import python_urns > import sys > pipeline_options = PipelineOptions(['--direct_num_workers', '4']) > file_pattern = 'part-r-00* > runner=fn_api_runner.FnApiRunner( > default_environment=beam_runner_api_pb2.Environment( > urn=python_urns.SUBPROCESS_SDK, > payload=b'%s -m apache_beam.runners.worker.sdk_worker_main' > % sys.executable.encode('ascii'))) > p = beam.Pipeline(runner=runner, options=pipeline_options) > lines = (p | 'read' >> beam.io.tfrecordio.ReadFromTFRecord(file_pattern) > | beam.combiners.Count.Globally() > | beam.io.WriteToText('/tmp/output')) > p.run() > {code} > Only one combination of apache_beam revision / worker type seems to work (I > refer to https://beam.apache.org/documentation/runners/direct/ for the worker > types) > * beam 2.16; neither multithread nor multiprocess achieve high cpu usage on > multiple cores > * beam 2.17: able to achieve high cpu usage on all 4 cores > * beam 2.18: not tested the mulithreaded mode but the multiprocess mode fails > when trying to serialize the Environment instance most likely because of a > change from 2.17 to 2.18. > I also tried briefly SparkRunner with version 2.16 but was no able to achieve > any throughput. > What is the recommnended way to achieve what I am trying to ? How can I > troubleshoot ? > -- > This is caused by [this > PR|https://github.com/apache/beam/commit/02f8ad4eee3ec0ea8cbdc0f99c1dad29f00a9f60]. > A [workaround|https://github.com/apache/beam/pull/10729] is tried, which is > rolling back iobase.py not to use _SDFBoundedSourceWrapper. This confirmed > that data is distributed to multiple workers, however, there are some > regressions with SDF wrapper tests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389229&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389229 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 19/Feb/20 00:43 Start Date: 19/Feb/20 00:43 Worklog Time Spent: 10m Work Description: KevinGG commented on pull request #10899: [BEAM-8335] Background Caching job URL: https://github.com/apache/beam/pull/10899 1. Exposed source data capture (implemented by background caching job) control APIs in interactive_beam module. 2. Abstract background caching job into a standalone class where pipeline jobs and limit checkers are self-contained. 3. Added test_stream_service related control and tracking inside background_caching_job and interactive_environment. 4. TODO items: 1) integrate streaming_cache once it implements cache_manager; 2) add implementation of `capture_size` control, now the capture size limit checker is simply a thread ends whenever the capture elapse limit checker terminates the infinitely running background caching job; 3) wire test_stream_service's creation when the dependencies are ready. **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/i
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389223&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389223 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 19/Feb/20 00:34 Start Date: 19/Feb/20 00:34 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10886: [BEAM-8019] Updates DataflowRunner to support multiple SDK environments. URL: https://github.com/apache/beam/pull/10886#discussion_r381006993 ## File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py ## @@ -301,6 +351,22 @@ def __init__(self, packages, options, environment_version, pipeline_url): dataflow.Environment.SdkPipelineOptionsValue.AdditionalProperty( key='display_data', value=to_json_value(items))) + def _get_environments_from_tranforms(self): Review comment: Any reason to not just `return self._pipeline_proto.environments.values()`? Do we expect unused environments to be there for any reason? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389223) > Support cross-language transforms for DataflowRunner > > > Key: BEAM-8019 > URL: https://issues.apache.org/jira/browse/BEAM-8019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > This is to capture the Beam changes needed for this task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389225&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389225 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 19/Feb/20 00:34 Start Date: 19/Feb/20 00:34 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10886: [BEAM-8019] Updates DataflowRunner to support multiple SDK environments. URL: https://github.com/apache/beam/pull/10886#discussion_r381015733 ## File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient_test.py ## @@ -329,7 +402,7 @@ def test_harness_override_default_in_released_sdks(self): env = apiclient.Environment([], #packages pipeline_options, '2.0.0', #any environment version -FAKE_PIPELINE_URL) +FAKE_PIPELINE_URL, None, None) Review comment: Can None be an optional argument rather than passing it everywhere here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389225) Time Spent: 3h 50m (was: 3h 40m) > Support cross-language transforms for DataflowRunner > > > Key: BEAM-8019 > URL: https://issues.apache.org/jira/browse/BEAM-8019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > > This is to capture the Beam changes needed for this task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389222&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389222 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 19/Feb/20 00:34 Start Date: 19/Feb/20 00:34 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10886: [BEAM-8019] Updates DataflowRunner to support multiple SDK environments. URL: https://github.com/apache/beam/pull/10886#discussion_r381005234 ## File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py ## @@ -576,10 +655,25 @@ def create_job(self, job): 'A template was just created at location %s', template_location) return None + def _apply_sdk_environment_overrides(self, proto_pipeline): +# Update environments based on user provided overrides +sdk_overrides = self._sdk_image_overrides +if sdk_overrides: + for environment in proto_pipeline.components.environments.values(): +docker_payload = proto_utils.parse_Bytes( +environment.payload, beam_runner_api_pb2.DockerPayload) +for pattern in sdk_overrides: + override = sdk_overrides[pattern] + if re.match(pattern, docker_payload.container_image): +new_payload = beam_runner_api_pb2.DockerPayload( Review comment: Would it be safer to copy the proto and update the field, in case other fields get added in the future? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389222) > Support cross-language transforms for DataflowRunner > > > Key: BEAM-8019 > URL: https://issues.apache.org/jira/browse/BEAM-8019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > This is to capture the Beam changes needed for this task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389227&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389227 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 19/Feb/20 00:35 Start Date: 19/Feb/20 00:35 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10886: [BEAM-8019] Updates DataflowRunner to support multiple SDK environments. URL: https://github.com/apache/beam/pull/10886#discussion_r381006145 ## File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py ## @@ -133,12 +136,19 @@ def get_output(self, tag=None): class Environment(object): """Wrapper for a dataflow Environment protobuf.""" Review comment: This comment seems out of date... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389227) Time Spent: 4h (was: 3h 50m) > Support cross-language transforms for DataflowRunner > > > Key: BEAM-8019 > URL: https://issues.apache.org/jira/browse/BEAM-8019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > This is to capture the Beam changes needed for this task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389218&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389218 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 19/Feb/20 00:34 Start Date: 19/Feb/20 00:34 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10886: [BEAM-8019] Updates DataflowRunner to support multiple SDK environments. URL: https://github.com/apache/beam/pull/10886#discussion_r381003618 ## File path: sdks/python/apache_beam/options/pipeline_options.py ## @@ -748,6 +748,17 @@ def _add_argparse_args(cls, parser): 'worker harness. Default is the container for the version of the ' 'SDK. Note: currently, only approved Google Cloud Dataflow ' 'container images may be used here.')) +parser.add_argument( +'--sdk_harness_container_image_overrides', Review comment: When, if ever, do we expect the user to provide a reasonable value for this flag? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389218) Time Spent: 3h 20m (was: 3h 10m) > Support cross-language transforms for DataflowRunner > > > Key: BEAM-8019 > URL: https://issues.apache.org/jira/browse/BEAM-8019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > This is to capture the Beam changes needed for this task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389224&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389224 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 19/Feb/20 00:34 Start Date: 19/Feb/20 00:34 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10886: [BEAM-8019] Updates DataflowRunner to support multiple SDK environments. URL: https://github.com/apache/beam/pull/10886#discussion_r381004678 ## File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py ## @@ -576,10 +655,25 @@ def create_job(self, job): 'A template was just created at location %s', template_location) return None + def _apply_sdk_environment_overrides(self, proto_pipeline): +# Update environments based on user provided overrides +sdk_overrides = self._sdk_image_overrides +if sdk_overrides: + for environment in proto_pipeline.components.environments.values(): +docker_payload = proto_utils.parse_Bytes( +environment.payload, beam_runner_api_pb2.DockerPayload) +for pattern in sdk_overrides: Review comment: `for pattern, override in sdk_overrides.items()` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389224) Time Spent: 3h 50m (was: 3h 40m) > Support cross-language transforms for DataflowRunner > > > Key: BEAM-8019 > URL: https://issues.apache.org/jira/browse/BEAM-8019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > > This is to capture the Beam changes needed for this task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389221&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389221 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 19/Feb/20 00:34 Start Date: 19/Feb/20 00:34 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10886: [BEAM-8019] Updates DataflowRunner to support multiple SDK environments. URL: https://github.com/apache/beam/pull/10886#discussion_r381008105 ## File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py ## @@ -301,6 +351,22 @@ def __init__(self, packages, options, environment_version, pipeline_url): dataflow.Environment.SdkPipelineOptionsValue.AdditionalProperty( key='display_data', value=to_json_value(items))) + def _get_environments_from_tranforms(self): +if not self._proto_pipeline: + return [] +environment_ids = [] +for transform in self._proto_pipeline.components.transforms.values(): + if transform.environment_id not in environment_ids: +environment_ids.append(transform.environment_id) +environments = [] +for environment_id in environment_ids: + if not environment_id: Review comment: Nit: I'd move this up into the previous loop. Or I'd write this as a list comprehension, e.g. ``` environment_ids = set( transform.environment_id for transform in self._proto_pipeline.components.transforms.values() if transform.environment_id) return [self._proto_pipeline.components.environments[id] for id in environment_ids] ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389221) Time Spent: 3h 40m (was: 3.5h) > Support cross-language transforms for DataflowRunner > > > Key: BEAM-8019 > URL: https://issues.apache.org/jira/browse/BEAM-8019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > This is to capture the Beam changes needed for this task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389226&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389226 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 19/Feb/20 00:35 Start Date: 19/Feb/20 00:35 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10886: [BEAM-8019] Updates DataflowRunner to support multiple SDK environments. URL: https://github.com/apache/beam/pull/10886#discussion_r381008681 ## File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py ## @@ -478,6 +544,19 @@ def __init__(self, options): get_credentials=(not self.google_cloud_options.no_auth), http=http_client, response_encoding=get_response_encoding()) +self._sdk_image_overrides = self._get_sdk_image_overrides(options) + + def _get_sdk_image_overrides(self, pipeline_options): +worker_options = pipeline_options.view_as(WorkerOptions) +sdk_overrides = worker_options.sdk_harness_container_image_overrides +overrides_dict = dict() Review comment: Again, one can write `overrides_dict = dict(override_str.split(',', 1) for override_str in sdk_overrides)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389226) Time Spent: 4h (was: 3h 50m) > Support cross-language transforms for DataflowRunner > > > Key: BEAM-8019 > URL: https://issues.apache.org/jira/browse/BEAM-8019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > This is to capture the Beam changes needed for this task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389220&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389220 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 19/Feb/20 00:34 Start Date: 19/Feb/20 00:34 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10886: [BEAM-8019] Updates DataflowRunner to support multiple SDK environments. URL: https://github.com/apache/beam/pull/10886#discussion_r381003938 ## File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py ## @@ -153,6 +158,9 @@ def __init__(self, packages, options, environment_version, pipeline_url): # User agent information. self.proto.userAgent = dataflow.Environment.UserAgentValue() self.local = 'localhost' in self.google_cloud_options.dataflow_endpoint +self._proto_pipeline = proto_pipeline +self._sdk_image_overrides = ( Review comment: One can also write `_sdk_image_overrides or dict()`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389220) Time Spent: 3.5h (was: 3h 20m) > Support cross-language transforms for DataflowRunner > > > Key: BEAM-8019 > URL: https://issues.apache.org/jira/browse/BEAM-8019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > This is to capture the Beam changes needed for this task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389219&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389219 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 19/Feb/20 00:34 Start Date: 19/Feb/20 00:34 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #10886: [BEAM-8019] Updates DataflowRunner to support multiple SDK environments. URL: https://github.com/apache/beam/pull/10886#discussion_r381004939 ## File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py ## @@ -576,10 +655,25 @@ def create_job(self, job): 'A template was just created at location %s', template_location) return None + def _apply_sdk_environment_overrides(self, proto_pipeline): +# Update environments based on user provided overrides +sdk_overrides = self._sdk_image_overrides +if sdk_overrides: + for environment in proto_pipeline.components.environments.values(): +docker_payload = proto_utils.parse_Bytes( +environment.payload, beam_runner_api_pb2.DockerPayload) +for pattern in sdk_overrides: + override = sdk_overrides[pattern] + if re.match(pattern, docker_payload.container_image): +new_payload = beam_runner_api_pb2.DockerPayload( +container_image=override) Review comment: I would have expected to use re.sub(pattern, override, docker_payload.container_image). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389219) Time Spent: 3.5h (was: 3h 20m) > Support cross-language transforms for DataflowRunner > > > Key: BEAM-8019 > URL: https://issues.apache.org/jira/browse/BEAM-8019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > This is to capture the Beam changes needed for this task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9299) Upgrade Flink Runner to 1.8.3 and 1.9.2
[ https://issues.apache.org/jira/browse/BEAM-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039584#comment-17039584 ] sunjincheng commented on BEAM-9299: --- Currently we always test the last version for flink runners,and I think it's by design for now, and I agree with you [~iemejia], would be better to avoid test all versions for every PR. > Upgrade Flink Runner to 1.8.3 and 1.9.2 > --- > > Key: BEAM-9299 > URL: https://issues.apache.org/jira/browse/BEAM-9299 > Project: Beam > Issue Type: Task > Components: runner-flink >Reporter: sunjincheng >Assignee: sunjincheng >Priority: Major > Fix For: 2.20.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > I would like to Upgrade Flink Runner to 18.3 and 1.9.2 due to both the Apache > Flink 1.8.3 and Apache Flink 1.9.2 have been released [1]. > What do you think? > [1] https://dist.apache.org/repos/dist/release/flink/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8201) clean up the current container API
[ https://issues.apache.org/jira/browse/BEAM-8201?focusedWorklogId=389215&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389215 ] ASF GitHub Bot logged work on BEAM-8201: Author: ASF GitHub Bot Created on: 19/Feb/20 00:17 Start Date: 19/Feb/20 00:17 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #10843: [BEAM-8201] Pass all other endpoints through provisioning service. URL: https://github.com/apache/beam/pull/10843#discussion_r381011326 ## File path: sdks/go/container/boot.go ## @@ -46,29 +46,42 @@ func main() { if *id == "" { log.Fatal("No id provided.") } + if *provisionEndpoint == "" { + log.Fatal("No provision endpoint provided.") + } + + ctx := grpcx.WriteWorkerID(context.Background(), *id) + + info, err := provision.Info(ctx, *provisionEndpoint) + if err != nil { + log.Fatalf("Failed to obtain provisioning information: %v", err) + } + log.Printf("Provision info:\n%v", info) + + // TODO(BEAM-8201): Simplify once flags are no longer used. + if info.GetLoggingEndpoint().GetUrl() != "" { + *loggingEndpoint = info.GetLoggingEndpoint().GetUrl() Review comment: Never mind, I overlooked the empty default. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389215) Time Spent: 1.5h (was: 1h 20m) > clean up the current container API > -- > > Key: BEAM-8201 > URL: https://issues.apache.org/jira/browse/BEAM-8201 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Hannah Jiang >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > From [~robertwb] > As part of this project, I propose we look at and clean up the current > container API before we "release" it as public and stable. IIRC, we currently > provide the worker arguments through a combination of (1) environment > variables (2) command line parameters to docker and (3) via the provisioning > API. It would be good to have a more principled approach to specifying > arguments (either all the same way, or if they vary, good reason for doing so > rather than by historical accident). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9326) JsonToRow transform should not use bounded Wildcards for its input
[ https://issues.apache.org/jira/browse/BEAM-9326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039577#comment-17039577 ] Kenneth Knowles commented on BEAM-9326: --- So basically {{String}} is a special case and this ticket is, to me, about optimizing for that special case, if it matters. If it doesn't matter, I don't have an opinion. But I assume you hit a problem? > JsonToRow transform should not use bounded Wildcards for its input > -- > > Key: BEAM-9326 > URL: https://issues.apache.org/jira/browse/BEAM-9326 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Fix For: 2.20.0 > > Time Spent: 10m > Remaining Estimate: 0h > > The JsonToRow PTransform input is a String (a final class in Java) so no > reason > to define a bounded wildcard as its argument. > We should use in Beam's codebase only when required by Java > Generics constraints. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9326) JsonToRow transform should not use bounded Wildcards for its input
[ https://issues.apache.org/jira/browse/BEAM-9326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039576#comment-17039576 ] Kenneth Knowles commented on BEAM-9326: --- It is important for {{MapElements}}. The data type {{PCollection}} is covariant in {{T}} so when {{S extends T}} the type {{PCollection}} should be considered a subtype of {{PCollection}}. Java does not have the ability to express this, so every use of the type has to use {{PCollection}} for proper type checking, except in a contravariant position (aka as an output) in which case it should be {{PCollection}}. So if you have {{MapElements.via}} of a function {{X -> Y}} you have to have this type in order to apply it to a {{PCollection}} where {{A extends X}}. Really, many of these things are not usable without the {{? extends T}} pattern. Does it cause a problem? > JsonToRow transform should not use bounded Wildcards for its input > -- > > Key: BEAM-9326 > URL: https://issues.apache.org/jira/browse/BEAM-9326 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ismaël Mejía >Assignee: Ismaël Mejía >Priority: Minor > Fix For: 2.20.0 > > Time Spent: 10m > Remaining Estimate: 0h > > The JsonToRow PTransform input is a String (a final class in Java) so no > reason > to define a bounded wildcard as its argument. > We should use in Beam's codebase only when required by Java > Generics constraints. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9333) DataCatalogPipelineOptions is not registered
[ https://issues.apache.org/jira/browse/BEAM-9333?focusedWorklogId=389199&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389199 ] ASF GitHub Bot logged work on BEAM-9333: Author: ASF GitHub Bot Created on: 19/Feb/20 00:02 Start Date: 19/Feb/20 00:02 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #10896: [BEAM-9333] Add DataCatalogPipelineOptionsRegistrar URL: https://github.com/apache/beam/pull/10896 Register DataCatalogPipelineOptions so its options can be set from the command line. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_
[jira] [Work logged] (BEAM-9333) DataCatalogPipelineOptions is not registered
[ https://issues.apache.org/jira/browse/BEAM-9333?focusedWorklogId=389201&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389201 ] ASF GitHub Bot logged work on BEAM-9333: Author: ASF GitHub Bot Created on: 19/Feb/20 00:02 Start Date: 19/Feb/20 00:02 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on issue #10896: [BEAM-9333] Add DataCatalogPipelineOptionsRegistrar URL: https://github.com/apache/beam/pull/10896#issuecomment-587965771 R: @kennknowles This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389201) Time Spent: 20m (was: 10m) > DataCatalogPipelineOptions is not registered > > > Key: BEAM-9333 > URL: https://issues.apache.org/jira/browse/BEAM-9333 > Project: Beam > Issue Type: Bug > Components: dsl-sql >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > ... so its not possible to set Data Catalog options from the command line. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1080) python sdk apiclient needs proper unit tests
[ https://issues.apache.org/jira/browse/BEAM-1080?focusedWorklogId=389200&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389200 ] ASF GitHub Bot logged work on BEAM-1080: Author: ASF GitHub Bot Created on: 19/Feb/20 00:02 Start Date: 19/Feb/20 00:02 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #10890: [BEAM-1080] Skip tests that required GCP credentials URL: https://github.com/apache/beam/pull/10890 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389200) Time Spent: 2h 20m (was: 2h 10m) > python sdk apiclient needs proper unit tests > > > Key: BEAM-1080 > URL: https://issues.apache.org/jira/browse/BEAM-1080 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Vikas Kedigehalli >Priority: Major > Labels: ccoss2019, newbie, starter > Time Spent: 2h 20m > Remaining Estimate: 0h > > There is only one unit test right now that tries to fetch actual gcp > credentials instead of mocking. This test fails when the credentials are not > available on the machine in which it is running. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8201) clean up the current container API
[ https://issues.apache.org/jira/browse/BEAM-8201?focusedWorklogId=389198&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389198 ] ASF GitHub Bot logged work on BEAM-8201: Author: ASF GitHub Bot Created on: 18/Feb/20 23:59 Start Date: 18/Feb/20 23:59 Worklog Time Spent: 10m Work Description: angoenka commented on pull request #10843: [BEAM-8201] Pass all other endpoints through provisioning service. URL: https://github.com/apache/beam/pull/10843#discussion_r381006295 ## File path: sdks/go/container/boot.go ## @@ -46,29 +46,42 @@ func main() { if *id == "" { log.Fatal("No id provided.") } + if *provisionEndpoint == "" { + log.Fatal("No provision endpoint provided.") + } + + ctx := grpcx.WriteWorkerID(context.Background(), *id) + + info, err := provision.Info(ctx, *provisionEndpoint) + if err != nil { + log.Fatalf("Failed to obtain provisioning information: %v", err) + } + log.Printf("Provision info:\n%v", info) + + // TODO(BEAM-8201): Simplify once flags are no longer used. + if info.GetLoggingEndpoint().GetUrl() != "" { + *loggingEndpoint = info.GetLoggingEndpoint().GetUrl() Review comment: Shall we add default values for all the ports so that we avoid passing them whole together in future. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389198) Time Spent: 1h 20m (was: 1h 10m) > clean up the current container API > -- > > Key: BEAM-8201 > URL: https://issues.apache.org/jira/browse/BEAM-8201 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Hannah Jiang >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > From [~robertwb] > As part of this project, I propose we look at and clean up the current > container API before we "release" it as public and stable. IIRC, we currently > provide the worker arguments through a combination of (1) environment > variables (2) command line parameters to docker and (3) via the provisioning > API. It would be good to have a more principled approach to specifying > arguments (either all the same way, or if they vary, good reason for doing so > rather than by historical accident). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9333) DataCatalogPipelineOptions is not registered
Brian Hulette created BEAM-9333: --- Summary: DataCatalogPipelineOptions is not registered Key: BEAM-9333 URL: https://issues.apache.org/jira/browse/BEAM-9333 Project: Beam Issue Type: Bug Components: dsl-sql Reporter: Brian Hulette Assignee: Brian Hulette ... so its not possible to set Data Catalog options from the command line. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-1833) Restructure Python pipeline construction to better follow the Runner API
[ https://issues.apache.org/jira/browse/BEAM-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Cwik resolved BEAM-1833. - Fix Version/s: 2.20.0 Resolution: Fixed First pass on cleaning this up to match was done. BEAM-9322 describes some necessary work to define local "names" when dealing with nested structures where multiple tags are defined. > Restructure Python pipeline construction to better follow the Runner API > > > Key: BEAM-1833 > URL: https://issues.apache.org/jira/browse/BEAM-1833 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Sam Rohde >Priority: Major > Fix For: 2.20.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > The most important part is removing the runner.apply overrides, but there are > also various other improvements (e.g. all inputs and outputs should be named). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1833) Restructure Python pipeline construction to better follow the Runner API
[ https://issues.apache.org/jira/browse/BEAM-1833?focusedWorklogId=389190&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389190 ] ASF GitHub Bot logged work on BEAM-1833: Author: ASF GitHub Bot Created on: 18/Feb/20 23:48 Start Date: 18/Feb/20 23:48 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10860: [BEAM-1833] Fixes BEAM-1833 URL: https://github.com/apache/beam/pull/10860 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389190) Time Spent: 2h 20m (was: 2h 10m) > Restructure Python pipeline construction to better follow the Runner API > > > Key: BEAM-1833 > URL: https://issues.apache.org/jira/browse/BEAM-1833 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Sam Rohde >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > The most important part is removing the runner.apply overrides, but there are > also various other improvements (e.g. all inputs and outputs should be named). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8201) clean up the current container API
[ https://issues.apache.org/jira/browse/BEAM-8201?focusedWorklogId=389189&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389189 ] ASF GitHub Bot logged work on BEAM-8201: Author: ASF GitHub Bot Created on: 18/Feb/20 23:48 Start Date: 18/Feb/20 23:48 Worklog Time Spent: 10m Work Description: robertwb commented on issue #10843: [BEAM-8201] Pass all other endpoints through provisioning service. URL: https://github.com/apache/beam/pull/10843#issuecomment-587962132 restest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389189) Time Spent: 1h 10m (was: 1h) > clean up the current container API > -- > > Key: BEAM-8201 > URL: https://issues.apache.org/jira/browse/BEAM-8201 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Hannah Jiang >Assignee: Robert Bradshaw >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > From [~robertwb] > As part of this project, I propose we look at and clean up the current > container API before we "release" it as public and stable. IIRC, we currently > provide the worker arguments through a combination of (1) environment > variables (2) command line parameters to docker and (3) via the provisioning > API. It would be good to have a more principled approach to specifying > arguments (either all the same way, or if they vary, good reason for doing so > rather than by historical accident). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=389169&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389169 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 18/Feb/20 23:20 Start Date: 18/Feb/20 23:20 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389169) Time Spent: 22.5h (was: 22h 20m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 22.5h > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389168&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389168 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 18/Feb/20 23:17 Start Date: 18/Feb/20 23:17 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #10886: [BEAM-8019] Updates DataflowRunner to support multiple SDK environments. URL: https://github.com/apache/beam/pull/10886#discussion_r380992944 ## File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py ## @@ -153,6 +158,9 @@ def __init__(self, packages, options, environment_version, pipeline_url): # User agent information. self.proto.userAgent = dataflow.Environment.UserAgentValue() self.local = 'localhost' in self.google_cloud_options.dataflow_endpoint +self._proto_pipeline = proto_pipeline +self._sdk_image_overrides = ( Review comment: OK. We can leave it as is. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389168) Time Spent: 3h 10m (was: 3h) > Support cross-language transforms for DataflowRunner > > > Key: BEAM-8019 > URL: https://issues.apache.org/jira/browse/BEAM-8019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > This is to capture the Beam changes needed for this task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1080) python sdk apiclient needs proper unit tests
[ https://issues.apache.org/jira/browse/BEAM-1080?focusedWorklogId=389165&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389165 ] ASF GitHub Bot logged work on BEAM-1080: Author: ASF GitHub Bot Created on: 18/Feb/20 23:11 Start Date: 18/Feb/20 23:11 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10890: [BEAM-1080] Skip tests that required GCP credentials URL: https://github.com/apache/beam/pull/10890#issuecomment-587951040 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389165) Time Spent: 2h 10m (was: 2h) > python sdk apiclient needs proper unit tests > > > Key: BEAM-1080 > URL: https://issues.apache.org/jira/browse/BEAM-1080 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Vikas Kedigehalli >Priority: Major > Labels: ccoss2019, newbie, starter > Time Spent: 2h 10m > Remaining Estimate: 0h > > There is only one unit test right now that tries to fetch actual gcp > credentials instead of mocking. This test fails when the credentials are not > available on the machine in which it is running. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8280) re-enable IOTypeHints.from_callable
[ https://issues.apache.org/jira/browse/BEAM-8280?focusedWorklogId=389164&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389164 ] ASF GitHub Bot logged work on BEAM-8280: Author: ASF GitHub Bot Created on: 18/Feb/20 23:11 Start Date: 18/Feb/20 23:11 Worklog Time Spent: 10m Work Description: udim commented on pull request #10894: [BEAM-8280] IOTypeHints debug_str traceback URL: https://github.com/apache/beam/pull/10894 - Improve format - Add multiple bases (in `.with_defaults()`) - Include debug_str output in ptransform.py TypeCheckErrors. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.ap
[jira] [Comment Edited] (BEAM-9198) BeamSQL aggregation analytics functions
[ https://issues.apache.org/jira/browse/BEAM-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039304#comment-17039304 ] Rui Wang edited comment on BEAM-9198 at 2/18/20 11:03 PM: -- There are many analytic functions, I will suggest to start from one single function, e.g. rank(), but aim at supporting the full functional syntax first: {code:sql} analytic_function_name ( [ argument_list ] ) OVER ( [ PARTITION BY partition_expression_list ] [ ORDER BY expression [{ ASC | DESC }] [, ...] ] [ window_frame_clause ] ) {code} By doing so you will be able to touch many concepts in distributed computing. E.g. per key computing, sorting, etc. and SQL, e.g. parser planner, etc. was (Author: amaliujia): There are many analytic functions, I will suggest to start from one single function, e.g. rank(), but aim at supporting the full functional syntax first: {code:sql} analytic_function_name ( [ argument_list ] ) OVER ( [ PARTITION BY partition_expression_list ] [ ORDER BY expression [{ ASC | DESC }] [, ...] ] [ window_frame_clause ] ) {code} By doing so you will be able to touch many concepts in distributed computing. E.g. per key computing, sorting, etc. > BeamSQL aggregation analytics functions > > > Key: BEAM-9198 > URL: https://issues.apache.org/jira/browse/BEAM-9198 > Project: Beam > Issue Type: Task > Components: dsl-sql >Reporter: Rui Wang >Priority: Major > Labels: gsoc, gsoc2020, mentor > > BeamSQL has a long list of of aggregation/aggregation analytics > functionalities to support. > To begin with, you will need to support this syntax: > {code:sql} > analytic_function_name ( [ argument_list ] ) > OVER ( > [ PARTITION BY partition_expression_list ] > [ ORDER BY expression [{ ASC | DESC }] [, ...] ] > [ window_frame_clause ] > ) > {code} > This will requires touch core components of BeamSQL: > 1. SQL parser to support the syntax above. > 2. SQL core to implement physical relational operator. > 3. Distributed algorithms to implement a list of functions in a distributed > manner. > 4. Build benchmarks to measure performance of your implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1833) Restructure Python pipeline construction to better follow the Runner API
[ https://issues.apache.org/jira/browse/BEAM-1833?focusedWorklogId=389158&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389158 ] ASF GitHub Bot logged work on BEAM-1833: Author: ASF GitHub Bot Created on: 18/Feb/20 22:55 Start Date: 18/Feb/20 22:55 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10860: [BEAM-1833] Fixes BEAM-1833 URL: https://github.com/apache/beam/pull/10860#issuecomment-587945517 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389158) Time Spent: 2h 10m (was: 2h) > Restructure Python pipeline construction to better follow the Runner API > > > Key: BEAM-1833 > URL: https://issues.apache.org/jira/browse/BEAM-1833 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Sam Rohde >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > The most important part is removing the runner.apply overrides, but there are > also various other improvements (e.g. all inputs and outputs should be named). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389151&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389151 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 18/Feb/20 22:44 Start Date: 18/Feb/20 22:44 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #10886: [BEAM-8019] Updates DataflowRunner to support multiple SDK environments. URL: https://github.com/apache/beam/pull/10886#discussion_r380980064 ## File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py ## @@ -153,6 +158,9 @@ def __init__(self, packages, options, environment_version, pipeline_url): # User agent information. self.proto.userAgent = dataflow.Environment.UserAgentValue() self.local = 'localhost' in self.google_cloud_options.dataflow_endpoint +self._proto_pipeline = proto_pipeline +self._sdk_image_overrides = ( Review comment: Seems like that results in a lint error (Dangerous default value dict() (builtins.dict) as argument). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389151) Time Spent: 3h (was: 2h 50m) > Support cross-language transforms for DataflowRunner > > > Key: BEAM-8019 > URL: https://issues.apache.org/jira/browse/BEAM-8019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > This is to capture the Beam changes needed for this task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1080) python sdk apiclient needs proper unit tests
[ https://issues.apache.org/jira/browse/BEAM-1080?focusedWorklogId=389145&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389145 ] ASF GitHub Bot logged work on BEAM-1080: Author: ASF GitHub Bot Created on: 18/Feb/20 22:34 Start Date: 18/Feb/20 22:34 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10890: [BEAM-1080] Skip tests that required GCP credentials URL: https://github.com/apache/beam/pull/10890#issuecomment-587938058 > Thanks, should we reopen BEAM-1080? Re-opened. Not sure why it was resolved. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389145) Time Spent: 2h (was: 1h 50m) > python sdk apiclient needs proper unit tests > > > Key: BEAM-1080 > URL: https://issues.apache.org/jira/browse/BEAM-1080 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Vikas Kedigehalli >Priority: Major > Labels: ccoss2019, newbie, starter > Time Spent: 2h > Remaining Estimate: 0h > > There is only one unit test right now that tries to fetch actual gcp > credentials instead of mocking. This test fails when the credentials are not > available on the machine in which it is running. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1833) Restructure Python pipeline construction to better follow the Runner API
[ https://issues.apache.org/jira/browse/BEAM-1833?focusedWorklogId=389144&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389144 ] ASF GitHub Bot logged work on BEAM-1833: Author: ASF GitHub Bot Created on: 18/Feb/20 22:33 Start Date: 18/Feb/20 22:33 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10860: [BEAM-1833] Fixes BEAM-1833 URL: https://github.com/apache/beam/pull/10860#issuecomment-587938021 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389144) Time Spent: 2h (was: 1h 50m) > Restructure Python pipeline construction to better follow the Runner API > > > Key: BEAM-1833 > URL: https://issues.apache.org/jira/browse/BEAM-1833 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Sam Rohde >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > The most important part is removing the runner.apply overrides, but there are > also various other improvements (e.g. all inputs and outputs should be named). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (BEAM-1080) python sdk apiclient needs proper unit tests
[ https://issues.apache.org/jira/browse/BEAM-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Altay reopened BEAM-1080: --- Assignee: (was: Pablo Estrada) > python sdk apiclient needs proper unit tests > > > Key: BEAM-1080 > URL: https://issues.apache.org/jira/browse/BEAM-1080 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Vikas Kedigehalli >Priority: Major > Labels: ccoss2019, newbie, starter > Fix For: 2.17.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > There is only one unit test right now that tries to fetch actual gcp > credentials instead of mocking. This test fails when the credentials are not > available on the machine in which it is running. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-1080) python sdk apiclient needs proper unit tests
[ https://issues.apache.org/jira/browse/BEAM-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Altay updated BEAM-1080: -- Fix Version/s: (was: 2.17.0) > python sdk apiclient needs proper unit tests > > > Key: BEAM-1080 > URL: https://issues.apache.org/jira/browse/BEAM-1080 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Vikas Kedigehalli >Priority: Major > Labels: ccoss2019, newbie, starter > Time Spent: 1h 50m > Remaining Estimate: 0h > > There is only one unit test right now that tries to fetch actual gcp > credentials instead of mocking. This test fails when the credentials are not > available on the machine in which it is running. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8629) WithTypeHints._get_or_create_type_hints may return a mutable copy of the class type hints.
[ https://issues.apache.org/jira/browse/BEAM-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri resolved BEAM-8629. - Fix Version/s: 2.20.0 Resolution: Fixed Above PR has been merged > WithTypeHints._get_or_create_type_hints may return a mutable copy of the > class type hints. > -- > > Key: BEAM-8629 > URL: https://issues.apache.org/jira/browse/BEAM-8629 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: Major > Fix For: 2.20.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9287) Python Validates runner tests for Unified Worker
[ https://issues.apache.org/jira/browse/BEAM-9287?focusedWorklogId=389140&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389140 ] ASF GitHub Bot logged work on BEAM-9287: Author: ASF GitHub Bot Created on: 18/Feb/20 22:27 Start Date: 18/Feb/20 22:27 Worklog Time Spent: 10m Work Description: angoenka commented on issue #10863: [BEAM-9287] Add Python streaming Validates runner tests for Unified Worker URL: https://github.com/apache/beam/pull/10863#issuecomment-587935706 Run Python Dataflow ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389140) Time Spent: 50m (was: 40m) > Python Validates runner tests for Unified Worker > > > Key: BEAM-9287 > URL: https://issues.apache.org/jira/browse/BEAM-9287 > Project: Beam > Issue Type: Test > Components: runner-dataflow, testing >Reporter: Ankur Goenka >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5605) Support Portable SplittableDoFn for batch
[ https://issues.apache.org/jira/browse/BEAM-5605?focusedWorklogId=389138&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389138 ] ASF GitHub Bot logged work on BEAM-5605: Author: ASF GitHub Bot Created on: 18/Feb/20 22:24 Start Date: 18/Feb/20 22:24 Worklog Time Spent: 10m Work Description: lukecwik commented on issue #10893: [BEAM-5605] Honor the bounded source timestamps timestamp in the SDF wrapper. URL: https://github.com/apache/beam/pull/10893#issuecomment-587934667 R: @boyuanzz This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389138) Time Spent: 16h (was: 15h 50m) > Support Portable SplittableDoFn for batch > - > > Key: BEAM-5605 > URL: https://issues.apache.org/jira/browse/BEAM-5605 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Scott Wegner >Assignee: Luke Cwik >Priority: Major > Labels: portability > Time Spent: 16h > Remaining Estimate: 0h > > Roll-up item tracking work towards supporting portable SplittableDoFn for > batch -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-5605) Support Portable SplittableDoFn for batch
[ https://issues.apache.org/jira/browse/BEAM-5605?focusedWorklogId=389137&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389137 ] ASF GitHub Bot logged work on BEAM-5605: Author: ASF GitHub Bot Created on: 18/Feb/20 22:24 Start Date: 18/Feb/20 22:24 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #10893: [BEAM-5605] Honor the bounded source timestamps timestamp in the SDF wrapper. URL: https://github.com/apache/beam/pull/10893 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds
[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.
[ https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=389136&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389136 ] ASF GitHub Bot logged work on BEAM-6522: Author: ASF GitHub Bot Created on: 18/Feb/20 22:21 Start Date: 18/Feb/20 22:21 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #10876: [BEAM-6522] Exclude the test that fails on earlier versions of Avro dependency that are still within allowed version range. URL: https://github.com/apache/beam/pull/10876 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389136) Time Spent: 11.5h (was: 11h 20m) > Dill fails to pickle avro.RecordSchema classes on Python 3. > > > Key: BEAM-6522 > URL: https://issues.apache.org/jira/browse/BEAM-6522 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Valentyn Tymofieiev >Priority: Major > Fix For: 2.16.0 > > Time Spent: 11.5h > Remaining Estimate: 0h > > The avroio module still has 4 failing tests. This is actually 2 times the > same 2 tests, both for Avro and Fastavro. > *apache_beam.io.avroio_test.TestAvro.test_sink_transform* > *apache_beam.io.avroio_test.TestFastAvro.test_sink_transform* > fail with: > {code:java} > Traceback (most recent call last): > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", > line 432, in test_sink_transform > | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line > 112, in __or__ > return self.pipeline.apply(ptransform, self) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line > 515, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line > 528, in expand > return pcoll | beam.io.iobase.Write(self._sink) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line > 112, in __or__ > return self.pipeline.apply(ptransform, self) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line > 515, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line > 960, in expand > return pcoll | WriteImpl(self.sink) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line > 112, in __or__ > return self.pipeline.apply(ptransform, self) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line > 515, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line > 979, in expand > lambda _, sink: sink.initialize_write(), self.sink) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", > line 1103, in Map > pardo = FlatMap(wrapper, *args, **kwargs) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", > line 1054, in FlatMap > pardo = ParDo(CallableWrapperDoFn(fn), *args, **kwargs) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", > line 864, in __init__ > super(ParDo, self).__init__(fn, *args, **kwargs) > File > "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/ptransform.py", > line 646, in __init__ > self.args = pickler.loads(pickler.dumps(self.args)) > File > "/home/robbe/workspace/beam/sd
[jira] [Work logged] (BEAM-9321) BigQuery avro write logical type support
[ https://issues.apache.org/jira/browse/BEAM-9321?focusedWorklogId=389132&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389132 ] ASF GitHub Bot logged work on BEAM-9321: Author: ASF GitHub Bot Created on: 18/Feb/20 22:13 Start Date: 18/Feb/20 22:13 Worklog Time Spent: 10m Work Description: regadas commented on issue #10869: [BEAM-9321] Add BigQuery Avro logical type support on write URL: https://github.com/apache/beam/pull/10869#issuecomment-587929726 R: @lukecwik This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389132) Time Spent: 0.5h (was: 20m) > BigQuery avro write logical type support > > > Key: BEAM-9321 > URL: https://issues.apache.org/jira/browse/BEAM-9321 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.19.0 >Reporter: Filipe Regadas >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > With 2.18.0 we are able to write GenericRecords to BigQuery. However, writing > does not respect Avro <-> BigQuery data type conversion > ([docs|https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro#logical_types]) > we need to set the useAvroLogicalTypes option. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389128&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389128 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 18/Feb/20 22:04 Start Date: 18/Feb/20 22:04 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #10892: [BEAM-8335] Make TestStream to/from runner_api include the output_tags property. URL: https://github.com/apache/beam/pull/10892 The TestStream has the "output_tags" property which keeps track of which events go to which PCollection. A TestStream going through a round-trip to/from proto won't have these fields set. To do this, a modification to the from_runner_api_parameter is needed to include the parent PTransform proto to retrieve the information from the outputs. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBui
[jira] [Work logged] (BEAM-1080) python sdk apiclient needs proper unit tests
[ https://issues.apache.org/jira/browse/BEAM-1080?focusedWorklogId=389127&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389127 ] ASF GitHub Bot logged work on BEAM-1080: Author: ASF GitHub Bot Created on: 18/Feb/20 22:01 Start Date: 18/Feb/20 22:01 Worklog Time Spent: 10m Work Description: y1chi commented on issue #10890: [BEAM-1080] Skip tests that required GCP credentials URL: https://github.com/apache/beam/pull/10890#issuecomment-587924966 Thanks, should we reopen BEAM-1080? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389127) Time Spent: 1h 50m (was: 1h 40m) > python sdk apiclient needs proper unit tests > > > Key: BEAM-1080 > URL: https://issues.apache.org/jira/browse/BEAM-1080 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Vikas Kedigehalli >Assignee: Pablo Estrada >Priority: Major > Labels: ccoss2019, newbie, starter > Fix For: 2.17.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > There is only one unit test right now that tries to fetch actual gcp > credentials instead of mocking. This test fails when the credentials are not > available on the machine in which it is running. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1080) python sdk apiclient needs proper unit tests
[ https://issues.apache.org/jira/browse/BEAM-1080?focusedWorklogId=389126&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389126 ] ASF GitHub Bot logged work on BEAM-1080: Author: ASF GitHub Bot Created on: 18/Feb/20 22:00 Start Date: 18/Feb/20 22:00 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10890: [BEAM-1080] Skip tests that required GCP credentials URL: https://github.com/apache/beam/pull/10890#issuecomment-587924568 R: @y1chi This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389126) Time Spent: 1h 40m (was: 1.5h) > python sdk apiclient needs proper unit tests > > > Key: BEAM-1080 > URL: https://issues.apache.org/jira/browse/BEAM-1080 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Vikas Kedigehalli >Assignee: Pablo Estrada >Priority: Major > Labels: ccoss2019, newbie, starter > Fix For: 2.17.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > There is only one unit test right now that tries to fetch actual gcp > credentials instead of mocking. This test fails when the credentials are not > available on the machine in which it is running. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1080) python sdk apiclient needs proper unit tests
[ https://issues.apache.org/jira/browse/BEAM-1080?focusedWorklogId=389125&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389125 ] ASF GitHub Bot logged work on BEAM-1080: Author: ASF GitHub Bot Created on: 18/Feb/20 22:00 Start Date: 18/Feb/20 22:00 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #10890: [BEAM-1080] Skip tests that required GCP credentials URL: https://github.com/apache/beam/pull/10890 Follow up to https://github.com/apache/beam/pull/10829 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Bui
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=389124&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389124 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 18/Feb/20 21:57 Start Date: 18/Feb/20 21:57 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-587923073 LGTM. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389124) Time Spent: 22h 20m (was: 22h 10m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 22h 20m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=389122&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389122 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 18/Feb/20 21:53 Start Date: 18/Feb/20 21:53 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-587916297 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389122) Time Spent: 22h 10m (was: 22h) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 22h 10m > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python
[ https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=389121&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389121 ] ASF GitHub Bot logged work on BEAM-7246: Author: ASF GitHub Bot Created on: 18/Feb/20 21:52 Start Date: 18/Feb/20 21:52 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10712: [BEAM-7246] Added Google Spanner Write Transform URL: https://github.com/apache/beam/pull/10712#issuecomment-587914900 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389121) Time Spent: 22h (was: 21h 50m) > Create a Spanner IO for Python > -- > > Key: BEAM-7246 > URL: https://issues.apache.org/jira/browse/BEAM-7246 > Project: Beam > Issue Type: Bug > Components: io-py-gcp >Reporter: Reuven Lax >Assignee: Shehzaad Nakhoda >Priority: Major > Time Spent: 22h > Remaining Estimate: 0h > > Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only). > Testing in this work item will be in the form of DirectRunner tests and > manual testing. > Integration and performance tests are a separate work item (not included > here). > See https://beam.apache.org/documentation/io/built-in/. The goal is to add > Google Clound Spanner to the Database column for the Python/Batch row. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=389119&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389119 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 18/Feb/20 21:47 Start Date: 18/Feb/20 21:47 Worklog Time Spent: 10m Work Description: angoenka commented on issue #10835: [BEAM-8575] Removed MAX_TIMESTAMP from testing data URL: https://github.com/apache/beam/pull/10835#issuecomment-587907253 Will wait for the tests to pass and merge This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389119) Time Spent: 54.5h (was: 54h 20m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 54.5h > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=389118&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389118 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 18/Feb/20 21:46 Start Date: 18/Feb/20 21:46 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10835: [BEAM-8575] Removed MAX_TIMESTAMP from testing data URL: https://github.com/apache/beam/pull/10835#issuecomment-587906045 Run PythonLint PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389118) Time Spent: 54h 20m (was: 54h 10m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 54h 20m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=389116&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389116 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 18/Feb/20 21:45 Start Date: 18/Feb/20 21:45 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10835: [BEAM-8575] Removed MAX_TIMESTAMP from testing data URL: https://github.com/apache/beam/pull/10835#issuecomment-587904955 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389116) Time Spent: 54h (was: 53h 50m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 54h > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=389117&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389117 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 18/Feb/20 21:45 Start Date: 18/Feb/20 21:45 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10835: [BEAM-8575] Removed MAX_TIMESTAMP from testing data URL: https://github.com/apache/beam/pull/10835#issuecomment-587905264 Run PythonLint PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389117) Time Spent: 54h 10m (was: 54h) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 54h 10m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389114&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389114 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 18/Feb/20 21:37 Start Date: 18/Feb/20 21:37 Worklog Time Spent: 10m Work Description: amoght commented on issue #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#issuecomment-587893601 @lukecwik I've incorporated mostly all the suggested changes in the PR. Please let me know your thoughts on this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389114) Time Spent: 10.5h (was: 10h 20m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 10.5h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389111&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389111 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 18/Feb/20 21:33 Start Date: 18/Feb/20 21:33 Worklog Time Spent: 10m Work Description: amoght commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r380947952 ## File path: sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java ## @@ -738,7 +1069,133 @@ public void testGzipProgress() throws IOException { assertThat(readerOrig, instanceOf(CompressedReader.class)); CompressedReader reader = (CompressedReader) readerOrig; // before starting - assertEquals(0.0, reader.getFractionConsumed(), 1e-6); + assertEquals(0.0, reader.getFractionConsumed(), DELTA); + assertEquals(0, reader.getSplitPointsConsumed()); + assertEquals(1, reader.getSplitPointsRemaining()); + + // confirm has three records + for (int i = 0; i < numRecords; ++i) { +if (i == 0) { + assertTrue(reader.start()); +} else { + assertTrue(reader.advance()); +} +assertEquals(0, reader.getSplitPointsConsumed()); +assertEquals(1, reader.getSplitPointsRemaining()); + } + assertFalse(reader.advance()); + + // after reading empty source Review comment: For now we have added a comment warning users that a concatenated lzo file doesn't gets decompressed correctly. Its added above testFalseReadConcatenatedLzop and testFalseReadMultiStreamLzop methods. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389111) Time Spent: 10h 20m (was: 10h 10m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 10h 20m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=389107&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389107 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 18/Feb/20 21:25 Start Date: 18/Feb/20 21:25 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10835: [BEAM-8575] Removed MAX_TIMESTAMP from testing data URL: https://github.com/apache/beam/pull/10835#issuecomment-587875180 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389107) Time Spent: 53h 50m (was: 53h 40m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 53h 50m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=389106&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389106 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 18/Feb/20 21:25 Start Date: 18/Feb/20 21:25 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10835: [BEAM-8575] Removed MAX_TIMESTAMP from testing data URL: https://github.com/apache/beam/pull/10835#issuecomment-587875076 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389106) Time Spent: 53h 40m (was: 53.5h) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 53h 40m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=389105&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389105 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 18/Feb/20 21:24 Start Date: 18/Feb/20 21:24 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10835: [BEAM-8575] Removed MAX_TIMESTAMP from testing data URL: https://github.com/apache/beam/pull/10835#issuecomment-587874929 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389105) Time Spent: 53.5h (was: 53h 20m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 53.5h > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-1833) Restructure Python pipeline construction to better follow the Runner API
[ https://issues.apache.org/jira/browse/BEAM-1833?focusedWorklogId=389104&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389104 ] ASF GitHub Bot logged work on BEAM-1833: Author: ASF GitHub Bot Created on: 18/Feb/20 21:24 Start Date: 18/Feb/20 21:24 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #10860: [BEAM-1833] Fixes BEAM-1833 URL: https://github.com/apache/beam/pull/10860#discussion_r380943475 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -314,7 +314,15 @@ def _replace_if_needed(self, original_transform_node): new_output.element_type = None Review comment: Added the assert This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389104) Time Spent: 1h 50m (was: 1h 40m) > Restructure Python pipeline construction to better follow the Runner API > > > Key: BEAM-1833 > URL: https://issues.apache.org/jira/browse/BEAM-1833 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Sam Rohde >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > The most important part is removing the runner.apply overrides, but there are > also various other improvements (e.g. all inputs and outputs should be named). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8916) external_test_it.py is not collected by pytest
[ https://issues.apache.org/jira/browse/BEAM-8916?focusedWorklogId=389103&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389103 ] ASF GitHub Bot logged work on BEAM-8916: Author: ASF GitHub Bot Created on: 18/Feb/20 21:23 Start Date: 18/Feb/20 21:23 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10749: [BEAM-8916] Rename external_test_it so that it is picked up by pytest URL: https://github.com/apache/beam/pull/10749#issuecomment-587872991 Run Python2_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389103) Time Spent: 1h 20m (was: 1h 10m) > external_test_it.py is not collected by pytest > -- > > Key: BEAM-8916 > URL: https://issues.apache.org/jira/browse/BEAM-8916 > Project: Beam > Issue Type: Bug > Components: sdk-py-core, testing >Reporter: Udi Meiri >Assignee: Chamikara Madhusanka Jayalath >Priority: Critical > Time Spent: 1h 20m > Remaining Estimate: 0h > > pytest only collects tests matching these patterns: > https://github.com/apache/beam/blob/8066d78f0fd2237b718859d4a776511203880df0/sdks/python/pytest.ini#L27 > Please rename the file. (ex: external_integration_test.py) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=389102&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389102 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 18/Feb/20 21:23 Start Date: 18/Feb/20 21:23 Worklog Time Spent: 10m Work Description: bumblebee-coming commented on issue #10835: [BEAM-8575] Removed MAX_TIMESTAMP from testing data URL: https://github.com/apache/beam/pull/10835#issuecomment-587872503 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389102) Time Spent: 53h 20m (was: 53h 10m) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 53h 20m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests
[ https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=389101&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389101 ] ASF GitHub Bot logged work on BEAM-8575: Author: ASF GitHub Bot Created on: 18/Feb/20 21:19 Start Date: 18/Feb/20 21:19 Worklog Time Spent: 10m Work Description: bumblebee-coming commented on issue #10835: [BEAM-8575] Removed MAX_TIMESTAMP from testing data URL: https://github.com/apache/beam/pull/10835#issuecomment-587866982 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389101) Time Spent: 53h 10m (was: 53h) > Add more Python validates runner tests > -- > > Key: BEAM-8575 > URL: https://issues.apache.org/jira/browse/BEAM-8575 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: wendy liu >Assignee: wendy liu >Priority: Major > Time Spent: 53h 10m > Remaining Estimate: 0h > > This is the umbrella issue to track the work of adding more Python tests to > improve test coverage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality
[ https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=389100&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389100 ] ASF GitHub Bot logged work on BEAM-9146: Author: ASF GitHub Bot Created on: 18/Feb/20 21:17 Start Date: 18/Feb/20 21:17 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10764: [BEAM-9146] Integrate GCP Video Intelligence functionality for Python SDK URL: https://github.com/apache/beam/pull/10764#issuecomment-587865079 :sdks:python:test-suites:tox:pycommon:docs task is failing -- There might be pydocs issues in the change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389100) Time Spent: 12h 20m (was: 12h 10m) > [Python] PTransform that integrates Video Intelligence functionality > > > Key: BEAM-9146 > URL: https://issues.apache.org/jira/browse/BEAM-9146 > Project: Beam > Issue Type: Sub-task > Components: io-py-gcp >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 12h 20m > Remaining Estimate: 0h > > The goal is to create a PTransform that integrates Google Cloud Video > Intelligence functionality [1]. > The transform should be able to take both video GCS location or video data > bytes as an input. > [1] https://cloud.google.com/video-intelligence/ -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.
[ https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=389099&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389099 ] ASF GitHub Bot logged work on BEAM-6522: Author: ASF GitHub Bot Created on: 18/Feb/20 21:15 Start Date: 18/Feb/20 21:15 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #10876: [BEAM-6522] Exclude the test that fails on earlier versions of Avro dependency that are still within allowed version range. URL: https://github.com/apache/beam/pull/10876#issuecomment-587861035 R: @TheNeuralBit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389099) Time Spent: 11h 20m (was: 11h 10m) > Dill fails to pickle avro.RecordSchema classes on Python 3. > > > Key: BEAM-6522 > URL: https://issues.apache.org/jira/browse/BEAM-6522 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Robbe >Assignee: Valentyn Tymofieiev >Priority: Major > Fix For: 2.16.0 > > Time Spent: 11h 20m > Remaining Estimate: 0h > > The avroio module still has 4 failing tests. This is actually 2 times the > same 2 tests, both for Avro and Fastavro. > *apache_beam.io.avroio_test.TestAvro.test_sink_transform* > *apache_beam.io.avroio_test.TestFastAvro.test_sink_transform* > fail with: > {code:java} > Traceback (most recent call last): > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", > line 432, in test_sink_transform > | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line > 112, in __or__ > return self.pipeline.apply(ptransform, self) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line > 515, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line > 528, in expand > return pcoll | beam.io.iobase.Write(self._sink) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line > 112, in __or__ > return self.pipeline.apply(ptransform, self) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line > 515, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line > 960, in expand > return pcoll | WriteImpl(self.sink) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line > 112, in __or__ > return self.pipeline.apply(ptransform, self) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line > 515, in apply > pvalueish_result = self.runner.apply(transform, pvalueish, self._options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 193, in apply > return m(transform, input, options) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", > line 199, in apply_PTransform > return transform.expand(input) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line > 979, in expand > lambda _, sink: sink.initialize_write(), self.sink) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", > line 1103, in Map > pardo = FlatMap(wrapper, *args, **kwargs) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", > line 1054, in FlatMap > pardo = ParDo(CallableWrapperDoFn(fn), *args, **kwargs) > File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", > line 864, in __init__ > super(ParDo, self).__init__(fn, *args, **kwargs) > File > "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/ptransform.py", > line 646, in __init__ > self.args = pickler.loads(pickler.dumps(self.args)) > File
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389097&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389097 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 18/Feb/20 21:14 Start Date: 18/Feb/20 21:14 Worklog Time Spent: 10m Work Description: amoght commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r380938634 ## File path: sdks/java/core/build.gradle ## @@ -91,4 +95,6 @@ dependencies { shadowTest library.java.avro_tests shadowTest library.java.zstd_jni testRuntimeOnly library.java.slf4j_jdk14 + compileOnly 'io.airlift:aircompressor:0.16' Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389097) Time Spent: 10h (was: 9h 50m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 10h > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7304) Twister2 Beam runner
[ https://issues.apache.org/jira/browse/BEAM-7304?focusedWorklogId=389095&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389095 ] ASF GitHub Bot logged work on BEAM-7304: Author: ASF GitHub Bot Created on: 18/Feb/20 21:14 Start Date: 18/Feb/20 21:14 Worklog Time Spent: 10m Work Description: pulasthi commented on issue #10888: [BEAM-7304] Twister2 Beam runner URL: https://github.com/apache/beam/pull/10888#issuecomment-587859343 It looks like its retesting ignoring the last commit which fixes some of the earlier check failures This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389095) Time Spent: 3h 10m (was: 3h) > Twister2 Beam runner > > > Key: BEAM-7304 > URL: https://issues.apache.org/jira/browse/BEAM-7304 > Project: Beam > Issue Type: New Feature > Components: runner-ideas >Reporter: Pulasthi Wickramasinghe >Assignee: Pulasthi Wickramasinghe >Priority: Minor > Time Spent: 3h 10m > Remaining Estimate: 0h > > Twister2 is a big data framework which supports both batch and stream > processing [1] [2]. The goal is to develop an beam runner for Twister2. > [1] [https://github.com/DSC-SPIDAL/twister2] > [2] [https://twister2.org/] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389096&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389096 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 18/Feb/20 21:14 Start Date: 18/Feb/20 21:14 Worklog Time Spent: 10m Work Description: amoght commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r380938566 ## File path: sdks/java/core/build.gradle ## @@ -91,4 +95,6 @@ dependencies { shadowTest library.java.avro_tests shadowTest library.java.zstd_jni testRuntimeOnly library.java.slf4j_jdk14 + compileOnly 'io.airlift:aircompressor:0.16' + compileOnly 'com.facebook.presto.hadoop:hadoop-apache2:3.2.0-1' Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389096) Time Spent: 9h 50m (was: 9h 40m) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 9h 50m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389098&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389098 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 18/Feb/20 21:14 Start Date: 18/Feb/20 21:14 Worklog Time Spent: 10m Work Description: amoght commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r380938720 ## File path: sdks/java/core/build.gradle ## @@ -58,6 +58,10 @@ test { } } +configurations { +testCompile.extendsFrom compileOnly +} + Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389098) Time Spent: 10h 10m (was: 10h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 10h 10m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389094&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389094 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 18/Feb/20 21:14 Start Date: 18/Feb/20 21:14 Worklog Time Spent: 10m Work Description: amoght commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r380938458 ## File path: sdks/java/core/build.gradle ## @@ -91,4 +95,6 @@ dependencies { shadowTest library.java.avro_tests shadowTest library.java.zstd_jni testRuntimeOnly library.java.slf4j_jdk14 + compileOnly 'io.airlift:aircompressor:0.16' Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389094) Time Spent: 9h 40m (was: 9.5h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 9h 40m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support
[ https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389090&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389090 ] ASF GitHub Bot logged work on BEAM-8564: Author: ASF GitHub Bot Created on: 18/Feb/20 21:13 Start Date: 18/Feb/20 21:13 Worklog Time Spent: 10m Work Description: amoght commented on pull request #10254: [BEAM-8564] Add LZO compression and decompression support URL: https://github.com/apache/beam/pull/10254#discussion_r380938051 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/util/LzoCompressorInputStream.java ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.util; + +import io.airlift.compress.lzo.LzoCodec; +import java.io.IOException; +import java.io.InputStream; +import org.apache.commons.compress.compressors.CompressorInputStream; +import org.apache.commons.compress.utils.CountingInputStream; +import org.apache.commons.compress.utils.IOUtils; +import org.apache.commons.compress.utils.InputStreamStatistics; + +/** + * {@link CompressorInputStream} implementation to create LZO encoded stream. Library relies on https://github.com/airlift/aircompressor/";>LZO + * + * @since 1.18 + */ +public class LzoCompressorInputStream extends CompressorInputStream Review comment: replaced wrapper classes with static methods This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 389090) Time Spent: 9h 10m (was: 9h) > Add LZO compression and decompression support > - > > Key: BEAM-8564 > URL: https://issues.apache.org/jira/browse/BEAM-8564 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Amogh Tiwari >Assignee: Amogh Tiwari >Priority: Minor > Time Spent: 9h 10m > Remaining Estimate: 0h > > LZO is a lossless data compression algorithm which is focused on compression > and decompression speeds. > This will enable Apache Beam sdk to compress/decompress files using LZO > compression algorithm. > This will include the following functionalities: > # compress() : for compressing files into an LZO archive > # decompress() : for decompressing files archived using LZO compression > Appropriate Input and Output stream will also be added to enable working with > LZO files. -- This message was sent by Atlassian Jira (v8.3.4#803005)