[jira] [Work logged] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9286?focusedWorklogId=389339&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389339
 ]

ASF GitHub Bot logged work on BEAM-9286:


Author: ASF GitHub Bot
Created on: 19/Feb/20 06:41
Start Date: 19/Feb/20 06:41
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #10823: [BEAM-9286] Create 
validation runner test for metrics (user counter). 
URL: https://github.com/apache/beam/pull/10823#issuecomment-588058804
 
 
   > I'm surprised we don't already have a test for such a basic feature.
   > 
   > FYI, you can use yapf to format your code: 
https://cwiki.apache.org/confluence/display/BEAM/Python+Tips
   
   Thanks Robert.  I was able to only find much a larger test that covers user 
counters, where counter verification is just small part of the whole thing. 
With a dedicated self-contained test for metrics, we may also keep adding other 
metrics related things here. 
   
   PR updated.  Also applied yapf.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389339)
Time Spent: 2h 10m  (was: 2h)

> Create validation tests for metrics based on MonitoringInfo if applicable
> -
>
> Key: BEAM-9286
> URL: https://issues.apache.org/jira/browse/BEAM-9286
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Minor
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Create dedicated validation runner tests for metrics (those based Monitoring 
> Info). 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9286?focusedWorklogId=389338&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389338
 ]

ASF GitHub Bot logged work on BEAM-9286:


Author: ASF GitHub Bot
Created on: 19/Feb/20 06:41
Start Date: 19/Feb/20 06:41
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #10823: [BEAM-9286] Create 
validation runner test for metrics (user counter). 
URL: https://github.com/apache/beam/pull/10823#issuecomment-588058804
 
 
   > I'm surprised we don't already have a test for such a basic feature.
   > 
   > FYI, you can use yapf to format your code: 
https://cwiki.apache.org/confluence/display/BEAM/Python+Tips
   
   Thanks Robert.  I was able to only find much larger test that cover user 
counters, where counter verification is just small part of the whole thing. 
With a dedicated self-contained test for metrics, we may also keep adding other 
metrics related things here. 
   
   PR updated.  Also applied yapf.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389338)
Time Spent: 2h  (was: 1h 50m)

> Create validation tests for metrics based on MonitoringInfo if applicable
> -
>
> Key: BEAM-9286
> URL: https://issues.apache.org/jira/browse/BEAM-9286
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Create dedicated validation runner tests for metrics (those based Monitoring 
> Info). 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9286?focusedWorklogId=389337&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389337
 ]

ASF GitHub Bot logged work on BEAM-9286:


Author: ASF GitHub Bot
Created on: 19/Feb/20 06:32
Start Date: 19/Feb/20 06:32
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #10823: [BEAM-9286] Create 
validation runner test for metrics (user counter). 
URL: https://github.com/apache/beam/pull/10823#issuecomment-588058804
 
 
   > I'm surprised we don't already have a test for such a basic feature.
   > 
   > FYI, you can use yapf to format your code: 
https://cwiki.apache.org/confluence/display/BEAM/Python+Tips
   
   Thanks Robert.  I was able to find larger test that cover user counters, but 
no dedicated validation runner test for metrics. 
   
   PR updated.  Also applied yapf.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389337)
Time Spent: 1h 50m  (was: 1h 40m)

> Create validation tests for metrics based on MonitoringInfo if applicable
> -
>
> Key: BEAM-9286
> URL: https://issues.apache.org/jira/browse/BEAM-9286
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Create dedicated validation runner tests for metrics (those based Monitoring 
> Info). 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9286?focusedWorklogId=389335&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389335
 ]

ASF GitHub Bot logged work on BEAM-9286:


Author: ASF GitHub Bot
Created on: 19/Feb/20 06:32
Start Date: 19/Feb/20 06:32
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on issue #10823: [BEAM-9286] Create 
validation runner test for metrics (user counter). 
URL: https://github.com/apache/beam/pull/10823#issuecomment-588058804
 
 
   > I'm surprised we don't already have a test for such a basic feature.
   > 
   > FYI, you can use yapf to format your code: 
https://cwiki.apache.org/confluence/display/BEAM/Python+Tips
   
   Thanks Robert.  I was able to find larger test that cover user counters, but 
no dedicated validation runner test for metrics. 
   
   PR updated. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389335)
Time Spent: 1h 40m  (was: 1.5h)

> Create validation tests for metrics based on MonitoringInfo if applicable
> -
>
> Key: BEAM-9286
> URL: https://issues.apache.org/jira/browse/BEAM-9286
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Create dedicated validation runner tests for metrics (those based Monitoring 
> Info). 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9276) python: create a class to encapsulate the work required to submit a pipeline to a job service

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9276?focusedWorklogId=389334&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389334
 ]

ASF GitHub Bot logged work on BEAM-9276:


Author: ASF GitHub Bot
Created on: 19/Feb/20 06:16
Start Date: 19/Feb/20 06:16
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #10811: [BEAM-9276] Create a 
class to encapsulate the steps required to submit a pipeline
URL: https://github.com/apache/beam/pull/10811#issuecomment-588054888
 
 
   Any thoughts on this?
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389334)
Time Spent: 40m  (was: 0.5h)

> python: create a class to encapsulate the work required to submit a pipeline 
> to a job service
> -
>
> Key: BEAM-9276
> URL: https://issues.apache.org/jira/browse/BEAM-9276
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> {{PortableRunner.run_pipeline}} is somewhat of a monolithic method for 
> submitting a pipeline.  It would be useful to factor out the code responsible 
> for interacting with the job and artifact services (prepare, stage, run) to 
> make this easier to modify this behavior in portable runner subclasses, as 
> well as in tests. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=389333&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389333
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 19/Feb/20 06:15
Start Date: 19/Feb/20 06:15
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #10822: [BEAM-7746] Minor 
typing updates / fixes
URL: https://github.com/apache/beam/pull/10822#issuecomment-584807444
 
 
   
   R: @udim 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389333)
Time Spent: 65h 10m  (was: 65h)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 65h 10m
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9286?focusedWorklogId=389330&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389330
 ]

ASF GitHub Bot logged work on BEAM-9286:


Author: ASF GitHub Bot
Created on: 19/Feb/20 06:14
Start Date: 19/Feb/20 06:14
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on pull request #10823: [BEAM-9286] 
Create validation runner test for metrics (user counter). 
URL: https://github.com/apache/beam/pull/10823#discussion_r381094747
 
 

 ##
 File path: sdks/python/apache_beam/metrics/metric_test.py
 ##
 @@ -127,6 +134,35 @@ def test_distribution_empty_namespace(self):
 with self.assertRaises(ValueError):
   Metrics.distribution("", "names")
 
+  @attr('ValidatesRunner')
+  def test_user_counter_using_pardo(self):
+class SomeDoFn(beam.DoFn):
+  """A custom dummy DoFn using yield."""
+  def __init__(self):
+self.user_counter_elements = metrics.Metrics.counter(
+  self.__class__, 'metrics_user_counter_element')
+
+  def process(self, element):
+self.user_counter_elements.inc()
+yield element
+
+pipeline = TestPipeline()
+nums = pipeline | 'Input' >> beam.Create([1, 2, 3, 4])
+results = nums | 'ApplyPardo' >> beam.ParDo(SomeDoFn())
+
+res = pipeline.run()
+res.wait_until_finish()
+metric_results = (
+  res.metrics().query(MetricsFilter()
+.with_name('metrics_user_counter_element')))
+outputs_counter = metric_results['counters'][0]
+assert_that(results, equal_to([1, 2, 3, 4]))
 
 Review comment:
   doen.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389330)
Time Spent: 1h 20m  (was: 1h 10m)

> Create validation tests for metrics based on MonitoringInfo if applicable
> -
>
> Key: BEAM-9286
> URL: https://issues.apache.org/jira/browse/BEAM-9286
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Create dedicated validation runner tests for metrics (those based Monitoring 
> Info). 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9286?focusedWorklogId=389331&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389331
 ]

ASF GitHub Bot logged work on BEAM-9286:


Author: ASF GitHub Bot
Created on: 19/Feb/20 06:14
Start Date: 19/Feb/20 06:14
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on pull request #10823: [BEAM-9286] 
Create validation runner test for metrics (user counter). 
URL: https://github.com/apache/beam/pull/10823#discussion_r381094747
 
 

 ##
 File path: sdks/python/apache_beam/metrics/metric_test.py
 ##
 @@ -127,6 +134,35 @@ def test_distribution_empty_namespace(self):
 with self.assertRaises(ValueError):
   Metrics.distribution("", "names")
 
+  @attr('ValidatesRunner')
+  def test_user_counter_using_pardo(self):
+class SomeDoFn(beam.DoFn):
+  """A custom dummy DoFn using yield."""
+  def __init__(self):
+self.user_counter_elements = metrics.Metrics.counter(
+  self.__class__, 'metrics_user_counter_element')
+
+  def process(self, element):
+self.user_counter_elements.inc()
+yield element
+
+pipeline = TestPipeline()
+nums = pipeline | 'Input' >> beam.Create([1, 2, 3, 4])
+results = nums | 'ApplyPardo' >> beam.ParDo(SomeDoFn())
+
+res = pipeline.run()
+res.wait_until_finish()
+metric_results = (
+  res.metrics().query(MetricsFilter()
+.with_name('metrics_user_counter_element')))
+outputs_counter = metric_results['counters'][0]
+assert_that(results, equal_to([1, 2, 3, 4]))
 
 Review comment:
   done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389331)
Time Spent: 1.5h  (was: 1h 20m)

> Create validation tests for metrics based on MonitoringInfo if applicable
> -
>
> Key: BEAM-9286
> URL: https://issues.apache.org/jira/browse/BEAM-9286
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Create dedicated validation runner tests for metrics (those based Monitoring 
> Info). 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9286?focusedWorklogId=389329&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389329
 ]

ASF GitHub Bot logged work on BEAM-9286:


Author: ASF GitHub Bot
Created on: 19/Feb/20 06:14
Start Date: 19/Feb/20 06:14
Worklog Time Spent: 10m 
  Work Description: HuangLED commented on pull request #10823: [BEAM-9286] 
Create validation runner test for metrics (user counter). 
URL: https://github.com/apache/beam/pull/10823#discussion_r381094727
 
 

 ##
 File path: sdks/python/apache_beam/metrics/metric_test.py
 ##
 @@ -127,6 +134,35 @@ def test_distribution_empty_namespace(self):
 with self.assertRaises(ValueError):
   Metrics.distribution("", "names")
 
+  @attr('ValidatesRunner')
+  def test_user_counter_using_pardo(self):
+class SomeDoFn(beam.DoFn):
+  """A custom dummy DoFn using yield."""
+  def __init__(self):
+self.user_counter_elements = metrics.Metrics.counter(
+  self.__class__, 'metrics_user_counter_element')
+
+  def process(self, element):
+self.user_counter_elements.inc()
+yield element
+
+pipeline = TestPipeline()
+nums = pipeline | 'Input' >> beam.Create([1, 2, 3, 4])
+results = nums | 'ApplyPardo' >> beam.ParDo(SomeDoFn())
+
+res = pipeline.run()
+res.wait_until_finish()
+metric_results = (
+  res.metrics().query(MetricsFilter()
+.with_name('metrics_user_counter_element')))
+outputs_counter = metric_results['counters'][0]
+assert_that(results, equal_to([1, 2, 3, 4]))
+
+self.assertEqual(outputs_counter.key.metric.name,
+'metrics_user_counter_element')
+self.assertEqual(outputs_counter.committed, 4)
+self.assertEqual(outputs_counter.attempted, 4)
 
 Review comment:
   removed the assert on attempted. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389329)
Time Spent: 1h 10m  (was: 1h)

> Create validation tests for metrics based on MonitoringInfo if applicable
> -
>
> Key: BEAM-9286
> URL: https://issues.apache.org/jira/browse/BEAM-9286
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Create dedicated validation runner tests for metrics (those based Monitoring 
> Info). 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7746) Add type hints to python code

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=389332&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389332
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 19/Feb/20 06:14
Start Date: 19/Feb/20 06:14
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #10822: [BEAM-7746] Minor 
typing updates / fixes
URL: https://github.com/apache/beam/pull/10822#issuecomment-588054472
 
 
   R: @robertwb 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389332)
Time Spent: 65h  (was: 64h 50m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 65h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9274) Support running yapf in a git pre-commit hook

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9274?focusedWorklogId=389328&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389328
 ]

ASF GitHub Bot logged work on BEAM-9274:


Author: ASF GitHub Bot
Created on: 19/Feb/20 06:12
Start Date: 19/Feb/20 06:12
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #10810: [BEAM-9274] Support 
running yapf in a git pre-commit hook
URL: https://github.com/apache/beam/pull/10810#issuecomment-588053806
 
 
   How are people feeling about this MR?   There's one outstanding review note 
for a comment in tox.ini.  After that are we ok with merging this?
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389328)
Time Spent: 1h 10m  (was: 1h)

> Support running yapf in a git pre-commit hook
> -
>
> Key: BEAM-9274
> URL: https://issues.apache.org/jira/browse/BEAM-9274
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> As a developer I want to be able to automatically run yapf before I make a 
> commit so that I don't waste time with failures on jenkins. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8979) protoc-gen-mypy: program not found or is not executable

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8979?focusedWorklogId=389327&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389327
 ]

ASF GitHub Bot logged work on BEAM-8979:


Author: ASF GitHub Bot
Created on: 19/Feb/20 06:11
Start Date: 19/Feb/20 06:11
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #10734: [BEAM-8979] 
reintroduce mypy-protobuf stub generation
URL: https://github.com/apache/beam/pull/10734#issuecomment-588053537
 
 
   The Python test has been stuck for days.
   
   Btw, I've submitted my CLAs to the secretary, so hopefully I'll be able to 
solve these problems on my own soon. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389327)
Time Spent: 10h 20m  (was: 10h 10m)

> protoc-gen-mypy: program not found or is not executable
> ---
>
> Key: BEAM-8979
> URL: https://issues.apache.org/jira/browse/BEAM-8979
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Kamil Wasilewski
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> In some tests, `:sdks:python:sdist:` task fails due to problems in finding 
> protoc-gen-mypy. The following tests are affected (there might be more):
>  * 
> [https://builds.apache.org/job/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/]
>  * 
> [https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/
>  
> |https://builds.apache.org/job/beam_BiqQueryIO_Write_Performance_Test_Python_Batch/]
> Relevant logs:
> {code:java}
> 10:46:32 > Task :sdks:python:sdist FAILED
> 10:46:32 Requirement already satisfied: mypy-protobuf==1.12 in 
> /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages
>  (1.12)
> 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/descriptor.proto 
> but not used.
> 10:46:32 beam_fn_api.proto: warning: Import google/protobuf/wrappers.proto 
> but not used.
> 10:46:32 protoc-gen-mypy: program not found or is not executable
> 10:46:32 --mypy_out: protoc-gen-mypy: Plugin failed with status code 1.
> 10:46:32 
> /home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/dist.py:476:
>  UserWarning: Normalizing '2.19.0.dev' to '2.19.0.dev0'
> 10:46:32   normalized_version,
> 10:46:32 Traceback (most recent call last):
> 10:46:32   File "setup.py", line 295, in 
> 10:46:32 'mypy': generate_protos_first(mypy),
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/__init__.py",
>  line 145, in setup
> 10:46:32 return distutils.core.setup(**attrs)
> 10:46:32   File "/usr/lib/python3.7/distutils/core.py", line 148, in setup
> 10:46:32 dist.run_commands()
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 966, in 
> run_commands
> 10:46:32 self.run_command(cmd)
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 985, in 
> run_command
> 10:46:32 cmd_obj.run()
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/build/gradleenv/192237/lib/python3.7/site-packages/setuptools/command/sdist.py",
>  line 44, in run
> 10:46:32 self.run_command('egg_info')
> 10:46:32   File "/usr/lib/python3.7/distutils/cmd.py", line 313, in 
> run_command
> 10:46:32 self.distribution.run_command(command)
> 10:46:32   File "/usr/lib/python3.7/distutils/dist.py", line 985, in 
> run_command
> 10:46:32 cmd_obj.run()
> 10:46:32   File "setup.py", line 220, in run
> 10:46:32 gen_protos.generate_proto_files(log=log)
> 10:46:32   File 
> "/home/jenkins/jenkins-slave/workspace/beam_LoadTests_Python_37_ParDo_Dataflow_Batch_PR/src/sdks/python/gen_protos.py",
>  line 144, in generate_proto_files
> 10:46:32 '%s' % ret_code)
> 10:46:32 RuntimeError: Protoc returned non-zero status (see logs for 
> details): 1
> {code}
>  
> This is what I have tried so far to resolve this (without being successful):
>  * Including _--plugin=protoc-gen-mypy=\{abs_path_to_executable}_ parameter 
> to the _protoc_ call ingen_protos.py:131
>  * Appending protoc-gen-mypy's directory to the PATH variable
> I wasn't able to reproduce th

[jira] [Work logged] (BEAM-8889) Make GcsUtil use GoogleCloudStorage

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8889?focusedWorklogId=389318&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389318
 ]

ASF GitHub Bot logged work on BEAM-8889:


Author: ASF GitHub Bot
Created on: 19/Feb/20 05:42
Start Date: 19/Feb/20 05:42
Worklog Time Spent: 10m 
  Work Description: veblush commented on issue #10769: [BEAM-8889] Upgrades 
gcsio to 2.0.0
URL: https://github.com/apache/beam/pull/10769#issuecomment-588046268
 
 
   It seems that `Java11`, `JavaPortabilityApiJava11`, and  
`Java_Examples_Dataflow_Java11` are stuck?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389318)
Remaining Estimate: 148h 50m  (was: 149h)
Time Spent: 19h 10m  (was: 19h)

> Make GcsUtil use GoogleCloudStorage
> ---
>
> Key: BEAM-8889
> URL: https://issues.apache.org/jira/browse/BEAM-8889
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.16.0
>Reporter: Esun Kim
>Assignee: VASU NORI
>Priority: Major
>  Labels: gcs
>   Original Estimate: 168h
>  Time Spent: 19h 10m
>  Remaining Estimate: 148h 50m
>
> [GcsUtil|https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/GcsUtil.java]
>  is a primary class to access Google Cloud Storage on Apache Beam. Current 
> implementation directly creates GoogleCloudStorageReadChannel and 
> GoogleCloudStorageWriteChannel by itself to read and write GCS data rather 
> than using 
> [GoogleCloudStorage|https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcsio/src/main/java/com/google/cloud/hadoop/gcsio/GoogleCloudStorage.java]
>  which is an abstract class providing basic IO capability which eventually 
> creates channel objects. This request is about updating GcsUtil to use 
> GoogleCloudStorage to create read and write channel, which is expected 
> flexible because it can easily pick up the new change; e.g. new channel 
> implementation using new protocol without code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9240) Check for Nullability in typesEqual() method of FieldType class

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9240?focusedWorklogId=389310&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389310
 ]

ASF GitHub Bot logged work on BEAM-9240:


Author: ASF GitHub Bot
Created on: 19/Feb/20 05:08
Start Date: 19/Feb/20 05:08
Worklog Time Spent: 10m 
  Work Description: rahul8383 commented on issue #10744: [BEAM-9240]: Check 
for Nullability in typesEqual() method of FieldTyp…
URL: https://github.com/apache/beam/pull/10744#issuecomment-588038492
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389310)
Time Spent: 1.5h  (was: 1h 20m)

> Check for Nullability in typesEqual() method of FieldType class
> ---
>
> Key: BEAM-9240
> URL: https://issues.apache.org/jira/browse/BEAM-9240
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.18.0
>Reporter: Rahul Patwari
>Assignee: Rahul Patwari
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> {{If two schemas are created like this:}}
> {{Schema schema1 = Schema.builder().addStringField("col1").build();}}
>  {{Schema schema2 = Schema.builder().addNullableField("col1", 
> FieldType.STRING).build();}}
>  
> {{schema1.typeEquals(schema2) returns "true" even though the schemas differ 
> by Nullability}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8965) WriteToBigQuery failed in BundleBasedDirectRunner

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8965?focusedWorklogId=389308&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389308
 ]

ASF GitHub Bot logged work on BEAM-8965:


Author: ASF GitHub Bot
Created on: 19/Feb/20 04:56
Start Date: 19/Feb/20 04:56
Worklog Time Spent: 10m 
  Work Description: bobingm commented on issue #10901: [BEAM-8965] Remove 
duplicate sideinputs in ConsumerTrackingPipelineVisitor
URL: https://github.com/apache/beam/pull/10901#issuecomment-588035568
 
 
   R: @pabloem @charlesccychen
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389308)
Time Spent: 20m  (was: 10m)

> WriteToBigQuery failed in BundleBasedDirectRunner
> -
>
> Key: BEAM-8965
> URL: https://issues.apache.org/jira/browse/BEAM-8965
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Affects Versions: 2.16.0, 2.17.0, 2.18.0, 2.19.0
>Reporter: Wenbing Bai
>Assignee: Wenbing Bai
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> *{{WriteToBigQuery}}* fails in *{{BundleBasedDirectRunner}}* with error 
> {{PCollection of size 2 with more than one element accessed as a singleton 
> view.}}
> Here is the code
>  
> {code:python}
> with Pipeline() as p:
> query_results = (
> p 
> | beam.io.Read(beam.io.BigQuerySource(
> query='SELECT ... FROM ...')
> )
> query_results | beam.io.gcp.WriteToBigQuery(
> table=,
> method=WriteToBigQuery.Method.FILE_LOADS,
> schema={"fields": []}
> )
> {code}
>  
> Here is the error
>  
> {code:none}
>   File "apache_beam/runners/common.py", line 778, in 
> apache_beam.runners.common.DoFnRunner.process
>     def process(self, windowed_value):
>   File "apache_beam/runners/common.py", line 782, in 
> apache_beam.runners.common.DoFnRunner.process
>     self._reraise_augmented(exn)
>   File "apache_beam/runners/common.py", line 849, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented
>     raise_with_traceback(new_exn)
>   File "apache_beam/runners/common.py", line 780, in 
> apache_beam.runners.common.DoFnRunner.process
>     return self.do_fn_invoker.invoke_process(windowed_value)
>   File "apache_beam/runners/common.py", line 587, in 
> apache_beam.runners.common.PerWindowInvoker.invoke_process
>     self._invoke_process_per_window(
>   File "apache_beam/runners/common.py", line 610, in 
> apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
>     [si[global_window] for si in self.side_inputs]))
>   File 
> "/home/wbai/terra/terra_py2/local/lib/python2.7/site-packages/apache_beam/transforms/sideinputs.py",
>  line 65, in __getitem__
>     _FilteringIterable(self._iterable, target_window), self._view_options)
>   File 
> "/home/wbai/terra/terra_py2/local/lib/python2.7/site-packages/apache_beam/pvalue.py",
>  line 443, in _from_runtime_iterable
>     len(head), str(head[0]), str(head[1])))
> ValueError: PCollection of size 2 with more than one element accessed as a 
> singleton view. First two elements encountered are 
> "gs://temp-dev/temp/bq_load/3edbf2172dd540edb5c8e9597206b10f", 
> "gs://temp-dev/temp/bq_load/3edbf2172dd540edb5c8e9597206b10f". [while running 
> 'WriteToBigQuery/BigQueryBatchFileLoads/ParDo(WriteRecordsToFile)/ParDo(WriteRecordsToFile)']
> {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8965) WriteToBigQuery failed in BundleBasedDirectRunner

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8965?focusedWorklogId=389307&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389307
 ]

ASF GitHub Bot logged work on BEAM-8965:


Author: ASF GitHub Bot
Created on: 19/Feb/20 04:54
Start Date: 19/Feb/20 04:54
Worklog Time Spent: 10m 
  Work Description: bobingm commented on pull request #10901: [BEAM-8965] 
Remove duplicate sideinputs in ConsumerTrackingPipelineVisitor
URL: https://github.com/apache/beam/pull/10901
 
 
   ## Summary
   This PR is mainly preventing `BundleBasedDirectRunner` evaluates single 
side_input more than once.
   
   To achieve this, this PR changes the logic to get side_input views in 
`ConsumerTrackingPipelineVisitor`, and modify the unit tests to make sure that 
when one single side_input is used in two different PTransforms, it will be 
evaluated once.
   
   ## Check List
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badg

[jira] [Work logged] (BEAM-7304) Twister2 Beam runner

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7304?focusedWorklogId=389301&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389301
 ]

ASF GitHub Bot logged work on BEAM-7304:


Author: ASF GitHub Bot
Created on: 19/Feb/20 04:29
Start Date: 19/Feb/20 04:29
Worklog Time Spent: 10m 
  Work Description: pulasthi commented on issue #10888: [BEAM-7304] 
Twister2 Beam runner
URL: https://github.com/apache/beam/pull/10888#issuecomment-588029642
 
 
   @iemejia the failure details are showing errors from a .gcp package which 
are not related to this pull, I did not notice any other failures. Is this due 
to some other commit or am I missing something? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389301)
Time Spent: 3h 20m  (was: 3h 10m)

> Twister2 Beam runner
> 
>
> Key: BEAM-7304
> URL: https://issues.apache.org/jira/browse/BEAM-7304
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-ideas
>Reporter: Pulasthi Wickramasinghe
>Assignee: Pulasthi Wickramasinghe
>Priority: Minor
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Twister2 is a big data framework which supports both batch and stream 
> processing [1] [2]. The goal is to develop an beam runner for Twister2. 
> [1] [https://github.com/DSC-SPIDAL/twister2]
> [2] [https://twister2.org/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8965) WriteToBigQuery failed in BundleBasedDirectRunner

2020-02-18 Thread Wenbing Bai (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenbing Bai updated BEAM-8965:
--
Affects Version/s: 2.17.0
   2.18.0
   2.19.0

> WriteToBigQuery failed in BundleBasedDirectRunner
> -
>
> Key: BEAM-8965
> URL: https://issues.apache.org/jira/browse/BEAM-8965
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Affects Versions: 2.16.0, 2.17.0, 2.18.0, 2.19.0
>Reporter: Wenbing Bai
>Assignee: Wenbing Bai
>Priority: Major
>
> *{{WriteToBigQuery}}* fails in *{{BundleBasedDirectRunner}}* with error 
> {{PCollection of size 2 with more than one element accessed as a singleton 
> view.}}
> Here is the code
>  
> {code:python}
> with Pipeline() as p:
> query_results = (
> p 
> | beam.io.Read(beam.io.BigQuerySource(
> query='SELECT ... FROM ...')
> )
> query_results | beam.io.gcp.WriteToBigQuery(
> table=,
> method=WriteToBigQuery.Method.FILE_LOADS,
> schema={"fields": []}
> )
> {code}
>  
> Here is the error
>  
> {code:none}
>   File "apache_beam/runners/common.py", line 778, in 
> apache_beam.runners.common.DoFnRunner.process
>     def process(self, windowed_value):
>   File "apache_beam/runners/common.py", line 782, in 
> apache_beam.runners.common.DoFnRunner.process
>     self._reraise_augmented(exn)
>   File "apache_beam/runners/common.py", line 849, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented
>     raise_with_traceback(new_exn)
>   File "apache_beam/runners/common.py", line 780, in 
> apache_beam.runners.common.DoFnRunner.process
>     return self.do_fn_invoker.invoke_process(windowed_value)
>   File "apache_beam/runners/common.py", line 587, in 
> apache_beam.runners.common.PerWindowInvoker.invoke_process
>     self._invoke_process_per_window(
>   File "apache_beam/runners/common.py", line 610, in 
> apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
>     [si[global_window] for si in self.side_inputs]))
>   File 
> "/home/wbai/terra/terra_py2/local/lib/python2.7/site-packages/apache_beam/transforms/sideinputs.py",
>  line 65, in __getitem__
>     _FilteringIterable(self._iterable, target_window), self._view_options)
>   File 
> "/home/wbai/terra/terra_py2/local/lib/python2.7/site-packages/apache_beam/pvalue.py",
>  line 443, in _from_runtime_iterable
>     len(head), str(head[0]), str(head[1])))
> ValueError: PCollection of size 2 with more than one element accessed as a 
> singleton view. First two elements encountered are 
> "gs://temp-dev/temp/bq_load/3edbf2172dd540edb5c8e9597206b10f", 
> "gs://temp-dev/temp/bq_load/3edbf2172dd540edb5c8e9597206b10f". [while running 
> 'WriteToBigQuery/BigQueryBatchFileLoads/ParDo(WriteRecordsToFile)/ParDo(WriteRecordsToFile)']
> {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9335) update hard-coded coder id when translating Java external transforms

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9335?focusedWorklogId=389271&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389271
 ]

ASF GitHub Bot logged work on BEAM-9335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 02:08
Start Date: 19/Feb/20 02:08
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #10900: [BEAM-9335] 
update hard-coded coder id when translating Java external transforms
URL: https://github.com/apache/beam/pull/10900
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apac

[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389270&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389270
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 02:08
Start Date: 19/Feb/20 02:08
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10899: [BEAM-8335] 
Background Caching job
URL: https://github.com/apache/beam/pull/10899#issuecomment-587998300
 
 
   Could someone working on the interactive Beam review first? @rohdesamuel ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389270)
Time Spent: 60h  (was: 59h 50m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 60h
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9335) update hard-coded coder id when translating Java external transforms

2020-02-18 Thread Heejong Lee (Jira)
Heejong Lee created BEAM-9335:
-

 Summary: update hard-coded coder id when translating Java external 
transforms
 Key: BEAM-9335
 URL: https://issues.apache.org/jira/browse/BEAM-9335
 Project: Beam
  Issue Type: Bug
  Components: java-fn-execution
Reporter: Heejong Lee
Assignee: Heejong Lee


hard-coded coder id needs to be updated when translating Java external 
transforms. Otherwise pipeline will fail if coder id is reused.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9335) update hard-coded coder id when translating Java external transforms

2020-02-18 Thread Heejong Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heejong Lee updated BEAM-9335:
--
Status: Open  (was: Triage Needed)

> update hard-coded coder id when translating Java external transforms
> 
>
> Key: BEAM-9335
> URL: https://issues.apache.org/jira/browse/BEAM-9335
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>
> hard-coded coder id needs to be updated when translating Java external 
> transforms. Otherwise pipeline will fail if coder id is reused.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5605) Support Portable SplittableDoFn for batch

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5605?focusedWorklogId=389242&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389242
 ]

ASF GitHub Bot logged work on BEAM-5605:


Author: ASF GitHub Bot
Created on: 19/Feb/20 01:02
Start Date: 19/Feb/20 01:02
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10893: [BEAM-5605] Honor 
the bounded source timestamps timestamp in the SDF wrapper.
URL: https://github.com/apache/beam/pull/10893#issuecomment-587981986
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389242)
Time Spent: 16h 10m  (was: 16h)

> Support Portable SplittableDoFn for batch
> -
>
> Key: BEAM-5605
> URL: https://issues.apache.org/jira/browse/BEAM-5605
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>  Time Spent: 16h 10m
>  Remaining Estimate: 0h
>
> Roll-up item tracking work towards supporting portable SplittableDoFn for 
> batch



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-9334) beam_PreCommit_Java_Cron failed on org.apache.beam.runners.samza.runtime.SamzaTimerInternalsFactoryTest.testProcessingTimeTimers

2020-02-18 Thread Yichi Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yichi Zhang closed BEAM-9334.
-
Fix Version/s: Not applicable
   Resolution: Duplicate

> beam_PreCommit_Java_Cron failed on 
> org.apache.beam.runners.samza.runtime.SamzaTimerInternalsFactoryTest.testProcessingTimeTimers
> 
>
> Key: BEAM-9334
> URL: https://issues.apache.org/jira/browse/BEAM-9334
> Project: Beam
>  Issue Type: Task
>  Components: runner-samza
>Reporter: Yichi Zhang
>Priority: Major
> Fix For: Not applicable
>
>
> h3. Error Message
> org.apache.samza.SamzaException: Error opening RocksDB store beamStore at 
> location /tmp/store3
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9334) beam_PreCommit_Java_Cron failed on org.apache.beam.runners.samza.runtime.SamzaTimerInternalsFactoryTest.testProcessingTimeTimers

2020-02-18 Thread Yichi Zhang (Jira)
Yichi Zhang created BEAM-9334:
-

 Summary: beam_PreCommit_Java_Cron failed on 
org.apache.beam.runners.samza.runtime.SamzaTimerInternalsFactoryTest.testProcessingTimeTimers
 Key: BEAM-9334
 URL: https://issues.apache.org/jira/browse/BEAM-9334
 Project: Beam
  Issue Type: Task
  Components: runner-samza
Reporter: Yichi Zhang


h3. Error Message

org.apache.samza.SamzaException: Error opening RocksDB store beamStore at 
location /tmp/store3

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389239&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389239
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 00:56
Start Date: 19/Feb/20 00:56
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #10899: [BEAM-8335] 
Background Caching job
URL: https://github.com/apache/beam/pull/10899#issuecomment-587980476
 
 
   Major changes:
   1.  Self-contained `BackgroundCachingJob` abstraction: 
[background_caching_job](https://github.com/apache/beam/pull/10899/commits/9396da777ae7670f2ed339a9f0dc4917a39629f5#diff-20e39cd33002649cd6f5eea77c73b188)
   2. Control APIs around `BackgroundCachingJob` exposed: 
[interactive_beam](https://github.com/apache/beam/pull/10899/commits/9396da777ae7670f2ed339a9f0dc4917a39629f5#diff-34a47fb3ecb0865f176f1c893b4cee89)
   
   R: @aaltay 
   PTAL, thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389239)
Time Spent: 59h 50m  (was: 59h 40m)

> Add streaming support to Interactive Beam
> -
>
> Key: BEAM-8335
> URL: https://issues.apache.org/jira/browse/BEAM-8335
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 59h 50m
>  Remaining Estimate: 0h
>
> This issue tracks the work items to introduce streaming support to the 
> Interactive Beam experience. This will allow users to:
>  * Write and run a streaming job in IPython
>  * Automatically cache records from unbounded sources
>  * Add a replay experience that replays all cached records to simulate the 
> original pipeline execution
>  * Add controls to play/pause/stop/step individual elements from the cached 
> records
>  * Add ability to inspect/visualize unbounded PCollections



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9286?focusedWorklogId=389237&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389237
 ]

ASF GitHub Bot logged work on BEAM-9286:


Author: ASF GitHub Bot
Created on: 19/Feb/20 00:53
Start Date: 19/Feb/20 00:53
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10823: [BEAM-9286] 
Create validation runner test for metrics (user counter). 
URL: https://github.com/apache/beam/pull/10823#discussion_r381020497
 
 

 ##
 File path: sdks/python/apache_beam/metrics/metric_test.py
 ##
 @@ -127,6 +134,35 @@ def test_distribution_empty_namespace(self):
 with self.assertRaises(ValueError):
   Metrics.distribution("", "names")
 
+  @attr('ValidatesRunner')
+  def test_user_counter_using_pardo(self):
+class SomeDoFn(beam.DoFn):
+  """A custom dummy DoFn using yield."""
+  def __init__(self):
+self.user_counter_elements = metrics.Metrics.counter(
+  self.__class__, 'metrics_user_counter_element')
+
+  def process(self, element):
+self.user_counter_elements.inc()
+yield element
+
+pipeline = TestPipeline()
+nums = pipeline | 'Input' >> beam.Create([1, 2, 3, 4])
+results = nums | 'ApplyPardo' >> beam.ParDo(SomeDoFn())
+
+res = pipeline.run()
+res.wait_until_finish()
+metric_results = (
+  res.metrics().query(MetricsFilter()
+.with_name('metrics_user_counter_element')))
+outputs_counter = metric_results['counters'][0]
+assert_that(results, equal_to([1, 2, 3, 4]))
+
+self.assertEqual(outputs_counter.key.metric.name,
+'metrics_user_counter_element')
+self.assertEqual(outputs_counter.committed, 4)
+self.assertEqual(outputs_counter.attempted, 4)
 
 Review comment:
   >= 4, no guarantee it's exactly 4. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389237)
Time Spent: 50m  (was: 40m)

> Create validation tests for metrics based on MonitoringInfo if applicable
> -
>
> Key: BEAM-9286
> URL: https://issues.apache.org/jira/browse/BEAM-9286
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Create dedicated validation runner tests for metrics (those based Monitoring 
> Info). 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9286) Create validation tests for metrics based on MonitoringInfo if applicable

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9286?focusedWorklogId=389238&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389238
 ]

ASF GitHub Bot logged work on BEAM-9286:


Author: ASF GitHub Bot
Created on: 19/Feb/20 00:53
Start Date: 19/Feb/20 00:53
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10823: [BEAM-9286] 
Create validation runner test for metrics (user counter). 
URL: https://github.com/apache/beam/pull/10823#discussion_r381020881
 
 

 ##
 File path: sdks/python/apache_beam/metrics/metric_test.py
 ##
 @@ -127,6 +134,35 @@ def test_distribution_empty_namespace(self):
 with self.assertRaises(ValueError):
   Metrics.distribution("", "names")
 
+  @attr('ValidatesRunner')
+  def test_user_counter_using_pardo(self):
+class SomeDoFn(beam.DoFn):
+  """A custom dummy DoFn using yield."""
+  def __init__(self):
+self.user_counter_elements = metrics.Metrics.counter(
+  self.__class__, 'metrics_user_counter_element')
+
+  def process(self, element):
+self.user_counter_elements.inc()
+yield element
+
+pipeline = TestPipeline()
+nums = pipeline | 'Input' >> beam.Create([1, 2, 3, 4])
+results = nums | 'ApplyPardo' >> beam.ParDo(SomeDoFn())
+
+res = pipeline.run()
+res.wait_until_finish()
+metric_results = (
+  res.metrics().query(MetricsFilter()
+.with_name('metrics_user_counter_element')))
+outputs_counter = metric_results['counters'][0]
+assert_that(results, equal_to([1, 2, 3, 4]))
 
 Review comment:
   This must be applied before the pipeline is run. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389238)
Time Spent: 1h  (was: 50m)

> Create validation tests for metrics based on MonitoringInfo if applicable
> -
>
> Key: BEAM-9286
> URL: https://issues.apache.org/jira/browse/BEAM-9286
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Ruoyun Huang
>Assignee: Ruoyun Huang
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Create dedicated validation runner tests for metrics (those based Monitoring 
> Info). 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9228) _SDFBoundedSourceWrapper doesn't distribute data to multiple workers

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9228?focusedWorklogId=389236&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389236
 ]

ASF GitHub Bot logged work on BEAM-9228:


Author: ASF GitHub Bot
Created on: 19/Feb/20 00:49
Start Date: 19/Feb/20 00:49
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10847: [BEAM-9228] 
Support further partition for FnApi ListBuffer
URL: https://github.com/apache/beam/pull/10847#discussion_r381019006
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py
 ##
 @@ -932,10 +998,14 @@ def get_buffer(buffer_id):
   return pcoll_buffers[buffer_id]
 
 def get_input_coder_impl(transform_id):
-  return context.coders[
-  safe_coders[beam_fn_api_pb2.RemoteGrpcPort.FromString(
-  process_bundle_descriptor.transforms[transform_id].spec.payload).
-  coder_id]].get_impl()
+  coder_id = beam_fn_api_pb2.RemoteGrpcPort.FromString(
+  process_bundle_descriptor.transforms[transform_id].spec.payload).\
+coder_id
+  assert coder_id is not None and coder_id != ''
 
 Review comment:
   OK, in the SDF case we can get the coder of the preceding transform (the one 
that produces its input `ref_PCollection_PCollection_3_split`) which should 
always be a RemoteGrpcPort. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389236)
Time Spent: 2h 10m  (was: 2h)

> _SDFBoundedSourceWrapper doesn't distribute data to multiple workers
> 
>
> Key: BEAM-9228
> URL: https://issues.apache.org/jira/browse/BEAM-9228
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.16.0, 2.18.0, 2.19.0
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> A user reported following issue.
> -
> I have a set of tfrecord files, obtained by converting parquet files with 
> Spark. Each file is roughly 1GB and I have 11 of those.
> I would expect simple statistics gathering (ie counting number of items of 
> all files) to scale linearly with respect to the number of cores on my system.
> I am able to reproduce the issue with the minimal snippet below
> {code:java}
> import apache_beam as beam
> from apache_beam.options.pipeline_options import PipelineOptions
> from apache_beam.runners.portability import fn_api_runner
> from apache_beam.portability.api import beam_runner_api_pb2
> from apache_beam.portability import python_urns
> import sys
> pipeline_options = PipelineOptions(['--direct_num_workers', '4'])
> file_pattern = 'part-r-00*
> runner=fn_api_runner.FnApiRunner(
>   default_environment=beam_runner_api_pb2.Environment(
>   urn=python_urns.SUBPROCESS_SDK,
>   payload=b'%s -m apache_beam.runners.worker.sdk_worker_main'
> % sys.executable.encode('ascii')))
> p = beam.Pipeline(runner=runner, options=pipeline_options)
> lines = (p | 'read' >> beam.io.tfrecordio.ReadFromTFRecord(file_pattern)
>  | beam.combiners.Count.Globally()
>  | beam.io.WriteToText('/tmp/output'))
> p.run()
> {code}
> Only one combination of apache_beam revision / worker type seems to work (I 
> refer to https://beam.apache.org/documentation/runners/direct/ for the worker 
> types)
> * beam 2.16; neither multithread nor multiprocess achieve high cpu usage on 
> multiple cores
> * beam 2.17: able to achieve high cpu usage on all 4 cores
> * beam 2.18: not tested the mulithreaded mode but the multiprocess mode fails 
> when trying to serialize the Environment instance most likely because of a 
> change from 2.17 to 2.18.
> I also tried briefly SparkRunner with version 2.16 but was no able to achieve 
> any throughput.
> What is the recommnended way to achieve what I am trying to ? How can I 
> troubleshoot ?
> --
> This is caused by [this 
> PR|https://github.com/apache/beam/commit/02f8ad4eee3ec0ea8cbdc0f99c1dad29f00a9f60].
> A [workaround|https://github.com/apache/beam/pull/10729] is tried, which is 
> rolling back iobase.py not to use _SDFBoundedSourceWrapper. This confirmed 
> that d

[jira] [Work logged] (BEAM-9228) _SDFBoundedSourceWrapper doesn't distribute data to multiple workers

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9228?focusedWorklogId=389233&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389233
 ]

ASF GitHub Bot logged work on BEAM-9228:


Author: ASF GitHub Bot
Created on: 19/Feb/20 00:49
Start Date: 19/Feb/20 00:49
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10847: [BEAM-9228] 
Support further partition for FnApi ListBuffer
URL: https://github.com/apache/beam/pull/10847#discussion_r381016780
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py
 ##
 @@ -287,43 +287,48 @@ def partition(self, n):
 
 class _ListBuffer():
   """Used to support parititioning of a list."""
-  def __init__(self, input_coder):
-self._input_coder = input_coder
+  def __init__(self, coder_impl):
+self._coder_impl = coder_impl
 self._inputs = []
 self._grouped_output = None
 self.cleared = False
 
   def append(self, element):
+if self.cleared:
+  raise RuntimeError('Trying to append to a cleared ListBuffer.')
 if self._grouped_output:
   raise RuntimeError('ListBuffer append after read.')
 self._inputs.append(element)
 
   def partition(self, n):
 # type: (int) -> List[List[bytes]]
+if self.cleared:
+  raise RuntimeError('Trying to partition a cleared ListBuffer.')
 if len(self._inputs) >= n or len(self._inputs) == 0:
   return [self._inputs[k::n] for k in range(n)]
 else:
   if not self._grouped_output:
-self._grouped_output = [[] for _ in range(n)]
-coder_impl = self._input_coder.get_impl()
-decoded_input = []
-output_stream_list = []
-for _ in range(n):
-  output_stream_list.append(create_OutputStream())
+output_stream_list = [create_OutputStream() for _ in range(n)]
+self._grouped_output = [output_stream.get() for output_stream
 
 Review comment:
   I meant you could put this instead of the loop at line 324.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389233)
Time Spent: 2h  (was: 1h 50m)

> _SDFBoundedSourceWrapper doesn't distribute data to multiple workers
> 
>
> Key: BEAM-9228
> URL: https://issues.apache.org/jira/browse/BEAM-9228
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.16.0, 2.18.0, 2.19.0
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> A user reported following issue.
> -
> I have a set of tfrecord files, obtained by converting parquet files with 
> Spark. Each file is roughly 1GB and I have 11 of those.
> I would expect simple statistics gathering (ie counting number of items of 
> all files) to scale linearly with respect to the number of cores on my system.
> I am able to reproduce the issue with the minimal snippet below
> {code:java}
> import apache_beam as beam
> from apache_beam.options.pipeline_options import PipelineOptions
> from apache_beam.runners.portability import fn_api_runner
> from apache_beam.portability.api import beam_runner_api_pb2
> from apache_beam.portability import python_urns
> import sys
> pipeline_options = PipelineOptions(['--direct_num_workers', '4'])
> file_pattern = 'part-r-00*
> runner=fn_api_runner.FnApiRunner(
>   default_environment=beam_runner_api_pb2.Environment(
>   urn=python_urns.SUBPROCESS_SDK,
>   payload=b'%s -m apache_beam.runners.worker.sdk_worker_main'
> % sys.executable.encode('ascii')))
> p = beam.Pipeline(runner=runner, options=pipeline_options)
> lines = (p | 'read' >> beam.io.tfrecordio.ReadFromTFRecord(file_pattern)
>  | beam.combiners.Count.Globally()
>  | beam.io.WriteToText('/tmp/output'))
> p.run()
> {code}
> Only one combination of apache_beam revision / worker type seems to work (I 
> refer to https://beam.apache.org/documentation/runners/direct/ for the worker 
> types)
> * beam 2.16; neither multithread nor multiprocess achieve high cpu usage on 
> multiple cores
> * beam 2.17: able to achieve high cpu usage on all 4 cores
> * beam 2.18: not tested the mulithreaded mode but the multiprocess mode fails 
> when trying to serialize the Environment instance most likely because of a 
> change from 2.17 to 2.18.

[jira] [Work logged] (BEAM-9228) _SDFBoundedSourceWrapper doesn't distribute data to multiple workers

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9228?focusedWorklogId=389235&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389235
 ]

ASF GitHub Bot logged work on BEAM-9228:


Author: ASF GitHub Bot
Created on: 19/Feb/20 00:49
Start Date: 19/Feb/20 00:49
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10847: [BEAM-9228] 
Support further partition for FnApi ListBuffer
URL: https://github.com/apache/beam/pull/10847#discussion_r381018471
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py
 ##
 @@ -932,10 +998,14 @@ def get_buffer(buffer_id):
   return pcoll_buffers[buffer_id]
 
 def get_input_coder_impl(transform_id):
-  return context.coders[
-  safe_coders[beam_fn_api_pb2.RemoteGrpcPort.FromString(
-  process_bundle_descriptor.transforms[transform_id].spec.payload).
-  coder_id]].get_impl()
+  coder_id = beam_fn_api_pb2.RemoteGrpcPort.FromString(
+  process_bundle_descriptor.transforms[transform_id].spec.payload).\
 
 Review comment:
   Lint: don't use backslashes for continuation. (These days you can just run 
yapf to format your code, see 
https://cwiki.apache.org/confluence/display/BEAM/Python+Tips.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389235)

> _SDFBoundedSourceWrapper doesn't distribute data to multiple workers
> 
>
> Key: BEAM-9228
> URL: https://issues.apache.org/jira/browse/BEAM-9228
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.16.0, 2.18.0, 2.19.0
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> A user reported following issue.
> -
> I have a set of tfrecord files, obtained by converting parquet files with 
> Spark. Each file is roughly 1GB and I have 11 of those.
> I would expect simple statistics gathering (ie counting number of items of 
> all files) to scale linearly with respect to the number of cores on my system.
> I am able to reproduce the issue with the minimal snippet below
> {code:java}
> import apache_beam as beam
> from apache_beam.options.pipeline_options import PipelineOptions
> from apache_beam.runners.portability import fn_api_runner
> from apache_beam.portability.api import beam_runner_api_pb2
> from apache_beam.portability import python_urns
> import sys
> pipeline_options = PipelineOptions(['--direct_num_workers', '4'])
> file_pattern = 'part-r-00*
> runner=fn_api_runner.FnApiRunner(
>   default_environment=beam_runner_api_pb2.Environment(
>   urn=python_urns.SUBPROCESS_SDK,
>   payload=b'%s -m apache_beam.runners.worker.sdk_worker_main'
> % sys.executable.encode('ascii')))
> p = beam.Pipeline(runner=runner, options=pipeline_options)
> lines = (p | 'read' >> beam.io.tfrecordio.ReadFromTFRecord(file_pattern)
>  | beam.combiners.Count.Globally()
>  | beam.io.WriteToText('/tmp/output'))
> p.run()
> {code}
> Only one combination of apache_beam revision / worker type seems to work (I 
> refer to https://beam.apache.org/documentation/runners/direct/ for the worker 
> types)
> * beam 2.16; neither multithread nor multiprocess achieve high cpu usage on 
> multiple cores
> * beam 2.17: able to achieve high cpu usage on all 4 cores
> * beam 2.18: not tested the mulithreaded mode but the multiprocess mode fails 
> when trying to serialize the Environment instance most likely because of a 
> change from 2.17 to 2.18.
> I also tried briefly SparkRunner with version 2.16 but was no able to achieve 
> any throughput.
> What is the recommnended way to achieve what I am trying to ? How can I 
> troubleshoot ?
> --
> This is caused by [this 
> PR|https://github.com/apache/beam/commit/02f8ad4eee3ec0ea8cbdc0f99c1dad29f00a9f60].
> A [workaround|https://github.com/apache/beam/pull/10729] is tried, which is 
> rolling back iobase.py not to use _SDFBoundedSourceWrapper. This confirmed 
> that data is distributed to multiple workers, however, there are some 
> regressions with SDF wrapper tests.



--
This message wa

[jira] [Work logged] (BEAM-9228) _SDFBoundedSourceWrapper doesn't distribute data to multiple workers

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9228?focusedWorklogId=389234&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389234
 ]

ASF GitHub Bot logged work on BEAM-9228:


Author: ASF GitHub Bot
Created on: 19/Feb/20 00:49
Start Date: 19/Feb/20 00:49
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10847: [BEAM-9228] 
Support further partition for FnApi ListBuffer
URL: https://github.com/apache/beam/pull/10847#discussion_r381017895
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py
 ##
 @@ -769,7 +774,7 @@ def _collect_written_timers_and_add_to_deferred_inputs(
 for windowed_key_timer in timers_by_key_and_window.values():
   windowed_timer_coder_impl.encode_to_stream(
   windowed_key_timer, out, True)
-deferred_inputs[transform_id] = _ListBuffer(input_coder=coder)
+deferred_inputs[transform_id] = 
_ListBuffer(coder_impl=coder.get_impl())
 
 Review comment:
   You can now revert the changes up at the top of the loop. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389234)

> _SDFBoundedSourceWrapper doesn't distribute data to multiple workers
> 
>
> Key: BEAM-9228
> URL: https://issues.apache.org/jira/browse/BEAM-9228
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.16.0, 2.18.0, 2.19.0
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> A user reported following issue.
> -
> I have a set of tfrecord files, obtained by converting parquet files with 
> Spark. Each file is roughly 1GB and I have 11 of those.
> I would expect simple statistics gathering (ie counting number of items of 
> all files) to scale linearly with respect to the number of cores on my system.
> I am able to reproduce the issue with the minimal snippet below
> {code:java}
> import apache_beam as beam
> from apache_beam.options.pipeline_options import PipelineOptions
> from apache_beam.runners.portability import fn_api_runner
> from apache_beam.portability.api import beam_runner_api_pb2
> from apache_beam.portability import python_urns
> import sys
> pipeline_options = PipelineOptions(['--direct_num_workers', '4'])
> file_pattern = 'part-r-00*
> runner=fn_api_runner.FnApiRunner(
>   default_environment=beam_runner_api_pb2.Environment(
>   urn=python_urns.SUBPROCESS_SDK,
>   payload=b'%s -m apache_beam.runners.worker.sdk_worker_main'
> % sys.executable.encode('ascii')))
> p = beam.Pipeline(runner=runner, options=pipeline_options)
> lines = (p | 'read' >> beam.io.tfrecordio.ReadFromTFRecord(file_pattern)
>  | beam.combiners.Count.Globally()
>  | beam.io.WriteToText('/tmp/output'))
> p.run()
> {code}
> Only one combination of apache_beam revision / worker type seems to work (I 
> refer to https://beam.apache.org/documentation/runners/direct/ for the worker 
> types)
> * beam 2.16; neither multithread nor multiprocess achieve high cpu usage on 
> multiple cores
> * beam 2.17: able to achieve high cpu usage on all 4 cores
> * beam 2.18: not tested the mulithreaded mode but the multiprocess mode fails 
> when trying to serialize the Environment instance most likely because of a 
> change from 2.17 to 2.18.
> I also tried briefly SparkRunner with version 2.16 but was no able to achieve 
> any throughput.
> What is the recommnended way to achieve what I am trying to ? How can I 
> troubleshoot ?
> --
> This is caused by [this 
> PR|https://github.com/apache/beam/commit/02f8ad4eee3ec0ea8cbdc0f99c1dad29f00a9f60].
> A [workaround|https://github.com/apache/beam/pull/10729] is tried, which is 
> rolling back iobase.py not to use _SDFBoundedSourceWrapper. This confirmed 
> that data is distributed to multiple workers, however, there are some 
> regressions with SDF wrapper tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389229&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389229
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 19/Feb/20 00:43
Start Date: 19/Feb/20 00:43
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #10899: [BEAM-8335] 
Background Caching job
URL: https://github.com/apache/beam/pull/10899
 
 
   1. Exposed source data capture (implemented by background caching job) 
control APIs in 
   interactive_beam module.
   2. Abstract background caching job into a standalone class where
  pipeline jobs and limit checkers are self-contained.
   3. Added test_stream_service related control and tracking inside
  background_caching_job and interactive_environment.
   4. TODO items: 1) integrate streaming_cache once it implements
  cache_manager; 2) add implementation of `capture_size` control, now
  the capture size limit checker is simply a thread ends whenever the
  capture elapse limit checker terminates the infinitely running background
  caching job; 3) wire test_stream_service's creation when the dependencies 
are ready.
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/i

[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389223&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389223
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 19/Feb/20 00:34
Start Date: 19/Feb/20 00:34
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10886: [BEAM-8019] 
Updates DataflowRunner to support multiple SDK environments.
URL: https://github.com/apache/beam/pull/10886#discussion_r381006993
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
 ##
 @@ -301,6 +351,22 @@ def __init__(self, packages, options, 
environment_version, pipeline_url):
   dataflow.Environment.SdkPipelineOptionsValue.AdditionalProperty(
   key='display_data', value=to_json_value(items)))
 
+  def _get_environments_from_tranforms(self):
 
 Review comment:
   Any reason to not just `return self._pipeline_proto.environments.values()`? 
Do we expect unused environments to be there for any reason? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389223)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389225&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389225
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 19/Feb/20 00:34
Start Date: 19/Feb/20 00:34
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10886: [BEAM-8019] 
Updates DataflowRunner to support multiple SDK environments.
URL: https://github.com/apache/beam/pull/10886#discussion_r381015733
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient_test.py
 ##
 @@ -329,7 +402,7 @@ def test_harness_override_default_in_released_sdks(self):
 env = apiclient.Environment([], #packages
 pipeline_options,
 '2.0.0', #any environment version
-FAKE_PIPELINE_URL)
+FAKE_PIPELINE_URL, None, None)
 
 Review comment:
   Can None be an optional argument rather than passing it everywhere here?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389225)
Time Spent: 3h 50m  (was: 3h 40m)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389222&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389222
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 19/Feb/20 00:34
Start Date: 19/Feb/20 00:34
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10886: [BEAM-8019] 
Updates DataflowRunner to support multiple SDK environments.
URL: https://github.com/apache/beam/pull/10886#discussion_r381005234
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
 ##
 @@ -576,10 +655,25 @@ def create_job(self, job):
 'A template was just created at location %s', template_location)
 return None
 
+  def _apply_sdk_environment_overrides(self, proto_pipeline):
+# Update environments based on user provided overrides
+sdk_overrides = self._sdk_image_overrides
+if sdk_overrides:
+  for environment in proto_pipeline.components.environments.values():
+docker_payload = proto_utils.parse_Bytes(
+environment.payload, beam_runner_api_pb2.DockerPayload)
+for pattern in sdk_overrides:
+  override = sdk_overrides[pattern]
+  if re.match(pattern, docker_payload.container_image):
+new_payload = beam_runner_api_pb2.DockerPayload(
 
 Review comment:
   Would it be safer to copy the proto and update the field, in case other 
fields get added in the future?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389222)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389227&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389227
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 19/Feb/20 00:35
Start Date: 19/Feb/20 00:35
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10886: [BEAM-8019] 
Updates DataflowRunner to support multiple SDK environments.
URL: https://github.com/apache/beam/pull/10886#discussion_r381006145
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
 ##
 @@ -133,12 +136,19 @@ def get_output(self, tag=None):
 
 class Environment(object):
   """Wrapper for a dataflow Environment protobuf."""
 
 Review comment:
   This comment seems out of date...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389227)
Time Spent: 4h  (was: 3h 50m)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389218&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389218
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 19/Feb/20 00:34
Start Date: 19/Feb/20 00:34
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10886: [BEAM-8019] 
Updates DataflowRunner to support multiple SDK environments.
URL: https://github.com/apache/beam/pull/10886#discussion_r381003618
 
 

 ##
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##
 @@ -748,6 +748,17 @@ def _add_argparse_args(cls, parser):
 'worker harness. Default is the container for the version of the '
 'SDK. Note: currently, only approved Google Cloud Dataflow '
 'container images may be used here.'))
+parser.add_argument(
+'--sdk_harness_container_image_overrides',
 
 Review comment:
   When, if ever, do we expect the user to provide a reasonable value for this 
flag? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389218)
Time Spent: 3h 20m  (was: 3h 10m)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389224&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389224
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 19/Feb/20 00:34
Start Date: 19/Feb/20 00:34
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10886: [BEAM-8019] 
Updates DataflowRunner to support multiple SDK environments.
URL: https://github.com/apache/beam/pull/10886#discussion_r381004678
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
 ##
 @@ -576,10 +655,25 @@ def create_job(self, job):
 'A template was just created at location %s', template_location)
 return None
 
+  def _apply_sdk_environment_overrides(self, proto_pipeline):
+# Update environments based on user provided overrides
+sdk_overrides = self._sdk_image_overrides
+if sdk_overrides:
+  for environment in proto_pipeline.components.environments.values():
+docker_payload = proto_utils.parse_Bytes(
+environment.payload, beam_runner_api_pb2.DockerPayload)
+for pattern in sdk_overrides:
 
 Review comment:
   `for pattern, override in sdk_overrides.items()`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389224)
Time Spent: 3h 50m  (was: 3h 40m)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389221&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389221
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 19/Feb/20 00:34
Start Date: 19/Feb/20 00:34
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10886: [BEAM-8019] 
Updates DataflowRunner to support multiple SDK environments.
URL: https://github.com/apache/beam/pull/10886#discussion_r381008105
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
 ##
 @@ -301,6 +351,22 @@ def __init__(self, packages, options, 
environment_version, pipeline_url):
   dataflow.Environment.SdkPipelineOptionsValue.AdditionalProperty(
   key='display_data', value=to_json_value(items)))
 
+  def _get_environments_from_tranforms(self):
+if not self._proto_pipeline:
+  return []
+environment_ids = []
+for transform in self._proto_pipeline.components.transforms.values():
+  if transform.environment_id not in environment_ids:
+environment_ids.append(transform.environment_id)
+environments = []
+for environment_id in environment_ids:
+  if not environment_id:
 
 Review comment:
   Nit: I'd move this up into the previous loop. Or I'd write this as a list 
comprehension, e.g. 
   
   ```
   environment_ids = set(
   transform.environment_id
   for transform in self._proto_pipeline.components.transforms.values()
   if transform.environment_id)
   return [self._proto_pipeline.components.environments[id] for id in 
environment_ids]
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389221)
Time Spent: 3h 40m  (was: 3.5h)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389226&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389226
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 19/Feb/20 00:35
Start Date: 19/Feb/20 00:35
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10886: [BEAM-8019] 
Updates DataflowRunner to support multiple SDK environments.
URL: https://github.com/apache/beam/pull/10886#discussion_r381008681
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
 ##
 @@ -478,6 +544,19 @@ def __init__(self, options):
 get_credentials=(not self.google_cloud_options.no_auth),
 http=http_client,
 response_encoding=get_response_encoding())
+self._sdk_image_overrides = self._get_sdk_image_overrides(options)
+
+  def _get_sdk_image_overrides(self, pipeline_options):
+worker_options = pipeline_options.view_as(WorkerOptions)
+sdk_overrides = worker_options.sdk_harness_container_image_overrides
+overrides_dict = dict()
 
 Review comment:
   Again, one can write
   
   `overrides_dict = dict(override_str.split(',', 1) for override_str in 
sdk_overrides)`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389226)
Time Spent: 4h  (was: 3h 50m)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389220&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389220
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 19/Feb/20 00:34
Start Date: 19/Feb/20 00:34
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10886: [BEAM-8019] 
Updates DataflowRunner to support multiple SDK environments.
URL: https://github.com/apache/beam/pull/10886#discussion_r381003938
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
 ##
 @@ -153,6 +158,9 @@ def __init__(self, packages, options, environment_version, 
pipeline_url):
 # User agent information.
 self.proto.userAgent = dataflow.Environment.UserAgentValue()
 self.local = 'localhost' in self.google_cloud_options.dataflow_endpoint
+self._proto_pipeline = proto_pipeline
+self._sdk_image_overrides = (
 
 Review comment:
   One can also write `_sdk_image_overrides or dict()`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389220)
Time Spent: 3.5h  (was: 3h 20m)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389219&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389219
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 19/Feb/20 00:34
Start Date: 19/Feb/20 00:34
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #10886: [BEAM-8019] 
Updates DataflowRunner to support multiple SDK environments.
URL: https://github.com/apache/beam/pull/10886#discussion_r381004939
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
 ##
 @@ -576,10 +655,25 @@ def create_job(self, job):
 'A template was just created at location %s', template_location)
 return None
 
+  def _apply_sdk_environment_overrides(self, proto_pipeline):
+# Update environments based on user provided overrides
+sdk_overrides = self._sdk_image_overrides
+if sdk_overrides:
+  for environment in proto_pipeline.components.environments.values():
+docker_payload = proto_utils.parse_Bytes(
+environment.payload, beam_runner_api_pb2.DockerPayload)
+for pattern in sdk_overrides:
+  override = sdk_overrides[pattern]
+  if re.match(pattern, docker_payload.container_image):
+new_payload = beam_runner_api_pb2.DockerPayload(
+container_image=override)
 
 Review comment:
   I would have expected to use re.sub(pattern, override, 
docker_payload.container_image).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389219)
Time Spent: 3.5h  (was: 3h 20m)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9299) Upgrade Flink Runner to 1.8.3 and 1.9.2

2020-02-18 Thread sunjincheng (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039584#comment-17039584
 ] 

sunjincheng commented on BEAM-9299:
---

Currently we always test the last version for flink runners,and I think it's by 
design for now, and I agree with you [~iemejia], would be better to avoid  test 
all versions for every PR.

> Upgrade Flink Runner to 1.8.3 and 1.9.2
> ---
>
> Key: BEAM-9299
> URL: https://issues.apache.org/jira/browse/BEAM-9299
> Project: Beam
>  Issue Type: Task
>  Components: runner-flink
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I would like to Upgrade Flink Runner to 18.3 and 1.9.2 due to both the Apache 
> Flink 1.8.3 and Apache Flink 1.9.2 have been released [1]. 
> What do you think?
> [1] https://dist.apache.org/repos/dist/release/flink/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8201) clean up the current container API

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8201?focusedWorklogId=389215&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389215
 ]

ASF GitHub Bot logged work on BEAM-8201:


Author: ASF GitHub Bot
Created on: 19/Feb/20 00:17
Start Date: 19/Feb/20 00:17
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #10843: [BEAM-8201] 
Pass all other endpoints through provisioning service.
URL: https://github.com/apache/beam/pull/10843#discussion_r381011326
 
 

 ##
 File path: sdks/go/container/boot.go
 ##
 @@ -46,29 +46,42 @@ func main() {
if *id == "" {
log.Fatal("No id provided.")
}
+   if *provisionEndpoint == "" {
+   log.Fatal("No provision endpoint provided.")
+   }
+
+   ctx := grpcx.WriteWorkerID(context.Background(), *id)
+
+   info, err := provision.Info(ctx, *provisionEndpoint)
+   if err != nil {
+   log.Fatalf("Failed to obtain provisioning information: %v", err)
+   }
+   log.Printf("Provision info:\n%v", info)
+
+   // TODO(BEAM-8201): Simplify once flags are no longer used.
+   if info.GetLoggingEndpoint().GetUrl() != "" {
+   *loggingEndpoint = info.GetLoggingEndpoint().GetUrl()
 
 Review comment:
   Never mind, I overlooked the empty default.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389215)
Time Spent: 1.5h  (was: 1h 20m)

> clean up the current container API
> --
>
> Key: BEAM-8201
> URL: https://issues.apache.org/jira/browse/BEAM-8201
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> From [~robertwb]
> As part of this project, I propose we look at and clean up the current 
> container API before we "release" it as public and stable. IIRC, we currently 
> provide the worker arguments through a combination of (1) environment 
> variables (2) command line parameters to docker and (3) via the provisioning 
> API. It would be good to have a more principled approach to specifying 
> arguments (either all the same way, or if they vary, good reason for doing so 
> rather than by historical accident).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9326) JsonToRow transform should not use bounded Wildcards for its input

2020-02-18 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039577#comment-17039577
 ] 

Kenneth Knowles commented on BEAM-9326:
---

So basically {{String}} is a special case and this ticket is, to me, about 
optimizing for that special case, if it matters. If it doesn't matter, I don't 
have an opinion. But I assume you hit a problem?

> JsonToRow transform should not use bounded Wildcards for its input
> --
>
> Key: BEAM-9326
> URL: https://issues.apache.org/jira/browse/BEAM-9326
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.20.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The JsonToRow PTransform input is a String (a final class in Java) so no 
> reason
> to define a bounded wildcard as its argument.
> We should use  in Beam's codebase only when required by Java
> Generics constraints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9326) JsonToRow transform should not use bounded Wildcards for its input

2020-02-18 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039576#comment-17039576
 ] 

Kenneth Knowles commented on BEAM-9326:
---

It is important for {{MapElements}}. The data type {{PCollection}} is 
covariant in {{T}} so when {{S extends T}} the type {{PCollection}} should 
be considered a subtype of {{PCollection}}. Java does not have the ability 
to express this, so every use of the type has to use {{PCollection}} for proper type checking, except in a contravariant position (aka as an 
output) in which case it should be {{PCollection}}.

So if you have {{MapElements.via}} of a function {{X -> Y}} you have to have 
this type in order to apply it to a {{PCollection}} where {{A extends X}}. 
Really, many of these things are not usable without the {{? extends T}} pattern.

Does it cause a problem?

> JsonToRow transform should not use bounded Wildcards for its input
> --
>
> Key: BEAM-9326
> URL: https://issues.apache.org/jira/browse/BEAM-9326
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
> Fix For: 2.20.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The JsonToRow PTransform input is a String (a final class in Java) so no 
> reason
> to define a bounded wildcard as its argument.
> We should use  in Beam's codebase only when required by Java
> Generics constraints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9333) DataCatalogPipelineOptions is not registered

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9333?focusedWorklogId=389199&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389199
 ]

ASF GitHub Bot logged work on BEAM-9333:


Author: ASF GitHub Bot
Created on: 19/Feb/20 00:02
Start Date: 19/Feb/20 00:02
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #10896: 
[BEAM-9333] Add DataCatalogPipelineOptionsRegistrar
URL: https://github.com/apache/beam/pull/10896
 
 
   Register DataCatalogPipelineOptions so its options can be set from the 
command line.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_

[jira] [Work logged] (BEAM-9333) DataCatalogPipelineOptions is not registered

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9333?focusedWorklogId=389201&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389201
 ]

ASF GitHub Bot logged work on BEAM-9333:


Author: ASF GitHub Bot
Created on: 19/Feb/20 00:02
Start Date: 19/Feb/20 00:02
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #10896: [BEAM-9333] Add 
DataCatalogPipelineOptionsRegistrar
URL: https://github.com/apache/beam/pull/10896#issuecomment-587965771
 
 
   R: @kennknowles 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389201)
Time Spent: 20m  (was: 10m)

> DataCatalogPipelineOptions is not registered
> 
>
> Key: BEAM-9333
> URL: https://issues.apache.org/jira/browse/BEAM-9333
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> ... so its not possible to set Data Catalog options from the command line.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1080) python sdk apiclient needs proper unit tests

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1080?focusedWorklogId=389200&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389200
 ]

ASF GitHub Bot logged work on BEAM-1080:


Author: ASF GitHub Bot
Created on: 19/Feb/20 00:02
Start Date: 19/Feb/20 00:02
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10890: [BEAM-1080] 
Skip tests that required GCP credentials
URL: https://github.com/apache/beam/pull/10890
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389200)
Time Spent: 2h 20m  (was: 2h 10m)

> python sdk apiclient needs proper unit tests
> 
>
> Key: BEAM-1080
> URL: https://issues.apache.org/jira/browse/BEAM-1080
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Vikas Kedigehalli
>Priority: Major
>  Labels: ccoss2019, newbie, starter
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> There is only one unit test right now that tries to fetch actual gcp 
> credentials instead of mocking. This test fails when the credentials are not 
> available on the machine in which it is running. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8201) clean up the current container API

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8201?focusedWorklogId=389198&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389198
 ]

ASF GitHub Bot logged work on BEAM-8201:


Author: ASF GitHub Bot
Created on: 18/Feb/20 23:59
Start Date: 18/Feb/20 23:59
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #10843: [BEAM-8201] 
Pass all other endpoints through provisioning service.
URL: https://github.com/apache/beam/pull/10843#discussion_r381006295
 
 

 ##
 File path: sdks/go/container/boot.go
 ##
 @@ -46,29 +46,42 @@ func main() {
if *id == "" {
log.Fatal("No id provided.")
}
+   if *provisionEndpoint == "" {
+   log.Fatal("No provision endpoint provided.")
+   }
+
+   ctx := grpcx.WriteWorkerID(context.Background(), *id)
+
+   info, err := provision.Info(ctx, *provisionEndpoint)
+   if err != nil {
+   log.Fatalf("Failed to obtain provisioning information: %v", err)
+   }
+   log.Printf("Provision info:\n%v", info)
+
+   // TODO(BEAM-8201): Simplify once flags are no longer used.
+   if info.GetLoggingEndpoint().GetUrl() != "" {
+   *loggingEndpoint = info.GetLoggingEndpoint().GetUrl()
 
 Review comment:
   Shall we add default values for all the ports so that we avoid passing them 
whole together in future.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389198)
Time Spent: 1h 20m  (was: 1h 10m)

> clean up the current container API
> --
>
> Key: BEAM-8201
> URL: https://issues.apache.org/jira/browse/BEAM-8201
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> From [~robertwb]
> As part of this project, I propose we look at and clean up the current 
> container API before we "release" it as public and stable. IIRC, we currently 
> provide the worker arguments through a combination of (1) environment 
> variables (2) command line parameters to docker and (3) via the provisioning 
> API. It would be good to have a more principled approach to specifying 
> arguments (either all the same way, or if they vary, good reason for doing so 
> rather than by historical accident).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9333) DataCatalogPipelineOptions is not registered

2020-02-18 Thread Brian Hulette (Jira)
Brian Hulette created BEAM-9333:
---

 Summary: DataCatalogPipelineOptions is not registered
 Key: BEAM-9333
 URL: https://issues.apache.org/jira/browse/BEAM-9333
 Project: Beam
  Issue Type: Bug
  Components: dsl-sql
Reporter: Brian Hulette
Assignee: Brian Hulette


... so its not possible to set Data Catalog options from the command line.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-1833) Restructure Python pipeline construction to better follow the Runner API

2020-02-18 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-1833.
-
Fix Version/s: 2.20.0
   Resolution: Fixed

First pass on cleaning this up to match was done. BEAM-9322 describes some 
necessary work to define local "names" when dealing with nested structures 
where multiple tags are defined.

> Restructure Python pipeline construction to better follow the Runner API
> 
>
> Key: BEAM-1833
> URL: https://issues.apache.org/jira/browse/BEAM-1833
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Sam Rohde
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The most important part is removing the runner.apply overrides, but there are 
> also various other improvements (e.g. all inputs and outputs should be named).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1833) Restructure Python pipeline construction to better follow the Runner API

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1833?focusedWorklogId=389190&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389190
 ]

ASF GitHub Bot logged work on BEAM-1833:


Author: ASF GitHub Bot
Created on: 18/Feb/20 23:48
Start Date: 18/Feb/20 23:48
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10860: [BEAM-1833] 
Fixes BEAM-1833
URL: https://github.com/apache/beam/pull/10860
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389190)
Time Spent: 2h 20m  (was: 2h 10m)

> Restructure Python pipeline construction to better follow the Runner API
> 
>
> Key: BEAM-1833
> URL: https://issues.apache.org/jira/browse/BEAM-1833
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The most important part is removing the runner.apply overrides, but there are 
> also various other improvements (e.g. all inputs and outputs should be named).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8201) clean up the current container API

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8201?focusedWorklogId=389189&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389189
 ]

ASF GitHub Bot logged work on BEAM-8201:


Author: ASF GitHub Bot
Created on: 18/Feb/20 23:48
Start Date: 18/Feb/20 23:48
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #10843: [BEAM-8201] Pass 
all other endpoints through provisioning service.
URL: https://github.com/apache/beam/pull/10843#issuecomment-587962132
 
 
   restest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389189)
Time Spent: 1h 10m  (was: 1h)

> clean up the current container API
> --
>
> Key: BEAM-8201
> URL: https://issues.apache.org/jira/browse/BEAM-8201
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> From [~robertwb]
> As part of this project, I propose we look at and clean up the current 
> container API before we "release" it as public and stable. IIRC, we currently 
> provide the worker arguments through a combination of (1) environment 
> variables (2) command line parameters to docker and (3) via the provisioning 
> API. It would be good to have a more principled approach to specifying 
> arguments (either all the same way, or if they vary, good reason for doing so 
> rather than by historical accident).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=389169&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389169
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 18/Feb/20 23:20
Start Date: 18/Feb/20 23:20
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #10712: 
[BEAM-7246] Added Google Spanner Write Transform
URL: https://github.com/apache/beam/pull/10712
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389169)
Time Spent: 22.5h  (was: 22h 20m)

> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 22.5h
>  Remaining Estimate: 0h
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389168&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389168
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 18/Feb/20 23:17
Start Date: 18/Feb/20 23:17
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10886: [BEAM-8019] 
Updates DataflowRunner to support multiple SDK environments.
URL: https://github.com/apache/beam/pull/10886#discussion_r380992944
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
 ##
 @@ -153,6 +158,9 @@ def __init__(self, packages, options, environment_version, 
pipeline_url):
 # User agent information.
 self.proto.userAgent = dataflow.Environment.UserAgentValue()
 self.local = 'localhost' in self.google_cloud_options.dataflow_endpoint
+self._proto_pipeline = proto_pipeline
+self._sdk_image_overrides = (
 
 Review comment:
   OK. We can leave it as is.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389168)
Time Spent: 3h 10m  (was: 3h)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1080) python sdk apiclient needs proper unit tests

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1080?focusedWorklogId=389165&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389165
 ]

ASF GitHub Bot logged work on BEAM-1080:


Author: ASF GitHub Bot
Created on: 18/Feb/20 23:11
Start Date: 18/Feb/20 23:11
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10890: [BEAM-1080] Skip 
tests that required GCP credentials
URL: https://github.com/apache/beam/pull/10890#issuecomment-587951040
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389165)
Time Spent: 2h 10m  (was: 2h)

> python sdk apiclient needs proper unit tests
> 
>
> Key: BEAM-1080
> URL: https://issues.apache.org/jira/browse/BEAM-1080
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Vikas Kedigehalli
>Priority: Major
>  Labels: ccoss2019, newbie, starter
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> There is only one unit test right now that tries to fetch actual gcp 
> credentials instead of mocking. This test fails when the credentials are not 
> available on the machine in which it is running. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8280) re-enable IOTypeHints.from_callable

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8280?focusedWorklogId=389164&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389164
 ]

ASF GitHub Bot logged work on BEAM-8280:


Author: ASF GitHub Bot
Created on: 18/Feb/20 23:11
Start Date: 18/Feb/20 23:11
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #10894: [BEAM-8280] 
IOTypeHints debug_str traceback
URL: https://github.com/apache/beam/pull/10894
 
 
   - Improve format
   - Add multiple bases (in `.with_defaults()`)
   - Include debug_str output in ptransform.py TypeCheckErrors.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.ap

[jira] [Comment Edited] (BEAM-9198) BeamSQL aggregation analytics functions

2020-02-18 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039304#comment-17039304
 ] 

Rui Wang edited comment on BEAM-9198 at 2/18/20 11:03 PM:
--

There are many analytic functions, I will suggest to start from one single 
function, e.g. rank(), but aim at supporting the full functional syntax first:


{code:sql}
analytic_function_name ( [ argument_list ] )
  OVER (
[ PARTITION BY partition_expression_list ]
[ ORDER BY expression [{ ASC | DESC }] [, ...] ]
[ window_frame_clause ]
  )
{code}

By doing so you will be able to touch many concepts in distributed computing. 
E.g. per key computing, sorting, etc. and SQL, e.g. parser planner, etc.



was (Author: amaliujia):
There are many analytic functions, I will suggest to start from one single 
function, e.g. rank(), but aim at supporting the full functional syntax first:


{code:sql}
analytic_function_name ( [ argument_list ] )
  OVER (
[ PARTITION BY partition_expression_list ]
[ ORDER BY expression [{ ASC | DESC }] [, ...] ]
[ window_frame_clause ]
  )
{code}

By doing so you will be able to touch many concepts in distributed computing. 
E.g. per key computing, sorting, etc.


> BeamSQL aggregation analytics functions 
> 
>
> Key: BEAM-9198
> URL: https://issues.apache.org/jira/browse/BEAM-9198
> Project: Beam
>  Issue Type: Task
>  Components: dsl-sql
>Reporter: Rui Wang
>Priority: Major
>  Labels: gsoc, gsoc2020, mentor
>
> BeamSQL has a long list of of aggregation/aggregation analytics 
> functionalities to support. 
> To begin with, you will need to support this syntax:
> {code:sql}
> analytic_function_name ( [ argument_list ] )
>   OVER (
> [ PARTITION BY partition_expression_list ]
> [ ORDER BY expression [{ ASC | DESC }] [, ...] ]
> [ window_frame_clause ]
>   )
> {code}
> This will requires touch core components of BeamSQL:
> 1. SQL parser to support the syntax above.
> 2. SQL core to implement physical relational operator.
> 3. Distributed algorithms to implement a list of functions in a distributed 
> manner. 
> 4. Build benchmarks to measure performance of your implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1833) Restructure Python pipeline construction to better follow the Runner API

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1833?focusedWorklogId=389158&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389158
 ]

ASF GitHub Bot logged work on BEAM-1833:


Author: ASF GitHub Bot
Created on: 18/Feb/20 22:55
Start Date: 18/Feb/20 22:55
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10860: [BEAM-1833] Fixes 
BEAM-1833
URL: https://github.com/apache/beam/pull/10860#issuecomment-587945517
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389158)
Time Spent: 2h 10m  (was: 2h)

> Restructure Python pipeline construction to better follow the Runner API
> 
>
> Key: BEAM-1833
> URL: https://issues.apache.org/jira/browse/BEAM-1833
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The most important part is removing the runner.apply overrides, but there are 
> also various other improvements (e.g. all inputs and outputs should be named).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=389151&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389151
 ]

ASF GitHub Bot logged work on BEAM-8019:


Author: ASF GitHub Bot
Created on: 18/Feb/20 22:44
Start Date: 18/Feb/20 22:44
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #10886: 
[BEAM-8019] Updates DataflowRunner to support multiple SDK environments.
URL: https://github.com/apache/beam/pull/10886#discussion_r380980064
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/internal/apiclient.py
 ##
 @@ -153,6 +158,9 @@ def __init__(self, packages, options, environment_version, 
pipeline_url):
 # User agent information.
 self.proto.userAgent = dataflow.Environment.UserAgentValue()
 self.local = 'localhost' in self.google_cloud_options.dataflow_endpoint
+self._proto_pipeline = proto_pipeline
+self._sdk_image_overrides = (
 
 Review comment:
   Seems like that results in a lint error (Dangerous default value dict() 
(builtins.dict) as argument).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389151)
Time Spent: 3h  (was: 2h 50m)

> Support cross-language transforms for DataflowRunner
> 
>
> Key: BEAM-8019
> URL: https://issues.apache.org/jira/browse/BEAM-8019
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> This is to capture the Beam changes needed for this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1080) python sdk apiclient needs proper unit tests

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1080?focusedWorklogId=389145&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389145
 ]

ASF GitHub Bot logged work on BEAM-1080:


Author: ASF GitHub Bot
Created on: 18/Feb/20 22:34
Start Date: 18/Feb/20 22:34
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10890: [BEAM-1080] Skip 
tests that required GCP credentials
URL: https://github.com/apache/beam/pull/10890#issuecomment-587938058
 
 
   > Thanks, should we reopen BEAM-1080?
   
   Re-opened. Not sure why it was resolved.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389145)
Time Spent: 2h  (was: 1h 50m)

> python sdk apiclient needs proper unit tests
> 
>
> Key: BEAM-1080
> URL: https://issues.apache.org/jira/browse/BEAM-1080
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Vikas Kedigehalli
>Priority: Major
>  Labels: ccoss2019, newbie, starter
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> There is only one unit test right now that tries to fetch actual gcp 
> credentials instead of mocking. This test fails when the credentials are not 
> available on the machine in which it is running. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1833) Restructure Python pipeline construction to better follow the Runner API

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1833?focusedWorklogId=389144&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389144
 ]

ASF GitHub Bot logged work on BEAM-1833:


Author: ASF GitHub Bot
Created on: 18/Feb/20 22:33
Start Date: 18/Feb/20 22:33
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10860: [BEAM-1833] Fixes 
BEAM-1833
URL: https://github.com/apache/beam/pull/10860#issuecomment-587938021
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389144)
Time Spent: 2h  (was: 1h 50m)

> Restructure Python pipeline construction to better follow the Runner API
> 
>
> Key: BEAM-1833
> URL: https://issues.apache.org/jira/browse/BEAM-1833
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The most important part is removing the runner.apply overrides, but there are 
> also various other improvements (e.g. all inputs and outputs should be named).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (BEAM-1080) python sdk apiclient needs proper unit tests

2020-02-18 Thread Ahmet Altay (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay reopened BEAM-1080:
---
  Assignee: (was: Pablo Estrada)

> python sdk apiclient needs proper unit tests
> 
>
> Key: BEAM-1080
> URL: https://issues.apache.org/jira/browse/BEAM-1080
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Vikas Kedigehalli
>Priority: Major
>  Labels: ccoss2019, newbie, starter
> Fix For: 2.17.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> There is only one unit test right now that tries to fetch actual gcp 
> credentials instead of mocking. This test fails when the credentials are not 
> available on the machine in which it is running. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-1080) python sdk apiclient needs proper unit tests

2020-02-18 Thread Ahmet Altay (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Altay updated BEAM-1080:
--
Fix Version/s: (was: 2.17.0)

> python sdk apiclient needs proper unit tests
> 
>
> Key: BEAM-1080
> URL: https://issues.apache.org/jira/browse/BEAM-1080
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Vikas Kedigehalli
>Priority: Major
>  Labels: ccoss2019, newbie, starter
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> There is only one unit test right now that tries to fetch actual gcp 
> credentials instead of mocking. This test fails when the credentials are not 
> available on the machine in which it is running. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8629) WithTypeHints._get_or_create_type_hints may return a mutable copy of the class type hints.

2020-02-18 Thread Udi Meiri (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri resolved BEAM-8629.
-
Fix Version/s: 2.20.0
   Resolution: Fixed

Above PR has been merged

> WithTypeHints._get_or_create_type_hints may return a mutable copy of the 
> class type hints.
> --
>
> Key: BEAM-8629
> URL: https://issues.apache.org/jira/browse/BEAM-8629
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9287) Python Validates runner tests for Unified Worker

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9287?focusedWorklogId=389140&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389140
 ]

ASF GitHub Bot logged work on BEAM-9287:


Author: ASF GitHub Bot
Created on: 18/Feb/20 22:27
Start Date: 18/Feb/20 22:27
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #10863: [BEAM-9287] Add 
Python streaming Validates runner tests for Unified Worker
URL: https://github.com/apache/beam/pull/10863#issuecomment-587935706
 
 
   Run Python Dataflow ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389140)
Time Spent: 50m  (was: 40m)

> Python Validates runner tests for Unified Worker
> 
>
> Key: BEAM-9287
> URL: https://issues.apache.org/jira/browse/BEAM-9287
> Project: Beam
>  Issue Type: Test
>  Components: runner-dataflow, testing
>Reporter: Ankur Goenka
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5605) Support Portable SplittableDoFn for batch

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5605?focusedWorklogId=389138&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389138
 ]

ASF GitHub Bot logged work on BEAM-5605:


Author: ASF GitHub Bot
Created on: 18/Feb/20 22:24
Start Date: 18/Feb/20 22:24
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10893: [BEAM-5605] Honor 
the bounded source timestamps timestamp in the SDF wrapper.
URL: https://github.com/apache/beam/pull/10893#issuecomment-587934667
 
 
   R: @boyuanzz 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389138)
Time Spent: 16h  (was: 15h 50m)

> Support Portable SplittableDoFn for batch
> -
>
> Key: BEAM-5605
> URL: https://issues.apache.org/jira/browse/BEAM-5605
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Scott Wegner
>Assignee: Luke Cwik
>Priority: Major
>  Labels: portability
>  Time Spent: 16h
>  Remaining Estimate: 0h
>
> Roll-up item tracking work towards supporting portable SplittableDoFn for 
> batch



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5605) Support Portable SplittableDoFn for batch

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5605?focusedWorklogId=389137&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389137
 ]

ASF GitHub Bot logged work on BEAM-5605:


Author: ASF GitHub Bot
Created on: 18/Feb/20 22:24
Start Date: 18/Feb/20 22:24
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #10893: [BEAM-5605] 
Honor the bounded source timestamps timestamp in the SDF wrapper.
URL: https://github.com/apache/beam/pull/10893
 
 
   
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds

[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=389136&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389136
 ]

ASF GitHub Bot logged work on BEAM-6522:


Author: ASF GitHub Bot
Created on: 18/Feb/20 22:21
Start Date: 18/Feb/20 22:21
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #10876: 
[BEAM-6522] Exclude the test that fails on earlier versions of Avro dependency 
that are still within allowed version range.
URL: https://github.com/apache/beam/pull/10876
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389136)
Time Spent: 11.5h  (was: 11h 20m)

> Dill fails to pickle  avro.RecordSchema classes on Python 3.
> 
>
> Key: BEAM-6522
> URL: https://issues.apache.org/jira/browse/BEAM-6522
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> The avroio module still has 4 failing tests. This is actually 2 times the 
> same 2 tests, both for Avro and Fastavro.
> *apache_beam.io.avroio_test.TestAvro.test_sink_transform*
>  *apache_beam.io.avroio_test.TestFastAvro.test_sink_transform*
> fail with:
> {code:java}
> Traceback (most recent call last):
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", 
> line 432, in test_sink_transform
> | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line 
> 528, in expand
> return pcoll | beam.io.iobase.Write(self._sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 960, in expand
> return pcoll | WriteImpl(self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 979, in expand
> lambda _, sink: sink.initialize_write(), self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1103, in Map
> pardo = FlatMap(wrapper, *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1054, in FlatMap
> pardo = ParDo(CallableWrapperDoFn(fn), *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 864, in __init__
> super(ParDo, self).__init__(fn, *args, **kwargs)
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 646, in __init__
> self.args = pickler.loads(pickler.dumps(self.args))
> File 
> "/home/robbe/workspace/beam/sd

[jira] [Work logged] (BEAM-9321) BigQuery avro write logical type support

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9321?focusedWorklogId=389132&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389132
 ]

ASF GitHub Bot logged work on BEAM-9321:


Author: ASF GitHub Bot
Created on: 18/Feb/20 22:13
Start Date: 18/Feb/20 22:13
Worklog Time Spent: 10m 
  Work Description: regadas commented on issue #10869: [BEAM-9321] Add 
BigQuery Avro logical type support on write
URL: https://github.com/apache/beam/pull/10869#issuecomment-587929726
 
 
   R: @lukecwik 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389132)
Time Spent: 0.5h  (was: 20m)

> BigQuery avro write logical type support
> 
>
> Key: BEAM-9321
> URL: https://issues.apache.org/jira/browse/BEAM-9321
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.19.0
>Reporter: Filipe Regadas
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> With 2.18.0 we are able to write GenericRecords to BigQuery. However, writing 
> does not respect Avro <-> BigQuery data type conversion 
> ([docs|https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro#logical_types])
>  we need to set the useAvroLogicalTypes option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=389128&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389128
 ]

ASF GitHub Bot logged work on BEAM-8335:


Author: ASF GitHub Bot
Created on: 18/Feb/20 22:04
Start Date: 18/Feb/20 22:04
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #10892: 
[BEAM-8335] Make TestStream to/from runner_api include the output_tags property.
URL: https://github.com/apache/beam/pull/10892
 
 
   The TestStream has the "output_tags" property which keeps track of which 
events go to which PCollection. A TestStream going through a round-trip to/from 
proto won't have these fields set. To do this, a modification to the 
from_runner_api_parameter is needed to include the parent PTransform proto to 
retrieve the information from the outputs.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBui

[jira] [Work logged] (BEAM-1080) python sdk apiclient needs proper unit tests

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1080?focusedWorklogId=389127&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389127
 ]

ASF GitHub Bot logged work on BEAM-1080:


Author: ASF GitHub Bot
Created on: 18/Feb/20 22:01
Start Date: 18/Feb/20 22:01
Worklog Time Spent: 10m 
  Work Description: y1chi commented on issue #10890: [BEAM-1080] Skip tests 
that required GCP credentials
URL: https://github.com/apache/beam/pull/10890#issuecomment-587924966
 
 
   Thanks, should we reopen BEAM-1080?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389127)
Time Spent: 1h 50m  (was: 1h 40m)

> python sdk apiclient needs proper unit tests
> 
>
> Key: BEAM-1080
> URL: https://issues.apache.org/jira/browse/BEAM-1080
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Vikas Kedigehalli
>Assignee: Pablo Estrada
>Priority: Major
>  Labels: ccoss2019, newbie, starter
> Fix For: 2.17.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> There is only one unit test right now that tries to fetch actual gcp 
> credentials instead of mocking. This test fails when the credentials are not 
> available on the machine in which it is running. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1080) python sdk apiclient needs proper unit tests

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1080?focusedWorklogId=389126&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389126
 ]

ASF GitHub Bot logged work on BEAM-1080:


Author: ASF GitHub Bot
Created on: 18/Feb/20 22:00
Start Date: 18/Feb/20 22:00
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10890: [BEAM-1080] Skip 
tests that required GCP credentials
URL: https://github.com/apache/beam/pull/10890#issuecomment-587924568
 
 
   R: @y1chi 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389126)
Time Spent: 1h 40m  (was: 1.5h)

> python sdk apiclient needs proper unit tests
> 
>
> Key: BEAM-1080
> URL: https://issues.apache.org/jira/browse/BEAM-1080
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Vikas Kedigehalli
>Assignee: Pablo Estrada
>Priority: Major
>  Labels: ccoss2019, newbie, starter
> Fix For: 2.17.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> There is only one unit test right now that tries to fetch actual gcp 
> credentials instead of mocking. This test fails when the credentials are not 
> available on the machine in which it is running. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1080) python sdk apiclient needs proper unit tests

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1080?focusedWorklogId=389125&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389125
 ]

ASF GitHub Bot logged work on BEAM-1080:


Author: ASF GitHub Bot
Created on: 18/Feb/20 22:00
Start Date: 18/Feb/20 22:00
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #10890: [BEAM-1080] 
Skip tests that required GCP credentials
URL: https://github.com/apache/beam/pull/10890
 
 
   Follow up to https://github.com/apache/beam/pull/10829
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Bui

[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=389124&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389124
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 18/Feb/20 21:57
Start Date: 18/Feb/20 21:57
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10712: [BEAM-7246] 
Added Google Spanner Write Transform
URL: https://github.com/apache/beam/pull/10712#issuecomment-587923073
 
 
   LGTM. Thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389124)
Time Spent: 22h 20m  (was: 22h 10m)

> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 22h 20m
>  Remaining Estimate: 0h
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=389122&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389122
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 18/Feb/20 21:53
Start Date: 18/Feb/20 21:53
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10712: [BEAM-7246] Added 
Google Spanner Write Transform
URL: https://github.com/apache/beam/pull/10712#issuecomment-587916297
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389122)
Time Spent: 22h 10m  (was: 22h)

> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 22h 10m
>  Remaining Estimate: 0h
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7246) Create a Spanner IO for Python

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7246?focusedWorklogId=389121&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389121
 ]

ASF GitHub Bot logged work on BEAM-7246:


Author: ASF GitHub Bot
Created on: 18/Feb/20 21:52
Start Date: 18/Feb/20 21:52
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10712: [BEAM-7246] Added 
Google Spanner Write Transform
URL: https://github.com/apache/beam/pull/10712#issuecomment-587914900
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389121)
Time Spent: 22h  (was: 21h 50m)

> Create a Spanner IO for Python
> --
>
> Key: BEAM-7246
> URL: https://issues.apache.org/jira/browse/BEAM-7246
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 22h
>  Remaining Estimate: 0h
>
> Add I/O support for Google Cloud Spanner for the Python SDK (Batch Only).
> Testing in this work item will be in the form of DirectRunner tests and 
> manual testing.
> Integration and performance tests are a separate work item (not included 
> here).
> See https://beam.apache.org/documentation/io/built-in/. The goal is to add 
> Google Clound Spanner to the Database column for the Python/Batch row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=389119&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389119
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 18/Feb/20 21:47
Start Date: 18/Feb/20 21:47
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #10835: [BEAM-8575] Removed 
MAX_TIMESTAMP from testing data
URL: https://github.com/apache/beam/pull/10835#issuecomment-587907253
 
 
   Will wait for the tests to pass and merge
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389119)
Time Spent: 54.5h  (was: 54h 20m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 54.5h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=389118&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389118
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 18/Feb/20 21:46
Start Date: 18/Feb/20 21:46
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10835: [BEAM-8575] 
Removed MAX_TIMESTAMP from testing data
URL: https://github.com/apache/beam/pull/10835#issuecomment-587906045
 
 
   Run PythonLint PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389118)
Time Spent: 54h 20m  (was: 54h 10m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 54h 20m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=389116&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389116
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 18/Feb/20 21:45
Start Date: 18/Feb/20 21:45
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10835: [BEAM-8575] 
Removed MAX_TIMESTAMP from testing data
URL: https://github.com/apache/beam/pull/10835#issuecomment-587904955
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389116)
Time Spent: 54h  (was: 53h 50m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 54h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=389117&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389117
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 18/Feb/20 21:45
Start Date: 18/Feb/20 21:45
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10835: [BEAM-8575] 
Removed MAX_TIMESTAMP from testing data
URL: https://github.com/apache/beam/pull/10835#issuecomment-587905264
 
 
   Run PythonLint PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389117)
Time Spent: 54h 10m  (was: 54h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 54h 10m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389114&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389114
 ]

ASF GitHub Bot logged work on BEAM-8564:


Author: ASF GitHub Bot
Created on: 18/Feb/20 21:37
Start Date: 18/Feb/20 21:37
Worklog Time Spent: 10m 
  Work Description: amoght commented on issue #10254: [BEAM-8564] Add LZO 
compression and decompression support
URL: https://github.com/apache/beam/pull/10254#issuecomment-587893601
 
 
   @lukecwik I've incorporated mostly all the suggested changes in the PR. 
Please let me know your thoughts on this.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389114)
Time Spent: 10.5h  (was: 10h 20m)

> Add LZO compression and decompression support
> -
>
> Key: BEAM-8564
> URL: https://issues.apache.org/jira/browse/BEAM-8564
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Amogh Tiwari
>Assignee: Amogh Tiwari
>Priority: Minor
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> LZO is a lossless data compression algorithm which is focused on compression 
> and decompression speeds.
> This will enable Apache Beam sdk to compress/decompress files using LZO 
> compression algorithm. 
> This will include the following functionalities:
>  # compress() : for compressing files into an LZO archive
>  # decompress() : for decompressing files archived using LZO compression
> Appropriate Input and Output stream will also be added to enable working with 
> LZO files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389111&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389111
 ]

ASF GitHub Bot logged work on BEAM-8564:


Author: ASF GitHub Bot
Created on: 18/Feb/20 21:33
Start Date: 18/Feb/20 21:33
Worklog Time Spent: 10m 
  Work Description: amoght commented on pull request #10254: [BEAM-8564] 
Add LZO compression and decompression support
URL: https://github.com/apache/beam/pull/10254#discussion_r380947952
 
 

 ##
 File path: 
sdks/java/core/src/test/java/org/apache/beam/sdk/io/CompressedSourceTest.java
 ##
 @@ -738,7 +1069,133 @@ public void testGzipProgress() throws IOException {
   assertThat(readerOrig, instanceOf(CompressedReader.class));
   CompressedReader reader = (CompressedReader) readerOrig;
   // before starting
-  assertEquals(0.0, reader.getFractionConsumed(), 1e-6);
+  assertEquals(0.0, reader.getFractionConsumed(), DELTA);
+  assertEquals(0, reader.getSplitPointsConsumed());
+  assertEquals(1, reader.getSplitPointsRemaining());
+
+  // confirm has three records
+  for (int i = 0; i < numRecords; ++i) {
+if (i == 0) {
+  assertTrue(reader.start());
+} else {
+  assertTrue(reader.advance());
+}
+assertEquals(0, reader.getSplitPointsConsumed());
+assertEquals(1, reader.getSplitPointsRemaining());
+  }
+  assertFalse(reader.advance());
+
+  // after reading empty source
 
 Review comment:
   For now we have added a comment warning users that a concatenated lzo file 
doesn't gets decompressed correctly. Its added above 
testFalseReadConcatenatedLzop and testFalseReadMultiStreamLzop methods. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389111)
Time Spent: 10h 20m  (was: 10h 10m)

> Add LZO compression and decompression support
> -
>
> Key: BEAM-8564
> URL: https://issues.apache.org/jira/browse/BEAM-8564
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Amogh Tiwari
>Assignee: Amogh Tiwari
>Priority: Minor
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> LZO is a lossless data compression algorithm which is focused on compression 
> and decompression speeds.
> This will enable Apache Beam sdk to compress/decompress files using LZO 
> compression algorithm. 
> This will include the following functionalities:
>  # compress() : for compressing files into an LZO archive
>  # decompress() : for decompressing files archived using LZO compression
> Appropriate Input and Output stream will also be added to enable working with 
> LZO files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=389107&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389107
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 18/Feb/20 21:25
Start Date: 18/Feb/20 21:25
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10835: [BEAM-8575] 
Removed MAX_TIMESTAMP from testing data
URL: https://github.com/apache/beam/pull/10835#issuecomment-587875180
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389107)
Time Spent: 53h 50m  (was: 53h 40m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 53h 50m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=389106&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389106
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 18/Feb/20 21:25
Start Date: 18/Feb/20 21:25
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10835: [BEAM-8575] 
Removed MAX_TIMESTAMP from testing data
URL: https://github.com/apache/beam/pull/10835#issuecomment-587875076
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389106)
Time Spent: 53h 40m  (was: 53.5h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 53h 40m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=389105&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389105
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 18/Feb/20 21:24
Start Date: 18/Feb/20 21:24
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10835: [BEAM-8575] 
Removed MAX_TIMESTAMP from testing data
URL: https://github.com/apache/beam/pull/10835#issuecomment-587874929
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389105)
Time Spent: 53.5h  (was: 53h 20m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 53.5h
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1833) Restructure Python pipeline construction to better follow the Runner API

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1833?focusedWorklogId=389104&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389104
 ]

ASF GitHub Bot logged work on BEAM-1833:


Author: ASF GitHub Bot
Created on: 18/Feb/20 21:24
Start Date: 18/Feb/20 21:24
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #10860: 
[BEAM-1833] Fixes BEAM-1833
URL: https://github.com/apache/beam/pull/10860#discussion_r380943475
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -314,7 +314,15 @@ def _replace_if_needed(self, original_transform_node):
 new_output.element_type = None
 
 Review comment:
   Added the assert
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389104)
Time Spent: 1h 50m  (was: 1h 40m)

> Restructure Python pipeline construction to better follow the Runner API
> 
>
> Key: BEAM-1833
> URL: https://issues.apache.org/jira/browse/BEAM-1833
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The most important part is removing the runner.apply overrides, but there are 
> also various other improvements (e.g. all inputs and outputs should be named).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8916) external_test_it.py is not collected by pytest

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8916?focusedWorklogId=389103&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389103
 ]

ASF GitHub Bot logged work on BEAM-8916:


Author: ASF GitHub Bot
Created on: 18/Feb/20 21:23
Start Date: 18/Feb/20 21:23
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10749: [BEAM-8916] Rename 
external_test_it so that it is picked up by pytest
URL: https://github.com/apache/beam/pull/10749#issuecomment-587872991
 
 
   Run Python2_PVR_Flink PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389103)
Time Spent: 1h 20m  (was: 1h 10m)

> external_test_it.py is not collected by pytest
> --
>
> Key: BEAM-8916
> URL: https://issues.apache.org/jira/browse/BEAM-8916
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, testing
>Reporter: Udi Meiri
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Critical
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> pytest only collects tests matching these patterns:
> https://github.com/apache/beam/blob/8066d78f0fd2237b718859d4a776511203880df0/sdks/python/pytest.ini#L27
> Please rename the file. (ex: external_integration_test.py)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=389102&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389102
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 18/Feb/20 21:23
Start Date: 18/Feb/20 21:23
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on issue #10835: [BEAM-8575] 
Removed MAX_TIMESTAMP from testing data
URL: https://github.com/apache/beam/pull/10835#issuecomment-587872503
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389102)
Time Spent: 53h 20m  (was: 53h 10m)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 53h 20m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8575) Add more Python validates runner tests

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8575?focusedWorklogId=389101&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389101
 ]

ASF GitHub Bot logged work on BEAM-8575:


Author: ASF GitHub Bot
Created on: 18/Feb/20 21:19
Start Date: 18/Feb/20 21:19
Worklog Time Spent: 10m 
  Work Description: bumblebee-coming commented on issue #10835: [BEAM-8575] 
Removed MAX_TIMESTAMP from testing data
URL: https://github.com/apache/beam/pull/10835#issuecomment-587866982
 
 
   Retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389101)
Time Spent: 53h 10m  (was: 53h)

> Add more Python validates runner tests
> --
>
> Key: BEAM-8575
> URL: https://issues.apache.org/jira/browse/BEAM-8575
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: wendy liu
>Assignee: wendy liu
>Priority: Major
>  Time Spent: 53h 10m
>  Remaining Estimate: 0h
>
> This is the umbrella issue to track the work of adding more Python tests to 
> improve test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-9146) [Python] PTransform that integrates Video Intelligence functionality

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9146?focusedWorklogId=389100&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389100
 ]

ASF GitHub Bot logged work on BEAM-9146:


Author: ASF GitHub Bot
Created on: 18/Feb/20 21:17
Start Date: 18/Feb/20 21:17
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #10764: [BEAM-9146] Integrate 
GCP Video Intelligence functionality for Python SDK
URL: https://github.com/apache/beam/pull/10764#issuecomment-587865079
 
 
   :sdks:python:test-suites:tox:pycommon:docs  task is failing -- There might 
be pydocs issues in the change.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389100)
Time Spent: 12h 20m  (was: 12h 10m)

> [Python] PTransform that integrates Video Intelligence functionality
> 
>
> Key: BEAM-9146
> URL: https://issues.apache.org/jira/browse/BEAM-9146
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-py-gcp
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 12h 20m
>  Remaining Estimate: 0h
>
> The goal is to create a PTransform that integrates Google Cloud Video 
> Intelligence functionality [1].
> The transform should be able to take both video GCS location or video data 
> bytes as an input.
> [1] https://cloud.google.com/video-intelligence/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-6522) Dill fails to pickle avro.RecordSchema classes on Python 3.

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6522?focusedWorklogId=389099&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389099
 ]

ASF GitHub Bot logged work on BEAM-6522:


Author: ASF GitHub Bot
Created on: 18/Feb/20 21:15
Start Date: 18/Feb/20 21:15
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #10876: [BEAM-6522] 
Exclude the test that fails on earlier versions of Avro dependency that are 
still within allowed version range.
URL: https://github.com/apache/beam/pull/10876#issuecomment-587861035
 
 
   R: @TheNeuralBit 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389099)
Time Spent: 11h 20m  (was: 11h 10m)

> Dill fails to pickle  avro.RecordSchema classes on Python 3.
> 
>
> Key: BEAM-6522
> URL: https://issues.apache.org/jira/browse/BEAM-6522
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> The avroio module still has 4 failing tests. This is actually 2 times the 
> same 2 tests, both for Avro and Fastavro.
> *apache_beam.io.avroio_test.TestAvro.test_sink_transform*
>  *apache_beam.io.avroio_test.TestFastAvro.test_sink_transform*
> fail with:
> {code:java}
> Traceback (most recent call last):
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio_test.py", 
> line 432, in test_sink_transform
> | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/avroio.py", line 
> 528, in expand
> return pcoll | beam.io.iobase.Write(self._sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 960, in expand
> return pcoll | WriteImpl(self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pvalue.py", line 
> 112, in __or__
> return self.pipeline.apply(ptransform, self)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/pipeline.py", line 
> 515, in apply
> pvalueish_result = self.runner.apply(transform, pvalueish, self._options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 193, in apply
> return m(transform, input, options)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/runners/runner.py", 
> line 199, in apply_PTransform
> return transform.expand(input)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/io/iobase.py", line 
> 979, in expand
> lambda _, sink: sink.initialize_write(), self.sink)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1103, in Map
> pardo = FlatMap(wrapper, *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 1054, in FlatMap
> pardo = ParDo(CallableWrapperDoFn(fn), *args, **kwargs)
> File "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/core.py", 
> line 864, in __init__
> super(ParDo, self).__init__(fn, *args, **kwargs)
> File 
> "/home/robbe/workspace/beam/sdks/python/apache_beam/transforms/ptransform.py",
>  line 646, in __init__
> self.args = pickler.loads(pickler.dumps(self.args))
> File

[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389097&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389097
 ]

ASF GitHub Bot logged work on BEAM-8564:


Author: ASF GitHub Bot
Created on: 18/Feb/20 21:14
Start Date: 18/Feb/20 21:14
Worklog Time Spent: 10m 
  Work Description: amoght commented on pull request #10254: [BEAM-8564] 
Add LZO compression and decompression support
URL: https://github.com/apache/beam/pull/10254#discussion_r380938634
 
 

 ##
 File path: sdks/java/core/build.gradle
 ##
 @@ -91,4 +95,6 @@ dependencies {
   shadowTest library.java.avro_tests
   shadowTest library.java.zstd_jni
   testRuntimeOnly library.java.slf4j_jdk14
+  compileOnly 'io.airlift:aircompressor:0.16' 
 
 Review comment:
   done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389097)
Time Spent: 10h  (was: 9h 50m)

> Add LZO compression and decompression support
> -
>
> Key: BEAM-8564
> URL: https://issues.apache.org/jira/browse/BEAM-8564
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Amogh Tiwari
>Assignee: Amogh Tiwari
>Priority: Minor
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> LZO is a lossless data compression algorithm which is focused on compression 
> and decompression speeds.
> This will enable Apache Beam sdk to compress/decompress files using LZO 
> compression algorithm. 
> This will include the following functionalities:
>  # compress() : for compressing files into an LZO archive
>  # decompress() : for decompressing files archived using LZO compression
> Appropriate Input and Output stream will also be added to enable working with 
> LZO files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7304) Twister2 Beam runner

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7304?focusedWorklogId=389095&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389095
 ]

ASF GitHub Bot logged work on BEAM-7304:


Author: ASF GitHub Bot
Created on: 18/Feb/20 21:14
Start Date: 18/Feb/20 21:14
Worklog Time Spent: 10m 
  Work Description: pulasthi commented on issue #10888: [BEAM-7304] 
Twister2 Beam runner
URL: https://github.com/apache/beam/pull/10888#issuecomment-587859343
 
 
   It looks like its retesting ignoring the last commit which fixes some of the 
earlier check failures 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389095)
Time Spent: 3h 10m  (was: 3h)

> Twister2 Beam runner
> 
>
> Key: BEAM-7304
> URL: https://issues.apache.org/jira/browse/BEAM-7304
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-ideas
>Reporter: Pulasthi Wickramasinghe
>Assignee: Pulasthi Wickramasinghe
>Priority: Minor
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Twister2 is a big data framework which supports both batch and stream 
> processing [1] [2]. The goal is to develop an beam runner for Twister2. 
> [1] [https://github.com/DSC-SPIDAL/twister2]
> [2] [https://twister2.org/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389096&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389096
 ]

ASF GitHub Bot logged work on BEAM-8564:


Author: ASF GitHub Bot
Created on: 18/Feb/20 21:14
Start Date: 18/Feb/20 21:14
Worklog Time Spent: 10m 
  Work Description: amoght commented on pull request #10254: [BEAM-8564] 
Add LZO compression and decompression support
URL: https://github.com/apache/beam/pull/10254#discussion_r380938566
 
 

 ##
 File path: sdks/java/core/build.gradle
 ##
 @@ -91,4 +95,6 @@ dependencies {
   shadowTest library.java.avro_tests
   shadowTest library.java.zstd_jni
   testRuntimeOnly library.java.slf4j_jdk14
+  compileOnly 'io.airlift:aircompressor:0.16' 
+  compileOnly 'com.facebook.presto.hadoop:hadoop-apache2:3.2.0-1'
 
 Review comment:
   done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389096)
Time Spent: 9h 50m  (was: 9h 40m)

> Add LZO compression and decompression support
> -
>
> Key: BEAM-8564
> URL: https://issues.apache.org/jira/browse/BEAM-8564
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Amogh Tiwari
>Assignee: Amogh Tiwari
>Priority: Minor
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> LZO is a lossless data compression algorithm which is focused on compression 
> and decompression speeds.
> This will enable Apache Beam sdk to compress/decompress files using LZO 
> compression algorithm. 
> This will include the following functionalities:
>  # compress() : for compressing files into an LZO archive
>  # decompress() : for decompressing files archived using LZO compression
> Appropriate Input and Output stream will also be added to enable working with 
> LZO files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389098&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389098
 ]

ASF GitHub Bot logged work on BEAM-8564:


Author: ASF GitHub Bot
Created on: 18/Feb/20 21:14
Start Date: 18/Feb/20 21:14
Worklog Time Spent: 10m 
  Work Description: amoght commented on pull request #10254: [BEAM-8564] 
Add LZO compression and decompression support
URL: https://github.com/apache/beam/pull/10254#discussion_r380938720
 
 

 ##
 File path: sdks/java/core/build.gradle
 ##
 @@ -58,6 +58,10 @@ test {
   }
 }
 
+configurations {
+testCompile.extendsFrom compileOnly
+}
+
 
 Review comment:
   done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389098)
Time Spent: 10h 10m  (was: 10h)

> Add LZO compression and decompression support
> -
>
> Key: BEAM-8564
> URL: https://issues.apache.org/jira/browse/BEAM-8564
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Amogh Tiwari
>Assignee: Amogh Tiwari
>Priority: Minor
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> LZO is a lossless data compression algorithm which is focused on compression 
> and decompression speeds.
> This will enable Apache Beam sdk to compress/decompress files using LZO 
> compression algorithm. 
> This will include the following functionalities:
>  # compress() : for compressing files into an LZO archive
>  # decompress() : for decompressing files archived using LZO compression
> Appropriate Input and Output stream will also be added to enable working with 
> LZO files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389094&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389094
 ]

ASF GitHub Bot logged work on BEAM-8564:


Author: ASF GitHub Bot
Created on: 18/Feb/20 21:14
Start Date: 18/Feb/20 21:14
Worklog Time Spent: 10m 
  Work Description: amoght commented on pull request #10254: [BEAM-8564] 
Add LZO compression and decompression support
URL: https://github.com/apache/beam/pull/10254#discussion_r380938458
 
 

 ##
 File path: sdks/java/core/build.gradle
 ##
 @@ -91,4 +95,6 @@ dependencies {
   shadowTest library.java.avro_tests
   shadowTest library.java.zstd_jni
   testRuntimeOnly library.java.slf4j_jdk14
+  compileOnly 'io.airlift:aircompressor:0.16' 
 
 Review comment:
   done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389094)
Time Spent: 9h 40m  (was: 9.5h)

> Add LZO compression and decompression support
> -
>
> Key: BEAM-8564
> URL: https://issues.apache.org/jira/browse/BEAM-8564
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Amogh Tiwari
>Assignee: Amogh Tiwari
>Priority: Minor
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> LZO is a lossless data compression algorithm which is focused on compression 
> and decompression speeds.
> This will enable Apache Beam sdk to compress/decompress files using LZO 
> compression algorithm. 
> This will include the following functionalities:
>  # compress() : for compressing files into an LZO archive
>  # decompress() : for decompressing files archived using LZO compression
> Appropriate Input and Output stream will also be added to enable working with 
> LZO files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8564) Add LZO compression and decompression support

2020-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8564?focusedWorklogId=389090&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-389090
 ]

ASF GitHub Bot logged work on BEAM-8564:


Author: ASF GitHub Bot
Created on: 18/Feb/20 21:13
Start Date: 18/Feb/20 21:13
Worklog Time Spent: 10m 
  Work Description: amoght commented on pull request #10254: [BEAM-8564] 
Add LZO compression and decompression support
URL: https://github.com/apache/beam/pull/10254#discussion_r380938051
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/util/LzoCompressorInputStream.java
 ##
 @@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.util;
+
+import io.airlift.compress.lzo.LzoCodec;
+import java.io.IOException;
+import java.io.InputStream;
+import org.apache.commons.compress.compressors.CompressorInputStream;
+import org.apache.commons.compress.utils.CountingInputStream;
+import org.apache.commons.compress.utils.IOUtils;
+import org.apache.commons.compress.utils.InputStreamStatistics;
+
+/**
+ * {@link CompressorInputStream} implementation to create LZO encoded stream. 
Library relies on https://github.com/airlift/aircompressor/";>LZO
+ *
+ * @since 1.18
+ */
+public class LzoCompressorInputStream extends CompressorInputStream
 
 Review comment:
   replaced wrapper classes with static methods
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 389090)
Time Spent: 9h 10m  (was: 9h)

> Add LZO compression and decompression support
> -
>
> Key: BEAM-8564
> URL: https://issues.apache.org/jira/browse/BEAM-8564
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-core
>Reporter: Amogh Tiwari
>Assignee: Amogh Tiwari
>Priority: Minor
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> LZO is a lossless data compression algorithm which is focused on compression 
> and decompression speeds.
> This will enable Apache Beam sdk to compress/decompress files using LZO 
> compression algorithm. 
> This will include the following functionalities:
>  # compress() : for compressing files into an LZO archive
>  # decompress() : for decompressing files archived using LZO compression
> Appropriate Input and Output stream will also be added to enable working with 
> LZO files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >