[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=328333=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328333
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 15/Oct/19 05:12
Start Date: 15/Oct/19 05:12
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9725: [BEAM-8350] 
Upgrade to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#discussion_r334741855
 
 

 ##
 File path: sdks/python/.pylintrc
 ##
 @@ -85,28 +85,41 @@ disable =
   bad-builtin,
   bad-super-call,
   broad-except,
+  chained-comparison,
 
 Review comment:
   > How did you choose which to rules to permanently exclude and which to fix 
in this PR?
   
   That's a good question. 
   
   Obvious errors to exclude:
   
   - `useless-object-inheritance`: obviously we need to keep this until we drop 
python2 support
   -  `import-outside-toplevel`: scoped imports are pretty fundamental.  I 
don't even understand why they added this as an error
   
   Probably worth solving now:
   - `expression-not-assigned`: used in a few places in one test.  Could be 
worth making explicit exclusions for the test cases so that the problem is not 
spread elsewhere
   - `singleton-comparison`:  e.g. `x == None`.  this only occurs in a few 
places, so would be easy to squash now
   
   Could be added to a ticket for later:
   - `no-else-raise/break/continue`:  this pattern is pretty prevalent 
throughout the code.  could be worth fixing, but pretty time consuming.  good 
newbie task.
   
   Most of the remaining errors seemed like insignificant or debatable style 
suggestions.
   
   Here's the full output (excluding the first two).  Let me know if there are 
other errors you'd like to fix in this PR or add to a follow-up ticket.
   
   ```
   * Module apache_beam.metrics.metricbase
   apache_beam/metrics/metricbase.py:92:2: W0107: Unnecessary pass statement 
(unnecessary-pass)
   * Module apache_beam.pvalue
   apache_beam/pvalue.py:195:2: W0107: Unnecessary pass statement 
(unnecessary-pass)
   apache_beam/pvalue.py:201:2: W0107: Unnecessary pass statement 
(unnecessary-pass)
   apache_beam/pvalue.py:574:2: W0107: Unnecessary pass statement 
(unnecessary-pass)
   * Module apache_beam.metrics.execution_test
   apache_beam/metrics/execution_test.py:129:21: R1718: Consider using a set 
comprehension (consider-using-set-comprehension)
   apache_beam/metrics/execution_test.py:131:21: R1718: Consider using a set 
comprehension (consider-using-set-comprehension)
   apache_beam/metrics/execution_test.py:140:21: R1718: Consider using a set 
comprehension (consider-using-set-comprehension)
   apache_beam/metrics/execution_test.py:142:21: R1718: Consider using a set 
comprehension (consider-using-set-comprehension)
   * Module apache_beam.options.pipeline_options
   apache_beam/options/pipeline_options.py:181:6: R1723: Unnecessary "elif" 
after "break" (no-else-break)
   apache_beam/options/pipeline_options.py:520:9: R1714: Consider merging these 
comparisons with "in" to "runner in ('DataflowRunner', 'TestDataflowRunner')" 
(consider-using-in)
   * Module apache_beam.pipeline
   apache_beam/pipeline.py:243:10: R1720: Unnecessary "elif" after "raise" 
(no-else-raise)
   apache_beam/pipeline.py:682:23: R1718: Consider using a set comprehension 
(consider-using-set-comprehension)
   apache_beam/pipeline.py:720:4: W0107: Unnecessary pass statement 
(unnecessary-pass)
   apache_beam/pipeline.py:724:4: W0107: Unnecessary pass statement 
(unnecessary-pass)
   apache_beam/pipeline.py:728:4: W0107: Unnecessary pass statement 
(unnecessary-pass)
   apache_beam/pipeline.py:732:4: W0107: Unnecessary pass statement 
(unnecessary-pass)
   * Module apache_beam.coders.coder_impl
   apache_beam/coders/coder_impl.py:405:9: R1714: Consider merging these 
comparisons with "in" to 't in (LIST_TYPE, TUPLE_TYPE, SET_TYPE)' 
(consider-using-in)
   apache_beam/coders/coder_impl.py:459:7: R0123: Comparison to literal 
(literal-comparison)
   apache_beam/coders/coder_impl.py:461:9: R0123: Comparison to literal 
(literal-comparison)
   apache_beam/coders/coder_impl.py:470:7: R0123: Comparison to literal 
(literal-comparison)
   apache_beam/coders/coder_impl.py:472:9: R0123: Comparison to literal 
(literal-comparison)
   * Module apache_beam.coders.coders
   apache_beam/coders/coders.py:192:4: R1720: Unnecessary "else" after "raise" 
(no-else-raise)
   apache_beam/coders/coders.py:198:4: R1720: Unnecessary "else" after "raise" 
(no-else-raise)
   * Module apache_beam.pipeline_test
   apache_beam/pipeline_test.py:690:8: R1718: Consider using a set 
comprehension (consider-using-set-comprehension)
   apache_beam/pipeline_test.py:695:8: R1718: Consider using a set 
comprehension 

[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=328332=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328332
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 15/Oct/19 05:11
Start Date: 15/Oct/19 05:11
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9725: [BEAM-8350] 
Upgrade to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#discussion_r334741855
 
 

 ##
 File path: sdks/python/.pylintrc
 ##
 @@ -85,28 +85,41 @@ disable =
   bad-builtin,
   bad-super-call,
   broad-except,
+  chained-comparison,
 
 Review comment:
   > How did you choose which to rules to permanently exclude and which to fix 
in this PR?
   
   That's a good question. 
   
   Obvious errors to exclude:
   
   - `useless-object-inheritance`: obviously we need to keep this until we drop 
python2 support
   -  `import-outside-toplevel`: scoped imports are pretty fundamental.  I 
don't even understand why they added this as an error
   
   Probably worth solving now:
   - `expression-not-assigned`: used in a few places in one test.  Could be 
worth making explicit exclusions for the test cases so that the problem is not 
spread elsewhere
   - `singleton-comparison`:  e.g. `x == None`.  this only occurs in a few 
places, so would be easy to squash now
   
   Could be added to a ticket for later:
   - `no-else-raise/break/continue`:  this pattern is pretty prevalent 
throughout the code.  could be worth fixing, but pretty time consuming.  good 
newbie task.
   
   Here's the full output (excluding the first two).  Let me know if there are 
other errors you'd like to fix in this PR or add to a follow-up ticket.
   
   ```
   * Module apache_beam.metrics.metricbase
   apache_beam/metrics/metricbase.py:92:2: W0107: Unnecessary pass statement 
(unnecessary-pass)
   * Module apache_beam.pvalue
   apache_beam/pvalue.py:195:2: W0107: Unnecessary pass statement 
(unnecessary-pass)
   apache_beam/pvalue.py:201:2: W0107: Unnecessary pass statement 
(unnecessary-pass)
   apache_beam/pvalue.py:574:2: W0107: Unnecessary pass statement 
(unnecessary-pass)
   * Module apache_beam.metrics.execution_test
   apache_beam/metrics/execution_test.py:129:21: R1718: Consider using a set 
comprehension (consider-using-set-comprehension)
   apache_beam/metrics/execution_test.py:131:21: R1718: Consider using a set 
comprehension (consider-using-set-comprehension)
   apache_beam/metrics/execution_test.py:140:21: R1718: Consider using a set 
comprehension (consider-using-set-comprehension)
   apache_beam/metrics/execution_test.py:142:21: R1718: Consider using a set 
comprehension (consider-using-set-comprehension)
   * Module apache_beam.options.pipeline_options
   apache_beam/options/pipeline_options.py:181:6: R1723: Unnecessary "elif" 
after "break" (no-else-break)
   apache_beam/options/pipeline_options.py:520:9: R1714: Consider merging these 
comparisons with "in" to "runner in ('DataflowRunner', 'TestDataflowRunner')" 
(consider-using-in)
   * Module apache_beam.pipeline
   apache_beam/pipeline.py:243:10: R1720: Unnecessary "elif" after "raise" 
(no-else-raise)
   apache_beam/pipeline.py:682:23: R1718: Consider using a set comprehension 
(consider-using-set-comprehension)
   apache_beam/pipeline.py:720:4: W0107: Unnecessary pass statement 
(unnecessary-pass)
   apache_beam/pipeline.py:724:4: W0107: Unnecessary pass statement 
(unnecessary-pass)
   apache_beam/pipeline.py:728:4: W0107: Unnecessary pass statement 
(unnecessary-pass)
   apache_beam/pipeline.py:732:4: W0107: Unnecessary pass statement 
(unnecessary-pass)
   * Module apache_beam.coders.coder_impl
   apache_beam/coders/coder_impl.py:405:9: R1714: Consider merging these 
comparisons with "in" to 't in (LIST_TYPE, TUPLE_TYPE, SET_TYPE)' 
(consider-using-in)
   apache_beam/coders/coder_impl.py:459:7: R0123: Comparison to literal 
(literal-comparison)
   apache_beam/coders/coder_impl.py:461:9: R0123: Comparison to literal 
(literal-comparison)
   apache_beam/coders/coder_impl.py:470:7: R0123: Comparison to literal 
(literal-comparison)
   apache_beam/coders/coder_impl.py:472:9: R0123: Comparison to literal 
(literal-comparison)
   * Module apache_beam.coders.coders
   apache_beam/coders/coders.py:192:4: R1720: Unnecessary "else" after "raise" 
(no-else-raise)
   apache_beam/coders/coders.py:198:4: R1720: Unnecessary "else" after "raise" 
(no-else-raise)
   * Module apache_beam.pipeline_test
   apache_beam/pipeline_test.py:690:8: R1718: Consider using a set 
comprehension (consider-using-set-comprehension)
   apache_beam/pipeline_test.py:695:8: R1718: Consider using a set 
comprehension (consider-using-set-comprehension)
   * Module apache_beam.io.hadoopfilesystem
   

[jira] [Created] (BEAM-8402) Create a class hierarchy to represent environments

2019-10-14 Thread Chad Dombrova (Jira)
Chad Dombrova created BEAM-8402:
---

 Summary: Create a class hierarchy to represent environments
 Key: BEAM-8402
 URL: https://issues.apache.org/jira/browse/BEAM-8402
 Project: Beam
  Issue Type: New Feature
  Components: sdk-py-core
Reporter: Chad Dombrova
Assignee: Chad Dombrova


As a first step towards making it possible to assign different environments to 
sections of a pipeline, we first need to expose environment classes to the 
pipeline API.  Unlike PTransforms, PCollections, Coders, and Windowings,  
environments exists solely in the portability framework as protobuf objects.   
By creating a hierarchy of "native" classes that represent the various 
environment types -- external, docker, process, etc -- users will be able to 
instantiate these and assign them to parts of the pipeline.  The assignment 
portion will be covered in a follow-up issue/PR.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8367) Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8367?focusedWorklogId=328329=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328329
 ]

ASF GitHub Bot logged work on BEAM-8367:


Author: ASF GitHub Bot
Created on: 15/Oct/19 04:41
Start Date: 15/Oct/19 04:41
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9797: [BEAM-8367] 
Using insertId for BQ streaming inserts
URL: https://github.com/apache/beam/pull/9797#discussion_r334750033
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -1609,7 +1609,7 @@ def test_lull_logging(self):
  | beam.Create([1])
  | beam.Map(time.sleep))
 
-self.assertRegexpMatches(
+self.assertRegexp(
 
 Review comment:
   nvm it broke precommits
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328329)
Time Spent: 50m  (was: 40m)

> Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS
> -
>
> Key: BEAM-8367
> URL: https://issues.apache.org/jira/browse/BEAM-8367
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Unique IDs ensure (best effort) that writes to BigQuery are idempotent, for 
> example, we don't write the same record twice in a VM failure.
>  
> Currently Python BQ sink insert BQ IDs here but they'll be re-generated in a 
> VM failure resulting in data duplication.
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L766]
>  
> Correct fix is to do a Reshuffle to checkpoint unique IDs once they are 
> generated, similar to how Java BQ sink operates.
> [https://github.com/apache/beam/blob/dcf6ad301069e4d2cfaec5db6b178acb7bb67f49/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java#L225]
>  
> Pablo, can you do an initial assessment here ?
> I think this is a relatively small fix but I might be wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8367) Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8367?focusedWorklogId=328328=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328328
 ]

ASF GitHub Bot logged work on BEAM-8367:


Author: ASF GitHub Bot
Created on: 15/Oct/19 04:40
Start Date: 15/Oct/19 04:40
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9797: [BEAM-8367] Using 
insertId for BQ streaming inserts
URL: https://github.com/apache/beam/pull/9797#issuecomment-542033127
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328328)
Time Spent: 40m  (was: 0.5h)

> Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS
> -
>
> Key: BEAM-8367
> URL: https://issues.apache.org/jira/browse/BEAM-8367
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Unique IDs ensure (best effort) that writes to BigQuery are idempotent, for 
> example, we don't write the same record twice in a VM failure.
>  
> Currently Python BQ sink insert BQ IDs here but they'll be re-generated in a 
> VM failure resulting in data duplication.
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L766]
>  
> Correct fix is to do a Reshuffle to checkpoint unique IDs once they are 
> generated, similar to how Java BQ sink operates.
> [https://github.com/apache/beam/blob/dcf6ad301069e4d2cfaec5db6b178acb7bb67f49/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java#L225]
>  
> Pablo, can you do an initial assessment here ?
> I think this is a relatively small fix but I might be wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=328299=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328299
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 15/Oct/19 03:47
Start Date: 15/Oct/19 03:47
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9725: [BEAM-8350] 
Upgrade to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#discussion_r334741855
 
 

 ##
 File path: sdks/python/.pylintrc
 ##
 @@ -85,28 +85,41 @@ disable =
   bad-builtin,
   bad-super-call,
   broad-except,
+  chained-comparison,
 
 Review comment:
   > How did you choose which to rules to permanently exclude and which to fix 
in this PR?
   
   That's a good question. 
   
   Obvious errors to exclude:
   
   - `useless-object-inheritance`: obviously we need to keep this until we get 
to python3-only
   -  `import-outside-toplevel`: scoped imports are pretty fundamental.  I 
don't even understand why they added this as an error
   
   Probably worth solving now:
   - `expression-not-assigned`: used in a few places in tests.  Could be worth 
making explicit exclusions for the test cases so that the problem is not spread 
elsewhere
   - `singleton-comparison`:  e.g. `x == None`.  this only occurs in a few 
places, so would be easy to squash now
   
   Could be added to a ticket for later:
   - `no-else-raise/break/continue`:  this pattern is pretty prevalent 
throughout the code.  could be worth fixing, but pretty time consuming.  good 
newbie task.
   
   Here's the full output (excluding the first two).  Let me know if there are 
other errors you'd like to fix in this PR or add to a follow-up ticket.
   
   ```
   * Module apache_beam.metrics.metricbase
   apache_beam/metrics/metricbase.py:92:2: W0107: Unnecessary pass statement 
(unnecessary-pass)
   * Module apache_beam.pvalue
   apache_beam/pvalue.py:195:2: W0107: Unnecessary pass statement 
(unnecessary-pass)
   apache_beam/pvalue.py:201:2: W0107: Unnecessary pass statement 
(unnecessary-pass)
   apache_beam/pvalue.py:574:2: W0107: Unnecessary pass statement 
(unnecessary-pass)
   * Module apache_beam.metrics.execution_test
   apache_beam/metrics/execution_test.py:129:21: R1718: Consider using a set 
comprehension (consider-using-set-comprehension)
   apache_beam/metrics/execution_test.py:131:21: R1718: Consider using a set 
comprehension (consider-using-set-comprehension)
   apache_beam/metrics/execution_test.py:140:21: R1718: Consider using a set 
comprehension (consider-using-set-comprehension)
   apache_beam/metrics/execution_test.py:142:21: R1718: Consider using a set 
comprehension (consider-using-set-comprehension)
   * Module apache_beam.options.pipeline_options
   apache_beam/options/pipeline_options.py:181:6: R1723: Unnecessary "elif" 
after "break" (no-else-break)
   apache_beam/options/pipeline_options.py:520:9: R1714: Consider merging these 
comparisons with "in" to "runner in ('DataflowRunner', 'TestDataflowRunner')" 
(consider-using-in)
   * Module apache_beam.pipeline
   apache_beam/pipeline.py:243:10: R1720: Unnecessary "elif" after "raise" 
(no-else-raise)
   apache_beam/pipeline.py:682:23: R1718: Consider using a set comprehension 
(consider-using-set-comprehension)
   apache_beam/pipeline.py:720:4: W0107: Unnecessary pass statement 
(unnecessary-pass)
   apache_beam/pipeline.py:724:4: W0107: Unnecessary pass statement 
(unnecessary-pass)
   apache_beam/pipeline.py:728:4: W0107: Unnecessary pass statement 
(unnecessary-pass)
   apache_beam/pipeline.py:732:4: W0107: Unnecessary pass statement 
(unnecessary-pass)
   * Module apache_beam.coders.coder_impl
   apache_beam/coders/coder_impl.py:405:9: R1714: Consider merging these 
comparisons with "in" to 't in (LIST_TYPE, TUPLE_TYPE, SET_TYPE)' 
(consider-using-in)
   apache_beam/coders/coder_impl.py:459:7: R0123: Comparison to literal 
(literal-comparison)
   apache_beam/coders/coder_impl.py:461:9: R0123: Comparison to literal 
(literal-comparison)
   apache_beam/coders/coder_impl.py:470:7: R0123: Comparison to literal 
(literal-comparison)
   apache_beam/coders/coder_impl.py:472:9: R0123: Comparison to literal 
(literal-comparison)
   * Module apache_beam.coders.coders
   apache_beam/coders/coders.py:192:4: R1720: Unnecessary "else" after "raise" 
(no-else-raise)
   apache_beam/coders/coders.py:198:4: R1720: Unnecessary "else" after "raise" 
(no-else-raise)
   * Module apache_beam.pipeline_test
   apache_beam/pipeline_test.py:690:8: R1718: Consider using a set 
comprehension (consider-using-set-comprehension)
   apache_beam/pipeline_test.py:695:8: R1718: Consider using a set 
comprehension (consider-using-set-comprehension)
   * Module apache_beam.io.hadoopfilesystem
   

[jira] [Work logged] (BEAM-8348) Portable Python job name hard-coded to "job"

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8348?focusedWorklogId=328263=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328263
 ]

ASF GitHub Bot logged work on BEAM-8348:


Author: ASF GitHub Bot
Created on: 15/Oct/19 02:25
Start Date: 15/Oct/19 02:25
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #9789: [BEAM-8348] set 
job_name in portable_runner.py
URL: https://github.com/apache/beam/pull/9789#discussion_r334728726
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/portable_runner.py
 ##
 @@ -296,7 +297,8 @@ def add_runner_options(parser):
 
 prepare_response = job_service.Prepare(
 beam_job_api_pb2.PrepareJobRequest(
-job_name='job', pipeline=proto_pipeline,
+job_name=options.view_as(GoogleCloudOptions).job_name or 'job',
 
 Review comment:
   Would it be possible to replicate the option and deprecate it in 
`GoogleCloudOptions`?
   
   FWIW this option also exists in the Flink runner and the job_name is already 
reflected in the pipeline (see 
https://github.com/apache/beam/pull/9752#issuecomment-541250003), so what is 
the effect of this change going to be?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328263)
Time Spent: 1h 40m  (was: 1.5h)

> Portable Python job name hard-coded to "job"
> 
>
> Key: BEAM-8348
> URL: https://issues.apache.org/jira/browse/BEAM-8348
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> See [1]. `job_name` is already taken by Google Cloud options [2], so I guess 
> we should create a new option (maybe `portable_job_name` to avoid disruption).
> [[1] 
> https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294|https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294]
> [2] 
> [https://github.com/apache/beam/blob/c5bbb51014f7506a2651d6070f27fb3c3dc0da8f/sdks/python/apache_beam/options/pipeline_options.py#L438]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8391) `AwsModule` of sdks/java/io/amazon-web-services uses deprecated type id write methods

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8391?focusedWorklogId=328260=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328260
 ]

ASF GitHub Bot logged work on BEAM-8391:


Author: ASF GitHub Bot
Created on: 15/Oct/19 02:04
Start Date: 15/Oct/19 02:04
Worklog Time Spent: 10m 
  Work Description: cowtowncoder commented on issue #9783: [BEAM-8391] 
Update type id write methods in `AwsModule`
URL: https://github.com/apache/beam/pull/9783#issuecomment-542002686
 
 
   @iemejia You are welcome! I could try 2.10 upgrade, time permitting. There 
shouldn't be much to do but would definitely want full regression test to be 
run just in case.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328260)
Time Spent: 1h 10m  (was: 1h)

> `AwsModule` of sdks/java/io/amazon-web-services uses deprecated type id write 
> methods
> -
>
> Key: BEAM-8391
> URL: https://issues.apache.org/jira/browse/BEAM-8391
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-aws
>Reporter: Tatu Saloranta
>Assignee: Tatu Saloranta
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> `AwsModule` class uses old `typeSerializer.writeTypePrefixForObject()` method 
> in `serializeWithType()`, deprecated in Jackson 2.9. While this still works 
> (and should work for 2.x), it makes sense to use replacement that is fully 
> supported.
> I will provide a patch for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8398) Upgrade Dataflow Java Client API

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8398?focusedWorklogId=328253=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328253
 ]

ASF GitHub Bot logged work on BEAM-8398:


Author: ASF GitHub Bot
Created on: 15/Oct/19 01:22
Start Date: 15/Oct/19 01:22
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9792: [BEAM-8398] Upgrade 
Dataflow Java Client API
URL: https://github.com/apache/beam/pull/9792#issuecomment-541994450
 
 
   I made some more conservative version choices. The diff, besides the 
obvious/expected 1.27->1.28 changes, is pretty minimal, as follows.
   
   `google-http-client` picked up some new dependencies. Before:
   ```
   ||+--- com.google.http-client:google-http-client:1.27.0
   |||+--- com.google.code.findbugs:jsr305:3.0.2
   |||+--- com.google.guava:guava:20.0
   |||+--- org.apache.httpcomponents:httpclient:4.5.5
   ||||+--- org.apache.httpcomponents:httpcore:4.4.9
   ||||+--- commons-logging:commons-logging:1.2
   ||||\--- commons-codec:commons-codec:1.10
   |||\--- com.google.j2objc:j2objc-annotations:1.1
   ```
   After:
   ```
   ||+--- com.google.http-client:google-http-client:1.28.0
   |||+--- com.google.code.findbugs:jsr305:3.0.2
   |||+--- com.google.guava:guava:26.0-android -> 20.0
   |||+--- com.google.j2objc:j2objc-annotations:1.1
   |||+--- io.opencensus:opencensus-api:0.18.0
   ||||\--- io.grpc:grpc-context:1.14.0
   |||\--- io.opencensus:opencensus-contrib-http-util:0.18.0
   ||| +--- io.opencensus:opencensus-api:0.18.0 (*)
   ||| \--- com.google.guava:guava:20.0
   ```
   The existing `org.apache.httpcomponents:httpclient` was added under 
`google-api-client`:
   ```
   ||+--- com.google.http-client:google-http-client-apache:2.0.0
   |||+--- com.google.http-client:google-http-client:1.28.0 (*)
   |||\--- org.apache.httpcomponents:httpclient:4.5.5
   ||| +--- org.apache.httpcomponents:httpcore:4.4.9
   ||| +--- commons-logging:commons-logging:1.2
   ||| \--- commons-codec:commons-codec:1.10
   ```
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328253)
Time Spent: 40m  (was: 0.5h)

> Upgrade Dataflow Java Client API
> 
>
> Key: BEAM-8398
> URL: https://issues.apache.org/jira/browse/BEAM-8398
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=328252=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328252
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 15/Oct/19 01:19
Start Date: 15/Oct/19 01:19
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #9725: [BEAM-8350] 
Upgrade to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#discussion_r334717631
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -33,7 +33,8 @@
 from pkg_resources import DistributionNotFound
 from pkg_resources import get_distribution
 from setuptools.command.build_py import build_py
-from setuptools.command.develop import develop
+# workaround pylint bug: https://github.com/PyCQA/pylint/issues/3152
 
 Review comment:
   Same TODO here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328252)
Time Spent: 6h 10m  (was: 6h)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=328250=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328250
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 15/Oct/19 01:19
Start Date: 15/Oct/19 01:19
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #9725: [BEAM-8350] 
Upgrade to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#discussion_r334718345
 
 

 ##
 File path: sdks/python/.pylintrc
 ##
 @@ -85,28 +85,41 @@ disable =
   bad-builtin,
   bad-super-call,
   broad-except,
+  chained-comparison,
 
 Review comment:
   What is your perspective on these exceptions: do we need cleanup TODOs to 
fix some of them in the future? How did you choose which to rules to 
permanently exclude and which to fix in this PR?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328250)
Time Spent: 6h  (was: 5h 50m)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=328251=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328251
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 15/Oct/19 01:19
Start Date: 15/Oct/19 01:19
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #9725: [BEAM-8350] 
Upgrade to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#discussion_r334716983
 
 

 ##
 File path: sdks/python/apache_beam/examples/complete/juliaset/setup.py
 ##
 @@ -31,7 +31,8 @@
 import subprocess
 from distutils.command.build import build as _build
 
-import setuptools
+# workaround pylint bug: https://github.com/PyCQA/pylint/issues/3152
 
 Review comment:
   How about we file an issue and add a comment:
   # TODO(BEAM-...): re-enable lint check. 
   We can tag the issue as newbie/starter/trivial and it can be someone's first 
contribution to beam.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328251)
Time Spent: 6h 10m  (was: 6h)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=328245=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328245
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 15/Oct/19 01:06
Start Date: 15/Oct/19 01:06
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9741: [BEAM-7926] Visualize 
PCollection
URL: https://github.com/apache/beam/pull/9741#issuecomment-541991427
 
 
   Any updates @rohdesamuel @KevinGG ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328245)
Time Spent: 1h 40m  (was: 1.5h)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8367) Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8367?focusedWorklogId=328246=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328246
 ]

ASF GitHub Bot logged work on BEAM-8367:


Author: ASF GitHub Bot
Created on: 15/Oct/19 01:06
Start Date: 15/Oct/19 01:06
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9797: [BEAM-8367] Using 
insertId for BQ streaming inserts
URL: https://github.com/apache/beam/pull/9797#issuecomment-541991476
 
 
   r: @chamikaramj 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328246)
Time Spent: 0.5h  (was: 20m)

> Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS
> -
>
> Key: BEAM-8367
> URL: https://issues.apache.org/jira/browse/BEAM-8367
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Unique IDs ensure (best effort) that writes to BigQuery are idempotent, for 
> example, we don't write the same record twice in a VM failure.
>  
> Currently Python BQ sink insert BQ IDs here but they'll be re-generated in a 
> VM failure resulting in data duplication.
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L766]
>  
> Correct fix is to do a Reshuffle to checkpoint unique IDs once they are 
> generated, similar to how Java BQ sink operates.
> [https://github.com/apache/beam/blob/dcf6ad301069e4d2cfaec5db6b178acb7bb67f49/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java#L225]
>  
> Pablo, can you do an initial assessment here ?
> I think this is a relatively small fix but I might be wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=328241=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328241
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 15/Oct/19 01:05
Start Date: 15/Oct/19 01:05
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on issue #9790: [BEAM-7389] Show 
code snippet outputs as stdout
URL: https://github.com/apache/beam/pull/9790#issuecomment-541991284
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328241)
Time Spent: 67h 40m  (was: 67.5h)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 67h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=328244=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328244
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 15/Oct/19 01:05
Start Date: 15/Oct/19 01:05
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on issue #9790: [BEAM-7389] Show 
code snippet outputs as stdout
URL: https://github.com/apache/beam/pull/9790#issuecomment-541991284
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328244)
Time Spent: 68h 10m  (was: 68h)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 68h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=328243=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328243
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 15/Oct/19 01:05
Start Date: 15/Oct/19 01:05
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9790: [BEAM-7389] 
Show code snippet outputs as stdout
URL: https://github.com/apache/beam/pull/9790#discussion_r334716595
 
 

 ##
 File path: 
sdks/python/apache_beam/examples/snippets/transforms/elementwise/filter_test.py
 ##
 @@ -31,31 +31,26 @@
 
 
 def check_perennials(actual):
-  # [START perennials]
-  perennials = [
-  {'icon': '', 'name': 'Strawberry', 'duration': 'perennial'},
-  {'icon': '', 'name': 'Eggplant', 'duration': 'perennial'},
-  {'icon': '凜', 'name': 'Potato', 'duration': 'perennial'},
-  ]
-  # [END perennials]
-  assert_that(actual, equal_to(perennials))
+  expected = '''[START perennials]
+{'icon': '', 'name': 'Strawberry', 'duration': 'perennial'}
+{'icon': '', 'name': 'Eggplant', 'duration': 'perennial'}
+{'icon': '凜', 'name': 'Potato', 'duration': 'perennial'}
+[END perennials]'''.splitlines()[1:-1]
+  assert_that(actual, equal_to(expected))
 
 
 def check_valid_plants(actual):
-  # [START valid_plants]
-  valid_plants = [
-  {'icon': '', 'name': 'Strawberry', 'duration': 'perennial'},
-  {'icon': '凌', 'name': 'Carrot', 'duration': 'biennial'},
-  {'icon': '', 'name': 'Eggplant', 'duration': 'perennial'},
-  {'icon': '', 'name': 'Tomato', 'duration': 'annual'},
-  ]
-  # [END valid_plants]
-  assert_that(actual, equal_to(valid_plants))
+  expected = '''[START valid_plants]
+{'icon': '', 'name': 'Strawberry', 'duration': 'perennial'}
+{'icon': '凌', 'name': 'Carrot', 'duration': 'biennial'}
+{'icon': '', 'name': 'Eggplant', 'duration': 'perennial'}
+{'icon': '', 'name': 'Tomato', 'duration': 'annual'}
+[END valid_plants]'''.splitlines()[1:-1]
+  assert_that(actual, equal_to(expected))
 
 
 @mock.patch('apache_beam.Pipeline', TestPipeline)
-# pylint: disable=line-too-long
-@mock.patch('apache_beam.examples.snippets.transforms.elementwise.filter.print',
 lambda elem: elem)
+@mock.patch('apache_beam.examples.snippets.transforms.elementwise.filter.print',
 str)
 
 Review comment:
   I do not believe we should make an exception on disabling lint on these.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328243)
Time Spent: 68h  (was: 67h 50m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 68h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=328242=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328242
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 15/Oct/19 01:05
Start Date: 15/Oct/19 01:05
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9790: [BEAM-7389] 
Show code snippet outputs as stdout
URL: https://github.com/apache/beam/pull/9790#discussion_r334716592
 
 

 ##
 File path: 
sdks/python/apache_beam/examples/snippets/transforms/elementwise/filter_test.py
 ##
 @@ -31,31 +31,26 @@
 
 
 def check_perennials(actual):
-  # [START perennials]
-  perennials = [
-  {'icon': '', 'name': 'Strawberry', 'duration': 'perennial'},
-  {'icon': '', 'name': 'Eggplant', 'duration': 'perennial'},
-  {'icon': '凜', 'name': 'Potato', 'duration': 'perennial'},
-  ]
-  # [END perennials]
-  assert_that(actual, equal_to(perennials))
+  expected = '''[START perennials]
+{'icon': '', 'name': 'Strawberry', 'duration': 'perennial'}
+{'icon': '', 'name': 'Eggplant', 'duration': 'perennial'}
+{'icon': '凜', 'name': 'Potato', 'duration': 'perennial'}
+[END perennials]'''.splitlines()[1:-1]
 
 Review comment:
   What exactly users see will see when they run? And what exactly they will 
see in docs?
   
   I think it will be confusing to the read of this code to have our doc tags 
mixed with code.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328242)
Time Spent: 67h 50m  (was: 67h 40m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 67h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8367) Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8367?focusedWorklogId=328240=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328240
 ]

ASF GitHub Bot logged work on BEAM-8367:


Author: ASF GitHub Bot
Created on: 15/Oct/19 01:04
Start Date: 15/Oct/19 01:04
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9797: [BEAM-8367] 
Using insertId for BQ streaming inserts
URL: https://github.com/apache/beam/pull/9797#discussion_r334716457
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner_test.py
 ##
 @@ -1609,7 +1609,7 @@ def test_lull_logging(self):
  | beam.Create([1])
  | beam.Map(time.sleep))
 
-self.assertRegexpMatches(
+self.assertRegexp(
 
 Review comment:
   This change just switches from a deprecated method : )
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328240)
Time Spent: 20m  (was: 10m)

> Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS
> -
>
> Key: BEAM-8367
> URL: https://issues.apache.org/jira/browse/BEAM-8367
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Unique IDs ensure (best effort) that writes to BigQuery are idempotent, for 
> example, we don't write the same record twice in a VM failure.
>  
> Currently Python BQ sink insert BQ IDs here but they'll be re-generated in a 
> VM failure resulting in data duplication.
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L766]
>  
> Correct fix is to do a Reshuffle to checkpoint unique IDs once they are 
> generated, similar to how Java BQ sink operates.
> [https://github.com/apache/beam/blob/dcf6ad301069e4d2cfaec5db6b178acb7bb67f49/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java#L225]
>  
> Pablo, can you do an initial assessment here ?
> I think this is a relatively small fix but I might be wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8368?focusedWorklogId=328239=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328239
 ]

ASF GitHub Bot logged work on BEAM-8368:


Author: ASF GitHub Bot
Created on: 15/Oct/19 01:02
Start Date: 15/Oct/19 01:02
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9768: [BEAM-8368] Avoid 
libprotobuf-generated exception when importing apache_beam
URL: https://github.com/apache/beam/pull/9768#issuecomment-541990847
 
 
   Great, thank you!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328239)
Time Spent: 1h 40m  (was: 1.5h)

> [Python] libprotobuf-generated exception when importing apache_beam
> ---
>
> Key: BEAM-8368
> URL: https://issues.apache.org/jira/browse/BEAM-8368
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.15.0, 2.17.0
>Reporter: Ubaier Bhat
>Assignee: Ahmet Altay
>Priority: Blocker
> Fix For: 2.17.0
>
> Attachments: error_log.txt
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Unable to import apache_beam after upgrading to macos 10.15 (Catalina). 
> Cleared all the pipenvs and but can't get it working again.
> {code}
> import apache_beam as beam
> /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84:
>  UserWarning: Some syntactic constructs of Python 3 are not yet fully 
> supported by Apache Beam.
>   'Some syntactic constructs of Python 3 are not yet fully supported by '
> [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already 
> exists in database: 
> [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> libc++abi.dylib: terminating with uncaught exception of type 
> google::protobuf::FatalException: CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8367) Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS

2019-10-14 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951448#comment-16951448
 ] 

Pablo Estrada commented on BEAM-8367:
-

https://github.com/apache/beam/pull/9797 out to fix

> Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS
> -
>
> Key: BEAM-8367
> URL: https://issues.apache.org/jira/browse/BEAM-8367
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Unique IDs ensure (best effort) that writes to BigQuery are idempotent, for 
> example, we don't write the same record twice in a VM failure.
>  
> Currently Python BQ sink insert BQ IDs here but they'll be re-generated in a 
> VM failure resulting in data duplication.
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L766]
>  
> Correct fix is to do a Reshuffle to checkpoint unique IDs once they are 
> generated, similar to how Java BQ sink operates.
> [https://github.com/apache/beam/blob/dcf6ad301069e4d2cfaec5db6b178acb7bb67f49/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java#L225]
>  
> Pablo, can you do an initial assessment here ?
> I think this is a relatively small fix but I might be wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8367) Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8367?focusedWorklogId=328238=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328238
 ]

ASF GitHub Bot logged work on BEAM-8367:


Author: ASF GitHub Bot
Created on: 15/Oct/19 00:57
Start Date: 15/Oct/19 00:57
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9797: [BEAM-8367] 
Using insertId for BQ streaming inserts
URL: https://github.com/apache/beam/pull/9797
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 

[jira] [Commented] (BEAM-8367) Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS

2019-10-14 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951446#comment-16951446
 ] 

Pablo Estrada commented on BEAM-8367:
-

Working on fixing this

> Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS
> -
>
> Key: BEAM-8367
> URL: https://issues.apache.org/jira/browse/BEAM-8367
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Pablo Estrada
>Priority: Major
>
> Unique IDs ensure (best effort) that writes to BigQuery are idempotent, for 
> example, we don't write the same record twice in a VM failure.
>  
> Currently Python BQ sink insert BQ IDs here but they'll be re-generated in a 
> VM failure resulting in data duplication.
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L766]
>  
> Correct fix is to do a Reshuffle to checkpoint unique IDs once they are 
> generated, similar to how Java BQ sink operates.
> [https://github.com/apache/beam/blob/dcf6ad301069e4d2cfaec5db6b178acb7bb67f49/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java#L225]
>  
> Pablo, can you do an initial assessment here ?
> I think this is a relatively small fix but I might be wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8372) Allow submission of Flink UberJar directly to flink cluster.

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8372?focusedWorklogId=328235=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328235
 ]

ASF GitHub Bot logged work on BEAM-8372:


Author: ASF GitHub Bot
Created on: 15/Oct/19 00:27
Start Date: 15/Oct/19 00:27
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9775: [BEAM-8372] Job 
server submitting UberJars directly to Flink Runner.
URL: https://github.com/apache/beam/pull/9775#issuecomment-541984452
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328235)
Time Spent: 2h 10m  (was: 2h)

> Allow submission of Flink UberJar directly to flink cluster.
> 
>
> Key: BEAM-8372
> URL: https://issues.apache.org/jira/browse/BEAM-8372
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8401) Upgrade to ZetaSQL 2019.10.1

2019-10-14 Thread Andrew Pilloud (Jira)
Andrew Pilloud created BEAM-8401:


 Summary: Upgrade to ZetaSQL 2019.10.1
 Key: BEAM-8401
 URL: https://issues.apache.org/jira/browse/BEAM-8401
 Project: Beam
  Issue Type: New Feature
  Components: dsl-sql-zetasql
Reporter: Andrew Pilloud
Assignee: Andrew Pilloud


ZetaSQL 2019.10.1 will be available in maven central momentarily. We should 
upgrade to it. The new version fixes a circular dependency issue that breaks 
maven and a threading issue that may result in a shutdown hang.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8401) Upgrade to ZetaSQL 2019.10.1

2019-10-14 Thread Andrew Pilloud (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Pilloud updated BEAM-8401:
-
Status: Open  (was: Triage Needed)

> Upgrade to ZetaSQL 2019.10.1
> 
>
> Key: BEAM-8401
> URL: https://issues.apache.org/jira/browse/BEAM-8401
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql-zetasql
>Reporter: Andrew Pilloud
>Assignee: Andrew Pilloud
>Priority: Major
>
> ZetaSQL 2019.10.1 will be available in maven central momentarily. We should 
> upgrade to it. The new version fixes a circular dependency issue that breaks 
> maven and a threading issue that may result in a shutdown hang.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8368?focusedWorklogId=328229=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328229
 ]

ASF GitHub Bot logged work on BEAM-8368:


Author: ASF GitHub Bot
Created on: 14/Oct/19 23:42
Start Date: 14/Oct/19 23:42
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9768: [BEAM-8368] 
Avoid libprotobuf-generated exception when importing apache_beam
URL: https://github.com/apache/beam/pull/9768#issuecomment-541975521
 
 
   Yeah the fix for 
[ARROW-6860](https://issues.apache.org/jira/browse/ARROW-6860) should be in 
0.15.1, and it sounds like that will [resolve our 
issue](https://issues.apache.org/jira/browse/BEAM-8368?focusedCommentId=16950836=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16950836).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328229)
Time Spent: 1.5h  (was: 1h 20m)

> [Python] libprotobuf-generated exception when importing apache_beam
> ---
>
> Key: BEAM-8368
> URL: https://issues.apache.org/jira/browse/BEAM-8368
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.15.0, 2.17.0
>Reporter: Ubaier Bhat
>Assignee: Ahmet Altay
>Priority: Blocker
> Fix For: 2.17.0
>
> Attachments: error_log.txt
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Unable to import apache_beam after upgrading to macos 10.15 (Catalina). 
> Cleared all the pipenvs and but can't get it working again.
> {code}
> import apache_beam as beam
> /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84:
>  UserWarning: Some syntactic constructs of Python 3 are not yet fully 
> supported by Apache Beam.
>   'Some syntactic constructs of Python 3 are not yet fully supported by '
> [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already 
> exists in database: 
> [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> libc++abi.dylib: terminating with uncaught exception of type 
> google::protobuf::FatalException: CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=328226=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328226
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 14/Oct/19 23:36
Start Date: 14/Oct/19 23:36
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on pull request #9790: 
[BEAM-7389] Show code snippet outputs as stdout
URL: https://github.com/apache/beam/pull/9790#discussion_r334700424
 
 

 ##
 File path: 
sdks/python/apache_beam/examples/snippets/transforms/elementwise/filter_test.py
 ##
 @@ -31,31 +31,26 @@
 
 
 def check_perennials(actual):
-  # [START perennials]
-  perennials = [
-  {'icon': '', 'name': 'Strawberry', 'duration': 'perennial'},
-  {'icon': '', 'name': 'Eggplant', 'duration': 'perennial'},
-  {'icon': '凜', 'name': 'Potato', 'duration': 'perennial'},
-  ]
-  # [END perennials]
-  assert_that(actual, equal_to(perennials))
+  expected = '''[START perennials]
+{'icon': '', 'name': 'Strawberry', 'duration': 'perennial'}
+{'icon': '', 'name': 'Eggplant', 'duration': 'perennial'}
+{'icon': '凜', 'name': 'Potato', 'duration': 'perennial'}
+[END perennials]'''.splitlines()[1:-1]
 
 Review comment:
   It is a little weird in the code, but it shows a lot nicer in the docs this 
way. That's the actual output people will get when running the code, rather 
than the lists.
   
   The [START] and [END] tags need to be as part of the string so we can only 
extract the desired output, otherwise we would be extracting other characters 
like the starting/ending string quotes. The [START]/[END] tags are filtered out 
using `.splitlines()[1:-1]`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328226)
Time Spent: 67.5h  (was: 67h 20m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 67.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=328225=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328225
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 14/Oct/19 23:34
Start Date: 14/Oct/19 23:34
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on pull request #9790: 
[BEAM-7389] Show code snippet outputs as stdout
URL: https://github.com/apache/beam/pull/9790#discussion_r334700908
 
 

 ##
 File path: 
sdks/python/apache_beam/examples/snippets/transforms/elementwise/filter_test.py
 ##
 @@ -31,31 +31,26 @@
 
 
 def check_perennials(actual):
-  # [START perennials]
-  perennials = [
-  {'icon': '', 'name': 'Strawberry', 'duration': 'perennial'},
-  {'icon': '', 'name': 'Eggplant', 'duration': 'perennial'},
-  {'icon': '凜', 'name': 'Potato', 'duration': 'perennial'},
-  ]
-  # [END perennials]
-  assert_that(actual, equal_to(perennials))
+  expected = '''[START perennials]
+{'icon': '', 'name': 'Strawberry', 'duration': 'perennial'}
+{'icon': '', 'name': 'Eggplant', 'duration': 'perennial'}
+{'icon': '凜', 'name': 'Potato', 'duration': 'perennial'}
+[END perennials]'''.splitlines()[1:-1]
+  assert_that(actual, equal_to(expected))
 
 
 def check_valid_plants(actual):
-  # [START valid_plants]
-  valid_plants = [
-  {'icon': '', 'name': 'Strawberry', 'duration': 'perennial'},
-  {'icon': '凌', 'name': 'Carrot', 'duration': 'biennial'},
-  {'icon': '', 'name': 'Eggplant', 'duration': 'perennial'},
-  {'icon': '', 'name': 'Tomato', 'duration': 'annual'},
-  ]
-  # [END valid_plants]
-  assert_that(actual, equal_to(valid_plants))
+  expected = '''[START valid_plants]
+{'icon': '', 'name': 'Strawberry', 'duration': 'perennial'}
+{'icon': '凌', 'name': 'Carrot', 'duration': 'biennial'}
+{'icon': '', 'name': 'Eggplant', 'duration': 'perennial'}
+{'icon': '', 'name': 'Tomato', 'duration': 'annual'}
+[END valid_plants]'''.splitlines()[1:-1]
+  assert_that(actual, equal_to(expected))
 
 
 @mock.patch('apache_beam.Pipeline', TestPipeline)
-# pylint: disable=line-too-long
-@mock.patch('apache_beam.examples.snippets.transforms.elementwise.filter.print',
 lambda elem: elem)
+@mock.patch('apache_beam.examples.snippets.transforms.elementwise.filter.print',
 str)
 
 Review comment:
   The mocked function signature can get pretty long, and it's easier to have 
in a single line rather than having to split it into multiple lines. Since it's 
a fairly repetitive line that is not adding too much to the logic, I would 
prefer them to be in a single line. I added an exception for `@mock.patch` in 
the `.pylintrc` so we don't have to explicitly state the `pylint: disable`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328225)
Time Spent: 67h 20m  (was: 67h 10m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 67h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=328223=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328223
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 14/Oct/19 23:32
Start Date: 14/Oct/19 23:32
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on pull request #9790: 
[BEAM-7389] Show code snippet outputs as stdout
URL: https://github.com/apache/beam/pull/9790#discussion_r334700424
 
 

 ##
 File path: 
sdks/python/apache_beam/examples/snippets/transforms/elementwise/filter_test.py
 ##
 @@ -31,31 +31,26 @@
 
 
 def check_perennials(actual):
-  # [START perennials]
-  perennials = [
-  {'icon': '', 'name': 'Strawberry', 'duration': 'perennial'},
-  {'icon': '', 'name': 'Eggplant', 'duration': 'perennial'},
-  {'icon': '凜', 'name': 'Potato', 'duration': 'perennial'},
-  ]
-  # [END perennials]
-  assert_that(actual, equal_to(perennials))
+  expected = '''[START perennials]
+{'icon': '', 'name': 'Strawberry', 'duration': 'perennial'}
+{'icon': '', 'name': 'Eggplant', 'duration': 'perennial'}
+{'icon': '凜', 'name': 'Potato', 'duration': 'perennial'}
+[END perennials]'''.splitlines()[1:-1]
 
 Review comment:
   It is a little weird in the code, but it shows a lot nicer in the docs this 
way. That's the actual output people will get when running the code, rather 
than the lists.
   
   The [START] and [END] tags need to be as part of the string so we can only 
extract the desired output, but they are being filtered out with the 
`.splitlines()[1:-1]`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328223)
Time Spent: 67h 10m  (was: 67h)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 67h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=328216=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328216
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 14/Oct/19 23:23
Start Date: 14/Oct/19 23:23
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9790: [BEAM-7389] 
Show code snippet outputs as stdout
URL: https://github.com/apache/beam/pull/9790#discussion_r334698571
 
 

 ##
 File path: 
sdks/python/apache_beam/examples/snippets/transforms/elementwise/filter_test.py
 ##
 @@ -31,31 +31,26 @@
 
 
 def check_perennials(actual):
-  # [START perennials]
-  perennials = [
-  {'icon': '', 'name': 'Strawberry', 'duration': 'perennial'},
-  {'icon': '', 'name': 'Eggplant', 'duration': 'perennial'},
-  {'icon': '凜', 'name': 'Potato', 'duration': 'perennial'},
-  ]
-  # [END perennials]
-  assert_that(actual, equal_to(perennials))
+  expected = '''[START perennials]
+{'icon': '', 'name': 'Strawberry', 'duration': 'perennial'}
+{'icon': '', 'name': 'Eggplant', 'duration': 'perennial'}
+{'icon': '凜', 'name': 'Potato', 'duration': 'perennial'}
+[END perennials]'''.splitlines()[1:-1]
 
 Review comment:
   I am confused why [START] and [END] tags needs to be part of this expected 
value? The list before the change looks simpler when reading the code.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328216)
Time Spent: 67h  (was: 66h 50m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 67h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=328215=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328215
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 14/Oct/19 23:23
Start Date: 14/Oct/19 23:23
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9790: [BEAM-7389] 
Show code snippet outputs as stdout
URL: https://github.com/apache/beam/pull/9790#discussion_r334698623
 
 

 ##
 File path: 
sdks/python/apache_beam/examples/snippets/transforms/elementwise/filter_test.py
 ##
 @@ -31,31 +31,26 @@
 
 
 def check_perennials(actual):
-  # [START perennials]
-  perennials = [
-  {'icon': '', 'name': 'Strawberry', 'duration': 'perennial'},
-  {'icon': '', 'name': 'Eggplant', 'duration': 'perennial'},
-  {'icon': '凜', 'name': 'Potato', 'duration': 'perennial'},
-  ]
-  # [END perennials]
-  assert_that(actual, equal_to(perennials))
+  expected = '''[START perennials]
+{'icon': '', 'name': 'Strawberry', 'duration': 'perennial'}
+{'icon': '', 'name': 'Eggplant', 'duration': 'perennial'}
+{'icon': '凜', 'name': 'Potato', 'duration': 'perennial'}
+[END perennials]'''.splitlines()[1:-1]
+  assert_that(actual, equal_to(expected))
 
 
 def check_valid_plants(actual):
-  # [START valid_plants]
-  valid_plants = [
-  {'icon': '', 'name': 'Strawberry', 'duration': 'perennial'},
-  {'icon': '凌', 'name': 'Carrot', 'duration': 'biennial'},
-  {'icon': '', 'name': 'Eggplant', 'duration': 'perennial'},
-  {'icon': '', 'name': 'Tomato', 'duration': 'annual'},
-  ]
-  # [END valid_plants]
-  assert_that(actual, equal_to(valid_plants))
+  expected = '''[START valid_plants]
+{'icon': '', 'name': 'Strawberry', 'duration': 'perennial'}
+{'icon': '凌', 'name': 'Carrot', 'duration': 'biennial'}
+{'icon': '', 'name': 'Eggplant', 'duration': 'perennial'}
+{'icon': '', 'name': 'Tomato', 'duration': 'annual'}
+[END valid_plants]'''.splitlines()[1:-1]
+  assert_that(actual, equal_to(expected))
 
 
 @mock.patch('apache_beam.Pipeline', TestPipeline)
-# pylint: disable=line-too-long
-@mock.patch('apache_beam.examples.snippets.transforms.elementwise.filter.print',
 lambda elem: elem)
+@mock.patch('apache_beam.examples.snippets.transforms.elementwise.filter.print',
 str)
 
 Review comment:
   Do you need mock.patch() to be in a single line?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328215)
Time Spent: 66h 50m  (was: 66h 40m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 66h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8400) Side inputs not working in CombineGlobally

2019-10-14 Thread David Cavazos (Jira)
David Cavazos created BEAM-8400:
---

 Summary: Side inputs not working in CombineGlobally
 Key: BEAM-8400
 URL: https://issues.apache.org/jira/browse/BEAM-8400
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Affects Versions: 2.16.0
Reporter: David Cavazos


Side inputs are not working in CombineGlobally. They do work as expected in 
CombinePerKey and CombineValues.

 

The function argument still has the value of `AsSingleton` rather than being 
resolved into the real value.

 

Here is a [Notebook where you can reproduce the 
issue|https://colab.research.google.com/drive/149By0ZKJjb_JdDOsFywdT_OLj1hMEPBa].

https://colab.research.google.com/drive/149By0ZKJjb_JdDOsFywdT_OLj1hMEPBa



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=328202=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328202
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 14/Oct/19 22:48
Start Date: 14/Oct/19 22:48
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on issue #9790: [BEAM-7389] Show 
code snippet outputs as stdout
URL: https://github.com/apache/beam/pull/9790#issuecomment-541963852
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328202)
Time Spent: 66.5h  (was: 66h 20m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 66.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=328203=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328203
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 14/Oct/19 22:48
Start Date: 14/Oct/19 22:48
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on issue #9790: [BEAM-7389] Show 
code snippet outputs as stdout
URL: https://github.com/apache/beam/pull/9790#issuecomment-541963852
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328203)
Time Spent: 66h 40m  (was: 66.5h)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 66h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8398) Upgrade Dataflow Java Client API

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8398?focusedWorklogId=328183=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328183
 ]

ASF GitHub Bot logged work on BEAM-8398:


Author: ASF GitHub Bot
Created on: 14/Oct/19 22:14
Start Date: 14/Oct/19 22:14
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #9792: [BEAM-8398] Upgrade 
Dataflow Java Client API
URL: https://github.com/apache/beam/pull/9792#issuecomment-541953652
 
 
   Pointing google-cloud-bigquery at 1.97.0 picks up the google-cloud-clients 
[0.115.0-alpha 
bom](https://repo.maven.apache.org/maven2/com/google/cloud/google-cloud-clients/0.115.0-alpha/google-cloud-clients-0.115.0-alpha.pom).
 The bom defines a bunch of dependency versions of compatible libraries. Also, 
the versions of libraries that you selected may not be compatible with the 
related google support libraries like gax, auth, 
   
   Please provide the transitive dependency diff including version overrides 
where we squash a resolved version for an older/newer version.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328183)
Time Spent: 0.5h  (was: 20m)

> Upgrade Dataflow Java Client API
> 
>
> Key: BEAM-8398
> URL: https://issues.apache.org/jira/browse/BEAM-8398
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8350) Upgrade to pylint 2.4

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8350?focusedWorklogId=328179=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328179
 ]

ASF GitHub Bot logged work on BEAM-8350:


Author: ASF GitHub Bot
Created on: 14/Oct/19 22:10
Start Date: 14/Oct/19 22:10
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9725: [BEAM-8350] Upgrade 
to Pylint 2.4
URL: https://github.com/apache/beam/pull/9725#issuecomment-541951747
 
 
   This is ready for another round of review.  All tests are passing.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328179)
Time Spent: 5h 50m  (was: 5h 40m)

> Upgrade to pylint 2.4
> -
>
> Key: BEAM-8350
> URL: https://issues.apache.org/jira/browse/BEAM-8350
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> pylint 2.4 provides a number of new features and fixes, but the most 
> important/pressing one for me is that 2.4 adds support for understanding 
> python type annotations, which fixes a bunch of spurious unused import errors 
> in the PR I'm working on for BEAM-7746.
> As of 2.0, pylint dropped support for running tests in python2, so to make 
> the upgrade we have to move our lint jobs to python3.  Doing so will put 
> pylint into "python3-mode" and there is not an option to run in 
> python2-compatible mode.  That said, the beam code is intended to be python3 
> compatible, so in practice, performing a python3 lint on the Beam code-base 
> is perfectly safe.  The primary risk of doing this is that someone introduces 
> a python-3 only change that breaks python2, but these would largely be syntax 
> errors that would be immediately caught by the unit and integration tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328176=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328176
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 14/Oct/19 22:04
Start Date: 14/Oct/19 22:04
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9764: [BEAM-8365] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#discussion_r334680521
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamIOSourceRel.java
 ##
 @@ -109,10 +137,11 @@ public RelOptCost computeSelfCost(RelOptPlanner planner, 
RelMetadataQuery mq) {
   @Override
   public BeamCostModel beamComputeSelfCost(RelOptPlanner planner, 
RelMetadataQuery mq) {
 NodeStats estimates = BeamSqlRelUtils.getNodeStats(this, mq);
-return BeamCostModel.FACTORY.makeCost(estimates.getRowCount(), 
estimates.getRate());
+return BeamCostModel.FACTORY.makeCost(
+estimates.getRowCount() * getRowType().getFieldCount(), 
estimates.getRate());
   }
 
 Review comment:
   Is this the most optimal way to ensure that the table with projects 
pushed-down is favored?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328176)
Time Spent: 4h 10m  (was: 4h)

> Add project push-down capability to IO APIs
> ---
>
> Key: BEAM-8365
> URL: https://issues.apache.org/jira/browse/BEAM-8365
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> * InMemoryTable should implement a following method:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code}
> Which should return a `PCollection` with fields specified in `fieldNames` 
> list.
>  * Create a rule to push fields used by a Calc (in projects and in a 
> condition) down into TestTable IO.
>  * Updating that same Calc  (from previous step) to have a proper input and 
> output schemes, remove unused fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2019-10-14 Thread Chamikara Madhusanka Jayalath (Jira)
Chamikara Madhusanka Jayalath created BEAM-8399:
---

 Summary: Python HDFS implementation should support filenames of 
the format "hdfs://namenodehost/parent/child"
 Key: BEAM-8399
 URL: https://issues.apache.org/jira/browse/BEAM-8399
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Chamikara Madhusanka Jayalath


"hdfs://namenodehost/parent/child" and "/parent/child" seems to be the correct 
filename formats for HDFS based on [1] but we currently support format 
"hdfs://parent/child".

To not break existing users, we have to either (1) somehow support both 
versions by default (based on [2] seems like HDFS does not allow colons in file 
path so this might be possible) (2) make  "hdfs://namenodehost/parent/child" 
optional for now and change it to default after few versions.

We should also make sure that Beam Java and Python HDFS file-system 
implementations are consistent in this regard.

 

[1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]

[2] https://issues.apache.org/jira/browse/HDFS-13

 

cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8398) Upgrade Dataflow Java Client API

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8398?focusedWorklogId=328171=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328171
 ]

ASF GitHub Bot logged work on BEAM-8398:


Author: ASF GitHub Bot
Created on: 14/Oct/19 21:54
Start Date: 14/Oct/19 21:54
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9792: [BEAM-8398] Upgrade 
Dataflow Java Client API
URL: https://github.com/apache/beam/pull/9792#issuecomment-541944256
 
 
   Judging by 
[here](https://repo.maven.apache.org/maven2/com/google/cloud/google-cloud-bigquery/),
 it looks like these bigquery version are way off from $google_clients_version, 
so I'm guessing they're semantically different?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328171)
Time Spent: 20m  (was: 10m)

> Upgrade Dataflow Java Client API
> 
>
> Key: BEAM-8398
> URL: https://issues.apache.org/jira/browse/BEAM-8398
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-4046) Decouple gradle project names and maven artifact ids

2019-10-14 Thread Luke Cwik (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Cwik resolved BEAM-4046.
-
Fix Version/s: Not applicable
   Resolution: Fixed

> Decouple gradle project names and maven artifact ids
> 
>
> Key: BEAM-4046
> URL: https://issues.apache.org/jira/browse/BEAM-4046
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Kenneth Knowles
>Assignee: Michael Luckey
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 42h 40m
>  Remaining Estimate: 0h
>
> In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. 
> It is clumsy and requires a hacky settings.gradle that is not idiomatic.
> In our second draft, we changed them to names that work well with Gradle, 
> like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky.
> In our third draft, we regressed to the first draft to get the Maven artifact 
> ids right.
> These should be able to be decoupled. It seems there are many StackOverflow 
> questions on the subject.
> Since it is unidiomatic and a poor user experience, if it does turn out to be 
> mandatory then it needs to be documented inline everywhere - the 
> settings.gradle should say why it is so bizarre, and each build.gradle should 
> indicate what its project id is.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-4046) Decouple gradle project names and maven artifact ids

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4046?focusedWorklogId=328168=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328168
 ]

ASF GitHub Bot logged work on BEAM-4046:


Author: ASF GitHub Bot
Created on: 14/Oct/19 21:44
Start Date: 14/Oct/19 21:44
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #8915: [BEAM-4046] 
Remove old project name mappings.
URL: https://github.com/apache/beam/pull/8915
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328168)
Time Spent: 42h 40m  (was: 42.5h)

> Decouple gradle project names and maven artifact ids
> 
>
> Key: BEAM-4046
> URL: https://issues.apache.org/jira/browse/BEAM-4046
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Kenneth Knowles
>Assignee: Michael Luckey
>Priority: Major
>  Time Spent: 42h 40m
>  Remaining Estimate: 0h
>
> In our first draft, we had gradle projects like {{":beam-sdks-java-core"}}. 
> It is clumsy and requires a hacky settings.gradle that is not idiomatic.
> In our second draft, we changed them to names that work well with Gradle, 
> like {{":sdks:java:core"}}. This caused Maven artifact IDs to be wonky.
> In our third draft, we regressed to the first draft to get the Maven artifact 
> ids right.
> These should be able to be decoupled. It seems there are many StackOverflow 
> questions on the subject.
> Since it is unidiomatic and a poor user experience, if it does turn out to be 
> mandatory then it needs to be documented inline everywhere - the 
> settings.gradle should say why it is so bizarre, and each build.gradle should 
> indicate what its project id is.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-1438) The default behavior for the Write transform doesn't work well with the Dataflow streaming runner

2019-10-14 Thread Robert Bradshaw (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951374#comment-16951374
 ] 

Robert Bradshaw commented on BEAM-1438:
---

Does this mean that the error at 
https://github.com/apache/beam/blob/release-2.16.0/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java#L315
 can be removed? 

> The default behavior for the Write transform doesn't work well with the 
> Dataflow streaming runner
> -
>
> Key: BEAM-1438
> URL: https://issues.apache.org/jira/browse/BEAM-1438
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Reuven Lax
>Assignee: Reuven Lax
>Priority: Major
> Fix For: 2.5.0
>
>
> If a Write specifies 0 output shards, that implies the runner should pick an 
> appropriate sharding. The default behavior is to write one shard per input 
> bundle. This works well with the Dataflow batch runner, but not with the 
> streaming runner which produces large numbers of small bundles.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328162=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328162
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 14/Oct/19 21:24
Start Date: 14/Oct/19 21:24
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #9764: [BEAM-8365] [WIP] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#issuecomment-541930227
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328162)
Time Spent: 4h  (was: 3h 50m)

> Add project push-down capability to IO APIs
> ---
>
> Key: BEAM-8365
> URL: https://issues.apache.org/jira/browse/BEAM-8365
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> * InMemoryTable should implement a following method:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code}
> Which should return a `PCollection` with fields specified in `fieldNames` 
> list.
>  * Create a rule to push fields used by a Calc (in projects and in a 
> condition) down into TestTable IO.
>  * Updating that same Calc  (from previous step) to have a proper input and 
> output schemes, remove unused fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328161=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328161
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 14/Oct/19 21:23
Start Date: 14/Oct/19 21:23
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #9764: [BEAM-8365] [WIP] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#issuecomment-541930227
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328161)
Time Spent: 3h 50m  (was: 3h 40m)

> Add project push-down capability to IO APIs
> ---
>
> Key: BEAM-8365
> URL: https://issues.apache.org/jira/browse/BEAM-8365
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> * InMemoryTable should implement a following method:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code}
> Which should return a `PCollection` with fields specified in `fieldNames` 
> list.
>  * Create a rule to push fields used by a Calc (in projects and in a 
> condition) down into TestTable IO.
>  * Updating that same Calc  (from previous step) to have a proper input and 
> output schemes, remove unused fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8398) Upgrade Dataflow Java Client API

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8398?focusedWorklogId=328150=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328150
 ]

ASF GitHub Bot logged work on BEAM-8398:


Author: ASF GitHub Bot
Created on: 14/Oct/19 21:12
Start Date: 14/Oct/19 21:12
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9792: [BEAM-8398] 
Upgrade Dataflow Java Client API
URL: https://github.com/apache/beam/pull/9792
 
 
   We need to pick up a newer version of the Dataflow API to add support for 
the new `worker_region` and `worker_zone` flags 
([BEAM-8251](https://issues.apache.org/jira/browse/BEAM-8251)). 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 

[jira] [Commented] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam

2019-10-14 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951366#comment-16951366
 ] 

Wes McKinney commented on BEAM-8368:


Thanks. 

More generally, it would be helpful to know if you have guidance about best 
practices for Python wheels that need to depend on the C++ protocol buffers. 
Does bundling {{libprotobuf.so}} help (something we could do, though it does 
create risk if there are ABI changes in libprotobuf and two wheels are built 
against different ABIs)? We could also work together to make a "cpp_protobuf" 
wheel that bundles a common shared library that different wheels can import to 
make sure that everyone's pinning to the same protobuf version

Really, wheels are a bad platform for shipping complex C++ projects, as has 
been discussed in many other places. 

> [Python] libprotobuf-generated exception when importing apache_beam
> ---
>
> Key: BEAM-8368
> URL: https://issues.apache.org/jira/browse/BEAM-8368
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.15.0, 2.17.0
>Reporter: Ubaier Bhat
>Assignee: Ahmet Altay
>Priority: Blocker
> Fix For: 2.17.0
>
> Attachments: error_log.txt
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Unable to import apache_beam after upgrading to macos 10.15 (Catalina). 
> Cleared all the pipenvs and but can't get it working again.
> {code}
> import apache_beam as beam
> /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84:
>  UserWarning: Some syntactic constructs of Python 3 are not yet fully 
> supported by Apache Beam.
>   'Some syntactic constructs of Python 3 are not yet fully supported by '
> [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already 
> exists in database: 
> [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> libc++abi.dylib: terminating with uncaught exception of type 
> google::protobuf::FatalException: CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8367) Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS

2019-10-14 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada updated BEAM-8367:

Status: Open  (was: Triage Needed)

> Python BigQuery sink should use unique IDs for mode STREAMING_INSERTS
> -
>
> Key: BEAM-8367
> URL: https://issues.apache.org/jira/browse/BEAM-8367
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Pablo Estrada
>Priority: Major
>
> Unique IDs ensure (best effort) that writes to BigQuery are idempotent, for 
> example, we don't write the same record twice in a VM failure.
>  
> Currently Python BQ sink insert BQ IDs here but they'll be re-generated in a 
> VM failure resulting in data duplication.
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L766]
>  
> Correct fix is to do a Reshuffle to checkpoint unique IDs once they are 
> generated, similar to how Java BQ sink operates.
> [https://github.com/apache/beam/blob/dcf6ad301069e4d2cfaec5db6b178acb7bb67f49/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteTables.java#L225]
>  
> Pablo, can you do an initial assessment here ?
> I think this is a relatively small fix but I might be wrong.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8398) Upgrade Dataflow Java Client API

2019-10-14 Thread Kyle Weaver (Jira)
Kyle Weaver created BEAM-8398:
-

 Summary: Upgrade Dataflow Java Client API
 Key: BEAM-8398
 URL: https://issues.apache.org/jira/browse/BEAM-8398
 Project: Beam
  Issue Type: Task
  Components: sdk-java-core
Reporter: Kyle Weaver
Assignee: Kyle Weaver






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=328142=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328142
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 14/Oct/19 20:57
Start Date: 14/Oct/19 20:57
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on pull request #9791: 
[BEAM-7389] Add code snippet for CoGroupByKey
URL: https://github.com/apache/beam/pull/9791
 
 
   Adding the code snippet for `CoGroupByKey`
   
   R: @aaltay 
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8348) Portable Python job name hard-coded to "job"

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8348?focusedWorklogId=328138=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328138
 ]

ASF GitHub Bot logged work on BEAM-8348:


Author: ASF GitHub Bot
Created on: 14/Oct/19 20:54
Start Date: 14/Oct/19 20:54
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9789: [BEAM-8348] set 
job_name in portable_runner.py
URL: https://github.com/apache/beam/pull/9789#discussion_r334658452
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/portable_runner.py
 ##
 @@ -296,7 +297,8 @@ def add_runner_options(parser):
 
 prepare_response = job_service.Prepare(
 beam_job_api_pb2.PrepareJobRequest(
-job_name='job', pipeline=proto_pipeline,
+job_name=options.view_as(GoogleCloudOptions).job_name or 'job',
 
 Review comment:
   That was initially my preferred approach, but I don't see a way to do it 
without breaking all the existing Dataflow pipelines that set a job name. See 
#9724
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328138)
Time Spent: 1.5h  (was: 1h 20m)

> Portable Python job name hard-coded to "job"
> 
>
> Key: BEAM-8348
> URL: https://issues.apache.org/jira/browse/BEAM-8348
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> See [1]. `job_name` is already taken by Google Cloud options [2], so I guess 
> we should create a new option (maybe `portable_job_name` to avoid disruption).
> [[1] 
> https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294|https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294]
> [2] 
> [https://github.com/apache/beam/blob/c5bbb51014f7506a2651d6070f27fb3c3dc0da8f/sdks/python/apache_beam/options/pipeline_options.py#L438]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8348) Portable Python job name hard-coded to "job"

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8348?focusedWorklogId=328135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328135
 ]

ASF GitHub Bot logged work on BEAM-8348:


Author: ASF GitHub Bot
Created on: 14/Oct/19 20:52
Start Date: 14/Oct/19 20:52
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #9789: [BEAM-8348] set 
job_name in portable_runner.py
URL: https://github.com/apache/beam/pull/9789#discussion_r334657707
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/portable_runner.py
 ##
 @@ -296,7 +297,8 @@ def add_runner_options(parser):
 
 prepare_response = job_service.Prepare(
 beam_job_api_pb2.PrepareJobRequest(
-job_name='job', pipeline=proto_pipeline,
+job_name=options.view_as(GoogleCloudOptions).job_name or 'job',
 
 Review comment:
   GoogleCloudOptions doesn't seem appropriate for this. Should we relocate the 
option?
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328135)
Time Spent: 1h 20m  (was: 1h 10m)

> Portable Python job name hard-coded to "job"
> 
>
> Key: BEAM-8348
> URL: https://issues.apache.org/jira/browse/BEAM-8348
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> See [1]. `job_name` is already taken by Google Cloud options [2], so I guess 
> we should create a new option (maybe `portable_job_name` to avoid disruption).
> [[1] 
> https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294|https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294]
> [2] 
> [https://github.com/apache/beam/blob/c5bbb51014f7506a2651d6070f27fb3c3dc0da8f/sdks/python/apache_beam/options/pipeline_options.py#L438]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8372) Allow submission of Flink UberJar directly to flink cluster.

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8372?focusedWorklogId=328131=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328131
 ]

ASF GitHub Bot logged work on BEAM-8372:


Author: ASF GitHub Bot
Created on: 14/Oct/19 20:47
Start Date: 14/Oct/19 20:47
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9775: [BEAM-8372] Job 
server submitting UberJars directly to Flink Runner.
URL: https://github.com/apache/beam/pull/9775#discussion_r334652926
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/artifact_service.py
 ##
 @@ -23,51 +23,61 @@
 from __future__ import print_function
 
 import hashlib
-import random
-import re
+import threading
+import zipfile
+
+from google.protobuf import json_format
 
 from apache_beam.io import filesystems
 from apache_beam.portability.api import beam_artifact_api_pb2
 from apache_beam.portability.api import beam_artifact_api_pb2_grpc
 
 
-class BeamFilesystemArtifactService(
+class AbstractArtifactService(
 beam_artifact_api_pb2_grpc.ArtifactStagingServiceServicer,
 beam_artifact_api_pb2_grpc.ArtifactRetrievalServiceServicer):
 
   _DEFAULT_CHUNK_SIZE = 2 << 20  # 2mb
 
-  def __init__(self, root, chunk_size=_DEFAULT_CHUNK_SIZE):
+  def __init__(self, root, chunk_size=None):
 
 Review comment:
   Missed that, makes sense.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328131)
Time Spent: 2h  (was: 1h 50m)

> Allow submission of Flink UberJar directly to flink cluster.
> 
>
> Key: BEAM-8372
> URL: https://issues.apache.org/jira/browse/BEAM-8372
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=328120=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328120
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 14/Oct/19 20:28
Start Date: 14/Oct/19 20:28
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on pull request #9790: 
[BEAM-7389] Show code snippet outputs as stdout
URL: https://github.com/apache/beam/pull/9790
 
 
   It was a little confusing to see the outputs of the code as lists rather 
than what actually outputs when running the code.
   
   Changing the outputs of the code snippets to what we actually see when 
running the code, this makes it consistent with the outputs of the notebooks as 
well.
   
   It's mostly adding some logic to "parse" the stdout outputs, but the logic 
of the code samples stays the same.
   
   I did some style changes to the `Partition` samples to make them easier to 
read as well.
   
   R: @aaltay 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8348) Portable Python job name hard-coded to "job"

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8348?focusedWorklogId=328117=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328117
 ]

ASF GitHub Bot logged work on BEAM-8348:


Author: ASF GitHub Bot
Created on: 14/Oct/19 20:26
Start Date: 14/Oct/19 20:26
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9789: [BEAM-8348] set 
job_name in portable_runner.py
URL: https://github.com/apache/beam/pull/9789
 
 
   It's awkward that `job_name` belongs to `GoogleCloudOptions`, but moving it 
to `StandardOptions` would be a breaking change (see #9724)
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 

[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328106=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328106
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 14/Oct/19 20:15
Start Date: 14/Oct/19 20:15
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9764: [BEAM-8365] 
[WIP] Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#discussion_r334645199
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProvider.java
 ##
 @@ -150,12 +165,40 @@ public BeamTableStatistics 
getTableStatistics(PipelineOptions options) {
   return begin.apply(Create.of(tableWithRows.rows).withCoder(rowCoder()));
 }
 
+@Override
+public PCollection buildIOReader(
+PBegin begin, BeamSqlTableFilter filters, List fieldNames) {
+  PCollection withAllFields = buildIOReader(begin);
+  if (options == PushDownOptions.NONE
+  || (fieldNames.isEmpty() && filters instanceof DefaultTableFilter)) {
+return withAllFields;
+  }
 
 Review comment:
   Moved the default case to `Transform#expand`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328106)
Time Spent: 3h 40m  (was: 3.5h)

> Add project push-down capability to IO APIs
> ---
>
> Key: BEAM-8365
> URL: https://issues.apache.org/jira/browse/BEAM-8365
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> * InMemoryTable should implement a following method:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code}
> Which should return a `PCollection` with fields specified in `fieldNames` 
> list.
>  * Create a rule to push fields used by a Calc (in projects and in a 
> condition) down into TestTable IO.
>  * Updating that same Calc  (from previous step) to have a proper input and 
> output schemes, remove unused fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8372) Allow submission of Flink UberJar directly to flink cluster.

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8372?focusedWorklogId=328103=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328103
 ]

ASF GitHub Bot logged work on BEAM-8372:


Author: ASF GitHub Bot
Created on: 14/Oct/19 20:14
Start Date: 14/Oct/19 20:14
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9775: [BEAM-8372] 
Job server submitting UberJars directly to Flink Runner.
URL: https://github.com/apache/beam/pull/9775#discussion_r334623991
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server.py
 ##
 @@ -0,0 +1,233 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""A job server submitting portable pipelines as uber jars to Flink."""
+
+from __future__ import absolute_import
+from __future__ import print_function
+
+import logging
+import os
+import shutil
+import tempfile
+import time
+import zipfile
+
+from google.protobuf import json_format
+import grpc
+import requests
+from concurrent import futures
+
+from apache_beam.portability.api import beam_artifact_api_pb2_grpc
+from apache_beam.portability.api import beam_job_api_pb2
+from apache_beam.portability.api import endpoints_pb2
+from apache_beam.runners.portability import abstract_job_service
+from apache_beam.runners.portability import artifact_service
+from apache_beam.runners.portability import job_server
+
+
+class FlinkUberJarJobServer(abstract_job_service.AbstractJobServiceServicer):
+
+  def __init__(self, master_url, executable_jar=None):
+super(FlinkUberJarJobServer, self).__init__()
+self._master_url = master_url
+self._executable_jar = executable_jar
+self._temp_dir = tempfile.mkdtemp(prefix='apache-beam-flink')
+
+  def start(self):
+return self
+
+  def stop(self):
+pass
+
+  def executable_jar(self):
+return self._executable_jar or job_server.JavaJarJobServer.local_jar(
+job_server.JavaJarJobServer.path_to_beam_jar(
+'runners:flink:%s:job-server:shadowJar' % self.flink_version()))
+
+  def flink_version(self):
+full_version = requests.get(
+'%s/v1/config' % self._master_url).json()['flink-version']
+# Only return up to minor version.
+return '.'.join(full_version.split('.')[:2])
+
+  def create_beam_job(self, job_id, job_name, pipeline, options):
+return FlinkBeamJob(
+self._master_url,
+self.executable_jar(),
+job_id,
+job_name,
+pipeline,
+options)
+
+
+class FlinkBeamJob(abstract_job_service.AbstractBeamJob):
+
+  # These must agree with those defined in PortablePipelineJarUtils.java.
+  PIPELINE_FOLDER_PATH = "BEAM-PIPELINE"
+  PIPELINE_PATH = PIPELINE_FOLDER_PATH + "/pipeline.json"
+  PIPELINE_OPTIONS_PATH = PIPELINE_FOLDER_PATH + "/pipeline-options.json"
+  ARTIFACT_STAGING_FOLDER_PATH = "BEAM-ARTIFACT-STAGING"
+  ARTIFACT_MANIFEST_PATH = (
+  ARTIFACT_STAGING_FOLDER_PATH + "/artifact-manifest.json")
+
+  def __init__(
+  self, master_url, executable_jar, job_id, job_name, pipeline, options):
+super(FlinkBeamJob, self).__init__(job_id, job_name, pipeline, options)
+self._master_url = master_url
+self._executable_jar = executable_jar
+self._jar_uploaded = False
+
+  def prepare(self):
+# Copy the executable jar, injecting the pipeline and options as resources.
+with tempfile.NamedTemporaryFile(suffix='.jar') as tout:
+  self._jar = tout.name
+shutil.copy(self._executable_jar, self._jar)
+with zipfile.ZipFile(self._jar, 'a', compression=zipfile.ZIP_DEFLATED) as 
z:
+  with z.open(self.PIPELINE_PATH, 'w') as fout:
+fout.write(json_format.MessageToJson(
+self._pipeline_proto).encode('utf-8'))
+  with z.open(self.PIPELINE_OPTIONS_PATH, 'w') as fout:
+fout.write(json_format.MessageToJson(
+self._pipeline_options).encode('utf-8'))
+self._start_artifact_service(self._jar)
+
+  def _start_artifact_service(self, jar):
+self._artifact_staging_service = artifact_service.ZipFileArtifactService(
+jar)
+self._artifact_staging_server = 

[jira] [Work logged] (BEAM-8372) Allow submission of Flink UberJar directly to flink cluster.

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8372?focusedWorklogId=328101=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328101
 ]

ASF GitHub Bot logged work on BEAM-8372:


Author: ASF GitHub Bot
Created on: 14/Oct/19 20:14
Start Date: 14/Oct/19 20:14
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9775: [BEAM-8372] 
Job server submitting UberJars directly to Flink Runner.
URL: https://github.com/apache/beam/pull/9775#discussion_r334632076
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/flink_runner.py
 ##
 @@ -29,7 +32,13 @@
 
 class FlinkRunner(portable_runner.PortableRunner):
   def default_job_server(self, options):
-return job_server.StopOnExitJobServer(FlinkJarJobServer(options))
+flink_master_url = options.view_as(FlinkRunnerOptions).flink_master_url
+flink_master_url = 'http://localhost:8081'
 
 Review comment:
   Oops. Fixed. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328101)
Time Spent: 1h 50m  (was: 1h 40m)

> Allow submission of Flink UberJar directly to flink cluster.
> 
>
> Key: BEAM-8372
> URL: https://issues.apache.org/jira/browse/BEAM-8372
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8372) Allow submission of Flink UberJar directly to flink cluster.

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8372?focusedWorklogId=328098=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328098
 ]

ASF GitHub Bot logged work on BEAM-8372:


Author: ASF GitHub Bot
Created on: 14/Oct/19 20:14
Start Date: 14/Oct/19 20:14
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9775: [BEAM-8372] 
Job server submitting UberJars directly to Flink Runner.
URL: https://github.com/apache/beam/pull/9775#discussion_r334624917
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/flink_runner.py
 ##
 @@ -29,7 +32,13 @@
 
 class FlinkRunner(portable_runner.PortableRunner):
   def default_job_server(self, options):
-return job_server.StopOnExitJobServer(FlinkJarJobServer(options))
+flink_master_url = options.view_as(FlinkRunnerOptions).flink_master_url
+flink_master_url = 'http://localhost:8081'
+if flink_master_url == '[local]' or sys.version_info < (3, 6):
+  # TOOD: Also default to LOOPBACK?
 
 Review comment:
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328098)
Time Spent: 1.5h  (was: 1h 20m)

> Allow submission of Flink UberJar directly to flink cluster.
> 
>
> Key: BEAM-8372
> URL: https://issues.apache.org/jira/browse/BEAM-8372
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8372) Allow submission of Flink UberJar directly to flink cluster.

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8372?focusedWorklogId=328102=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328102
 ]

ASF GitHub Bot logged work on BEAM-8372:


Author: ASF GitHub Bot
Created on: 14/Oct/19 20:14
Start Date: 14/Oct/19 20:14
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9775: [BEAM-8372] 
Job server submitting UberJars directly to Flink Runner.
URL: https://github.com/apache/beam/pull/9775#discussion_r334644554
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/portability/flink_uber_jar_job_server.py
 ##
 @@ -0,0 +1,233 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""A job server submitting portable pipelines as uber jars to Flink."""
+
+from __future__ import absolute_import
+from __future__ import print_function
+
+import logging
+import os
+import shutil
+import tempfile
+import time
+import zipfile
+
+from google.protobuf import json_format
+import grpc
+import requests
+from concurrent import futures
+
+from apache_beam.portability.api import beam_artifact_api_pb2_grpc
+from apache_beam.portability.api import beam_job_api_pb2
+from apache_beam.portability.api import endpoints_pb2
+from apache_beam.runners.portability import abstract_job_service
+from apache_beam.runners.portability import artifact_service
+from apache_beam.runners.portability import job_server
+
+
+class FlinkUberJarJobServer(abstract_job_service.AbstractJobServiceServicer):
+
+  def __init__(self, master_url, executable_jar=None):
+super(FlinkUberJarJobServer, self).__init__()
+self._master_url = master_url
+self._executable_jar = executable_jar
+self._temp_dir = tempfile.mkdtemp(prefix='apache-beam-flink')
+
+  def start(self):
+return self
+
+  def stop(self):
+pass
+
+  def executable_jar(self):
+return self._executable_jar or job_server.JavaJarJobServer.local_jar(
+job_server.JavaJarJobServer.path_to_beam_jar(
+'runners:flink:%s:job-server:shadowJar' % self.flink_version()))
+
+  def flink_version(self):
+full_version = requests.get(
+'%s/v1/config' % self._master_url).json()['flink-version']
+# Only return up to minor version.
+return '.'.join(full_version.split('.')[:2])
+
+  def create_beam_job(self, job_id, job_name, pipeline, options):
+return FlinkBeamJob(
+self._master_url,
+self.executable_jar(),
+job_id,
+job_name,
+pipeline,
+options)
+
+
+class FlinkBeamJob(abstract_job_service.AbstractBeamJob):
+
+  # These must agree with those defined in PortablePipelineJarUtils.java.
 
 Review comment:
   Thanks for the heads up.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328102)
Time Spent: 1h 50m  (was: 1h 40m)

> Allow submission of Flink UberJar directly to flink cluster.
> 
>
> Key: BEAM-8372
> URL: https://issues.apache.org/jira/browse/BEAM-8372
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8372) Allow submission of Flink UberJar directly to flink cluster.

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8372?focusedWorklogId=328100=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328100
 ]

ASF GitHub Bot logged work on BEAM-8372:


Author: ASF GitHub Bot
Created on: 14/Oct/19 20:14
Start Date: 14/Oct/19 20:14
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9775: [BEAM-8372] 
Job server submitting UberJars directly to Flink Runner.
URL: https://github.com/apache/beam/pull/9775#discussion_r334625678
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/abstract_job_service.py
 ##
 @@ -0,0 +1,139 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+from __future__ import absolute_import
+
+import logging
+import uuid
+from builtins import object
+
+from apache_beam.portability.api import beam_job_api_pb2
+from apache_beam.portability.api import beam_job_api_pb2_grpc
+
+TERMINAL_STATES = [
+beam_job_api_pb2.JobState.DONE,
+beam_job_api_pb2.JobState.STOPPED,
+beam_job_api_pb2.JobState.FAILED,
+beam_job_api_pb2.JobState.CANCELLED,
+]
+
+
+class AbstractJobServiceServicer(beam_job_api_pb2_grpc.JobServiceServicer):
+  """Manages one or more pipelines, possibly concurrently.
+  Experimental: No backward compatibility guaranteed.
+  Servicer for the Beam Job API.
+  """
+  def __init__(self):
+self._jobs = {}
+
+  def create_beam_job(self, preparation_id, job_name, pipeline, options):
+"""Returns an instance of AbstractBeamJob specific to this servicer."""
+raise NotImplementedError(type(self))
+
+  def Prepare(self, request, context=None, timeout=None):
+# For now, just use the job name as the job id.
 
 Review comment:
   Yeah. This comment is out of date. Removed.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328100)
Time Spent: 1h 40m  (was: 1.5h)

> Allow submission of Flink UberJar directly to flink cluster.
> 
>
> Key: BEAM-8372
> URL: https://issues.apache.org/jira/browse/BEAM-8372
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8372) Allow submission of Flink UberJar directly to flink cluster.

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8372?focusedWorklogId=328099=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328099
 ]

ASF GitHub Bot logged work on BEAM-8372:


Author: ASF GitHub Bot
Created on: 14/Oct/19 20:14
Start Date: 14/Oct/19 20:14
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9775: [BEAM-8372] 
Job server submitting UberJars directly to Flink Runner.
URL: https://github.com/apache/beam/pull/9775#discussion_r334625937
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/artifact_service.py
 ##
 @@ -23,51 +23,61 @@
 from __future__ import print_function
 
 import hashlib
-import random
-import re
+import threading
+import zipfile
+
+from google.protobuf import json_format
 
 from apache_beam.io import filesystems
 from apache_beam.portability.api import beam_artifact_api_pb2
 from apache_beam.portability.api import beam_artifact_api_pb2_grpc
 
 
-class BeamFilesystemArtifactService(
+class AbstractArtifactService(
 beam_artifact_api_pb2_grpc.ArtifactStagingServiceServicer,
 beam_artifact_api_pb2_grpc.ArtifactRetrievalServiceServicer):
 
   _DEFAULT_CHUNK_SIZE = 2 << 20  # 2mb
 
-  def __init__(self, root, chunk_size=_DEFAULT_CHUNK_SIZE):
+  def __init__(self, root, chunk_size=None):
 
 Review comment:
   One can now explicitly pass None to get the default (e.g. in a superclass 
constructor call). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328099)
Time Spent: 1h 40m  (was: 1.5h)

> Allow submission of Flink UberJar directly to flink cluster.
> 
>
> Key: BEAM-8372
> URL: https://issues.apache.org/jira/browse/BEAM-8372
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7073) AvroUtils converting generic record to Beam Row causes class cast exception

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7073?focusedWorklogId=328093=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328093
 ]

ASF GitHub Bot logged work on BEAM-7073:


Author: ASF GitHub Bot
Created on: 14/Oct/19 20:03
Start Date: 14/Oct/19 20:03
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #9773: [BEAM-7073]: 
Add unit test for Avro logical type datum.
URL: https://github.com/apache/beam/pull/9773
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328093)
Time Spent: 50m  (was: 40m)

> AvroUtils converting generic record to Beam Row causes class cast exception
> ---
>
> Key: BEAM-7073
> URL: https://issues.apache.org/jira/browse/BEAM-7073
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.11.0
> Environment: Direct Runner
>Reporter: Vishwas
>Assignee: Ryan Skraba
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Below is my pipeline:
> KafkaSource (KafkaIo.read) ---> Pardo ---> BeamSql--> 
> KafkaSink (KafkaIO.write)
> Kafka Source IO reads from Kafka topic avro records and deserializes it to 
> generic record using below
> KafkaIO.Read kafkaIoRead = KafkaIO. GenericRecord>read()
>     .withBootstrapServers(bootstrapServerUrl)
>     .withTopic(topicName)
>     .withKeyDeserializer(StringDeserializer.class)
>     .withValueDeserializerAndCoder(GenericAvroDeserializer.class,
>                                                  
> AvroCoder.of(GenericRecord.class, avroSchema))   
>     
> .updateConsumerProperties(ImmutableMap.of("schema.registry.url",
>   
>    schemaRegistryUrl));
> Avro schema of the topic has a logicaltype (timestamp-millis). This is 
> deserialized to
> joda-time.
>   {
>             "name": "timeOfRelease",
>         "type": [
>                 "null",
>                 {
>                     "type": "long",
>                     "logicalType": "timestamp-millis",
>                     "connect.version": 1,
>                     "connect.name": "org.apache.kafka.connect.data.Timestamp"
>                 }
>             ],
>             "default": null,
>        }
> Now in my Pardo transform, I am trying to use the AvroUtils class methods to 
> convert the generic record to Beam Row and getting below class cast exception
>  AvroUtils.toBeamRowStrict(genericRecord, this.beamSchema)
> Caused by: java.lang.ClassCastException: org.joda.time.DateTime cannot be 
> cast to java.lang.Long
>     at 
> org.apache.beam.sdk.schemas.utils.AvroUtils.convertAvroFieldStrict(AvroUtils.java:664)
>     at 
> org.apache.beam.sdk.schemas.utils.AvroUtils.toBeamRowStrict(AvroUtils.java:217)
>  
> This looks like a bug as joda time type created as part of deserialization is 
> being type casted to Long in below code.
>   else if (logicalType instanceof LogicalTypes.TimestampMillis) {
>           return convertDateTimeStrict((Long) value, fieldType);
>   }
> PS: I also used the avro-tools 1.8.2 jar to get the classes for the mentioned 
> avro schema and I see that the attribute with timestamp-millis logical type 
> is being converted to joda-time.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8387) Remove sdk-worker-parallelism option from JobServerDriver

2019-10-14 Thread Thomas Weise (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Weise resolved BEAM-8387.

Fix Version/s: 2.17.0
   Resolution: Fixed

> Remove sdk-worker-parallelism option from JobServerDriver
> -
>
> Key: BEAM-8387
> URL: https://issues.apache.org/jira/browse/BEAM-8387
> Project: Beam
>  Issue Type: Task
>  Components: runner-core
>Affects Versions: 2.16.0
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
> Fix For: 2.17.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The option was added when it wasn't possible to specify it as pipeline 
> option, which is no longer the case. The pipeline option has a value of 0, 
> which means that the runner should pick a suitable value. But this is then 
> overridden in FlinkJobInvoker with 1 (because that's the default in the  
> JobServerDriver config.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8394) Support for Multiple DataSources in JdbcIO

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8394?focusedWorklogId=328085=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328085
 ]

ASF GitHub Bot logged work on BEAM-8394:


Author: ASF GitHub Bot
Created on: 14/Oct/19 19:52
Start Date: 14/Oct/19 19:52
Worklog Time Spent: 10m 
  Work Description: rahul8383 commented on issue #9787: [BEAM-8394]: 
Support for Multiple DataSources in JdbcIO
URL: https://github.com/apache/beam/pull/9787#issuecomment-541885282
 
 
   Sorry, A small Misunderstanding from my end.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328085)
Time Spent: 50m  (was: 40m)

> Support for Multiple DataSources in JdbcIO
> --
>
> Key: BEAM-8394
> URL: https://issues.apache.org/jira/browse/BEAM-8394
> Project: Beam
>  Issue Type: Task
>  Components: io-java-jdbc
>Reporter: Rahul Patwari
>Assignee: Rahul Patwari
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When a Pipeline consists of multiple database systems, user should be able to 
> provide DataSourceConfiguration for each database system and 
> DataSourceProviderFromDataSourceConfiguration should return the respective 
> DataSources for each database when apply() method is called.
>  
> Also add withDataSourceConfiguration() method in JdbcIO.ReadRows class



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8387) Remove sdk-worker-parallelism option from JobServerDriver

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8387?focusedWorklogId=328083=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328083
 ]

ASF GitHub Bot logged work on BEAM-8387:


Author: ASF GitHub Bot
Created on: 14/Oct/19 19:52
Start Date: 14/Oct/19 19:52
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #9785: [BEAM-8387] 
Remove sdk-worker-parallelism option from JobServerDriver
URL: https://github.com/apache/beam/pull/9785
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328083)
Time Spent: 0.5h  (was: 20m)

> Remove sdk-worker-parallelism option from JobServerDriver
> -
>
> Key: BEAM-8387
> URL: https://issues.apache.org/jira/browse/BEAM-8387
> Project: Beam
>  Issue Type: Task
>  Components: runner-core
>Affects Versions: 2.16.0
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The option was added when it wasn't possible to specify it as pipeline 
> option, which is no longer the case. The pipeline option has a value of 0, 
> which means that the runner should pick a suitable value. But this is then 
> overridden in FlinkJobInvoker with 1 (because that's the default in the  
> JobServerDriver config.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8394) Support for Multiple DataSources in JdbcIO

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8394?focusedWorklogId=328082=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328082
 ]

ASF GitHub Bot logged work on BEAM-8394:


Author: ASF GitHub Bot
Created on: 14/Oct/19 19:51
Start Date: 14/Oct/19 19:51
Worklog Time Spent: 10m 
  Work Description: rahul8383 commented on pull request #9787: [BEAM-8394]: 
Support for Multiple DataSources in JdbcIO
URL: https://github.com/apache/beam/pull/9787
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328082)
Time Spent: 40m  (was: 0.5h)

> Support for Multiple DataSources in JdbcIO
> --
>
> Key: BEAM-8394
> URL: https://issues.apache.org/jira/browse/BEAM-8394
> Project: Beam
>  Issue Type: Task
>  Components: io-java-jdbc
>Reporter: Rahul Patwari
>Assignee: Rahul Patwari
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When a Pipeline consists of multiple database systems, user should be able to 
> provide DataSourceConfiguration for each database system and 
> DataSourceProviderFromDataSourceConfiguration should return the respective 
> DataSources for each database when apply() method is called.
>  
> Also add withDataSourceConfiguration() method in JdbcIO.ReadRows class



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8387) Remove sdk-worker-parallelism option from JobServerDriver

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8387?focusedWorklogId=328080=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328080
 ]

ASF GitHub Bot logged work on BEAM-8387:


Author: ASF GitHub Bot
Created on: 14/Oct/19 19:45
Start Date: 14/Oct/19 19:45
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9785: [BEAM-8387] Remove 
sdk-worker-parallelism option from JobServerDriver
URL: https://github.com/apache/beam/pull/9785#issuecomment-541881979
 
 
   LGTM, I always thought having the same argument set in both places added 
unnecessary complexity (like 
[BEAM-7657](https://issues.apache.org/jira/browse/BEAM-7657)).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328080)
Time Spent: 20m  (was: 10m)

> Remove sdk-worker-parallelism option from JobServerDriver
> -
>
> Key: BEAM-8387
> URL: https://issues.apache.org/jira/browse/BEAM-8387
> Project: Beam
>  Issue Type: Task
>  Components: runner-core
>Affects Versions: 2.16.0
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The option was added when it wasn't possible to specify it as pipeline 
> option, which is no longer the case. The pipeline option has a value of 0, 
> which means that the runner should pick a suitable value. But this is then 
> overridden in FlinkJobInvoker with 1 (because that's the default in the  
> JobServerDriver config.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8394) Support for Multiple DataSources in JdbcIO

2019-10-14 Thread Rahul Patwari (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rahul Patwari updated BEAM-8394:

Description: 
When a Pipeline consists of multiple database systems, user should be able to 
provide DataSourceConfiguration for each database system and 
DataSourceProviderFromDataSourceConfiguration should return the respective 
DataSources for each database when apply() method is called.

 

Also add withDataSourceConfiguration() method in JdbcIO.ReadRows class

  was:
When a Pipeline consists of multiple databases, user should be able to provide 
DataSourceConfiguration for each database and  
DataSourceProviderFromDataSourceConfiguration should return the respective 
DataSources for each database when apply() method is called.

 

Also add withDataSourceConfiguration() method in JdbcIO.ReadRows class


> Support for Multiple DataSources in JdbcIO
> --
>
> Key: BEAM-8394
> URL: https://issues.apache.org/jira/browse/BEAM-8394
> Project: Beam
>  Issue Type: Task
>  Components: io-java-jdbc
>Reporter: Rahul Patwari
>Assignee: Rahul Patwari
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When a Pipeline consists of multiple database systems, user should be able to 
> provide DataSourceConfiguration for each database system and 
> DataSourceProviderFromDataSourceConfiguration should return the respective 
> DataSources for each database when apply() method is called.
>  
> Also add withDataSourceConfiguration() method in JdbcIO.ReadRows class



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8324) Pre/Postcommit Dataflow IT tests fail: _create_function() takes from 2 to 6 positional arguments but 7 were given

2019-10-14 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951276#comment-16951276
 ] 

Valentyn Tymofieiev commented on BEAM-8324:
---

The infinite recursion error that appears with dill upgrade also appears 
without dill upgrade if we launch the tests through nose: 
https://issues.apache.org/jira/browse/BEAM-8397.

> Pre/Postcommit Dataflow IT tests fail: _create_function() takes from 2 to 6 
> positional arguments but 7 were given
> -
>
> Key: BEAM-8324
> URL: https://issues.apache.org/jira/browse/BEAM-8324
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Blocker
> Fix For: 2.16.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> {noformat}
> Dataflow pipeline failed. State: FAILED, Error:
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 
> 773, in run
> self._load_main_session(self.local_staging_directory)
>   File 
> "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py", line 
> 489, in _load_main_session
> pickler.load_session(session_file)
>   File 
> "/usr/local/lib/python3.7/site-packages/apache_beam/internal/pickler.py", 
> line 287, in load_session
> return dill.load_session(file_path)
>   File "/usr/local/lib/python3.7/site-packages/dill/_dill.py", line 410, in 
> load_session
> module = unpickler.load()
> TypeError: _create_function() takes from 2 to 6 positional arguments but 7 
> were given
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-5878) Support DoFns with Keyword-only arguments in Python 3.

2019-10-14 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951275#comment-16951275
 ] 

Valentyn Tymofieiev commented on BEAM-5878:
---

The infinite recursion error that appears with dill upgrade also appears 
without dill upgrade if we launch the tests through nose: 
https://issues.apache.org/jira/browse/BEAM-8397.

> Support DoFns with Keyword-only arguments in Python 3.
> --
>
> Key: BEAM-5878
> URL: https://issues.apache.org/jira/browse/BEAM-5878
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Minor
> Fix For: 2.17.0
>
>  Time Spent: 15h
>  Remaining Estimate: 0h
>
> Python 3.0 [adds a possibility|https://www.python.org/dev/peps/pep-3102/] to 
> define functions with keyword-only arguments. 
> Currently Beam does not handle them correctly. [~ruoyu] pointed out [one 
> place|https://github.com/apache/beam/blob/a56ce43109c97c739fa08adca45528c41e3c925c/sdks/python/apache_beam/typehints/decorators.py#L118]
>  in our codebase that we should fix: in Python in 3.0 inspect.getargspec() 
> will fail on functions with keyword-only arguments, but a new method 
> [inspect.getfullargspec()|https://docs.python.org/3/library/inspect.html#inspect.getfullargspec]
>  supports them.
> There may be implications for our (best-effort) type-hints machinery.
> We should also add a Py3-only unit tests that covers DoFn's with keyword-only 
> arguments once Beam Python 3 tests are in a good shape.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8397) DataflowRunnerTest.test_remote_runner_display_data fails due to infinite recursion during pickling.

2019-10-14 Thread Valentyn Tymofieiev (Jira)
Valentyn Tymofieiev created BEAM-8397:
-

 Summary: DataflowRunnerTest.test_remote_runner_display_data fails 
due to infinite recursion during pickling.
 Key: BEAM-8397
 URL: https://issues.apache.org/jira/browse/BEAM-8397
 Project: Beam
  Issue Type: Sub-task
  Components: sdk-py-core
Reporter: Valentyn Tymofieiev


`python ./setup.py test -s 
apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest.test_remote_runner_display_data`
 passes.
`tox -e py37-gcp` passes if Beam depends on dill==0.3.0, but fails if Beam 
depends on dill==0.3.1.1.`python ./setup.py nosetests --tests 
'apache_beam/runners/dataflow/dataflow_runner_test.py:DataflowRunnerTest.test_remote_runner_display_data`
 fails currently if run on master.

The failure indicates infinite recursion during pickling:
{noformat}
test_remote_runner_display_data 
(apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest) ... 
Fatal Python error: Cannot recover from stack overflow.

Current thread 0x7f9d700ed740 (most recent call first):
  File "/usr/lib/python3.7/pickle.py", line 479 in get
  File "/usr/lib/python3.7/pickle.py", line 497 in save
  File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
  File "/usr/lib/python3.7/pickle.py", line 504 in save
  File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
  File 
"/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
 line 1394 in save_function
  File "/usr/lib/python3.7/pickle.py", line 504 in save
  File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
  File "/usr/lib/python3.7/pickle.py", line 856 in save_dict
  File 
"/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
 line 910 in save_module_dict
  File 
"/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
 line 198 in new_save_module_dict
  File "/usr/lib/python3.7/pickle.py", line 504 in save
  File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
  File "/usr/lib/python3.7/pickle.py", line 504 in save
  File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
  File 
"/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
 line 114 in wrapper
  File "/usr/lib/python3.7/pickle.py", line 504 in save
  File "/usr/lib/python3.7/pickle.py", line 771 in save_tuple
  File "/usr/lib/python3.7/pickle.py", line 504 in save
  File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
  File 
"/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
 line 1137 in save_cell
  File "/usr/lib/python3.7/pickle.py", line 504 in save
  File "/usr/lib/python3.7/pickle.py", line 771 in save_tuple
  File "/usr/lib/python3.7/pickle.py", line 504 in save
  File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
  File "/usr/lib/python3.7/pickle.py", line 504 in save
  File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
  File 
"/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
 line 1394 in save_function
  File "/usr/lib/python3.7/pickle.py", line 504 in save
  File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
  File "/usr/lib/python3.7/pickle.py", line 856 in save_dict
  File 
"/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
 line 910 in save_module_dict
  File 
"/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
 line 198 in new_save_module_dict
...
{noformat}

cc: [~lazylynx]




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8397) DataflowRunnerTest.test_remote_runner_display_data fails due to infinite recursion during pickling.

2019-10-14 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-8397:
--
Status: Open  (was: Triage Needed)

> DataflowRunnerTest.test_remote_runner_display_data fails due to infinite 
> recursion during pickling.
> ---
>
> Key: BEAM-8397
> URL: https://issues.apache.org/jira/browse/BEAM-8397
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
>
> `python ./setup.py test -s 
> apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest.test_remote_runner_display_data`
>  passes.
> `tox -e py37-gcp` passes if Beam depends on dill==0.3.0, but fails if Beam 
> depends on dill==0.3.1.1.`python ./setup.py nosetests --tests 
> 'apache_beam/runners/dataflow/dataflow_runner_test.py:DataflowRunnerTest.test_remote_runner_display_data`
>  fails currently if run on master.
> The failure indicates infinite recursion during pickling:
> {noformat}
> test_remote_runner_display_data 
> (apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest) ... 
> Fatal Python error: Cannot recover from stack overflow.
> Current thread 0x7f9d700ed740 (most recent call first):
>   File "/usr/lib/python3.7/pickle.py", line 479 in get
>   File "/usr/lib/python3.7/pickle.py", line 497 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1394 in save_function
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
>   File "/usr/lib/python3.7/pickle.py", line 856 in save_dict
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 910 in save_module_dict
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 198 in new_save_module_dict
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 114 in wrapper
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 771 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1137 in save_cell
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 771 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1394 in save_function
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
>   File "/usr/lib/python3.7/pickle.py", line 856 in save_dict
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 910 in save_module_dict
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 198 in new_save_module_dict
> ...
> {noformat}
> cc: [~lazylynx]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=328070=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328070
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 14/Oct/19 19:17
Start Date: 14/Oct/19 19:17
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on issue #9788: [BEAM-7389] Add 
links to docs for easier navigation
URL: https://github.com/apache/beam/pull/9788#issuecomment-541868004
 
 
   Alright, that makes sense, I'll close this then :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328070)
Time Spent: 65h 50m  (was: 65h 40m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 65h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=328071=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328071
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 14/Oct/19 19:17
Start Date: 14/Oct/19 19:17
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on pull request #9788: 
[BEAM-7389] Add links to docs for easier navigation
URL: https://github.com/apache/beam/pull/9788
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328071)
Time Spent: 66h  (was: 65h 50m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 66h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-6984) Python 3.7 Support

2019-10-14 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev reassigned BEAM-6984:
-

Assignee: Valentyn Tymofieiev  (was: Juta Staes)

> Python 3.7 Support
> --
>
> Key: BEAM-6984
> URL: https://issues.apache.org/jira/browse/BEAM-6984
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Robbe
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The first step of adding Python 3 support focused only on Python 3.5. Support 
> for Python 3.7 should be added as a next step.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8396) Default to LOOPBACK mode for local flink (spark, ...) runner.

2019-10-14 Thread Robert Bradshaw (Jira)
Robert Bradshaw created BEAM-8396:
-

 Summary: Default to LOOPBACK mode for local flink (spark, ...) 
runner.
 Key: BEAM-8396
 URL: https://issues.apache.org/jira/browse/BEAM-8396
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Robert Bradshaw


As well as being lower overhead, this will avoid surprises about workers 
operating within the docker filesystem, etc. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=328061=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328061
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 14/Oct/19 19:07
Start Date: 14/Oct/19 19:07
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9788: [BEAM-7389] Add links 
to docs for easier navigation
URL: https://github.com/apache/beam/pull/9788#issuecomment-541863054
 
 
   This could be a maintenance issue over time. Doc links to code. I am not 
sure if code needs link back to docs. The entry point for docs would ideally be 
discoverable enough to find what a user needs and would not require link back 
from snippets code to the corresponding docs.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328061)
Time Spent: 65h 40m  (was: 65.5h)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 65h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8394) Support for Multiple DataSources in JdbcIO

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8394?focusedWorklogId=328060=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328060
 ]

ASF GitHub Bot logged work on BEAM-8394:


Author: ASF GitHub Bot
Created on: 14/Oct/19 19:05
Start Date: 14/Oct/19 19:05
Worklog Time Spent: 10m 
  Work Description: rahul8383 commented on issue #9787: [BEAM-8394]: 
Support for Multiple DataSources in JdbcIO
URL: https://github.com/apache/beam/pull/9787#issuecomment-541862241
 
 
   R: @iemejia 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328060)
Time Spent: 0.5h  (was: 20m)

> Support for Multiple DataSources in JdbcIO
> --
>
> Key: BEAM-8394
> URL: https://issues.apache.org/jira/browse/BEAM-8394
> Project: Beam
>  Issue Type: Task
>  Components: io-java-jdbc
>Reporter: Rahul Patwari
>Assignee: Rahul Patwari
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When a Pipeline consists of multiple databases, user should be able to 
> provide DataSourceConfiguration for each database and  
> DataSourceProviderFromDataSourceConfiguration should return the respective 
> DataSources for each database when apply() method is called.
>  
> Also add withDataSourceConfiguration() method in JdbcIO.ReadRows class



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=328058=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328058
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 14/Oct/19 19:01
Start Date: 14/Oct/19 19:01
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on pull request #9788: 
[BEAM-7389] Add links to docs for easier navigation
URL: https://github.com/apache/beam/pull/9788
 
 
   Add links to docs for easier navigation. If a user lands on the GitHub code 
pages, have a way to redirect them to the docs for more information.
   
   R: @aaltay 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 

[jira] [Updated] (BEAM-8395) JavaBeamZetaSQL PreCommit is flaky

2019-10-14 Thread Kirill Kozlov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kirill Kozlov updated BEAM-8395:

Description: 
Occasionally fails on task: *Task :sdks:java:extensions:sql:zetasql:test*
{code:java}
10:28:56 org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLDialectSpecTest > 
testExceptAll FAILED
10:28:56 java.lang.NoSuchMethodError at ZetaSQLDialectSpecTest.java:2553

10:28:56 org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLDialectSpecTest > 
testZetaSQLStructFieldAccessInTumble FAILED
10:28:56 java.lang.NoClassDefFoundError at ZetaSQLDialectSpecTest.java:1272

... More of java.lang.NoClassDefFoundError{code}
Jenkins that failed: 
[https://builds.apache.org/job/beam_PreCommit_JavaBeamZetaSQL_Phrase/8/]

Jenkins that passed: 
[https://builds.apache.org/job/beam_PreCommit_JavaBeamZetaSQL_Phrase/9/]

 

Stack trace:
{code:java}
java.lang.NoSuchMethodError: 
com.google.protobuf.Descriptors$FileDescriptor.internalBuildGeneratedFileFrom([Ljava/lang/String;[Lcom/google/protobuf/Descriptors$FileDescriptor;)Lcom/google/protobuf/Descriptors$FileDescriptor;Close
 stack trace

at 
com.google.zetasql.functions.ZetaSQLDateTime.(ZetaSQLDateTime.java:508)

at 
com.google.zetasql.functions.ZetaSQLDateTime$DateTimestampPart.getDescriptor(ZetaSQLDateTime.java:460)

at 
com.google.zetasql.SimpleCatalog.processGetBuiltinFunctionsResponse(SimpleCatalog.java:377)

at com.google.zetasql.SimpleCatalog.addZetaSQLFunctions(SimpleCatalog.java:365)

at 
org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addBuiltinFunctionsToCatalog(SqlAnalyzer.java:152)

at 
org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.createPopulatedCatalog(SqlAnalyzer.java:136)

at 
org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.analyze(SqlAnalyzer.java:90)

at 
org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer$Builder.analyze(SqlAnalyzer.java:281)

at 
org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:136)

at 
org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:92)

at 
org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)

at 
org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)

at 
org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLDialectSpecTest.testExceptAll(ZetaSQLDialectSpecTest.java:2553){code}

  was:
Occasionally fails on task: *Task :sdks:java:extensions:sql:zetasql:test*
{code:java}
10:28:56 org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLDialectSpecTest > 
testExceptAll FAILED
10:28:56 java.lang.NoSuchMethodError at ZetaSQLDialectSpecTest.java:2553

10:28:56 org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLDialectSpecTest > 
testZetaSQLStructFieldAccessInTumble FAILED
10:28:56 java.lang.NoClassDefFoundError at ZetaSQLDialectSpecTest.java:1272

... More of java.lang.NoClassDefFoundError{code}
Jenkins that failed: 
[https://builds.apache.org/job/beam_PreCommit_JavaBeamZetaSQL_Phrase/8/]

Jenkins that passed: 
[https://builds.apache.org/job/beam_PreCommit_JavaBeamZetaSQL_Phrase/9/]


> JavaBeamZetaSQL PreCommit is flaky
> --
>
> Key: BEAM-8395
> URL: https://issues.apache.org/jira/browse/BEAM-8395
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql-zetasql
>Affects Versions: 2.15.0
>Reporter: Kirill Kozlov
>Priority: Major
>
> Occasionally fails on task: *Task :sdks:java:extensions:sql:zetasql:test*
> {code:java}
> 10:28:56 org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLDialectSpecTest > 
> testExceptAll FAILED
> 10:28:56 java.lang.NoSuchMethodError at ZetaSQLDialectSpecTest.java:2553
> 10:28:56 org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLDialectSpecTest > 
> testZetaSQLStructFieldAccessInTumble FAILED
> 10:28:56 java.lang.NoClassDefFoundError at ZetaSQLDialectSpecTest.java:1272
> ... More of java.lang.NoClassDefFoundError{code}
> Jenkins that failed: 
> [https://builds.apache.org/job/beam_PreCommit_JavaBeamZetaSQL_Phrase/8/]
> Jenkins that passed: 
> [https://builds.apache.org/job/beam_PreCommit_JavaBeamZetaSQL_Phrase/9/]
>  
> Stack trace:
> {code:java}
> java.lang.NoSuchMethodError: 
> com.google.protobuf.Descriptors$FileDescriptor.internalBuildGeneratedFileFrom([Ljava/lang/String;[Lcom/google/protobuf/Descriptors$FileDescriptor;)Lcom/google/protobuf/Descriptors$FileDescriptor;Close
>  stack trace
> at 
> com.google.zetasql.functions.ZetaSQLDateTime.(ZetaSQLDateTime.java:508)
> at 
> com.google.zetasql.functions.ZetaSQLDateTime$DateTimestampPart.getDescriptor(ZetaSQLDateTime.java:460)
> at 
> com.google.zetasql.SimpleCatalog.processGetBuiltinFunctionsResponse(SimpleCatalog.java:377)
> at 
> com.google.zetasql.SimpleCatalog.addZetaSQLFunctions(SimpleCatalog.java:365)
> at 
> 

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=328043=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328043
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 14/Oct/19 18:33
Start Date: 14/Oct/19 18:33
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9786: [BEAM-7389] 
Remove old element-wise snippets directory
URL: https://github.com/apache/beam/pull/9786
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328043)
Time Spent: 65h 20m  (was: 65h 10m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 65h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8368) [Python] libprotobuf-generated exception when importing apache_beam

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8368?focusedWorklogId=328040=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328040
 ]

ASF GitHub Bot logged work on BEAM-8368:


Author: ASF GitHub Bot
Created on: 14/Oct/19 18:27
Start Date: 14/Oct/19 18:27
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9768: [BEAM-8368] 
Avoid libprotobuf-generated exception when importing apache_beam
URL: https://github.com/apache/beam/pull/9768
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328040)
Time Spent: 1h 20m  (was: 1h 10m)

> [Python] libprotobuf-generated exception when importing apache_beam
> ---
>
> Key: BEAM-8368
> URL: https://issues.apache.org/jira/browse/BEAM-8368
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.15.0, 2.17.0
>Reporter: Ubaier Bhat
>Assignee: Ahmet Altay
>Priority: Blocker
> Fix For: 2.17.0
>
> Attachments: error_log.txt
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Unable to import apache_beam after upgrading to macos 10.15 (Catalina). 
> Cleared all the pipenvs and but can't get it working again.
> {code}
> import apache_beam as beam
> /Users/***/.local/share/virtualenvs/beam-etl-ims6DitU/lib/python3.7/site-packages/apache_beam/__init__.py:84:
>  UserWarning: Some syntactic constructs of Python 3 are not yet fully 
> supported by Apache Beam.
>   'Some syntactic constructs of Python 3 are not yet fully supported by '
> [libprotobuf ERROR google/protobuf/descriptor_database.cc:58] File already 
> exists in database: 
> [libprotobuf FATAL google/protobuf/descriptor.cc:1370] CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> libc++abi.dylib: terminating with uncaught exception of type 
> google::protobuf::FatalException: CHECK failed: 
> GeneratedDatabase()->Add(encoded_file_descriptor, size): 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1440) Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1440?focusedWorklogId=328025=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328025
 ]

ASF GitHub Bot logged work on BEAM-1440:


Author: ASF GitHub Bot
Created on: 14/Oct/19 17:57
Start Date: 14/Oct/19 17:57
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #9772: [BEAM-1440] Create a 
BigQuery source that implements iobase.BoundedSource for Python
URL: https://github.com/apache/beam/pull/9772#issuecomment-541826086
 
 
   Run Python 3 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328025)
Time Spent: 1h 40m  (was: 1.5h)

> Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK
> --
>
> Key: BEAM-1440
> URL: https://issues.apache.org/jira/browse/BEAM-1440
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently we have a BigQuery native source for Python SDK [1].
> This can only be used by Dataflow runner.
> We should  implement a Beam BigQuery source that implements 
> iobase.BoundedSource [2] interface so that other runners that try to use 
> Python SDK can read from BigQuery as well. Java SDK already has a Beam 
> BigQuery source [3].
> [1] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py
> [2] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py#L70
> [3] 
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1189



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-1440) Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-1440?focusedWorklogId=328026=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328026
 ]

ASF GitHub Bot logged work on BEAM-1440:


Author: ASF GitHub Bot
Created on: 14/Oct/19 17:57
Start Date: 14/Oct/19 17:57
Worklog Time Spent: 10m 
  Work Description: kamilwu commented on issue #9772: [BEAM-1440] Create a 
BigQuery source that implements iobase.BoundedSource for Python
URL: https://github.com/apache/beam/pull/9772#issuecomment-541826086
 
 
   Run Python 3 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328026)
Time Spent: 1h 50m  (was: 1h 40m)

> Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK
> --
>
> Key: BEAM-1440
> URL: https://issues.apache.org/jira/browse/BEAM-1440
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently we have a BigQuery native source for Python SDK [1].
> This can only be used by Dataflow runner.
> We should  implement a Beam BigQuery source that implements 
> iobase.BoundedSource [2] interface so that other runners that try to use 
> Python SDK can read from BigQuery as well. Java SDK already has a Beam 
> BigQuery source [3].
> [1] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py
> [2] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py#L70
> [3] 
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1189



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8395) JavaBeamZetaSQL PreCommit is flaky

2019-10-14 Thread Kirill Kozlov (Jira)
Kirill Kozlov created BEAM-8395:
---

 Summary: JavaBeamZetaSQL PreCommit is flaky
 Key: BEAM-8395
 URL: https://issues.apache.org/jira/browse/BEAM-8395
 Project: Beam
  Issue Type: Bug
  Components: dsl-sql-zetasql
Affects Versions: 2.15.0
Reporter: Kirill Kozlov


Occasionally fails on task: *Task :sdks:java:extensions:sql:zetasql:test*
*10:28:56* org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLDialectSpecTest > 
testExceptAll FAILED*10:28:56* java.lang.NoSuchMethodError at 
ZetaSQLDialectSpecTest.java:2553*10:28:56* 
org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLDialectSpecTest > 
testZetaSQLStructFieldAccessInTumble FAILED*10:28:56* 
java.lang.NoClassDefFoundError at ZetaSQLDialectSpecTest.java:1272

... More of java.lang.NoClassDefFoundError
Jenkins that failed: 
[https://builds.apache.org/job/beam_PreCommit_JavaBeamZetaSQL_Phrase/8/]

Jenkins that passed: 
[https://builds.apache.org/job/beam_PreCommit_JavaBeamZetaSQL_Phrase/9/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3342) Create a Cloud Bigtable IO connector for Python

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3342?focusedWorklogId=328016=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328016
 ]

ASF GitHub Bot logged work on BEAM-3342:


Author: ASF GitHub Bot
Created on: 14/Oct/19 17:49
Start Date: 14/Oct/19 17:49
Worklog Time Spent: 10m 
  Work Description: mf2199 commented on issue #8457: [BEAM-3342] Create a 
Cloud Bigtable IO connector for Python
URL: https://github.com/apache/beam/pull/8457#issuecomment-541821885
 
 
   Hi Chamikara,
   
   Just forwarded you some details. It's close to 9pm now here in
   Saint-Petersburg, I’ll get back to it tomorrow.
   
   Max.
   
   
   
   
   пн, 14 окт. 2019 г., 20:30 Chamikara Jayalath :
   
   > Any updates here ?
   >
   > Thanks.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > 
,
   > or unsubscribe
   > 

   > .
   >
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328016)
Time Spent: 41h  (was: 40h 50m)

> Create a Cloud Bigtable IO connector for Python
> ---
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: Major
>  Time Spent: 41h
>  Remaining Estimate: 0h
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8394) Support for Multiple DataSources in JdbcIO

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8394?focusedWorklogId=328014=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328014
 ]

ASF GitHub Bot logged work on BEAM-8394:


Author: ASF GitHub Bot
Created on: 14/Oct/19 17:48
Start Date: 14/Oct/19 17:48
Worklog Time Spent: 10m 
  Work Description: rahul8383 commented on pull request #9787: [BEAM-8394]: 
Support for Multiple DataSources in JdbcIO
URL: https://github.com/apache/beam/pull/9787
 
 
   Made changes in JdbcIO.java to support:
   When a Pipeline consists of multiple databases, the user should be able to 
provide DataSourceConfiguration for each database and 
DataSourceProviderFromDataSourceConfiguration should return the respective 
DataSources for each database when apply() method is called.
   
   Also added withDataSourceConfiguration() method in JdbcIO.ReadRows class
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328008=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328008
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 14/Oct/19 17:46
Start Date: 14/Oct/19 17:46
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #9764: [BEAM-8365] [WIP] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#issuecomment-541820040
 
 
   Run JavaBeamZetaSQL PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328008)
Time Spent: 3h 20m  (was: 3h 10m)

> Add project push-down capability to IO APIs
> ---
>
> Key: BEAM-8365
> URL: https://issues.apache.org/jira/browse/BEAM-8365
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> * InMemoryTable should implement a following method:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code}
> Which should return a `PCollection` with fields specified in `fieldNames` 
> list.
>  * Create a rule to push fields used by a Calc (in projects and in a 
> condition) down into TestTable IO.
>  * Updating that same Calc  (from previous step) to have a proper input and 
> output schemes, remove unused fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=328010=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-328010
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 14/Oct/19 17:46
Start Date: 14/Oct/19 17:46
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #9764: [BEAM-8365] [WIP] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#issuecomment-541820040
 
 
   Run JavaBeamZetaSQL PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 328010)
Time Spent: 3.5h  (was: 3h 20m)

> Add project push-down capability to IO APIs
> ---
>
> Key: BEAM-8365
> URL: https://issues.apache.org/jira/browse/BEAM-8365
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> * InMemoryTable should implement a following method:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code}
> Which should return a `PCollection` with fields specified in `fieldNames` 
> list.
>  * Create a rule to push fields used by a Calc (in projects and in a 
> condition) down into TestTable IO.
>  * Updating that same Calc  (from previous step) to have a proper input and 
> output schemes, remove unused fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8394) Support for Multiple DataSources in JdbcIO

2019-10-14 Thread Rahul Patwari (Jira)
Rahul Patwari created BEAM-8394:
---

 Summary: Support for Multiple DataSources in JdbcIO
 Key: BEAM-8394
 URL: https://issues.apache.org/jira/browse/BEAM-8394
 Project: Beam
  Issue Type: Task
  Components: io-java-jdbc
Reporter: Rahul Patwari
Assignee: Rahul Patwari


When a Pipeline consists of multiple databases, user should be able to provide 
DataSourceConfiguration for each database and  
DataSourceProviderFromDataSourceConfiguration should return the respective 
DataSources for each database when apply() method is called.

 

Also add withDataSourceConfiguration() method in JdbcIO.ReadRows class



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-2535) Allow explicit output time independent of firing specification for all timers

2019-10-14 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951180#comment-16951180
 ] 

Kenneth Knowles commented on BEAM-2535:
---

[~rohdesam]

> Allow explicit output time independent of firing specification for all timers
> -
>
> Key: BEAM-2535
> URL: https://issues.apache.org/jira/browse/BEAM-2535
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Today, we have insufficient control over the event time timestamp of elements 
> output from a timer callback.
> 1. For an event time timer, it is the timestamp of the timer itself.
>  2. For a processing time timer, it is the current input watermark at the 
> time of processing.
> But for both of these, we may want to reserve the right to output a 
> particular time, aka set a "watermark hold".
> A naive implementation of a {{TimerWithWatermarkHold}} would work for making 
> sure output is not droppable, but does not fully explain window expiration 
> and late data/timer dropping.
> In the natural interpretation of a timer as a feedback loop on a transform, 
> timers should be viewed as another channel of input, with a watermark, and 
> items on that channel _all need event time timestamps even if they are 
> delivered according to a different time domain_.
> I propose that the specification for when a timer should fire should be 
> separated (with nice defaults) from the specification of the event time of 
> resulting outputs. These timestamps will determine a side channel with a new 
> "timer watermark" that constrains the output watermark.
>  - We still need to fire event time timers according to the input watermark, 
> so that event time timers fire.
>  - Late data dropping and window expiration will be in terms of the minimum 
> of the input watermark and the timer watermark. In this way, whenever a timer 
> is set, the window is not going to be garbage collected.
>  - We will need to make sure we have a way to "wake up" a window once it is 
> expired; this may be as simple as exhausting the timer channel as soon as the 
> input watermark indicates expiration of a window
> This is mostly aimed at end-user timers in a stateful+timely {{DoFn}}. It 
> seems reasonable to use timers as an implementation detail (e.g. in 
> runners-core utilities) without wanting any of this additional machinery. For 
> example, if there is no possibility of output from the timer callback.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3342) Create a Cloud Bigtable IO connector for Python

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3342?focusedWorklogId=327994=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-327994
 ]

ASF GitHub Bot logged work on BEAM-3342:


Author: ASF GitHub Bot
Created on: 14/Oct/19 17:30
Start Date: 14/Oct/19 17:30
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #8457: [BEAM-3342] 
Create a Cloud Bigtable IO connector for Python
URL: https://github.com/apache/beam/pull/8457#issuecomment-541811674
 
 
   Any updates here ?
   
   Thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 327994)
Time Spent: 40h 50m  (was: 40h 40m)

> Create a Cloud Bigtable IO connector for Python
> ---
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: Major
>  Time Spent: 40h 50m
>  Remaining Estimate: 0h
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=327988=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-327988
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 14/Oct/19 17:28
Start Date: 14/Oct/19 17:28
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #9764: [BEAM-8365] [WIP] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#issuecomment-541810496
 
 
   Run JavaBeamZetaSQL PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 327988)
Time Spent: 3h 10m  (was: 3h)

> Add project push-down capability to IO APIs
> ---
>
> Key: BEAM-8365
> URL: https://issues.apache.org/jira/browse/BEAM-8365
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> * InMemoryTable should implement a following method:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code}
> Which should return a `PCollection` with fields specified in `fieldNames` 
> list.
>  * Create a rule to push fields used by a Calc (in projects and in a 
> condition) down into TestTable IO.
>  * Updating that same Calc  (from previous step) to have a proper input and 
> output schemes, remove unused fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=327987=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-327987
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 14/Oct/19 17:27
Start Date: 14/Oct/19 17:27
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #9764: [BEAM-8365] [WIP] 
Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#issuecomment-541810496
 
 
   Run JavaBeamZetaSQL PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 327987)
Time Spent: 3h  (was: 2h 50m)

> Add project push-down capability to IO APIs
> ---
>
> Key: BEAM-8365
> URL: https://issues.apache.org/jira/browse/BEAM-8365
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> * InMemoryTable should implement a following method:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code}
> Which should return a `PCollection` with fields specified in `fieldNames` 
> list.
>  * Create a rule to push fields used by a Calc (in projects and in a 
> condition) down into TestTable IO.
>  * Updating that same Calc  (from previous step) to have a proper input and 
> output schemes, remove unused fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-3713) Consider moving away from nose to nose2 or pytest.

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3713?focusedWorklogId=327986=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-327986
 ]

ASF GitHub Bot logged work on BEAM-3713:


Author: ASF GitHub Bot
Created on: 14/Oct/19 17:22
Start Date: 14/Oct/19 17:22
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #9756: [BEAM-3713] Add pytest 
for unit tests
URL: https://github.com/apache/beam/pull/9756#issuecomment-541807541
 
 
   @chadrik PTAL
   CC/R: @markflyhigh, @chamikaramj 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 327986)
Time Spent: 8.5h  (was: 8h 20m)

> Consider moving away from nose to nose2 or pytest.
> --
>
> Key: BEAM-3713
> URL: https://issues.apache.org/jira/browse/BEAM-3713
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: Robert Bradshaw
>Assignee: Udi Meiri
>Priority: Minor
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> Per 
> [https://nose.readthedocs.io/en/latest/|https://nose.readthedocs.io/en/latest/,]
>  , nose is in maintenance mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=327981=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-327981
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 14/Oct/19 17:08
Start Date: 14/Oct/19 17:08
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #9743: [BEAM-8365] 
Project push-down for test table provider
URL: https://github.com/apache/beam/pull/9743
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 327981)
Time Spent: 2h 50m  (was: 2h 40m)

> Add project push-down capability to IO APIs
> ---
>
> Key: BEAM-8365
> URL: https://issues.apache.org/jira/browse/BEAM-8365
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> * InMemoryTable should implement a following method:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code}
> Which should return a `PCollection` with fields specified in `fieldNames` 
> list.
>  * Create a rule to push fields used by a Calc (in projects and in a 
> condition) down into TestTable IO.
>  * Updating that same Calc  (from previous step) to have a proper input and 
> output schemes, remove unused fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=327980=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-327980
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 14/Oct/19 17:07
Start Date: 14/Oct/19 17:07
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9764: [BEAM-8365] 
[WIP] Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#discussion_r334568107
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProvider.java
 ##
 @@ -150,12 +165,40 @@ public BeamTableStatistics 
getTableStatistics(PipelineOptions options) {
   return begin.apply(Create.of(tableWithRows.rows).withCoder(rowCoder()));
 }
 
+@Override
+public PCollection buildIOReader(
+PBegin begin, BeamSqlTableFilter filters, List fieldNames) {
+  PCollection withAllFields = buildIOReader(begin);
+  if (options == PushDownOptions.NONE
+  || (fieldNames.isEmpty() && filters instanceof DefaultTableFilter)) {
+return withAllFields;
+  }
 
 Review comment:
   Some portion of this `if` statement can be moved to `BeamIOSourceRel`, more 
specifically into the `Transform#expand` method.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 327980)
Time Spent: 2h 40m  (was: 2.5h)

> Add project push-down capability to IO APIs
> ---
>
> Key: BEAM-8365
> URL: https://issues.apache.org/jira/browse/BEAM-8365
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> * InMemoryTable should implement a following method:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code}
> Which should return a `PCollection` with fields specified in `fieldNames` 
> list.
>  * Create a rule to push fields used by a Calc (in projects and in a 
> condition) down into TestTable IO.
>  * Updating that same Calc  (from previous step) to have a proper input and 
> output schemes, remove unused fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8365) Add project push-down capability to IO APIs

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8365?focusedWorklogId=327963=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-327963
 ]

ASF GitHub Bot logged work on BEAM-8365:


Author: ASF GitHub Bot
Created on: 14/Oct/19 16:40
Start Date: 14/Oct/19 16:40
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9764: [BEAM-8365] 
[WIP] Project push-down for TestTableProvider
URL: https://github.com/apache/beam/pull/9764#discussion_r334568107
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTableProvider.java
 ##
 @@ -150,12 +165,40 @@ public BeamTableStatistics 
getTableStatistics(PipelineOptions options) {
   return begin.apply(Create.of(tableWithRows.rows).withCoder(rowCoder()));
 }
 
+@Override
+public PCollection buildIOReader(
+PBegin begin, BeamSqlTableFilter filters, List fieldNames) {
+  PCollection withAllFields = buildIOReader(begin);
+  if (options == PushDownOptions.NONE
+  || (fieldNames.isEmpty() && filters instanceof DefaultTableFilter)) {
+return withAllFields;
+  }
 
 Review comment:
   Some portion of this `if` statement can be moved to `BeamIOSourceRel`, more 
specifically the `Transform#expand` method.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 327963)
Time Spent: 2.5h  (was: 2h 20m)

> Add project push-down capability to IO APIs
> ---
>
> Key: BEAM-8365
> URL: https://issues.apache.org/jira/browse/BEAM-8365
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> * InMemoryTable should implement a following method:
> {code:java}
> public PCollection buildIOReader(
> PBegin begin, BeamSqlTableFilter filters, List fieldNames);{code}
> Which should return a `PCollection` with fields specified in `fieldNames` 
> list.
>  * Create a rule to push fields used by a Calc (in projects and in a 
> condition) down into TestTable IO.
>  * Updating that same Calc  (from previous step) to have a proper input and 
> output schemes, remove unused fields.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8387) Remove sdk-worker-parallelism option from JobServerDriver

2019-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8387?focusedWorklogId=327946=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-327946
 ]

ASF GitHub Bot logged work on BEAM-8387:


Author: ASF GitHub Bot
Created on: 14/Oct/19 16:22
Start Date: 14/Oct/19 16:22
Worklog Time Spent: 10m 
  Work Description: tweise commented on pull request #9785: [BEAM-8387] 
Remove sdk-worker-parallelism option from JobServerDriver
URL: https://github.com/apache/beam/pull/9785
 
 
   This option was added when it wasn't possible to specify the worker count as 
pipeline option, which is no longer the case. The pipeline option has a special 
value of 0, which means that the runner should pick a suitable value. 
Unfortunately this was overridden in FlinkJobInvoker with 1 (because that's the 
default in the  JobServerDriver config). There is no need to have this option 
in the job server.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

  1   2   >