date:20191126

[jira] [Work logged] (BEAM-8406) TextTable support JSON format

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8406?focusedWorklogId=349639&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349639
 ]

ASF GitHub Bot logged work on BEAM-8406:


Author: ASF GitHub Bot
Created on: 26/Nov/19 09:00
Start Date: 26/Nov/19 09:00
Worklog Time Spent: 10m 
  Work Description: milantracy commented on pull request #10217: 
[BEAM-8406] Add support for JSON format text tables
URL: https://github.com/apache/beam/pull/10217
 
 
   **Please** add a meaningful description for your change here
   
   Reading and writing text tables in JSON.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![B

[jira] [Work logged] (BEAM-8406) TextTable support JSON format

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8406?focusedWorklogId=349641&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349641
 ]

ASF GitHub Bot logged work on BEAM-8406:


Author: ASF GitHub Bot
Created on: 26/Nov/19 09:03
Start Date: 26/Nov/19 09:03
Worklog Time Spent: 10m 
  Work Description: milantracy commented on issue #10217: [BEAM-8406] Add 
support for JSON format text tables
URL: https://github.com/apache/beam/pull/10217#issuecomment-558531923
 
 
   R: @amaliujia @kennknowles 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349641)
Time Spent: 20m  (was: 10m)

> TextTable support JSON format
> -
>
> Key: BEAM-8406
> URL: https://issues.apache.org/jira/browse/BEAM-8406
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Jing Chen
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Have a JSON table implementation similar to [1].
> [1]: 
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/text/TextTable.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8568) Local file system does not match relative path with wildcards

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8568?focusedWorklogId=349660&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349660
 ]

ASF GitHub Bot logged work on BEAM-8568:


Author: ASF GitHub Bot
Created on: 26/Nov/19 09:34
Start Date: 26/Nov/19 09:34
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #10028: [BEAM-8568] Fixed 
problem that LocalFileSystem no longer supports wil…
URL: https://github.com/apache/beam/pull/10028#issuecomment-558544666
 
 
   This is passing now. Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349660)
Time Spent: 4h 50m  (was: 4h 40m)

> Local file system does not match relative path with wildcards
> -
>
> Key: BEAM-8568
> URL: https://issues.apache.org/jira/browse/BEAM-8568
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.16.0
>Reporter: Ondrej Cerny
>Assignee: David Moravek
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> CWD structure:
> {code}
> src/test/resources/input/sometestfile.txt
> {code}
>  
> Code:
> {code:java}
> input 
> .apply(Create.of("src/test/resources/input/*)) 
> .apply(FileIO.matchAll()) 
> .apply(FileIO.readMatches())
> {code}
> The code above doesn't match any file starting Beam 2.16.0. The regression 
> has been introduced in BEAM-7854.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8568) Local file system does not match relative path with wildcards

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8568?focusedWorklogId=349661&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349661
 ]

ASF GitHub Bot logged work on BEAM-8568:


Author: ASF GitHub Bot
Created on: 26/Nov/19 09:35
Start Date: 26/Nov/19 09:35
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #10028: [BEAM-8568] Fixed 
problem that LocalFileSystem no longer supports wil…
URL: https://github.com/apache/beam/pull/10028
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349661)
Time Spent: 5h  (was: 4h 50m)

> Local file system does not match relative path with wildcards
> -
>
> Key: BEAM-8568
> URL: https://issues.apache.org/jira/browse/BEAM-8568
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.16.0
>Reporter: Ondrej Cerny
>Assignee: David Moravek
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> CWD structure:
> {code}
> src/test/resources/input/sometestfile.txt
> {code}
>  
> Code:
> {code:java}
> input 
> .apply(Create.of("src/test/resources/input/*)) 
> .apply(FileIO.matchAll()) 
> .apply(FileIO.readMatches())
> {code}
> The code above doesn't match any file starting Beam 2.16.0. The regression 
> has been introduced in BEAM-7854.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (BEAM-8568) Local file system does not match relative path with wildcards

2019-11-26 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels resolved BEAM-8568.
--
Resolution: Fixed

> Local file system does not match relative path with wildcards
> -
>
> Key: BEAM-8568
> URL: https://issues.apache.org/jira/browse/BEAM-8568
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.16.0
>Reporter: Ondrej Cerny
>Assignee: David Moravek
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> CWD structure:
> {code}
> src/test/resources/input/sometestfile.txt
> {code}
>  
> Code:
> {code:java}
> input 
> .apply(Create.of("src/test/resources/input/*)) 
> .apply(FileIO.matchAll()) 
> .apply(FileIO.readMatches())
> {code}
> The code above doesn't match any file starting Beam 2.16.0. The regression 
> has been introduced in BEAM-7854.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8512) Add integration tests for Python "flink_runner.py"

2019-11-26 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16982315#comment-16982315
 ] 

Maximilian Michels commented on BEAM-8512:
--

You need to add {{flink-runtime-web}} to unlock all features of the Rest API. 
In "flink_runner.py":

{noformat}
// Note: We would want a custom Gradle configuration here to not include 
runtime-web in the final jar
runtimeOnly "org.apache.flink:flink-runtime-web_2.11:$flink_version"
{noformat}

> Add integration tests for Python "flink_runner.py"
> --
>
> Key: BEAM-8512
> URL: https://issues.apache.org/jira/browse/BEAM-8512
> Project: Beam
>  Issue Type: Test
>  Components: runner-flink, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> There are currently no integration tests for the Python FlinkRunner. We need 
> a set of tests similar to {{flink_runner_test.py}} which currently use the 
> PortableRunner and not the FlinkRunner.
> CC [~robertwb] [~ibzib] [~thw]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (BEAM-8512) Add integration tests for Python "flink_runner.py"

2019-11-26 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16982315#comment-16982315
 ] 

Maximilian Michels edited comment on BEAM-8512 at 11/26/19 10:10 AM:
-

You need to add {{flink-runtime-web}} to unlock all features of the Rest API. 
In "flink_runner.gradle":

{noformat}
// Note: We would want a custom Gradle configuration here to not include 
runtime-web in the final jar
runtimeOnly "org.apache.flink:flink-runtime-web_2.11:$flink_version"
{noformat}


was (Author: mxm):
You need to add {{flink-runtime-web}} to unlock all features of the Rest API. 
In "flink_runner.py":

{noformat}
// Note: We would want a custom Gradle configuration here to not include 
runtime-web in the final jar
runtimeOnly "org.apache.flink:flink-runtime-web_2.11:$flink_version"
{noformat}

> Add integration tests for Python "flink_runner.py"
> --
>
> Key: BEAM-8512
> URL: https://issues.apache.org/jira/browse/BEAM-8512
> Project: Beam
>  Issue Type: Test
>  Components: runner-flink, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Kyle Weaver
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> There are currently no integration tests for the Python FlinkRunner. We need 
> a set of tests similar to {{flink_runner_test.py}} which currently use the 
> PortableRunner and not the FlinkRunner.
> CC [~robertwb] [~ibzib] [~thw]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8819) AvroCoder for SpecificRecords is not serialized correctly since 2.13.0

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8819?focusedWorklogId=349678&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349678
 ]

ASF GitHub Bot logged work on BEAM-8819:


Author: ASF GitHub Bot
Created on: 26/Nov/19 10:14
Start Date: 26/Nov/19 10:14
Worklog Time Spent: 10m 
  Work Description: piter75 commented on pull request #10218: [BEAM-8819] 
Fix AvroCoder serialisation by introduction of AvroGenericCoder specialisation
URL: https://github.com/apache/beam/pull/10218
 
 
   Introduction of AvroCoderRegistrar and AvroCoderTranslator in #8342 broke 
AvroCoder behaviour for avro specific records - they are now always decoded as 
GenericRecord.
   
   With this change the new entity is created - AvroGenericCoder with 
corresponding *Registrar and *Translator to be used for GenericRecord 
coding/decoding.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_P

[jira] [Work logged] (BEAM-8511) Support for enhanced fan-out in KinesisIO.Read

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8511?focusedWorklogId=349680&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349680
 ]

ASF GitHub Bot logged work on BEAM-8511:


Author: ASF GitHub Bot
Created on: 26/Nov/19 10:14
Start Date: 26/Nov/19 10:14
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #9899: [BEAM-8511] 
[WIP] KinesisIO.Read enhanced fanout
URL: https://github.com/apache/beam/pull/9899#issuecomment-558560527
 
 
   @cmachgodaddy Which test file?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349680)
Time Spent: 3h  (was: 2h 50m)

> Support for enhanced fan-out in KinesisIO.Read
> --
>
> Key: BEAM-8511
> URL: https://issues.apache.org/jira/browse/BEAM-8511
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kinesis
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Add support for reading from an enhanced fan-out consumer using KinesisIO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8819) AvroCoder for SpecificRecords is not serialized correctly since 2.13.0

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8819?focusedWorklogId=349679&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349679
 ]

ASF GitHub Bot logged work on BEAM-8819:


Author: ASF GitHub Bot
Created on: 26/Nov/19 10:14
Start Date: 26/Nov/19 10:14
Worklog Time Spent: 10m 
  Work Description: piter75 commented on issue #10218: [BEAM-8819] Fix 
AvroCoder serialisation by introduction of AvroGenericCoder specialisation
URL: https://github.com/apache/beam/pull/10218#issuecomment-558560462
 
 
   R: mxm
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349679)
Time Spent: 20m  (was: 10m)

> AvroCoder for SpecificRecords is not serialized correctly since 2.13.0
> --
>
> Key: BEAM-8819
> URL: https://issues.apache.org/jira/browse/BEAM-8819
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Affects Versions: 2.13.0, 2.14.0, 2.15.0, 2.16.0
>Reporter: Piotr Szczepanik
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> While trying to upgrade our pipelines from Beam 2.11.0 to Beam 2.16.0 we 
> found that our SpecificRecords used in PCollection were being decoded as 
> GenericRecords.
> After some investigation we found the specific commit/issue that we think did 
> brake it:
> [https://github.com/apache/beam/pull/8342/files]
> https://issues.apache.org/jira/browse/BEAM-7103
> After the mentioned change all AvroCoders are serialized as simple urn: 
> "beam:coder:avro:v1" which means they are deserialized / rehydrated as 
> AvroCoder.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8819) AvroCoder for SpecificRecords is not serialized correctly since 2.13.0

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8819?focusedWorklogId=349681&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349681
 ]

ASF GitHub Bot logged work on BEAM-8819:


Author: ASF GitHub Bot
Created on: 26/Nov/19 10:14
Start Date: 26/Nov/19 10:14
Worklog Time Spent: 10m 
  Work Description: piter75 commented on issue #10218: [BEAM-8819] Fix 
AvroCoder serialisation by introduction of AvroGenericCoder specialisation
URL: https://github.com/apache/beam/pull/10218#issuecomment-558560462
 
 
   R: mxm
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349681)
Time Spent: 0.5h  (was: 20m)

> AvroCoder for SpecificRecords is not serialized correctly since 2.13.0
> --
>
> Key: BEAM-8819
> URL: https://issues.apache.org/jira/browse/BEAM-8819
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Affects Versions: 2.13.0, 2.14.0, 2.15.0, 2.16.0
>Reporter: Piotr Szczepanik
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> While trying to upgrade our pipelines from Beam 2.11.0 to Beam 2.16.0 we 
> found that our SpecificRecords used in PCollection were being decoded as 
> GenericRecords.
> After some investigation we found the specific commit/issue that we think did 
> brake it:
> [https://github.com/apache/beam/pull/8342/files]
> https://issues.apache.org/jira/browse/BEAM-7103
> After the mentioned change all AvroCoders are serialized as simple urn: 
> "beam:coder:avro:v1" which means they are deserialized / rehydrated as 
> AvroCoder.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8819) AvroCoder for SpecificRecords is not serialized correctly since 2.13.0

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8819?focusedWorklogId=349684&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349684
 ]

ASF GitHub Bot logged work on BEAM-8819:


Author: ASF GitHub Bot
Created on: 26/Nov/19 10:15
Start Date: 26/Nov/19 10:15
Worklog Time Spent: 10m 
  Work Description: piter75 commented on issue #10218: [BEAM-8819] Fix 
AvroCoder serialisation by introduction of AvroGenericCoder specialisation
URL: https://github.com/apache/beam/pull/10218#issuecomment-558560822
 
 
   R: @chamikaramj 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349684)
Time Spent: 50m  (was: 40m)

> AvroCoder for SpecificRecords is not serialized correctly since 2.13.0
> --
>
> Key: BEAM-8819
> URL: https://issues.apache.org/jira/browse/BEAM-8819
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Affects Versions: 2.13.0, 2.14.0, 2.15.0, 2.16.0
>Reporter: Piotr Szczepanik
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> While trying to upgrade our pipelines from Beam 2.11.0 to Beam 2.16.0 we 
> found that our SpecificRecords used in PCollection were being decoded as 
> GenericRecords.
> After some investigation we found the specific commit/issue that we think did 
> brake it:
> [https://github.com/apache/beam/pull/8342/files]
> https://issues.apache.org/jira/browse/BEAM-7103
> After the mentioned change all AvroCoders are serialized as simple urn: 
> "beam:coder:avro:v1" which means they are deserialized / rehydrated as 
> AvroCoder.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8819) AvroCoder for SpecificRecords is not serialized correctly since 2.13.0

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8819?focusedWorklogId=349682&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349682
 ]

ASF GitHub Bot logged work on BEAM-8819:


Author: ASF GitHub Bot
Created on: 26/Nov/19 10:15
Start Date: 26/Nov/19 10:15
Worklog Time Spent: 10m 
  Work Description: piter75 commented on issue #10218: [BEAM-8819] Fix 
AvroCoder serialisation by introduction of AvroGenericCoder specialisation
URL: https://github.com/apache/beam/pull/10218#issuecomment-558560753
 
 
   R: @mxm
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349682)
Time Spent: 40m  (was: 0.5h)

> AvroCoder for SpecificRecords is not serialized correctly since 2.13.0
> --
>
> Key: BEAM-8819
> URL: https://issues.apache.org/jira/browse/BEAM-8819
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Affects Versions: 2.13.0, 2.14.0, 2.15.0, 2.16.0
>Reporter: Piotr Szczepanik
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> While trying to upgrade our pipelines from Beam 2.11.0 to Beam 2.16.0 we 
> found that our SpecificRecords used in PCollection were being decoded as 
> GenericRecords.
> After some investigation we found the specific commit/issue that we think did 
> brake it:
> [https://github.com/apache/beam/pull/8342/files]
> https://issues.apache.org/jira/browse/BEAM-7103
> After the mentioned change all AvroCoders are serialized as simple urn: 
> "beam:coder:avro:v1" which means they are deserialized / rehydrated as 
> AvroCoder.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8825) OOM when writing large numbers of 'narrow' rows

2019-11-26 Thread Niel Markwick (Jira)

Niel Markwick created BEAM-8825:
---

 Summary: OOM when writing large numbers of 'narrow' rows
 Key: BEAM-8825
 URL: https://issues.apache.org/jira/browse/BEAM-8825
 Project: Beam
  Issue Type: Bug
  Components: io-java-gcp
Affects Versions: 2.16.0, 2.15.0, 2.14.0, 2.13.0, 2.12.0, 2.11.0, 2.10.0, 
2.9.0
Reporter: Niel Markwick
 Fix For: 2.17.0


SpannerIO can OOM when writing large numbers of 'narrow' rows. 

 

SpannerIO puts  input mutation elements into batches for efficient writing.
These batches are limited by number of cells mutated, and size of data written 
(5000 cells, 1MB data). SpannerIO groups enough mutations to build 1000 of 
these groups (5M cells, 1GB data), then sorts and batches them.

When the number of cells and size of data is very small (<5 cells, <100 bytes), 
the memory overhead of storing millions of mutations for batching is 
significant, and can lead to OOMs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8470) Create a new Spark runner based on Spark Structured streaming framework

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8470?focusedWorklogId=349712&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349712
 ]

ASF GitHub Bot logged work on BEAM-8470:


Author: ASF GitHub Bot
Created on: 26/Nov/19 11:03
Start Date: 26/Nov/19 11:03
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10211: [BEAM-8470] 
move enableSparkMetricSinks option to common spark pipeline options
URL: https://github.com/apache/beam/pull/10211#issuecomment-558578952
 
 
   Run Java Spark PortableValidatesRunner Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349712)
Time Spent: 12h 40m  (was: 12.5h)

> Create a new Spark runner based on Spark Structured streaming framework
> ---
>
> Key: BEAM-8470
> URL: https://issues.apache.org/jira/browse/BEAM-8470
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 12h 40m
>  Remaining Estimate: 0h
>
> h1. Why is it worth creating a new runner based on structured streaming:
> Because this new framework brings:
>  * Unified batch and streaming semantics:
>  * no more RDD/DStream distinction, as in Beam (only PCollection)
>  * Better state management:
>  * incremental state instead of saving all each time
>  * No more synchronous saving delaying computation: per batch and partition 
> delta file saved asynchronously + in-memory hashmap synchronous put/get
>  * Schemas in datasets:
>  * The dataset knows the structure of the data (fields) and can optimize 
> later on
>  * Schemas in PCollection in Beam
>  * New Source API
>  * Very close to Beam bounded source and unbounded sources
> h1. Why make a new runner from scratch?
>  * Structured streaming framework is very different from the RDD/Dstream 
> framework
> h1. We hope to gain
>  * More up to date runner in terms of libraries: leverage new features
>  * Leverage learnt practices from the previous runners
>  * Better performance thanks to the DAG optimizer (catalyst) and by 
> simplifying the code.
>  * Simplify the code and ease the maintenance
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8470) Create a new Spark runner based on Spark Structured streaming framework

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8470?focusedWorklogId=349711&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349711
 ]

ASF GitHub Bot logged work on BEAM-8470:


Author: ASF GitHub Bot
Created on: 26/Nov/19 11:03
Start Date: 26/Nov/19 11:03
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10211: [BEAM-8470] 
move enableSparkMetricSinks option to common spark pipeline options
URL: https://github.com/apache/beam/pull/10211#issuecomment-558578842
 
 
   Run Spark ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349711)
Time Spent: 12.5h  (was: 12h 20m)

> Create a new Spark runner based on Spark Structured streaming framework
> ---
>
> Key: BEAM-8470
> URL: https://issues.apache.org/jira/browse/BEAM-8470
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 12.5h
>  Remaining Estimate: 0h
>
> h1. Why is it worth creating a new runner based on structured streaming:
> Because this new framework brings:
>  * Unified batch and streaming semantics:
>  * no more RDD/DStream distinction, as in Beam (only PCollection)
>  * Better state management:
>  * incremental state instead of saving all each time
>  * No more synchronous saving delaying computation: per batch and partition 
> delta file saved asynchronously + in-memory hashmap synchronous put/get
>  * Schemas in datasets:
>  * The dataset knows the structure of the data (fields) and can optimize 
> later on
>  * Schemas in PCollection in Beam
>  * New Source API
>  * Very close to Beam bounded source and unbounded sources
> h1. Why make a new runner from scratch?
>  * Structured streaming framework is very different from the RDD/Dstream 
> framework
> h1. We hope to gain
>  * More up to date runner in terms of libraries: leverage new features
>  * Leverage learnt practices from the previous runners
>  * Better performance thanks to the DAG optimizer (catalyst) and by 
> simplifying the code.
>  * Simplify the code and ease the maintenance
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8470) Create a new Spark runner based on Spark Structured streaming framework

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8470?focusedWorklogId=349713&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349713
 ]

ASF GitHub Bot logged work on BEAM-8470:


Author: ASF GitHub Bot
Created on: 26/Nov/19 11:03
Start Date: 26/Nov/19 11:03
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10211: [BEAM-8470] 
move enableSparkMetricSinks option to common spark pipeline options
URL: https://github.com/apache/beam/pull/10211#issuecomment-558579078
 
 
   Run Spark StructuredStreaming ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349713)
Time Spent: 12h 50m  (was: 12h 40m)

> Create a new Spark runner based on Spark Structured streaming framework
> ---
>
> Key: BEAM-8470
> URL: https://issues.apache.org/jira/browse/BEAM-8470
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 12h 50m
>  Remaining Estimate: 0h
>
> h1. Why is it worth creating a new runner based on structured streaming:
> Because this new framework brings:
>  * Unified batch and streaming semantics:
>  * no more RDD/DStream distinction, as in Beam (only PCollection)
>  * Better state management:
>  * incremental state instead of saving all each time
>  * No more synchronous saving delaying computation: per batch and partition 
> delta file saved asynchronously + in-memory hashmap synchronous put/get
>  * Schemas in datasets:
>  * The dataset knows the structure of the data (fields) and can optimize 
> later on
>  * Schemas in PCollection in Beam
>  * New Source API
>  * Very close to Beam bounded source and unbounded sources
> h1. Why make a new runner from scratch?
>  * Structured streaming framework is very different from the RDD/Dstream 
> framework
> h1. We hope to gain
>  * More up to date runner in terms of libraries: leverage new features
>  * Leverage learnt practices from the previous runners
>  * Better performance thanks to the DAG optimizer (catalyst) and by 
> simplifying the code.
>  * Simplify the code and ease the maintenance
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8815) Portable pipeline execution without artifact staging

2019-11-26 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-8815:
-
Status: Open  (was: Triage Needed)

> Portable pipeline execution without artifact staging
> 
>
> Key: BEAM-8815
> URL: https://issues.apache.org/jira/browse/BEAM-8815
> Project: Beam
>  Issue Type: Task
>  Components: runner-core, runner-flink
>Affects Versions: 2.17.0
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Major
>  Labels: portability
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The default artifact staging implementation relies on a distributed 
> filesystem. A directory and manifest will be created even when artifact 
> staging isn't used, and the container boot code will fail retrieving 
> artifacts, even though there are non. In a containerized environment it is 
> common to package artifacts into containers. It should be possible to run the 
> pipeline w/o a distributed filesystem. 
> [https://lists.apache.org/thread.html/1b0d545955a80688ea19f227ad943683747b17beb45368ad0908fd21@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-8656) flink_master_url usage in flink_runner.py

2019-11-26 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels reassigned BEAM-8656:


Assignee: Maximilian Michels  (was: Kyle Weaver)

> flink_master_url usage in flink_runner.py
> -
>
> Key: BEAM-8656
> URL: https://issues.apache.org/jira/browse/BEAM-8656
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Maximilian Michels
>Priority: Major
>
> flink_master_url was replaced with flink_master in flink_runner.py without 
> preserving backward compatibility, but it remains documented on the website. 
> We will either have to update the website (making it clear that the 
> instructions are for 2.17+) or else make sure that the flink_master_url gets 
> aliased to flink_master before the 2.17 release is finalized. If anyone's 
> already using flink_runner.py, this will also break their existing pipeline.
> [https://github.com/apache/beam/pull/9946]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8656) flink_master_url usage in flink_runner.py

2019-11-26 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16982381#comment-16982381
 ] 

Maximilian Michels commented on BEAM-8656:
--

Updating the flink runner page. We should be good then.

> flink_master_url usage in flink_runner.py
> -
>
> Key: BEAM-8656
> URL: https://issues.apache.org/jira/browse/BEAM-8656
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> flink_master_url was replaced with flink_master in flink_runner.py without 
> preserving backward compatibility, but it remains documented on the website. 
> We will either have to update the website (making it clear that the 
> instructions are for 2.17+) or else make sure that the flink_master_url gets 
> aliased to flink_master before the 2.17 release is finalized. If anyone's 
> already using flink_runner.py, this will also break their existing pipeline.
> [https://github.com/apache/beam/pull/9946]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8656) flink_master_url usage in flink_runner.py

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8656?focusedWorklogId=349720&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349720
 ]

ASF GitHub Bot logged work on BEAM-8656:


Author: ASF GitHub Bot
Created on: 26/Nov/19 11:28
Start Date: 26/Nov/19 11:28
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #10220: [BEAM-8656] 
Update documentation for flink_master parameter
URL: https://github.com/apache/beam/pull/10220
 
 
   These were the only occurrences.
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](h

[jira] [Updated] (BEAM-8819) AvroCoder for SpecificRecords is not serialized correctly since 2.13.0

2019-11-26 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-8819:
-
Status: Open  (was: Triage Needed)

> AvroCoder for SpecificRecords is not serialized correctly since 2.13.0
> --
>
> Key: BEAM-8819
> URL: https://issues.apache.org/jira/browse/BEAM-8819
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Affects Versions: 2.13.0, 2.14.0, 2.15.0, 2.16.0
>Reporter: Piotr Szczepanik
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> While trying to upgrade our pipelines from Beam 2.11.0 to Beam 2.16.0 we 
> found that our SpecificRecords used in PCollection were being decoded as 
> GenericRecords.
> After some investigation we found the specific commit/issue that we think did 
> brake it:
> [https://github.com/apache/beam/pull/8342/files]
> https://issues.apache.org/jira/browse/BEAM-7103
> After the mentioned change all AvroCoders are serialized as simple urn: 
> "beam:coder:avro:v1" which means they are deserialized / rehydrated as 
> AvroCoder.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (BEAM-8819) AvroCoder for SpecificRecords is not serialized correctly since 2.13.0

2019-11-26 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels reassigned BEAM-8819:


Assignee: Piotr Szczepanik

> AvroCoder for SpecificRecords is not serialized correctly since 2.13.0
> --
>
> Key: BEAM-8819
> URL: https://issues.apache.org/jira/browse/BEAM-8819
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Affects Versions: 2.13.0, 2.14.0, 2.15.0, 2.16.0
>Reporter: Piotr Szczepanik
>Assignee: Piotr Szczepanik
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> While trying to upgrade our pipelines from Beam 2.11.0 to Beam 2.16.0 we 
> found that our SpecificRecords used in PCollection were being decoded as 
> GenericRecords.
> After some investigation we found the specific commit/issue that we think did 
> brake it:
> [https://github.com/apache/beam/pull/8342/files]
> https://issues.apache.org/jira/browse/BEAM-7103
> After the mentioned change all AvroCoders are serialized as simple urn: 
> "beam:coder:avro:v1" which means they are deserialized / rehydrated as 
> AvroCoder.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7732) Allow access to SpannerOptions in Beam

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7732?focusedWorklogId=349731&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349731
 ]

ASF GitHub Bot logged work on BEAM-7732:


Author: ASF GitHub Bot
Created on: 26/Nov/19 12:08
Start Date: 26/Nov/19 12:08
Worklog Time Spent: 10m 
  Work Description: nielm commented on issue #9048: [BEAM-7732] Enable 
setting custom SpannerOptions.
URL: https://github.com/apache/beam/pull/9048#issuecomment-558600383
 
 
   I'm going to rework this so that the Commit API deadline only is exposed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349731)
Time Spent: 3h 40m  (was: 3.5h)

> Allow access to SpannerOptions in Beam
> --
>
> Key: BEAM-7732
> URL: https://issues.apache.org/jira/browse/BEAM-7732
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.12.0, 2.13.0
>Reporter: Niel Markwick
>Priority: Minor
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Beam hides the 
> [SpannerOptions|https://github.com/googleapis/google-cloud-java/blob/master/google-cloud-clients/google-cloud-spanner/src/main/java/com/google/cloud/spanner/SpannerOptions.java]
>  object behind a 
> [SpannerConfig|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerConfig.java]
>  object because the SpannerOptions object is not serializable. 
> This means that the only options that can be set are those that can be 
> specified in SpannerConfig - limited to host, project, instance, database.
> Suggestion: add the possibility to set a SpannerOptionsFactory in 
> SpannerConfig:
> {code:java}
> public interface SpannerOptionsFactory extends Serializable {
>    public SpannerOptions create();
> }
> {code}
> This would allow the user use this factory class to specify custom 
> SpannerOptions before they are passed onto the connectToSpanner() method; 
> connectToSpanner() would then become: 
> {code:java}
> public SpannerAccessor connectToSpanner() {
>   
>   SpannerOptions.Builder builder = spannerOptionsFactory.create().toBuilder();
>   // rest of connectToSpanner follows, setting project, host, etc.
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8819) AvroCoder for SpecificRecords is not serialized correctly since 2.13.0

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8819?focusedWorklogId=349733&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349733
 ]

ASF GitHub Bot logged work on BEAM-8819:


Author: ASF GitHub Bot
Created on: 26/Nov/19 12:12
Start Date: 26/Nov/19 12:12
Worklog Time Spent: 10m 
  Work Description: piter75 commented on issue #10218: [BEAM-8819] Fix 
AvroCoder serialisation by introduction of AvroGenericCoder specialisation
URL: https://github.com/apache/beam/pull/10218#issuecomment-558601771
 
 
   Thanks @mxm!
   It's probably too late to get that into 2.17.0?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349733)
Time Spent: 1h  (was: 50m)

> AvroCoder for SpecificRecords is not serialized correctly since 2.13.0
> --
>
> Key: BEAM-8819
> URL: https://issues.apache.org/jira/browse/BEAM-8819
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Affects Versions: 2.13.0, 2.14.0, 2.15.0, 2.16.0
>Reporter: Piotr Szczepanik
>Assignee: Piotr Szczepanik
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> While trying to upgrade our pipelines from Beam 2.11.0 to Beam 2.16.0 we 
> found that our SpecificRecords used in PCollection were being decoded as 
> GenericRecords.
> After some investigation we found the specific commit/issue that we think did 
> brake it:
> [https://github.com/apache/beam/pull/8342/files]
> https://issues.apache.org/jira/browse/BEAM-7103
> After the mentioned change all AvroCoders are serialized as simple urn: 
> "beam:coder:avro:v1" which means they are deserialized / rehydrated as 
> AvroCoder.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8535) TextIO.read doesn't work with single wildcard with relative path

2019-11-26 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-8535:
-
Status: Open  (was: Triage Needed)

> TextIO.read doesn't work with single wildcard with relative path
> 
>
> Key: BEAM-8535
> URL: https://issues.apache.org/jira/browse/BEAM-8535
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model, io-java-files
>Affects Versions: 2.16.0
> Environment: Mac High Sierra 10.13.6.   DirectRunner local. 
>Reporter: Tim
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Critical
> Fix For: 2.17.0
>
>
> It looks like the TextIO.read transform is not matching files when using a 
> glob wildcard when the glob starts with a * and the path is relative.  IE 
> /full/path/* and ./path/f* work but ./path/* does not.
> Reproduction steps using the word count example from the Beam Quick start for 
> current version 2.16 ([https://beam.apache.org/get-started/quickstart-java/]) 
> - 
> {code:java}
> $ mkdir test-folder && cp pom.xml ./test-folder
> $ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
> >      -Dexec.args="--inputFile=./test-folder/* --output=counts" 
> >-Pdirect-runner
> {code}
>  The above fails when it is expected to find the pom.xml file. I tested the 
> same way with 2.15 and it works as expected.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8535) TextIO.read doesn't work with single wildcard with relative path

2019-11-26 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16982420#comment-16982420
 ] 

Maximilian Michels commented on BEAM-8535:
--

Is this a duplicate of BEAM-8568?

> TextIO.read doesn't work with single wildcard with relative path
> 
>
> Key: BEAM-8535
> URL: https://issues.apache.org/jira/browse/BEAM-8535
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model, io-java-files
>Affects Versions: 2.16.0
> Environment: Mac High Sierra 10.13.6.   DirectRunner local. 
>Reporter: Tim
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Critical
> Fix For: 2.17.0
>
>
> It looks like the TextIO.read transform is not matching files when using a 
> glob wildcard when the glob starts with a * and the path is relative.  IE 
> /full/path/* and ./path/f* work but ./path/* does not.
> Reproduction steps using the word count example from the Beam Quick start for 
> current version 2.16 ([https://beam.apache.org/get-started/quickstart-java/]) 
> - 
> {code:java}
> $ mkdir test-folder && cp pom.xml ./test-folder
> $ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
> >      -Dexec.args="--inputFile=./test-folder/* --output=counts" 
> >-Pdirect-runner
> {code}
>  The above fails when it is expected to find the pom.xml file. I tested the 
> same way with 2.15 and it works as expected.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (BEAM-8535) TextIO.read doesn't work with single wildcard with relative path

2019-11-26 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels closed BEAM-8535.

Resolution: Duplicate

Closing but feel free to re-open if it isn't.

> TextIO.read doesn't work with single wildcard with relative path
> 
>
> Key: BEAM-8535
> URL: https://issues.apache.org/jira/browse/BEAM-8535
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model, io-java-files
>Affects Versions: 2.16.0
> Environment: Mac High Sierra 10.13.6.   DirectRunner local. 
>Reporter: Tim
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Critical
> Fix For: 2.17.0
>
>
> It looks like the TextIO.read transform is not matching files when using a 
> glob wildcard when the glob starts with a * and the path is relative.  IE 
> /full/path/* and ./path/f* work but ./path/* does not.
> Reproduction steps using the word count example from the Beam Quick start for 
> current version 2.16 ([https://beam.apache.org/get-started/quickstart-java/]) 
> - 
> {code:java}
> $ mkdir test-folder && cp pom.xml ./test-folder
> $ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
> >      -Dexec.args="--inputFile=./test-folder/* --output=counts" 
> >-Pdirect-runner
> {code}
>  The above fails when it is expected to find the pom.xml file. I tested the 
> same way with 2.15 and it works as expected.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8819) AvroCoder for SpecificRecords is not serialized correctly since 2.13.0

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8819?focusedWorklogId=349738&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349738
 ]

ASF GitHub Bot logged work on BEAM-8819:


Author: ASF GitHub Bot
Created on: 26/Nov/19 12:18
Start Date: 26/Nov/19 12:18
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #10218: [BEAM-8819] Fix 
AvroCoder serialisation by introduction of AvroGenericCoder specialisation
URL: https://github.com/apache/beam/pull/10218#issuecomment-558603913
 
 
   Maybe. We can try to get this into 2.17.0.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349738)
Time Spent: 1h 20m  (was: 1h 10m)

> AvroCoder for SpecificRecords is not serialized correctly since 2.13.0
> --
>
> Key: BEAM-8819
> URL: https://issues.apache.org/jira/browse/BEAM-8819
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Affects Versions: 2.13.0, 2.14.0, 2.15.0, 2.16.0
>Reporter: Piotr Szczepanik
>Assignee: Piotr Szczepanik
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> While trying to upgrade our pipelines from Beam 2.11.0 to Beam 2.16.0 we 
> found that our SpecificRecords used in PCollection were being decoded as 
> GenericRecords.
> After some investigation we found the specific commit/issue that we think did 
> brake it:
> [https://github.com/apache/beam/pull/8342/files]
> https://issues.apache.org/jira/browse/BEAM-7103
> After the mentioned change all AvroCoders are serialized as simple urn: 
> "beam:coder:avro:v1" which means they are deserialized / rehydrated as 
> AvroCoder.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8819) AvroCoder for SpecificRecords is not serialized correctly since 2.13.0

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8819?focusedWorklogId=349737&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349737
 ]

ASF GitHub Bot logged work on BEAM-8819:


Author: ASF GitHub Bot
Created on: 26/Nov/19 12:18
Start Date: 26/Nov/19 12:18
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #10218: [BEAM-8819] Fix 
AvroCoder serialisation by introduction of AvroGenericCoder specialisation
URL: https://github.com/apache/beam/pull/10218#issuecomment-558603715
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349737)
Time Spent: 1h 10m  (was: 1h)

> AvroCoder for SpecificRecords is not serialized correctly since 2.13.0
> --
>
> Key: BEAM-8819
> URL: https://issues.apache.org/jira/browse/BEAM-8819
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Affects Versions: 2.13.0, 2.14.0, 2.15.0, 2.16.0
>Reporter: Piotr Szczepanik
>Assignee: Piotr Szczepanik
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> While trying to upgrade our pipelines from Beam 2.11.0 to Beam 2.16.0 we 
> found that our SpecificRecords used in PCollection were being decoded as 
> GenericRecords.
> After some investigation we found the specific commit/issue that we think did 
> brake it:
> [https://github.com/apache/beam/pull/8342/files]
> https://issues.apache.org/jira/browse/BEAM-7103
> After the mentioned change all AvroCoders are serialized as simple urn: 
> "beam:coder:avro:v1" which means they are deserialized / rehydrated as 
> AvroCoder.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8819) AvroCoder for SpecificRecords is not serialized correctly since 2.13.0

2019-11-26 Thread Maximilian Michels (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels updated BEAM-8819:
-
Fix Version/s: 2.17.0

> AvroCoder for SpecificRecords is not serialized correctly since 2.13.0
> --
>
> Key: BEAM-8819
> URL: https://issues.apache.org/jira/browse/BEAM-8819
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Affects Versions: 2.13.0, 2.14.0, 2.15.0, 2.16.0
>Reporter: Piotr Szczepanik
>Assignee: Piotr Szczepanik
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> While trying to upgrade our pipelines from Beam 2.11.0 to Beam 2.16.0 we 
> found that our SpecificRecords used in PCollection were being decoded as 
> GenericRecords.
> After some investigation we found the specific commit/issue that we think did 
> brake it:
> [https://github.com/apache/beam/pull/8342/files]
> https://issues.apache.org/jira/browse/BEAM-7103
> After the mentioned change all AvroCoders are serialized as simple urn: 
> "beam:coder:avro:v1" which means they are deserialized / rehydrated as 
> AvroCoder.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8819) AvroCoder for SpecificRecords is not serialized correctly since 2.13.0

2019-11-26 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16982424#comment-16982424
 ] 

Maximilian Michels commented on BEAM-8819:
--

Marking Fix Version 2.17.0 because this looks important enough to fix for the 
release. Pull request is ready.

> AvroCoder for SpecificRecords is not serialized correctly since 2.13.0
> --
>
> Key: BEAM-8819
> URL: https://issues.apache.org/jira/browse/BEAM-8819
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Affects Versions: 2.13.0, 2.14.0, 2.15.0, 2.16.0
>Reporter: Piotr Szczepanik
>Assignee: Piotr Szczepanik
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> While trying to upgrade our pipelines from Beam 2.11.0 to Beam 2.16.0 we 
> found that our SpecificRecords used in PCollection were being decoded as 
> GenericRecords.
> After some investigation we found the specific commit/issue that we think did 
> brake it:
> [https://github.com/apache/beam/pull/8342/files]
> https://issues.apache.org/jira/browse/BEAM-7103
> After the mentioned change all AvroCoders are serialized as simple urn: 
> "beam:coder:avro:v1" which means they are deserialized / rehydrated as 
> AvroCoder.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8819) AvroCoder for SpecificRecords is not serialized correctly since 2.13.0

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8819?focusedWorklogId=349740&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349740
 ]

ASF GitHub Bot logged work on BEAM-8819:


Author: ASF GitHub Bot
Created on: 26/Nov/19 12:25
Start Date: 26/Nov/19 12:25
Worklog Time Spent: 10m 
  Work Description: piter75 commented on issue #10218: [BEAM-8819] Fix 
AvroCoder serialisation by introduction of AvroGenericCoder specialisation
URL: https://github.com/apache/beam/pull/10218#issuecomment-558606179
 
 
   That would be great.
   I have also seen Portable_Python PreCommit failing but did not pursue it as 
it also fails in master if I understand correctly.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349740)
Time Spent: 1.5h  (was: 1h 20m)

> AvroCoder for SpecificRecords is not serialized correctly since 2.13.0
> --
>
> Key: BEAM-8819
> URL: https://issues.apache.org/jira/browse/BEAM-8819
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Affects Versions: 2.13.0, 2.14.0, 2.15.0, 2.16.0
>Reporter: Piotr Szczepanik
>Assignee: Piotr Szczepanik
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> While trying to upgrade our pipelines from Beam 2.11.0 to Beam 2.16.0 we 
> found that our SpecificRecords used in PCollection were being decoded as 
> GenericRecords.
> After some investigation we found the specific commit/issue that we think did 
> brake it:
> [https://github.com/apache/beam/pull/8342/files]
> https://issues.apache.org/jira/browse/BEAM-7103
> After the mentioned change all AvroCoders are serialized as simple urn: 
> "beam:coder:avro:v1" which means they are deserialized / rehydrated as 
> AvroCoder.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8218) Implement Apache PulsarIO

2019-11-26 Thread Sven Hornberg (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16982437#comment-16982437
 ] 

Sven Hornberg commented on BEAM-8218:
-

i created an issue on [Apache Pulsar 
Github|https://github.com/apache/pulsar/issues/5447] looking forward to this 
IO-idea. We are considering a switch from Kafka to Pulsar, but would like to 
use BEAM. 

> Implement Apache PulsarIO
> -
>
> Key: BEAM-8218
> URL: https://issues.apache.org/jira/browse/BEAM-8218
> Project: Beam
>  Issue Type: Task
>  Components: io-ideas
>Reporter: Alex Van Boxel
>Assignee: Taher Koitawala
>Priority: Minor
>
> Apache Pulsar is starting to gain popularity. Having a native Beam PulsarIO 
> could be beneficial.
> [https://pulsar.apache.org/|https://pulsar.apache.org/en/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8826) Investigate possibility of system metrics usage in portable performance tests

2019-11-26 Thread Lukasz Gajowy (Jira)

Lukasz Gajowy created BEAM-8826:
---

 Summary: Investigate possibility of system metrics usage in 
portable performance tests
 Key: BEAM-8826
 URL: https://issues.apache.org/jira/browse/BEAM-8826
 Project: Beam
  Issue Type: Task
  Components: testing
Reporter: Lukasz Gajowy


We currently use 
[TimeMonitor.java|https://github.com/apache/beam/blob/master/sdks/java/testing/test-utils/src/main/java/org/apache/beam/sdk/testutils/metrics/TimeMonitor.java]
 and 
[MeasureTime.py|https://github.com/apache/beam/blob/1988284a89b10b60eea48325f8a3b370b551c77c/sdks/python/apache_beam/testing/load_tests/load_test_metrics_utils.py#L406]
 DoFns to collect runtime in both portable and non-portable performance tests. 
However, in portable tests it seems to be possible to use [TOTAL_TIME_MSECS 
|https://github.com/apache/beam/blob/1988284a89b10b60eea48325f8a3b370b551c77c/model/pipeline/src/main/proto/metrics.proto#L130]for
 collecting execution time. Other system metrics are available as well (size, 
bundle size etc).

It seems like a good way to simplify things and get more useful metrics from 
portable jobs so it is worth investigating ways of using it in performance 
tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8470) Create a new Spark runner based on Spark Structured streaming framework

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8470?focusedWorklogId=349750&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349750
 ]

ASF GitHub Bot logged work on BEAM-8470:


Author: ASF GitHub Bot
Created on: 26/Nov/19 12:57
Start Date: 26/Nov/19 12:57
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #10211: 
[BEAM-8470] move enableSparkMetricSinks option to common spark pipeline options
URL: https://github.com/apache/beam/pull/10211
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349750)
Time Spent: 13h  (was: 12h 50m)

> Create a new Spark runner based on Spark Structured streaming framework
> ---
>
> Key: BEAM-8470
> URL: https://issues.apache.org/jira/browse/BEAM-8470
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> h1. Why is it worth creating a new runner based on structured streaming:
> Because this new framework brings:
>  * Unified batch and streaming semantics:
>  * no more RDD/DStream distinction, as in Beam (only PCollection)
>  * Better state management:
>  * incremental state instead of saving all each time
>  * No more synchronous saving delaying computation: per batch and partition 
> delta file saved asynchronously + in-memory hashmap synchronous put/get
>  * Schemas in datasets:
>  * The dataset knows the structure of the data (fields) and can optimize 
> later on
>  * Schemas in PCollection in Beam
>  * New Source API
>  * Very close to Beam bounded source and unbounded sources
> h1. Why make a new runner from scratch?
>  * Structured streaming framework is very different from the RDD/Dstream 
> framework
> h1. We hope to gain
>  * More up to date runner in terms of libraries: leverage new features
>  * Leverage learnt practices from the previous runners
>  * Better performance thanks to the DAG optimizer (catalyst) and by 
> simplifying the code.
>  * Simplify the code and ease the maintenance
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (BEAM-5192) Support Elasticsearch 7.x

2019-11-26 Thread Etienne Chauchot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot closed BEAM-5192.
--
Fix Version/s: Not applicable
   Resolution: Duplicate

> Support Elasticsearch 7.x
> -
>
> Key: BEAM-5192
> URL: https://issues.apache.org/jira/browse/BEAM-5192
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Etienne Chauchot
>Assignee: Chet Aldrich
>Priority: Major
> Fix For: Not applicable
>
>
> ES v7 is not out yet. But Elastic team scheduled a breaking change for ES 
> 7.0: the removal of the type feature. See 
> [https://www.elastic.co/blog/index-type-parent-child-join-now-future-in-elasticsearch]
> This will require a good amont of changes in the IO. 
> This ticket is there to track the future work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8338) Support ES 7.x for ElasticsearchIO

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8338?focusedWorklogId=349785&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349785
 ]

ASF GitHub Bot logged work on BEAM-8338:


Author: ASF GitHub Bot
Created on: 26/Nov/19 13:38
Start Date: 26/Nov/19 13:38
Worklog Time Spent: 10m 
  Work Description: echauchot commented on pull request #10025: [BEAM-8338] 
Support ES 7.x for ElasticsearchIO
URL: https://github.com/apache/beam/pull/10025#discussion_r350745236
 
 

 ##
 File path: 
sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java
 ##
 @@ -202,7 +202,7 @@ static void checkForErrors(HttpEntity responseEntity, int 
backendVersion, boolea
 } else {
   if (backendVersion == 2) {
 errorRootName = "create";
-  } else if (backendVersion == 5 || backendVersion == 6) {
+  } else if (backendVersion == 5 || backendVersion == 6 || 
backendVersion == 7) {
 
 Review comment:
   better ton avoid running on non-tested versions (if we use >)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349785)
Time Spent: 3h 20m  (was: 3h 10m)

> Support ES 7.x for ElasticsearchIO
> --
>
> Key: BEAM-8338
> URL: https://issues.apache.org/jira/browse/BEAM-8338
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Michal Brunát
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Elasticsearch has released 7.4 but ElasticsearchIO only supports 2x,5.x,6.x.
>  We should support ES 7.x for ElasticsearchIO.
>  [https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html]
>  
> [https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8338) Support ES 7.x for ElasticsearchIO

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8338?focusedWorklogId=349786&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349786
 ]

ASF GitHub Bot logged work on BEAM-8338:


Author: ASF GitHub Bot
Created on: 26/Nov/19 13:39
Start Date: 26/Nov/19 13:39
Worklog Time Spent: 10m 
  Work Description: echauchot commented on pull request #10025: [BEAM-8338] 
Support ES 7.x for ElasticsearchIO
URL: https://github.com/apache/beam/pull/10025#discussion_r350745236
 
 

 ##
 File path: 
sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java
 ##
 @@ -202,7 +202,7 @@ static void checkForErrors(HttpEntity responseEntity, int 
backendVersion, boolea
 } else {
   if (backendVersion == 2) {
 errorRootName = "create";
-  } else if (backendVersion == 5 || backendVersion == 6) {
+  } else if (backendVersion == 5 || backendVersion == 6 || 
backendVersion == 7) {
 
 Review comment:
   IMHO better ton avoid running on non-tested versions (if we use >)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349786)
Time Spent: 3.5h  (was: 3h 20m)

> Support ES 7.x for ElasticsearchIO
> --
>
> Key: BEAM-8338
> URL: https://issues.apache.org/jira/browse/BEAM-8338
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Michal Brunát
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Elasticsearch has released 7.4 but ElasticsearchIO only supports 2x,5.x,6.x.
>  We should support ES 7.x for ElasticsearchIO.
>  [https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html]
>  
> [https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-4776) Java PortableRunner should support metrics

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-4776?focusedWorklogId=349797&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349797
 ]

ASF GitHub Bot logged work on BEAM-4776:


Author: ASF GitHub Bot
Created on: 26/Nov/19 14:02
Start Date: 26/Nov/19 14:02
Worklog Time Spent: 10m 
  Work Description: echauchot commented on issue #10105: [BEAM-4776] Add 
metrics support to Java PortableRunner
URL: https://github.com/apache/beam/pull/10105#issuecomment-558642246
 
 
   @mwalenia to answer your question, I'm indeed the correct person for 
MetricsPusher related questions.
   Regarding MetricsPusher, the problem goes beyond the test itself. The whole 
MetricsPusher feature reports for now only user metrics (that is why the test 
sink that is tailored for it only reads user metrics). But the aim since the 
beginning of the architectural design (pull vs push essentially) was to allow 
in the future to support system metrics. Here is the design I did at the time:  
https://s.apache.org/runner_independent_metrics_extraction.
   Long story short, the good thing to do IMHO is to enhance MetricsPusher to 
support system metrics as well and, of course, update the test/sink.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349797)
Time Spent: 7.5h  (was: 7h 20m)

> Java PortableRunner should support metrics
> --
>
> Key: BEAM-4776
> URL: https://issues.apache.org/jira/browse/BEAM-4776
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Eugene Kirpichov
>Assignee: Michal Walenia
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> BEAM-4775 concerns adding metrics to the JobService API; the current issue is 
> about making PortableRunner understand them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-4776) Java PortableRunner should support metrics

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-4776?focusedWorklogId=349799&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349799
 ]

ASF GitHub Bot logged work on BEAM-4776:


Author: ASF GitHub Bot
Created on: 26/Nov/19 14:04
Start Date: 26/Nov/19 14:04
Worklog Time Spent: 10m 
  Work Description: echauchot commented on issue #10105: [BEAM-4776] Add 
metrics support to Java PortableRunner
URL: https://github.com/apache/beam/pull/10105#issuecomment-558642246
 
 
   @mwalenia to answer your question, I'm indeed the correct person for 
MetricsPusher related questions.
   Regarding MetricsPusher, the problem goes beyond the test itself. The whole 
MetricsPusher feature reports for now only user metrics (that is why the test 
sink that is tailored for it only reads user metrics). But the aim since the 
beginning of the architectural design (pull vs push essentially) was to allow 
in the future to support system metrics. Here is the design I did at the time:  
https://s.apache.org/runner_independent_metrics_extraction.
   Long story short, the good thing to do IMHO is to enhance MetricsPusher to 
support system metrics as well and, of course, update the test/sink.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349799)
Time Spent: 7h 40m  (was: 7.5h)

> Java PortableRunner should support metrics
> --
>
> Key: BEAM-4776
> URL: https://issues.apache.org/jira/browse/BEAM-4776
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Eugene Kirpichov
>Assignee: Michal Walenia
>Priority: Major
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> BEAM-4775 concerns adding metrics to the JobService API; the current issue is 
> about making PortableRunner understand them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-4776) Java PortableRunner should support metrics

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-4776?focusedWorklogId=349802&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349802
 ]

ASF GitHub Bot logged work on BEAM-4776:


Author: ASF GitHub Bot
Created on: 26/Nov/19 14:07
Start Date: 26/Nov/19 14:07
Worklog Time Spent: 10m 
  Work Description: echauchot commented on pull request #10105: [BEAM-4776] 
Add metrics support to Java PortableRunner
URL: https://github.com/apache/beam/pull/10105#discussion_r350760878
 
 

 ##
 File path: 
runners/portability/java/src/main/java/org/apache/beam/runners/portability/PortableMetrics.java
 ##
 @@ -0,0 +1,170 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.portability;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+import org.apache.beam.model.jobmanagement.v1.JobApi;
+import org.apache.beam.model.pipeline.v1.MetricsApi;
+import org.apache.beam.sdk.metrics.DistributionResult;
+import org.apache.beam.sdk.metrics.GaugeResult;
+import org.apache.beam.sdk.metrics.MetricFiltering;
+import org.apache.beam.sdk.metrics.MetricKey;
+import org.apache.beam.sdk.metrics.MetricName;
+import org.apache.beam.sdk.metrics.MetricQueryResults;
+import org.apache.beam.sdk.metrics.MetricResult;
+import org.apache.beam.sdk.metrics.MetricResults;
+import org.apache.beam.sdk.metrics.MetricsFilter;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.ImmutableList;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.collect.Iterables;
+import org.joda.time.Instant;
+
+public class PortableMetrics extends MetricResults {
+
+  private static final List COUNTER_METRIC_TYPES =
+  ImmutableList.of("beam:metrics:sum_int_64");
+  private static final List DISTRIBUTION_METRIC_TYPES =
+  ImmutableList.of("beam:metrics:distribution_int_64");
+  private static final List GAUGE_METRIC_TYPES =
+  ImmutableList.of("beam:metrics:latest_int_64");
+
+  private static final String NAMESPACE_LABEL = "NAMESPACE";
+  private static final String METRIC_NAME_LABEL = "NAME";
+  private static final String STEP_NAME_LABEL = "PTRANSFORM";
+  private Iterable> counters;
+  private Iterable> distributions;
+  private Iterable> gauges;
+
+  private PortableMetrics(
 
 Review comment:
   @mwalenia to answer your question, I'm indeed the correct person for 
MetricsPusher related questions.
   Regarding MetricsPusher, the problem goes beyond the test itself. The whole 
MetricsPusher feature reports for now only user metrics (that is why the test 
sink that is tailored for it only reads user metrics). But the aim since the 
beginning of the architectural design (pull vs push essentially) was to allow 
in the future to support system metrics. Here is the design I did at the time: 
https://s.apache.org/runner_independent_metrics_extraction.
   Long story short, the good thing to do IMHO is to enhance MetricsPusher to 
support system metrics as well and, of course, update the test/sink.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349802)
Time Spent: 7h 50m  (was: 7h 40m)

> Java PortableRunner should support metrics
> --
>
> Key: BEAM-4776
> URL: https://issues.apache.org/jira/browse/BEAM-4776
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Eugene Kirpichov
>Assignee: Michal Walenia
>Priority: Major
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> BEAM-4775 concerns adding metrics to the JobService API; the current issue is 
> about making PortableRunner understand them.



--
This message was sent by Atlassian Jira
(

[jira] [Work logged] (BEAM-8511) Support for enhanced fan-out in KinesisIO.Read

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8511?focusedWorklogId=349804&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349804
 ]

ASF GitHub Bot logged work on BEAM-8511:


Author: ASF GitHub Bot
Created on: 26/Nov/19 14:11
Start Date: 26/Nov/19 14:11
Worklog Time Spent: 10m 
  Work Description: cmachgodaddy commented on issue #9899: [BEAM-8511] 
[WIP] KinesisIO.Read enhanced fanout
URL: https://github.com/apache/beam/pull/9899#issuecomment-558646170
 
 
   diff-test.txt attached in this PR
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349804)
Time Spent: 3h 10m  (was: 3h)

> Support for enhanced fan-out in KinesisIO.Read
> --
>
> Key: BEAM-8511
> URL: https://issues.apache.org/jira/browse/BEAM-8511
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-kinesis
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Add support for reading from an enhanced fan-out consumer using KinesisIO.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8470) Create a new Spark runner based on Spark Structured streaming framework

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8470?focusedWorklogId=349806&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349806
 ]

ASF GitHub Bot logged work on BEAM-8470:


Author: ASF GitHub Bot
Created on: 26/Nov/19 14:22
Start Date: 26/Nov/19 14:22
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #10221: 
[BEAM-8470] Exclude failed ValidatesRunner tests
URL: https://github.com/apache/beam/pull/10221
 
 
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/be

[jira] [Work logged] (BEAM-8470) Create a new Spark runner based on Spark Structured streaming framework

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8470?focusedWorklogId=349807&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349807
 ]

ASF GitHub Bot logged work on BEAM-8470:


Author: ASF GitHub Bot
Created on: 26/Nov/19 14:23
Start Date: 26/Nov/19 14:23
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10221: [BEAM-8470] 
Exclude failed ValidatesRunner tests
URL: https://github.com/apache/beam/pull/10221#issuecomment-558651553
 
 
   Run Spark StructuredStreaming ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349807)
Time Spent: 13h 20m  (was: 13h 10m)

> Create a new Spark runner based on Spark Structured streaming framework
> ---
>
> Key: BEAM-8470
> URL: https://issues.apache.org/jira/browse/BEAM-8470
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> h1. Why is it worth creating a new runner based on structured streaming:
> Because this new framework brings:
>  * Unified batch and streaming semantics:
>  * no more RDD/DStream distinction, as in Beam (only PCollection)
>  * Better state management:
>  * incremental state instead of saving all each time
>  * No more synchronous saving delaying computation: per batch and partition 
> delta file saved asynchronously + in-memory hashmap synchronous put/get
>  * Schemas in datasets:
>  * The dataset knows the structure of the data (fields) and can optimize 
> later on
>  * Schemas in PCollection in Beam
>  * New Source API
>  * Very close to Beam bounded source and unbounded sources
> h1. Why make a new runner from scratch?
>  * Structured streaming framework is very different from the RDD/Dstream 
> framework
> h1. We hope to gain
>  * More up to date runner in terms of libraries: leverage new features
>  * Leverage learnt practices from the previous runners
>  * Better performance thanks to the DAG optimizer (catalyst) and by 
> simplifying the code.
>  * Simplify the code and ease the maintenance
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8827) Support Spark 3 on Spark Structured Streaming Runner

2019-11-26 Thread Jira



 [ 
https://issues.apache.org/jira/browse/BEAM-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-8827:
---
Status: Open  (was: Triage Needed)

> Support Spark 3 on Spark Structured Streaming Runner 
> -
>
> Key: BEAM-8827
> URL: https://issues.apache.org/jira/browse/BEAM-8827
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-spark
>Reporter: Ismaël Mejía
>Priority: Minor
>
> Spark Structured Streaming Runner uses currently non stable Spark APIs for 
> its Source translation, the DataSourceV2 API was changed on Spark 3. We need 
> probably a new class or a new compatibility layer to support this for Spark 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8827) Support Spark 3 on Spark Structured Streaming Runner

2019-11-26 Thread Jira

Ismaël Mejía created BEAM-8827:
--

 Summary: Support Spark 3 on Spark Structured Streaming Runner 
 Key: BEAM-8827
 URL: https://issues.apache.org/jira/browse/BEAM-8827
 Project: Beam
  Issue Type: Sub-task
  Components: runner-spark
Reporter: Ismaël Mejía


Spark Structured Streaming Runner uses currently non stable Spark APIs for its 
Source translation, the DataSourceV2 API was changed on Spark 3. We need 
probably a new class or a new compatibility layer to support this for Spark 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8733) The "KeyError: u'-47'" error from line 305 of sdk_worker.py

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8733?focusedWorklogId=349816&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349816
 ]

ASF GitHub Bot logged work on BEAM-8733:


Author: ASF GitHub Bot
Created on: 26/Nov/19 14:43
Start Date: 26/Nov/19 14:43
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10208: [BEAM-8733] Handle 
the registration request synchronously in the Python&Java SDK harness
URL: https://github.com/apache/beam/pull/10208#issuecomment-558660878
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349816)
Time Spent: 3h  (was: 2h 50m)

> The "KeyError: u'-47'" error from line 305 of sdk_worker.py
> ---
>
> Key: BEAM-8733
> URL: https://issues.apache.org/jira/browse/BEAM-8733
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> The issue reported by [~chamikara], error message as follows:
> apache_beam/runners/worker/sdk_worker.py", line 305, in get
> self.fns[bundle_descriptor_id],
> KeyError: u'-47'
> {code}
> at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:57)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:330)
> at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:195)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:123)
> Suppressed: java.lang.IllegalStateException: Already closed.
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:93)
>   at 
> org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.abort(RemoteGrpcPortWriteOperation.java:220)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:91)
> {code}
> More discussion info can be found here: 
> https://github.com/apache/beam/pull/10004



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8733) The "KeyError: u'-47'" error from line 305 of sdk_worker.py

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8733?focusedWorklogId=349818&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349818
 ]

ASF GitHub Bot logged work on BEAM-8733:


Author: ASF GitHub Bot
Created on: 26/Nov/19 14:44
Start Date: 26/Nov/19 14:44
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10208: [BEAM-8733] Handle 
the registration request synchronously in the Python&Java SDK harness
URL: https://github.com/apache/beam/pull/10208#issuecomment-558661313
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349818)
Time Spent: 3h 20m  (was: 3h 10m)

> The "KeyError: u'-47'" error from line 305 of sdk_worker.py
> ---
>
> Key: BEAM-8733
> URL: https://issues.apache.org/jira/browse/BEAM-8733
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> The issue reported by [~chamikara], error message as follows:
> apache_beam/runners/worker/sdk_worker.py", line 305, in get
> self.fns[bundle_descriptor_id],
> KeyError: u'-47'
> {code}
> at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:57)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:330)
> at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:195)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:123)
> Suppressed: java.lang.IllegalStateException: Already closed.
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:93)
>   at 
> org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.abort(RemoteGrpcPortWriteOperation.java:220)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:91)
> {code}
> More discussion info can be found here: 
> https://github.com/apache/beam/pull/10004



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8733) The "KeyError: u'-47'" error from line 305 of sdk_worker.py

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8733?focusedWorklogId=349817&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349817
 ]

ASF GitHub Bot logged work on BEAM-8733:


Author: ASF GitHub Bot
Created on: 26/Nov/19 14:44
Start Date: 26/Nov/19 14:44
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10208: [BEAM-8733] Handle 
the registration request synchronously in the Python&Java SDK harness
URL: https://github.com/apache/beam/pull/10208#issuecomment-558661015
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349817)
Time Spent: 3h 10m  (was: 3h)

> The "KeyError: u'-47'" error from line 305 of sdk_worker.py
> ---
>
> Key: BEAM-8733
> URL: https://issues.apache.org/jira/browse/BEAM-8733
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> The issue reported by [~chamikara], error message as follows:
> apache_beam/runners/worker/sdk_worker.py", line 305, in get
> self.fns[bundle_descriptor_id],
> KeyError: u'-47'
> {code}
> at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:57)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:330)
> at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:195)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:123)
> Suppressed: java.lang.IllegalStateException: Already closed.
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:93)
>   at 
> org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.abort(RemoteGrpcPortWriteOperation.java:220)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:91)
> {code}
> More discussion info can be found here: 
> https://github.com/apache/beam/pull/10004



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8825) OOM when writing large numbers of 'narrow' rows

2019-11-26 Thread Niel Markwick (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niel Markwick updated BEAM-8825:

Fix Version/s: (was: 2.17.0)
   2.18.0
Affects Version/s: 2.17.0

> OOM when writing large numbers of 'narrow' rows
> ---
>
> Key: BEAM-8825
> URL: https://issues.apache.org/jira/browse/BEAM-8825
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.9.0, 2.10.0, 2.11.0, 2.12.0, 2.13.0, 2.14.0, 2.15.0, 
> 2.16.0, 2.17.0
>Reporter: Niel Markwick
>Priority: Major
> Fix For: 2.18.0
>
>
> SpannerIO can OOM when writing large numbers of 'narrow' rows. 
>  
> SpannerIO puts  input mutation elements into batches for efficient writing.
> These batches are limited by number of cells mutated, and size of data 
> written (5000 cells, 1MB data). SpannerIO groups enough mutations to build 
> 1000 of these groups (5M cells, 1GB data), then sorts and batches them.
> When the number of cells and size of data is very small (<5 cells, <100 
> bytes), the memory overhead of storing millions of mutations for batching is 
> significant, and can lead to OOMs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8470) Create a new Spark runner based on Spark Structured streaming framework

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8470?focusedWorklogId=349829&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349829
 ]

ASF GitHub Bot logged work on BEAM-8470:


Author: ASF GitHub Bot
Created on: 26/Nov/19 15:18
Start Date: 26/Nov/19 15:18
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10221: [BEAM-8470] 
Exclude failed ValidatesRunner tests
URL: https://github.com/apache/beam/pull/10221#issuecomment-558677672
 
 
   Run Spark StructuredStreaming ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349829)
Time Spent: 13.5h  (was: 13h 20m)

> Create a new Spark runner based on Spark Structured streaming framework
> ---
>
> Key: BEAM-8470
> URL: https://issues.apache.org/jira/browse/BEAM-8470
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>
> h1. Why is it worth creating a new runner based on structured streaming:
> Because this new framework brings:
>  * Unified batch and streaming semantics:
>  * no more RDD/DStream distinction, as in Beam (only PCollection)
>  * Better state management:
>  * incremental state instead of saving all each time
>  * No more synchronous saving delaying computation: per batch and partition 
> delta file saved asynchronously + in-memory hashmap synchronous put/get
>  * Schemas in datasets:
>  * The dataset knows the structure of the data (fields) and can optimize 
> later on
>  * Schemas in PCollection in Beam
>  * New Source API
>  * Very close to Beam bounded source and unbounded sources
> h1. Why make a new runner from scratch?
>  * Structured streaming framework is very different from the RDD/Dstream 
> framework
> h1. We hope to gain
>  * More up to date runner in terms of libraries: leverage new features
>  * Leverage learnt practices from the previous runners
>  * Better performance thanks to the DAG optimizer (catalyst) and by 
> simplifying the code.
>  * Simplify the code and ease the maintenance
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8470) Create a new Spark runner based on Spark Structured streaming framework

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8470?focusedWorklogId=349830&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349830
 ]

ASF GitHub Bot logged work on BEAM-8470:


Author: ASF GitHub Bot
Created on: 26/Nov/19 15:19
Start Date: 26/Nov/19 15:19
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10221: [BEAM-8470] 
Exclude failed ValidatesRunner tests
URL: https://github.com/apache/beam/pull/10221#issuecomment-558651553
 
 
   Run Spark StructuredStreaming ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349830)
Time Spent: 13h 40m  (was: 13.5h)

> Create a new Spark runner based on Spark Structured streaming framework
> ---
>
> Key: BEAM-8470
> URL: https://issues.apache.org/jira/browse/BEAM-8470
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 13h 40m
>  Remaining Estimate: 0h
>
> h1. Why is it worth creating a new runner based on structured streaming:
> Because this new framework brings:
>  * Unified batch and streaming semantics:
>  * no more RDD/DStream distinction, as in Beam (only PCollection)
>  * Better state management:
>  * incremental state instead of saving all each time
>  * No more synchronous saving delaying computation: per batch and partition 
> delta file saved asynchronously + in-memory hashmap synchronous put/get
>  * Schemas in datasets:
>  * The dataset knows the structure of the data (fields) and can optimize 
> later on
>  * Schemas in PCollection in Beam
>  * New Source API
>  * Very close to Beam bounded source and unbounded sources
> h1. Why make a new runner from scratch?
>  * Structured streaming framework is very different from the RDD/Dstream 
> framework
> h1. We hope to gain
>  * More up to date runner in terms of libraries: leverage new features
>  * Leverage learnt practices from the previous runners
>  * Better performance thanks to the DAG optimizer (catalyst) and by 
> simplifying the code.
>  * Simplify the code and ease the maintenance
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8470) Create a new Spark runner based on Spark Structured streaming framework

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8470?focusedWorklogId=349846&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349846
 ]

ASF GitHub Bot logged work on BEAM-8470:


Author: ASF GitHub Bot
Created on: 26/Nov/19 15:37
Start Date: 26/Nov/19 15:37
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10221: [BEAM-8470] 
Exclude failed ValidatesRunner tests
URL: https://github.com/apache/beam/pull/10221#issuecomment-558686281
 
 
   @echauchot please, take a look
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349846)
Time Spent: 13h 50m  (was: 13h 40m)

> Create a new Spark runner based on Spark Structured streaming framework
> ---
>
> Key: BEAM-8470
> URL: https://issues.apache.org/jira/browse/BEAM-8470
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 13h 50m
>  Remaining Estimate: 0h
>
> h1. Why is it worth creating a new runner based on structured streaming:
> Because this new framework brings:
>  * Unified batch and streaming semantics:
>  * no more RDD/DStream distinction, as in Beam (only PCollection)
>  * Better state management:
>  * incremental state instead of saving all each time
>  * No more synchronous saving delaying computation: per batch and partition 
> delta file saved asynchronously + in-memory hashmap synchronous put/get
>  * Schemas in datasets:
>  * The dataset knows the structure of the data (fields) and can optimize 
> later on
>  * Schemas in PCollection in Beam
>  * New Source API
>  * Very close to Beam bounded source and unbounded sources
> h1. Why make a new runner from scratch?
>  * Structured streaming framework is very different from the RDD/Dstream 
> framework
> h1. We hope to gain
>  * More up to date runner in terms of libraries: leverage new features
>  * Leverage learnt practices from the previous runners
>  * Better performance thanks to the DAG optimizer (catalyst) and by 
> simplifying the code.
>  * Simplify the code and ease the maintenance
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (BEAM-8822) Hadoop Client version 2.8 from 2.7

2019-11-26 Thread Tomo Suzuki (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981950#comment-16981950
 ] 

Tomo Suzuki edited comment on BEAM-8822 at 11/26/19 3:41 PM:
-

org.apache.beam.sdk.io.hadoop.format.HadoopFormatIOElasticTest

{noformat}
org/apache/commons/httpclient/URIException
java.lang.NoClassDefFoundError: org/apache/commons/httpclient/URIException
at org.elasticsearch.hadoop.util.Version.(Version.java:58)
at 
org.elasticsearch.hadoop.rest.RestService.findPartitions(RestService.java:214)
at 
org.elasticsearch.hadoop.mr.EsInputFormat.getSplits(EsInputFormat.java:405)
at 
org.elasticsearch.hadoop.mr.EsInputFormat.getSplits(EsInputFormat.java:386)
at 
org.apache.beam.sdk.io.hadoop.format.HadoopFormatIO$HadoopInputFormatBoundedSource.computeSplitsIfNecessary(HadoopFormatIO.java:678)
at 
org.apache.beam.sdk.io.hadoop.format.HadoopFormatIO$HadoopInputFormatBoundedSource.getEstimatedSizeBytes(HadoopFormatIO.java:661)
at 
org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$InputProvider.getInitialInputs(BoundedReadEvaluatorFactory.java:212)
at 
org.apache.beam.runners.direct.ReadEvaluatorFactory$InputProvider.getInitialInputs(ReadEvaluatorFactory.java:89)
at 
org.apache.beam.runners.direct.RootProviderRegistry.getInitialInputs(RootProviderRegistry.java:76)
at 
org.apache.beam.runners.direct.ExecutorServiceParallelExecutor.start(ExecutorServiceParallelExecutor.java:155)
at 
org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:208)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:67)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:315)
at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350)
at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331)
at 
org.apache.beam.sdk.io.hadoop.format.HadoopFormatIOElasticTest.testHifIOWithElastic(HadoopFormatIOElasticTest.java:117)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at

[jira] [Work logged] (BEAM-8470) Create a new Spark runner based on Spark Structured streaming framework

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8470?focusedWorklogId=349849&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349849
 ]

ASF GitHub Bot logged work on BEAM-8470:


Author: ASF GitHub Bot
Created on: 26/Nov/19 15:42
Start Date: 26/Nov/19 15:42
Worklog Time Spent: 10m 
  Work Description: echauchot commented on issue #10221: [BEAM-8470] 
Exclude failed ValidatesRunner tests
URL: https://github.com/apache/beam/pull/10221#issuecomment-558688401
 
 
   IMHO, I would prefer to keep it red until they all pass.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349849)
Time Spent: 14h  (was: 13h 50m)

> Create a new Spark runner based on Spark Structured streaming framework
> ---
>
> Key: BEAM-8470
> URL: https://issues.apache.org/jira/browse/BEAM-8470
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 14h
>  Remaining Estimate: 0h
>
> h1. Why is it worth creating a new runner based on structured streaming:
> Because this new framework brings:
>  * Unified batch and streaming semantics:
>  * no more RDD/DStream distinction, as in Beam (only PCollection)
>  * Better state management:
>  * incremental state instead of saving all each time
>  * No more synchronous saving delaying computation: per batch and partition 
> delta file saved asynchronously + in-memory hashmap synchronous put/get
>  * Schemas in datasets:
>  * The dataset knows the structure of the data (fields) and can optimize 
> later on
>  * Schemas in PCollection in Beam
>  * New Source API
>  * Very close to Beam bounded source and unbounded sources
> h1. Why make a new runner from scratch?
>  * Structured streaming framework is very different from the RDD/Dstream 
> framework
> h1. We hope to gain
>  * More up to date runner in terms of libraries: leverage new features
>  * Leverage learnt practices from the previous runners
>  * Better performance thanks to the DAG optimizer (catalyst) and by 
> simplifying the code.
>  * Simplify the code and ease the maintenance
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8470) Create a new Spark runner based on Spark Structured streaming framework

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8470?focusedWorklogId=349850&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349850
 ]

ASF GitHub Bot logged work on BEAM-8470:


Author: ASF GitHub Bot
Created on: 26/Nov/19 15:42
Start Date: 26/Nov/19 15:42
Worklog Time Spent: 10m 
  Work Description: echauchot commented on issue #10221: [BEAM-8470] 
Exclude failed ValidatesRunner tests
URL: https://github.com/apache/beam/pull/10221#issuecomment-558688401
 
 
   IMHO, I would prefer to keep it red until they all pass because it is the 
essence of the tests
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349850)
Time Spent: 14h 10m  (was: 14h)

> Create a new Spark runner based on Spark Structured streaming framework
> ---
>
> Key: BEAM-8470
> URL: https://issues.apache.org/jira/browse/BEAM-8470
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 14h 10m
>  Remaining Estimate: 0h
>
> h1. Why is it worth creating a new runner based on structured streaming:
> Because this new framework brings:
>  * Unified batch and streaming semantics:
>  * no more RDD/DStream distinction, as in Beam (only PCollection)
>  * Better state management:
>  * incremental state instead of saving all each time
>  * No more synchronous saving delaying computation: per batch and partition 
> delta file saved asynchronously + in-memory hashmap synchronous put/get
>  * Schemas in datasets:
>  * The dataset knows the structure of the data (fields) and can optimize 
> later on
>  * Schemas in PCollection in Beam
>  * New Source API
>  * Very close to Beam bounded source and unbounded sources
> h1. Why make a new runner from scratch?
>  * Structured streaming framework is very different from the RDD/Dstream 
> framework
> h1. We hope to gain
>  * More up to date runner in terms of libraries: leverage new features
>  * Leverage learnt practices from the previous runners
>  * Better performance thanks to the DAG optimizer (catalyst) and by 
> simplifying the code.
>  * Simplify the code and ease the maintenance
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8470) Create a new Spark runner based on Spark Structured streaming framework

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8470?focusedWorklogId=349852&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349852
 ]

ASF GitHub Bot logged work on BEAM-8470:


Author: ASF GitHub Bot
Created on: 26/Nov/19 15:43
Start Date: 26/Nov/19 15:43
Worklog Time Spent: 10m 
  Work Description: echauchot commented on issue #10221: [BEAM-8470] 
Exclude failed ValidatesRunner tests
URL: https://github.com/apache/beam/pull/10221#issuecomment-558688401
 
 
   IMHO, I would prefer to keep it red until they all pass because it is the 
essence of the tests
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349852)
Time Spent: 14h 20m  (was: 14h 10m)

> Create a new Spark runner based on Spark Structured streaming framework
> ---
>
> Key: BEAM-8470
> URL: https://issues.apache.org/jira/browse/BEAM-8470
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 14h 20m
>  Remaining Estimate: 0h
>
> h1. Why is it worth creating a new runner based on structured streaming:
> Because this new framework brings:
>  * Unified batch and streaming semantics:
>  * no more RDD/DStream distinction, as in Beam (only PCollection)
>  * Better state management:
>  * incremental state instead of saving all each time
>  * No more synchronous saving delaying computation: per batch and partition 
> delta file saved asynchronously + in-memory hashmap synchronous put/get
>  * Schemas in datasets:
>  * The dataset knows the structure of the data (fields) and can optimize 
> later on
>  * Schemas in PCollection in Beam
>  * New Source API
>  * Very close to Beam bounded source and unbounded sources
> h1. Why make a new runner from scratch?
>  * Structured streaming framework is very different from the RDD/Dstream 
> framework
> h1. We hope to gain
>  * More up to date runner in terms of libraries: leverage new features
>  * Leverage learnt practices from the previous runners
>  * Better performance thanks to the DAG optimizer (catalyst) and by 
> simplifying the code.
>  * Simplify the code and ease the maintenance
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8470) Create a new Spark runner based on Spark Structured streaming framework

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8470?focusedWorklogId=349854&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349854
 ]

ASF GitHub Bot logged work on BEAM-8470:


Author: ASF GitHub Bot
Created on: 26/Nov/19 15:45
Start Date: 26/Nov/19 15:45
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10221: [BEAM-8470] 
Exclude failed ValidatesRunner tests
URL: https://github.com/apache/beam/pull/10221#issuecomment-558689479
 
 
   If they are always red, it's difficult to say if we have a regression or not 
with every new PR coming. You need to compare test results all the time, it's 
not convenient (this is what I already did several times).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349854)
Time Spent: 14h 40m  (was: 14.5h)

> Create a new Spark runner based on Spark Structured streaming framework
> ---
>
> Key: BEAM-8470
> URL: https://issues.apache.org/jira/browse/BEAM-8470
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 14h 40m
>  Remaining Estimate: 0h
>
> h1. Why is it worth creating a new runner based on structured streaming:
> Because this new framework brings:
>  * Unified batch and streaming semantics:
>  * no more RDD/DStream distinction, as in Beam (only PCollection)
>  * Better state management:
>  * incremental state instead of saving all each time
>  * No more synchronous saving delaying computation: per batch and partition 
> delta file saved asynchronously + in-memory hashmap synchronous put/get
>  * Schemas in datasets:
>  * The dataset knows the structure of the data (fields) and can optimize 
> later on
>  * Schemas in PCollection in Beam
>  * New Source API
>  * Very close to Beam bounded source and unbounded sources
> h1. Why make a new runner from scratch?
>  * Structured streaming framework is very different from the RDD/Dstream 
> framework
> h1. We hope to gain
>  * More up to date runner in terms of libraries: leverage new features
>  * Leverage learnt practices from the previous runners
>  * Better performance thanks to the DAG optimizer (catalyst) and by 
> simplifying the code.
>  * Simplify the code and ease the maintenance
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8470) Create a new Spark runner based on Spark Structured streaming framework

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8470?focusedWorklogId=349853&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349853
 ]

ASF GitHub Bot logged work on BEAM-8470:


Author: ASF GitHub Bot
Created on: 26/Nov/19 15:45
Start Date: 26/Nov/19 15:45
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10221: [BEAM-8470] 
Exclude failed ValidatesRunner tests
URL: https://github.com/apache/beam/pull/10221#issuecomment-558689479
 
 
   If they are always red, it's difficult to say if we have a regression or not 
with every new PR coming. You need to compare test results all the time, it's 
not convenient.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349853)
Time Spent: 14.5h  (was: 14h 20m)

> Create a new Spark runner based on Spark Structured streaming framework
> ---
>
> Key: BEAM-8470
> URL: https://issues.apache.org/jira/browse/BEAM-8470
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>
> h1. Why is it worth creating a new runner based on structured streaming:
> Because this new framework brings:
>  * Unified batch and streaming semantics:
>  * no more RDD/DStream distinction, as in Beam (only PCollection)
>  * Better state management:
>  * incremental state instead of saving all each time
>  * No more synchronous saving delaying computation: per batch and partition 
> delta file saved asynchronously + in-memory hashmap synchronous put/get
>  * Schemas in datasets:
>  * The dataset knows the structure of the data (fields) and can optimize 
> later on
>  * Schemas in PCollection in Beam
>  * New Source API
>  * Very close to Beam bounded source and unbounded sources
> h1. Why make a new runner from scratch?
>  * Structured streaming framework is very different from the RDD/Dstream 
> framework
> h1. We hope to gain
>  * More up to date runner in terms of libraries: leverage new features
>  * Leverage learnt practices from the previous runners
>  * Better performance thanks to the DAG optimizer (catalyst) and by 
> simplifying the code.
>  * Simplify the code and ease the maintenance
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8470) Create a new Spark runner based on Spark Structured streaming framework

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8470?focusedWorklogId=349855&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349855
 ]

ASF GitHub Bot logged work on BEAM-8470:


Author: ASF GitHub Bot
Created on: 26/Nov/19 15:47
Start Date: 26/Nov/19 15:47
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10221: [BEAM-8470] 
Exclude failed ValidatesRunner tests
URL: https://github.com/apache/beam/pull/10221#issuecomment-558690828
 
 
   > IMHO, I would prefer to keep it red until they all pass because it is the 
essence of the tests. 
   We have disabled a bunch of other tests (that are failing) as well. Do we 
need to enable them in this case?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349855)
Time Spent: 14h 50m  (was: 14h 40m)

> Create a new Spark runner based on Spark Structured streaming framework
> ---
>
> Key: BEAM-8470
> URL: https://issues.apache.org/jira/browse/BEAM-8470
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 14h 50m
>  Remaining Estimate: 0h
>
> h1. Why is it worth creating a new runner based on structured streaming:
> Because this new framework brings:
>  * Unified batch and streaming semantics:
>  * no more RDD/DStream distinction, as in Beam (only PCollection)
>  * Better state management:
>  * incremental state instead of saving all each time
>  * No more synchronous saving delaying computation: per batch and partition 
> delta file saved asynchronously + in-memory hashmap synchronous put/get
>  * Schemas in datasets:
>  * The dataset knows the structure of the data (fields) and can optimize 
> later on
>  * Schemas in PCollection in Beam
>  * New Source API
>  * Very close to Beam bounded source and unbounded sources
> h1. Why make a new runner from scratch?
>  * Structured streaming framework is very different from the RDD/Dstream 
> framework
> h1. We hope to gain
>  * More up to date runner in terms of libraries: leverage new features
>  * Leverage learnt practices from the previous runners
>  * Better performance thanks to the DAG optimizer (catalyst) and by 
> simplifying the code.
>  * Simplify the code and ease the maintenance
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8470) Create a new Spark runner based on Spark Structured streaming framework

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8470?focusedWorklogId=349856&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349856
 ]

ASF GitHub Bot logged work on BEAM-8470:


Author: ASF GitHub Bot
Created on: 26/Nov/19 15:48
Start Date: 26/Nov/19 15:48
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10221: [BEAM-8470] 
Exclude failed ValidatesRunner tests
URL: https://github.com/apache/beam/pull/10221#issuecomment-558690828
 
 
   > IMHO, I would prefer to keep it red until they all pass because it is the 
essence of the tests. 
   
   We have disabled a bunch of other tests (that are failing) as well. Do we 
need to enable them in this case?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349856)
Time Spent: 15h  (was: 14h 50m)

> Create a new Spark runner based on Spark Structured streaming framework
> ---
>
> Key: BEAM-8470
> URL: https://issues.apache.org/jira/browse/BEAM-8470
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 15h
>  Remaining Estimate: 0h
>
> h1. Why is it worth creating a new runner based on structured streaming:
> Because this new framework brings:
>  * Unified batch and streaming semantics:
>  * no more RDD/DStream distinction, as in Beam (only PCollection)
>  * Better state management:
>  * incremental state instead of saving all each time
>  * No more synchronous saving delaying computation: per batch and partition 
> delta file saved asynchronously + in-memory hashmap synchronous put/get
>  * Schemas in datasets:
>  * The dataset knows the structure of the data (fields) and can optimize 
> later on
>  * Schemas in PCollection in Beam
>  * New Source API
>  * Very close to Beam bounded source and unbounded sources
> h1. Why make a new runner from scratch?
>  * Structured streaming framework is very different from the RDD/Dstream 
> framework
> h1. We hope to gain
>  * More up to date runner in terms of libraries: leverage new features
>  * Leverage learnt practices from the previous runners
>  * Better performance thanks to the DAG optimizer (catalyst) and by 
> simplifying the code.
>  * Simplify the code and ease the maintenance
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-8218) Implement Apache PulsarIO

2019-11-26 Thread Maximilian Michels (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16982622#comment-16982622
 ] 

Maximilian Michels commented on BEAM-8218:
--

[~taherk77] Could you give an update on the progress here?

> Implement Apache PulsarIO
> -
>
> Key: BEAM-8218
> URL: https://issues.apache.org/jira/browse/BEAM-8218
> Project: Beam
>  Issue Type: Task
>  Components: io-ideas
>Reporter: Alex Van Boxel
>Assignee: Taher Koitawala
>Priority: Minor
>
> Apache Pulsar is starting to gain popularity. Having a native Beam PulsarIO 
> could be beneficial.
> [https://pulsar.apache.org/|https://pulsar.apache.org/en/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8733) The "KeyError: u'-47'" error from line 305 of sdk_worker.py

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8733?focusedWorklogId=349860&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349860
 ]

ASF GitHub Bot logged work on BEAM-8733:


Author: ASF GitHub Bot
Created on: 26/Nov/19 15:57
Start Date: 26/Nov/19 15:57
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10208: [BEAM-8733] Handle 
the registration request synchronously in the Python&Java SDK harness
URL: https://github.com/apache/beam/pull/10208#issuecomment-558695331
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349860)
Time Spent: 3h 40m  (was: 3.5h)

> The "KeyError: u'-47'" error from line 305 of sdk_worker.py
> ---
>
> Key: BEAM-8733
> URL: https://issues.apache.org/jira/browse/BEAM-8733
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> The issue reported by [~chamikara], error message as follows:
> apache_beam/runners/worker/sdk_worker.py", line 305, in get
> self.fns[bundle_descriptor_id],
> KeyError: u'-47'
> {code}
> at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:57)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:330)
> at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:195)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:123)
> Suppressed: java.lang.IllegalStateException: Already closed.
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:93)
>   at 
> org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.abort(RemoteGrpcPortWriteOperation.java:220)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:91)
> {code}
> More discussion info can be found here: 
> https://github.com/apache/beam/pull/10004



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8733) The "KeyError: u'-47'" error from line 305 of sdk_worker.py

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8733?focusedWorklogId=349859&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349859
 ]

ASF GitHub Bot logged work on BEAM-8733:


Author: ASF GitHub Bot
Created on: 26/Nov/19 15:57
Start Date: 26/Nov/19 15:57
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #10208: [BEAM-8733] Handle 
the registration request synchronously in the Python&Java SDK harness
URL: https://github.com/apache/beam/pull/10208#issuecomment-558695237
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349859)
Time Spent: 3.5h  (was: 3h 20m)

> The "KeyError: u'-47'" error from line 305 of sdk_worker.py
> ---
>
> Key: BEAM-8733
> URL: https://issues.apache.org/jira/browse/BEAM-8733
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Reporter: sunjincheng
>Assignee: sunjincheng
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> The issue reported by [~chamikara], error message as follows:
> apache_beam/runners/worker/sdk_worker.py", line 305, in get
> self.fns[bundle_descriptor_id],
> KeyError: u'-47'
> {code}
> at 
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
> at org.apache.beam.sdk.util.MoreFutures.get(MoreFutures.java:57)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.RegisterAndProcessBundleOperation.finish(RegisterAndProcessBundleOperation.java:330)
> at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:85)
> at 
> org.apache.beam.runners.dataflow.worker.fn.control.BeamFnMapTaskExecutor.execute(BeamFnMapTaskExecutor.java:125)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:411)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:380)
> at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:305)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.start(DataflowRunnerHarness.java:195)
> at 
> org.apache.beam.runners.dataflow.worker.DataflowRunnerHarness.main(DataflowRunnerHarness.java:123)
> Suppressed: java.lang.IllegalStateException: Already closed.
>   at 
> org.apache.beam.sdk.fn.data.BeamFnDataBufferingOutboundObserver.close(BeamFnDataBufferingOutboundObserver.java:93)
>   at 
> org.apache.beam.runners.dataflow.worker.fn.data.RemoteGrpcPortWriteOperation.abort(RemoteGrpcPortWriteOperation.java:220)
>   at 
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:91)
> {code}
> More discussion info can be found here: 
> https://github.com/apache/beam/pull/10004



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8470) Create a new Spark runner based on Spark Structured streaming framework

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8470?focusedWorklogId=349858&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349858
 ]

ASF GitHub Bot logged work on BEAM-8470:


Author: ASF GitHub Bot
Created on: 26/Nov/19 15:57
Start Date: 26/Nov/19 15:57
Worklog Time Spent: 10m 
  Work Description: echauchot commented on issue #10221: [BEAM-8470] 
Exclude failed ValidatesRunner tests
URL: https://github.com/apache/beam/pull/10221#issuecomment-558695063
 
 
   > We have disabled a bunch of other tests (that are failing) as well. Do we 
need to enable them in this case?
   well, you have a point.
   > If they are always red, it's difficult to say if we have a regression or 
not with every new PR coming. You need to compare test results all the time, 
it's not convenient (this is what I already did several times).
   you're right here also.
   
   What is the correct way to avoid forgetting fixing the non-passing tests ? 
Any suggestion better than listing content of build.gradle exclusions ? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349858)
Time Spent: 15h 10m  (was: 15h)

> Create a new Spark runner based on Spark Structured streaming framework
> ---
>
> Key: BEAM-8470
> URL: https://issues.apache.org/jira/browse/BEAM-8470
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 15h 10m
>  Remaining Estimate: 0h
>
> h1. Why is it worth creating a new runner based on structured streaming:
> Because this new framework brings:
>  * Unified batch and streaming semantics:
>  * no more RDD/DStream distinction, as in Beam (only PCollection)
>  * Better state management:
>  * incremental state instead of saving all each time
>  * No more synchronous saving delaying computation: per batch and partition 
> delta file saved asynchronously + in-memory hashmap synchronous put/get
>  * Schemas in datasets:
>  * The dataset knows the structure of the data (fields) and can optimize 
> later on
>  * Schemas in PCollection in Beam
>  * New Source API
>  * Very close to Beam bounded source and unbounded sources
> h1. Why make a new runner from scratch?
>  * Structured streaming framework is very different from the RDD/Dstream 
> framework
> h1. We hope to gain
>  * More up to date runner in terms of libraries: leverage new features
>  * Leverage learnt practices from the previous runners
>  * Better performance thanks to the DAG optimizer (catalyst) and by 
> simplifying the code.
>  * Simplify the code and ease the maintenance
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7746) Add type hints to python code

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=349861&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349861
 ]

ASF GitHub Bot logged work on BEAM-7746:


Author: ASF GitHub Bot
Created on: 26/Nov/19 16:02
Start Date: 26/Nov/19 16:02
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9915: [BEAM-7746] Add 
python type hints (part 1)
URL: https://github.com/apache/beam/pull/9915#issuecomment-558697643
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349861)
Time Spent: 31.5h  (was: 31h 20m)

> Add type hints to python code
> -
>
> Key: BEAM-7746
> URL: https://issues.apache.org/jira/browse/BEAM-7746
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 31.5h
>  Remaining Estimate: 0h
>
> As a developer of the beam source code, I would like the code to use pep484 
> type hints so that I can clearly see what types are required, get completion 
> in my IDE, and enforce code correctness via a static analyzer like mypy.
> This may be considered a precursor to BEAM-7060
> Work has been started here:  [https://github.com/apache/beam/pull/9056]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8828) BigQueryTableProvider should allow configuration of write disposition

2019-11-26 Thread Brian Hulette (Jira)

Brian Hulette created BEAM-8828:
---

 Summary: BigQueryTableProvider should allow configuration of write 
disposition
 Key: BEAM-8828
 URL: https://issues.apache.org/jira/browse/BEAM-8828
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Brian Hulette


It should be possible to set BigQueryIO's 
[writeDisposition|https://github.com/apache/beam/blob/b446304f75078ca9c97437e685409c31ceab7503/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L2122-L2125]
 in a big query table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8470) Create a new Spark runner based on Spark Structured streaming framework

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8470?focusedWorklogId=349870&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349870
 ]

ASF GitHub Bot logged work on BEAM-8470:


Author: ASF GitHub Bot
Created on: 26/Nov/19 16:22
Start Date: 26/Nov/19 16:22
Worklog Time Spent: 10m 
  Work Description: echauchot commented on issue #10221: [BEAM-8470] 
Exclude failed ValidatesRunner tests
URL: https://github.com/apache/beam/pull/10221#issuecomment-558695063
 
 
   > We have disabled a bunch of other tests (that are failing) as well. Do we 
need to enable them in this case?
   
   well, you have a point.
   
   > If they are always red, it's difficult to say if we have a regression or 
not with every new PR coming. You need to compare test results all the time, 
it's not convenient (this is what I already did several times).
   
   you're right here also.
   
   What is the correct way to avoid forgetting fixing the non-passing tests ? 
Any suggestion better than listing content of build.gradle exclusions ? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349870)
Time Spent: 15h 20m  (was: 15h 10m)

> Create a new Spark runner based on Spark Structured streaming framework
> ---
>
> Key: BEAM-8470
> URL: https://issues.apache.org/jira/browse/BEAM-8470
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 15h 20m
>  Remaining Estimate: 0h
>
> h1. Why is it worth creating a new runner based on structured streaming:
> Because this new framework brings:
>  * Unified batch and streaming semantics:
>  * no more RDD/DStream distinction, as in Beam (only PCollection)
>  * Better state management:
>  * incremental state instead of saving all each time
>  * No more synchronous saving delaying computation: per batch and partition 
> delta file saved asynchronously + in-memory hashmap synchronous put/get
>  * Schemas in datasets:
>  * The dataset knows the structure of the data (fields) and can optimize 
> later on
>  * Schemas in PCollection in Beam
>  * New Source API
>  * Very close to Beam bounded source and unbounded sources
> h1. Why make a new runner from scratch?
>  * Structured streaming framework is very different from the RDD/Dstream 
> framework
> h1. We hope to gain
>  * More up to date runner in terms of libraries: leverage new features
>  * Leverage learnt practices from the previous runners
>  * Better performance thanks to the DAG optimizer (catalyst) and by 
> simplifying the code.
>  * Simplify the code and ease the maintenance
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8829) PubsubJsonTableProvider throws error if input schema does not have event_timestamp

2019-11-26 Thread Brian Hulette (Jira)

Brian Hulette created BEAM-8829:
---

 Summary: PubsubJsonTableProvider throws error if input schema does 
not have event_timestamp
 Key: BEAM-8829
 URL: https://issues.apache.org/jira/browse/BEAM-8829
 Project: Beam
  Issue Type: Bug
  Components: dsl-sql
Reporter: Brian Hulette


RowToPubsubMessage uses 
[DropFields.fields("event_timestamp")|https://github.com/apache/beam/blob/b446304f75078ca9c97437e685409c31ceab7503/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/RowToPubsubMessage.java#L69]
 which throws if the input schema doesn't contain event_timestamp.

We should only drop the field if it exists.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner

2019-11-26 Thread Etienne Chauchot (Jira)

Etienne Chauchot created BEAM-8830:
--

 Summary: fix Flatten tests in Spark Structured Streaming runner
 Key: BEAM-8830
 URL: https://issues.apache.org/jira/browse/BEAM-8830
 Project: Beam
  Issue Type: Bug
  Components: runner-spark
Reporter: Etienne Chauchot
Assignee: Etienne Chauchot


'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
'org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders'
'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8470) Create a new Spark runner based on Spark Structured streaming framework

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8470?focusedWorklogId=349873&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349873
 ]

ASF GitHub Bot logged work on BEAM-8470:


Author: ASF GitHub Bot
Created on: 26/Nov/19 16:27
Start Date: 26/Nov/19 16:27
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #10221: [BEAM-8470] 
Exclude failed ValidatesRunner tests
URL: https://github.com/apache/beam/pull/10221#issuecomment-558708826
 
 
   @echauchot I'd prefer to keep `ValidatesRunner Tests` green and create a 
Jira(s) about failing tests since, in the end, there are the issues with 
transforms implementation (even if it's just corner cases). Wdyt?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349873)
Time Spent: 15.5h  (was: 15h 20m)

> Create a new Spark runner based on Spark Structured streaming framework
> ---
>
> Key: BEAM-8470
> URL: https://issues.apache.org/jira/browse/BEAM-8470
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Time Spent: 15.5h
>  Remaining Estimate: 0h
>
> h1. Why is it worth creating a new runner based on structured streaming:
> Because this new framework brings:
>  * Unified batch and streaming semantics:
>  * no more RDD/DStream distinction, as in Beam (only PCollection)
>  * Better state management:
>  * incremental state instead of saving all each time
>  * No more synchronous saving delaying computation: per batch and partition 
> delta file saved asynchronously + in-memory hashmap synchronous put/get
>  * Schemas in datasets:
>  * The dataset knows the structure of the data (fields) and can optimize 
> later on
>  * Schemas in PCollection in Beam
>  * New Source API
>  * Very close to Beam bounded source and unbounded sources
> h1. Why make a new runner from scratch?
>  * Structured streaming framework is very different from the RDD/Dstream 
> framework
> h1. We hope to gain
>  * More up to date runner in terms of libraries: leverage new features
>  * Leverage learnt practices from the previous runners
>  * Better performance thanks to the DAG optimizer (catalyst) and by 
> simplifying the code.
>  * Simplify the code and ease the maintenance
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-7516) Add a watermark manager for the fn_api_runner

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-7516?focusedWorklogId=349893&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349893
 ]

ASF GitHub Bot logged work on BEAM-7516:


Author: ASF GitHub Bot
Created on: 26/Nov/19 17:12
Start Date: 26/Nov/19 17:12
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10067: 
[BEAM-7516][BEAM-8823] FnApiRunner works with work queues, and a primitive 
watermark manager
URL: https://github.com/apache/beam/pull/10067#issuecomment-558729046
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349893)
Remaining Estimate: 0h
Time Spent: 10m

> Add a watermark manager for the fn_api_runner
> -
>
> Key: BEAM-7516
> URL: https://issues.apache.org/jira/browse/BEAM-7516
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> To track watermarks for each stage



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (BEAM-8831) Python PreCommit Failures: Could not copy file '/some/path/file.egg'

2019-11-26 Thread Luke Cwik (Jira)

Luke Cwik created BEAM-8831:
---

 Summary: Python PreCommit Failures: Could not copy file 
'/some/path/file.egg'
 Key: BEAM-8831
 URL: https://issues.apache.org/jira/browse/BEAM-8831
 Project: Beam
  Issue Type: Bug
  Components: test-failures
Reporter: Luke Cwik


Several precommits fail due to "Could not copy file '/some/path/file.egg'"

Examples

[https://scans.gradle.com/s/ihfmrxr7evslw/failure?openFailures=WzFd&openStackTraces=WzZd#top=0]

[https://scans.gradle.com/s/ihfmrxr7evslw]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8811) Upgrade Beam pipeline diagrams in docs

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8811?focusedWorklogId=349896&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349896
 ]

ASF GitHub Bot logged work on BEAM-8811:


Author: ASF GitHub Bot
Created on: 26/Nov/19 17:20
Start Date: 26/Nov/19 17:20
Worklog Time Spent: 10m 
  Work Description: soyrice commented on issue #10200: [BEAM-8811] Upgrade 
Beam pipeline diagrams in docs
URL: https://github.com/apache/beam/pull/10200#issuecomment-558732713
 
 
   > > My bad if we cleared this during review, but an Aggregation is also a 
PTransform. The diagrams seem to imply that they're different. Maybe an 
'Aggregating PTransform', or a 'Grouping PTransform', vs a 'per-element 
PTransform'.
   > 
   > Good catch - thanks. I like "Aggregating PTransform" since it's also 
consistent with the language in the transform catalog. Will update the legends 
accordingly.
   
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349896)
Time Spent: 40m  (was: 0.5h)

> Upgrade Beam pipeline diagrams in docs
> --
>
> Key: BEAM-8811
> URL: https://issues.apache.org/jira/browse/BEAM-8811
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (BEAM-8822) Hadoop Client version 2.8 from 2.7

2019-11-26 Thread Tomo Suzuki (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-8822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16981950#comment-16981950
 ] 

Tomo Suzuki edited comment on BEAM-8822 at 11/26/19 5:20 PM:
-

org.apache.beam.sdk.io.hadoop.format.HadoopFormatIOElasticTest
{noformat}
org/apache/commons/httpclient/URIException
java.lang.NoClassDefFoundError: org/apache/commons/httpclient/URIException
at org.elasticsearch.hadoop.util.Version.(Version.java:58)
at 
org.elasticsearch.hadoop.rest.RestService.findPartitions(RestService.java:214)
at 
org.elasticsearch.hadoop.mr.EsInputFormat.getSplits(EsInputFormat.java:405)
at 
org.elasticsearch.hadoop.mr.EsInputFormat.getSplits(EsInputFormat.java:386)
at 
org.apache.beam.sdk.io.hadoop.format.HadoopFormatIO$HadoopInputFormatBoundedSource.computeSplitsIfNecessary(HadoopFormatIO.java:678)
at 
org.apache.beam.sdk.io.hadoop.format.HadoopFormatIO$HadoopInputFormatBoundedSource.getEstimatedSizeBytes(HadoopFormatIO.java:661)
at 
org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$InputProvider.getInitialInputs(BoundedReadEvaluatorFactory.java:212)
at 
org.apache.beam.runners.direct.ReadEvaluatorFactory$InputProvider.getInitialInputs(ReadEvaluatorFactory.java:89)
at 
org.apache.beam.runners.direct.RootProviderRegistry.getInitialInputs(RootProviderRegistry.java:76)
at 
org.apache.beam.runners.direct.ExecutorServiceParallelExecutor.start(ExecutorServiceParallelExecutor.java:155)
at 
org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:208)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:67)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:315)
at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:350)
at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:331)
at 
org.apache.beam.sdk.io.hadoop.format.HadoopFormatIOElasticTest.testHifIOWithElastic(HadoopFormatIOElasticTest.java:117)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.apache.beam.sdk.testing.TestPipeline$1.evaluate(TestPipeline.java:319)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
s

[jira] [Closed] (BEAM-8747) Remove Unused non-vendored Guava compile dependencies

2019-11-26 Thread Tomo Suzuki (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomo Suzuki closed BEAM-8747.
-
Fix Version/s: 2.18.0
   Resolution: Fixed

> Remove Unused non-vendored Guava compile dependencies
> -
>
> Key: BEAM-8747
> URL: https://issues.apache.org/jira/browse/BEAM-8747
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
> Fix For: 2.18.0
>
> Attachments: Guava used as fully-qualified class name.png
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> [~kenn] says:
> BeamModulePlugin just contains lists of versions to ease coordination across 
> Beam modules, but mostly does not create dependencies. Most of Beam's modules 
> only depend on a few things there. For example Guava is not a core 
> dependency, but here is where it is actually depended upon:
> $ find . -name build.gradle | xargs grep library.java.guava
> ./sdks/java/core/build.gradle:  shadowTest library.java.guava_testlib
> ./sdks/java/extensions/sql/jdbc/build.gradle:  compile library.java.guava
> ./sdks/java/io/google-cloud-platform/build.gradle:  compile library.java.guava
> ./sdks/java/io/kinesis/build.gradle:  testCompile library.java.guava_testlib
> These results appear to be misleading. Grepping for 'import 
> com.google.common', I see this as the actual state of things:
>  - GCP connector does not appear to actually depend on Guava in compile scope
>  - The Beam SQL JDBC driver does not appear to actually depend on Guava in 
> compile scope
>  - The Dataflow Java worker does depend on Guava at compile scope but has 
> incorrect dependencies (and it probably shouldn't)
>  - KinesisIO does depend on Guava at compile scope but has incorrect 
> dependencies (Kinesis libs have Guava on API surface so it is OK here, but 
> should be correctly declared)
>  - ZetaSQL translator does depend on Guava at compile scope but has incorrect 
> dependencies (ZetaSQL has it on API surface so it is OK here, but should be 
> correctly declared)
> We used to have an analysis that prevented this class of error.
> Once the errors are fixed, the guava_version is simply a version that we have 
> discovered that seems to work for both Kinesis and ZetaSQL, libraries we do 
> not control. Kinesis producer is built against 18.0. Kinesis client against 
> 26.0-jre. ZetaSQL against 26.0-android.
> (or maybe I messed up in my analysis)
> Kenn



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Closed] (BEAM-8654) [Java] beam_Dependency_Check's not getting correct report from Gradle dependencyUpdates

2019-11-26 Thread Tomo Suzuki (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomo Suzuki closed BEAM-8654.
-
Fix Version/s: Not applicable
   Resolution: Fixed

> [Java] beam_Dependency_Check's not getting correct report from Gradle 
> dependencyUpdates
> ---
>
> Key: BEAM-8654
> URL: https://issues.apache.org/jira/browse/BEAM-8654
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Cont. of https://issues.apache.org/jira/browse/BEAM-8621
> https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_Dependency_Check/234/consoleFull
>  says
> {noformat}
> 18:20:07 > Task :dependencyUpdates
> ...
> 18:23:12 The following dependencies are using the latest release version:
> ...
> 18:23:12  - com.google.cloud.bigdataoss:util:1.9.16
> 18:23:12  - com.google.cloud.bigtable:bigtable-client-core:1.8.0
> {noformat}
> But they are not the latest release.
> * 
> https://search.maven.org/artifact/com.google.cloud.bigdataoss/util/2.0.0/jar 
> * 
> https://search.maven.org/artifact/com.google.cloud.bigtable/bigtable-client-core/1.12.1/jar
> Why does Gradle think they're the latest release?
> It seems that " -Drevision=release" flag plays some role here. Without the 
> flag, Gradle reports these artifacts are not the latest.
> https://gist.github.com/suztomo/1460f2be48025c8ea764e86a2c6e39a8
> Even with the flag, it should report the following
> {noformat}
> The following dependencies have later release versions:
>  - com.google.cloud.bigtable:bigtable-client-core [1.8.0 -> 1.12.1]
>  https://cloud.google.com/bigtable/
> {noformat}
> https://gist.github.com/suztomo/13473e6b9765c0e96c22aeffab18ef66



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-1440) Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-1440?focusedWorklogId=349897&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349897
 ]

ASF GitHub Bot logged work on BEAM-1440:


Author: ASF GitHub Bot
Created on: 26/Nov/19 17:27
Start Date: 26/Nov/19 17:27
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9772: [BEAM-1440] Create a 
BigQuery source that implements iobase.BoundedSource for Python
URL: https://github.com/apache/beam/pull/9772#issuecomment-558735446
 
 
   Looking once more.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349897)
Time Spent: 10.5h  (was: 10h 20m)

> Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK
> --
>
> Key: BEAM-1440
> URL: https://issues.apache.org/jira/browse/BEAM-1440
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> Currently we have a BigQuery native source for Python SDK [1].
> This can only be used by Dataflow runner.
> We should  implement a Beam BigQuery source that implements 
> iobase.BoundedSource [2] interface so that other runners that try to use 
> Python SDK can read from BigQuery as well. Java SDK already has a Beam 
> BigQuery source [3].
> [1] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py
> [2] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py#L70
> [3] 
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1189



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=349905&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349905
 ]

ASF GitHub Bot logged work on BEAM-8581:


Author: ASF GitHub Bot
Created on: 26/Nov/19 17:46
Start Date: 26/Nov/19 17:46
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #10035: [BEAM-8581] and 
[BEAM-8582] watermark and trigger fixes
URL: https://github.com/apache/beam/pull/10035#issuecomment-558743148
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349905)
Time Spent: 6h  (was: 5h 50m)

> Python SDK labels ontime empty panes as late
> 
>
> Key: BEAM-8581
> URL: https://issues.apache.org/jira/browse/BEAM-8581
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> The GeneralTriggerDriver does not put watermark holds on timers, leading to 
> the ontime empty pane being considered late data.
> Fix: Add a new notion of whether a trigger has an ontime pane. If it does, 
> then set a watermark hold to end of window - 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=349906&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349906
 ]

ASF GitHub Bot logged work on BEAM-8581:


Author: ASF GitHub Bot
Created on: 26/Nov/19 17:46
Start Date: 26/Nov/19 17:46
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #10035: [BEAM-8581] and 
[BEAM-8582] watermark and trigger fixes
URL: https://github.com/apache/beam/pull/10035#issuecomment-558743176
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349906)
Time Spent: 6h 10m  (was: 6h)

> Python SDK labels ontime empty panes as late
> 
>
> Key: BEAM-8581
> URL: https://issues.apache.org/jira/browse/BEAM-8581
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> The GeneralTriggerDriver does not put watermark holds on timers, leading to 
> the ontime empty pane being considered late data.
> Fix: Add a new notion of whether a trigger has an ontime pane. If it does, 
> then set a watermark hold to end of window - 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8581) Python SDK labels ontime empty panes as late

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8581?focusedWorklogId=349908&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349908
 ]

ASF GitHub Bot logged work on BEAM-8581:


Author: ASF GitHub Bot
Created on: 26/Nov/19 17:46
Start Date: 26/Nov/19 17:46
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on issue #10035: [BEAM-8581] and 
[BEAM-8582] watermark and trigger fixes
URL: https://github.com/apache/beam/pull/10035#issuecomment-558743234
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349908)
Time Spent: 6h 20m  (was: 6h 10m)

> Python SDK labels ontime empty panes as late
> 
>
> Key: BEAM-8581
> URL: https://issues.apache.org/jira/browse/BEAM-8581
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Major
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> The GeneralTriggerDriver does not put watermark holds on timers, leading to 
> the ontime empty pane being considered late data.
> Fix: Add a new notion of whether a trigger has an ontime pane. If it does, 
> then set a watermark hold to end of window - 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8822) Hadoop Client version 2.8 from 2.7

2019-11-26 Thread Tomo Suzuki (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomo Suzuki updated BEAM-8822:
--
Attachment: OGuVu0A18jJ.png

> Hadoop Client version 2.8 from 2.7
> --
>
> Key: BEAM-8822
> URL: https://issues.apache.org/jira/browse/BEAM-8822
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
> Attachments: OGuVu0A18jJ.png
>
>
> [~iemejia] says:
> bq. probably a quicker way forward is to unblock the bigtable issue is to 
> move our Hadoop dependency to Hadoop 2.8 given that Hadoop 2.7 is now EOL we 
> have a good reason to do so 
> https://cwiki.apache.org/confluence/display/HADOOP/EOL+%28End-of-life%29+Release+Branches
> The URL says
> {quote}Following branches are EOL: 
> [2.0.x - 2.7.x]{quote}
> https://issues.apache.org/jira/browse/BEAM-8569?focusedCommentId=16980532&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16980532
> About compatibility with other library:
> Hadoop client 2.7 is not compatible with Guava > 21 because of 
> Objects.toStringHelper. Fortunately Hadoop client 2.8 removed the use of the 
> method 
> ([detail|https://github.com/GoogleCloudPlatform/cloud-opensource-java/issues/1028#issuecomment-557709027]).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (BEAM-8822) Hadoop Client version 2.8 from 2.7

2019-11-26 Thread Tomo Suzuki (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomo Suzuki updated BEAM-8822:
--
Description: 
[~iemejia] says:

bq. probably a quicker way forward is to unblock the bigtable issue is to move 
our Hadoop dependency to Hadoop 2.8 given that Hadoop 2.7 is now EOL we have a 
good reason to do so 
https://cwiki.apache.org/confluence/display/HADOOP/EOL+%28End-of-life%29+Release+Branches

The URL says

{quote}Following branches are EOL: 

[2.0.x - 2.7.x]{quote}


https://issues.apache.org/jira/browse/BEAM-8569?focusedCommentId=16980532&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16980532

About compatibility with other library:

Hadoop client 2.7 is not compatible with Guava > 21 because of 
Objects.toStringHelper. Fortunately Hadoop client 2.8 removed the use of the 
method 
([detail|https://github.com/GoogleCloudPlatform/cloud-opensource-java/issues/1028#issuecomment-557709027]).

2.8.5 is the latest in 2.8.X.

 !OGuVu0A18jJ.png! 

  was:
[~iemejia] says:

bq. probably a quicker way forward is to unblock the bigtable issue is to move 
our Hadoop dependency to Hadoop 2.8 given that Hadoop 2.7 is now EOL we have a 
good reason to do so 
https://cwiki.apache.org/confluence/display/HADOOP/EOL+%28End-of-life%29+Release+Branches

The URL says

{quote}Following branches are EOL: 

[2.0.x - 2.7.x]{quote}


https://issues.apache.org/jira/browse/BEAM-8569?focusedCommentId=16980532&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16980532

About compatibility with other library:

Hadoop client 2.7 is not compatible with Guava > 21 because of 
Objects.toStringHelper. Fortunately Hadoop client 2.8 removed the use of the 
method 
([detail|https://github.com/GoogleCloudPlatform/cloud-opensource-java/issues/1028#issuecomment-557709027]).


> Hadoop Client version 2.8 from 2.7
> --
>
> Key: BEAM-8822
> URL: https://issues.apache.org/jira/browse/BEAM-8822
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
> Attachments: OGuVu0A18jJ.png
>
>
> [~iemejia] says:
> bq. probably a quicker way forward is to unblock the bigtable issue is to 
> move our Hadoop dependency to Hadoop 2.8 given that Hadoop 2.7 is now EOL we 
> have a good reason to do so 
> https://cwiki.apache.org/confluence/display/HADOOP/EOL+%28End-of-life%29+Release+Branches
> The URL says
> {quote}Following branches are EOL: 
> [2.0.x - 2.7.x]{quote}
> https://issues.apache.org/jira/browse/BEAM-8569?focusedCommentId=16980532&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16980532
> About compatibility with other library:
> Hadoop client 2.7 is not compatible with Guava > 21 because of 
> Objects.toStringHelper. Fortunately Hadoop client 2.8 removed the use of the 
> method 
> ([detail|https://github.com/GoogleCloudPlatform/cloud-opensource-java/issues/1028#issuecomment-557709027]).
> 2.8.5 is the latest in 2.8.X.
>  !OGuVu0A18jJ.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8656) flink_master_url usage in flink_runner.py

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8656?focusedWorklogId=349912&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349912
 ]

ASF GitHub Bot logged work on BEAM-8656:


Author: ASF GitHub Bot
Created on: 26/Nov/19 17:57
Start Date: 26/Nov/19 17:57
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #10220: [BEAM-8656] 
Update documentation for flink_master parameter
URL: https://github.com/apache/beam/pull/10220#discussion_r350896528
 
 

 ##
 File path: website/src/documentation/runners/flink.md
 ##
 @@ -323,7 +323,7 @@ See [here]({{ site.baseurl 
}}/roadmap/portability/#sdk-harness-config) for detai
 
 
 As of Beam 2.15.0, steps 2 and 3 can be automated in 
Python by using the `FlinkRunner`,
 
 Review comment:
   IIUC `flink_master` won't work with Beam 2.15-16, so maybe we should change 
this to say "As of Beam 2.17.0".
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349912)
Time Spent: 20m  (was: 10m)

> flink_master_url usage in flink_runner.py
> -
>
> Key: BEAM-8656
> URL: https://issues.apache.org/jira/browse/BEAM-8656
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> flink_master_url was replaced with flink_master in flink_runner.py without 
> preserving backward compatibility, but it remains documented on the website. 
> We will either have to update the website (making it clear that the 
> instructions are for 2.17+) or else make sure that the flink_master_url gets 
> aliased to flink_master before the 2.17 release is finalized. If anyone's 
> already using flink_runner.py, this will also break their existing pipeline.
> [https://github.com/apache/beam/pull/9946]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8822) Hadoop Client version 2.8 from 2.7

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8822?focusedWorklogId=349913&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349913
 ]

ASF GitHub Bot logged work on BEAM-8822:


Author: ASF GitHub Bot
Created on: 26/Nov/19 17:59
Start Date: 26/Nov/19 17:59
Worklog Time Spent: 10m 
  Work Description: suztomo commented on pull request #10222: [BEAM-8822] 
Hadoop client version 2.8.5 from 2.7
URL: https://github.com/apache/beam/pull/10222
 
 
   Upgrading Hadoop client version 2.7 (EOL) to 2.8.5 (the latest in 2.8.X 
release).
   
   CC: @iemejia 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Stat

[jira] [Work logged] (BEAM-8822) Hadoop Client version 2.8 from 2.7

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8822?focusedWorklogId=349914&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349914
 ]

ASF GitHub Bot logged work on BEAM-8822:


Author: ASF GitHub Bot
Created on: 26/Nov/19 17:59
Start Date: 26/Nov/19 17:59
Worklog Time Spent: 10m 
  Work Description: suztomo commented on pull request #10222: [BEAM-8822] 
Hadoop client version 2.8.5 from 2.7
URL: https://github.com/apache/beam/pull/10222#discussion_r350897749
 
 

 ##
 File path: sdks/java/io/hadoop-format/build.gradle
 ##
 @@ -73,6 +73,8 @@ dependencies {
 exclude group: "org.apache.spark", module: "spark-sql_2.10"
 exclude group: "org.apache.storm", module: "storm-core"
   }
+  // elasticsearch-hadoop 5.0.0 uses commons-httpclient's URIException
+  testCompile "commons-httpclient:commons-httpclient:3.1"
 
 Review comment:
   `Elasticsearch-hadoop` appear 12 lines above this line.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349914)
Time Spent: 20m  (was: 10m)

> Hadoop Client version 2.8 from 2.7
> --
>
> Key: BEAM-8822
> URL: https://issues.apache.org/jira/browse/BEAM-8822
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
> Attachments: OGuVu0A18jJ.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [~iemejia] says:
> bq. probably a quicker way forward is to unblock the bigtable issue is to 
> move our Hadoop dependency to Hadoop 2.8 given that Hadoop 2.7 is now EOL we 
> have a good reason to do so 
> https://cwiki.apache.org/confluence/display/HADOOP/EOL+%28End-of-life%29+Release+Branches
> The URL says
> {quote}Following branches are EOL: 
> [2.0.x - 2.7.x]{quote}
> https://issues.apache.org/jira/browse/BEAM-8569?focusedCommentId=16980532&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16980532
> About compatibility with other library:
> Hadoop client 2.7 is not compatible with Guava > 21 because of 
> Objects.toStringHelper. Fortunately Hadoop client 2.8 removed the use of the 
> method 
> ([detail|https://github.com/GoogleCloudPlatform/cloud-opensource-java/issues/1028#issuecomment-557709027]).
> 2.8.5 is the latest in 2.8.X.
>  !OGuVu0A18jJ.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8822) Hadoop Client version 2.8 from 2.7

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8822?focusedWorklogId=349915&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349915
 ]

ASF GitHub Bot logged work on BEAM-8822:


Author: ASF GitHub Bot
Created on: 26/Nov/19 18:00
Start Date: 26/Nov/19 18:00
Worklog Time Spent: 10m 
  Work Description: suztomo commented on issue #10222: [BEAM-8822] Hadoop 
client version 2.8.5 from 2.7
URL: https://github.com/apache/beam/pull/10222#issuecomment-558748894
 
 
   Run Java PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349915)
Time Spent: 0.5h  (was: 20m)

> Hadoop Client version 2.8 from 2.7
> --
>
> Key: BEAM-8822
> URL: https://issues.apache.org/jira/browse/BEAM-8822
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Tomo Suzuki
>Assignee: Tomo Suzuki
>Priority: Major
> Attachments: OGuVu0A18jJ.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> [~iemejia] says:
> bq. probably a quicker way forward is to unblock the bigtable issue is to 
> move our Hadoop dependency to Hadoop 2.8 given that Hadoop 2.7 is now EOL we 
> have a good reason to do so 
> https://cwiki.apache.org/confluence/display/HADOOP/EOL+%28End-of-life%29+Release+Branches
> The URL says
> {quote}Following branches are EOL: 
> [2.0.x - 2.7.x]{quote}
> https://issues.apache.org/jira/browse/BEAM-8569?focusedCommentId=16980532&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16980532
> About compatibility with other library:
> Hadoop client 2.7 is not compatible with Guava > 21 because of 
> Objects.toStringHelper. Fortunately Hadoop client 2.8 removed the use of the 
> method 
> ([detail|https://github.com/GoogleCloudPlatform/cloud-opensource-java/issues/1028#issuecomment-557709027]).
> 2.8.5 is the latest in 2.8.X.
>  !OGuVu0A18jJ.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8803) Default behaviour for Python BQ Streaming inserts sink should be to retry always

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8803?focusedWorklogId=349920&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349920
 ]

ASF GitHub Bot logged work on BEAM-8803:


Author: ASF GitHub Bot
Created on: 26/Nov/19 18:12
Start Date: 26/Nov/19 18:12
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #10207: [2.17.0][cherry 
pick][BEAM-8803] BigQuery Streaming Inserts are always retried by default.
URL: https://github.com/apache/beam/pull/10207#issuecomment-558753593
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349920)
Time Spent: 1.5h  (was: 1h 20m)

> Default behaviour for Python BQ Streaming inserts sink should be to retry 
> always
> 
>
> Key: BEAM-8803
> URL: https://issues.apache.org/jira/browse/BEAM-8803
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Blocker
> Fix For: 2.18.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8406) TextTable support JSON format

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8406?focusedWorklogId=349921&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349921
 ]

ASF GitHub Bot logged work on BEAM-8406:


Author: ASF GitHub Bot
Created on: 26/Nov/19 18:13
Start Date: 26/Nov/19 18:13
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #10217: [BEAM-8406] Add 
support for JSON format text tables
URL: https://github.com/apache/beam/pull/10217#issuecomment-558753918
 
 
   Thanks for your contribution! I will take a look soon.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349921)
Time Spent: 0.5h  (was: 20m)

> TextTable support JSON format
> -
>
> Key: BEAM-8406
> URL: https://issues.apache.org/jira/browse/BEAM-8406
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Jing Chen
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Have a JSON table implementation similar to [1].
> [1]: 
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/text/TextTable.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8251) Add worker_region and worker_zone options

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8251?focusedWorklogId=349923&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349923
 ]

ASF GitHub Bot logged work on BEAM-8251:


Author: ASF GitHub Bot
Created on: 26/Nov/19 18:22
Start Date: 26/Nov/19 18:22
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10150: [BEAM-8251] plumb 
worker_(region|zone) to Environment proto
URL: https://github.com/apache/beam/pull/10150#issuecomment-558757382
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349923)
Time Spent: 1h 20m  (was: 1h 10m)

> Add worker_region and worker_zone options
> -
>
> Key: BEAM-8251
> URL: https://issues.apache.org/jira/browse/BEAM-8251
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> We are refining the way the user specifies worker regions and zones to the 
> Dataflow service. We need to add worker_region and worker_zone pipeline 
> options that will be preferred over the old experiments=worker_region and 
> --zone flags. I will create subtasks for adding these options to each SDK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8251) Add worker_region and worker_zone options

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8251?focusedWorklogId=349924&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349924
 ]

ASF GitHub Bot logged work on BEAM-8251:


Author: ASF GitHub Bot
Created on: 26/Nov/19 18:23
Start Date: 26/Nov/19 18:23
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #10150: [BEAM-8251] plumb 
worker_(region|zone) to Environment proto
URL: https://github.com/apache/beam/pull/10150#issuecomment-558757431
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349924)
Time Spent: 1.5h  (was: 1h 20m)

> Add worker_region and worker_zone options
> -
>
> Key: BEAM-8251
> URL: https://issues.apache.org/jira/browse/BEAM-8251
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> We are refining the way the user specifies worker regions and zones to the 
> Dataflow service. We need to add worker_region and worker_zone pipeline 
> options that will be preferred over the old experiments=worker_region and 
> --zone flags. I will create subtasks for adding these options to each SDK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8811) Upgrade Beam pipeline diagrams in docs

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8811?focusedWorklogId=349927&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349927
 ]

ASF GitHub Bot logged work on BEAM-8811:


Author: ASF GitHub Bot
Created on: 26/Nov/19 18:27
Start Date: 26/Nov/19 18:27
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10200: [BEAM-8811] Upgrade 
Beam pipeline diagrams in docs
URL: https://github.com/apache/beam/pull/10200#issuecomment-558759066
 
 
   Thanks Cyrus. It looks like 
http://apache-beam-website-pull-requests.storage.googleapis.com/10200/get-started/wordcount-example/index.html
 has a figure that needs updating.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349927)
Time Spent: 50m  (was: 40m)

> Upgrade Beam pipeline diagrams in docs
> --
>
> Key: BEAM-8811
> URL: https://issues.apache.org/jira/browse/BEAM-8811
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8811) Upgrade Beam pipeline diagrams in docs

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8811?focusedWorklogId=349928&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349928
 ]

ASF GitHub Bot logged work on BEAM-8811:


Author: ASF GitHub Bot
Created on: 26/Nov/19 18:29
Start Date: 26/Nov/19 18:29
Worklog Time Spent: 10m 
  Work Description: soyrice commented on issue #10200: [BEAM-8811] Upgrade 
Beam pipeline diagrams in docs
URL: https://github.com/apache/beam/pull/10200#issuecomment-558760044
 
 
   > Thanks Cyrus. It looks like 
http://apache-beam-website-pull-requests.storage.googleapis.com/10200/get-started/wordcount-example/index.html
 has a figure that needs updating.
   
   Done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349928)
Time Spent: 1h  (was: 50m)

> Upgrade Beam pipeline diagrams in docs
> --
>
> Key: BEAM-8811
> URL: https://issues.apache.org/jira/browse/BEAM-8811
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8811) Upgrade Beam pipeline diagrams in docs

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8811?focusedWorklogId=349929&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349929
 ]

ASF GitHub Bot logged work on BEAM-8811:


Author: ASF GitHub Bot
Created on: 26/Nov/19 18:30
Start Date: 26/Nov/19 18:30
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10200: [BEAM-8811] Upgrade 
Beam pipeline diagrams in docs
URL: https://github.com/apache/beam/pull/10200#issuecomment-558760137
 
 
   Very cool having the SVG. RAT must be failing because the SVG files don't 
have the Apache license header. Other than that, LGTM.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349929)
Time Spent: 1h 10m  (was: 1h)

> Upgrade Beam pipeline diagrams in docs
> --
>
> Key: BEAM-8811
> URL: https://issues.apache.org/jira/browse/BEAM-8811
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8811) Upgrade Beam pipeline diagrams in docs

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8811?focusedWorklogId=349933&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349933
 ]

ASF GitHub Bot logged work on BEAM-8811:


Author: ASF GitHub Bot
Created on: 26/Nov/19 18:34
Start Date: 26/Nov/19 18:34
Worklog Time Spent: 10m 
  Work Description: soyrice commented on issue #10200: [BEAM-8811] Upgrade 
Beam pipeline diagrams in docs
URL: https://github.com/apache/beam/pull/10200#issuecomment-558762131
 
 
   > Very cool having the SVG. RAT must be failing because the SVG files don't 
have the Apache license header. Other than that, LGTM.
   
   
   > Very cool having the SVG. RAT must be failing because the SVG files don't 
have the Apache license header. Other than that, LGTM.
   
   Looks like it. Should be fixed now but I'll keep tabs on it in case this 
next build fails.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349933)
Time Spent: 1h 20m  (was: 1h 10m)

> Upgrade Beam pipeline diagrams in docs
> --
>
> Key: BEAM-8811
> URL: https://issues.apache.org/jira/browse/BEAM-8811
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-8811) Upgrade Beam pipeline diagrams in docs

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8811?focusedWorklogId=349934&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349934
 ]

ASF GitHub Bot logged work on BEAM-8811:


Author: ASF GitHub Bot
Created on: 26/Nov/19 18:35
Start Date: 26/Nov/19 18:35
Worklog Time Spent: 10m 
  Work Description: soyrice commented on issue #10200: [BEAM-8811] Upgrade 
Beam pipeline diagrams in docs
URL: https://github.com/apache/beam/pull/10200#issuecomment-558762131
 
 
   > Very cool having the SVG. RAT must be failing because the SVG files don't 
have the Apache license header. Other than that, LGTM.
   
   
   > Very cool having the SVG. RAT must be failing because the SVG files don't 
have the Apache license header. Other than that, LGTM.
   
   Looks like it. Added the header so it should be fixed now, but I'll keep 
tabs on it in case this next build fails.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349934)
Time Spent: 1.5h  (was: 1h 20m)

> Upgrade Beam pipeline diagrams in docs
> --
>
> Key: BEAM-8811
> URL: https://issues.apache.org/jira/browse/BEAM-8811
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work started] (BEAM-8406) TextTable support JSON format

2019-11-26 Thread Jing Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-8406 started by Jing Chen.
---
> TextTable support JSON format
> -
>
> Key: BEAM-8406
> URL: https://issues.apache.org/jira/browse/BEAM-8406
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Jing Chen
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Have a JSON table implementation similar to [1].
> [1]: 
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/text/TextTable.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (BEAM-3788) Implement a Kafka IO for Python SDK

2019-11-26 Thread Jing Chen (Jira)



[ 
https://issues.apache.org/jira/browse/BEAM-3788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16982762#comment-16982762
 ] 

Jing Chen commented on BEAM-3788:
-

i am curious if there is a plan to integrate confluent's schema registry etc or 
simply vanilla kafka

> Implement a Kafka IO for Python SDK
> ---
>
> Key: BEAM-3788
> URL: https://issues.apache.org/jira/browse/BEAM-3788
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Priority: Major
>
> This will be implemented using the Splittable DoFn framework.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-1440) Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK

2019-11-26 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/BEAM-1440?focusedWorklogId=349936&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-349936
 ]

ASF GitHub Bot logged work on BEAM-1440:


Author: ASF GitHub Bot
Created on: 26/Nov/19 18:45
Start Date: 26/Nov/19 18:45
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9772: [BEAM-1440] 
Create a BigQuery source that implements iobase.BoundedSource for Python
URL: https://github.com/apache/beam/pull/9772#discussion_r350918239
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery.py
 ##
 @@ -499,6 +509,189 @@ def reader(self, test_bigquery_client=None):
 kms_key=self.kms_key)
 
 
+FieldSchema = collections.namedtuple('FieldSchema', 'fields mode name type')
+
+
+def _to_bool(value):
+  return value == 'true'
+
+
+def _to_decimal(value):
+  return decimal.Decimal(value)
+
+
+def _to_bytes(value):
+  """Converts value from str to bytes on Python 3.x. Does nothing on
+  Python 2.7."""
+  return value.encode('utf-8')
+
+
+class _JsonToDictCoder(coders.Coder):
+  """A coder for a JSON string to a Python dict."""
+
+  def __init__(self, table_schema):
+self.fields = self._convert_to_tuple(table_schema.fields)
+self._converters = {
+'INTEGER': int,
+'INT64': int,
+'FLOAT': float,
+'BOOLEAN': _to_bool,
+'NUMERIC': _to_decimal,
+'BYTES': _to_bytes,
+}
+
+  @classmethod
+  def _convert_to_tuple(cls, table_field_schemas):
+"""Recursively converts the list of TableFieldSchema instances to the
+list of tuples to prevent errors when pickling and unpickling
+TableFieldSchema instances.
+"""
+if not table_field_schemas:
+  return []
+
+return [FieldSchema(cls._convert_to_tuple(x.fields), x.mode, x.name,
+x.type)
+for x in table_field_schemas]
+
+  def decode(self, value):
+value = json.loads(value)
+return self._decode_with_schema(value, self.fields)
+
+  def _decode_with_schema(self, value, schema_fields):
+for field in schema_fields:
+  if field.name not in value:
+# The field exists in the schema, but it doesn't exist in this row.
+# It probably means its value was null, as the extract to JSON job
+# doesn't preserve null fields
+value[field.name] = None
+continue
+
+  if field.type == 'RECORD':
+value[field.name] = self._decode_with_schema(value[field.name],
+ field.fields)
+  else:
+try:
+  converter = self._converters[field.type]
+  value[field.name] = converter(value[field.name])
+except KeyError:
 
 Review comment:
   Does this mean that for other data types, we pass them as they are? e.g. for 
datetime data, or other like that?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 349936)
Time Spent: 10h 40m  (was: 10.5h)

> Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK
> --
>
> Key: BEAM-1440
> URL: https://issues.apache.org/jira/browse/BEAM-1440
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> Currently we have a BigQuery native source for Python SDK [1].
> This can only be used by Dataflow runner.
> We should  implement a Beam BigQuery source that implements 
> iobase.BoundedSource [2] interface so that other runners that try to use 
> Python SDK can read from BigQuery as well. Java SDK already has a Beam 
> BigQuery source [3].
> [1] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py
> [2] 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py#L70
> [3] 
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1189



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 3 >

1 - 100 of 224 matches

Mail list logo