[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=440981&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-440981
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 03/Jun/20 20:57
Start Date: 03/Jun/20 20:57
Worklog Time Spent: 10m 
  Work Description: pabloem merged pull request #11896:
URL: https://github.com/apache/beam/pull/11896


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 440981)
Time Spent: 9h 40m  (was: 9.5h)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: P3
> Fix For: 2.21.0
>
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=440943&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-440943
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 03/Jun/20 19:35
Start Date: 03/Jun/20 19:35
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11896:
URL: https://github.com/apache/beam/pull/11896#issuecomment-638416678


   Run PythonLint PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 440943)
Time Spent: 9.5h  (was: 9h 20m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: P3
> Fix For: 2.21.0
>
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=440931&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-440931
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 03/Jun/20 19:11
Start Date: 03/Jun/20 19:11
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11896:
URL: https://github.com/apache/beam/pull/11896#issuecomment-638403708


   thanks @chunyang ! this LGTM. I'll merge after lint passes



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 440931)
Time Spent: 9h 20m  (was: 9h 10m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: P3
> Fix For: 2.21.0
>
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=440930&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-440930
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 03/Jun/20 19:10
Start Date: 03/Jun/20 19:10
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11896:
URL: https://github.com/apache/beam/pull/11896#issuecomment-638403555


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 440930)
Time Spent: 9h 10m  (was: 9h)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: P3
> Fix For: 2.21.0
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=440874&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-440874
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 03/Jun/20 16:38
Start Date: 03/Jun/20 16:38
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11896:
URL: https://github.com/apache/beam/pull/11896#issuecomment-638314463


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 440874)
Time Spent: 9h  (was: 8h 50m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: P3
> Fix For: 2.21.0
>
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=440863&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-440863
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 03/Jun/20 16:32
Start Date: 03/Jun/20 16:32
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11896:
URL: https://github.com/apache/beam/pull/11896#issuecomment-638310807


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 440863)
Time Spent: 8h 50m  (was: 8h 40m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: P3
> Fix For: 2.21.0
>
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-06-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=440856&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-440856
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 03/Jun/20 16:16
Start Date: 03/Jun/20 16:16
Worklog Time Spent: 10m 
  Work Description: chunyang commented on pull request #11896:
URL: https://github.com/apache/beam/pull/11896#issuecomment-638301892


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 440856)
Time Spent: 8h 40m  (was: 8.5h)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: P3
> Fix For: 2.21.0
>
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-06-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=440444&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-440444
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 02/Jun/20 21:06
Start Date: 02/Jun/20 21:06
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #11896:
URL: https://github.com/apache/beam/pull/11896#issuecomment-637806282


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 440444)
Time Spent: 8.5h  (was: 8h 20m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: P3
> Fix For: 2.21.0
>
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-06-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=440407&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-440407
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 02/Jun/20 19:36
Start Date: 02/Jun/20 19:36
Worklog Time Spent: 10m 
  Work Description: chunyang commented on pull request #11896:
URL: https://github.com/apache/beam/pull/11896#issuecomment-637762788


   R: @pabloem 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 440407)
Time Spent: 8h 20m  (was: 8h 10m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: P3
> Fix For: 2.21.0
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-06-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=440406&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-440406
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 02/Jun/20 19:36
Start Date: 02/Jun/20 19:36
Worklog Time Spent: 10m 
  Work Description: chunyang opened a new pull request #11896:
URL: https://github.com/apache/beam/pull/11896


   Make the bigquery_avro_tools_test added in #10979 Python 3 compatible. 
Somehow this didn't get tested against Python 3 the last time around.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_Pos

[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398819&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398819
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 06/Mar/20 01:00
Start Date: 06/Mar/20 01:00
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-595522865
 
 
   exciting. thanks @chunyang 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398819)
Time Spent: 8h  (was: 7h 50m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398818&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398818
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 06/Mar/20 01:00
Start Date: 06/Mar/20 01:00
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #10979: [BEAM-8841] 
Support writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398818)
Time Spent: 7h 50m  (was: 7h 40m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398757&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398757
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 05/Mar/20 23:02
Start Date: 05/Mar/20 23:02
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-595489779
 
 
   yup it seems like flaky/unrelated
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398757)
Time Spent: 7h 40m  (was: 7.5h)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398756&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398756
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 05/Mar/20 23:02
Start Date: 05/Mar/20 23:02
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-595489733
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398756)
Time Spent: 7.5h  (was: 7h 20m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398750&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398750
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 05/Mar/20 23:00
Start Date: 05/Mar/20 23:00
Worklog Time Spent: 10m 
  Work Description: chunyang commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-595489197
 
 
   Flaky/unrelated tests? I can't seem to reproduce locally.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398750)
Time Spent: 7h 20m  (was: 7h 10m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398747&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398747
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 05/Mar/20 22:55
Start Date: 05/Mar/20 22:55
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-595487536
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398747)
Time Spent: 7h 10m  (was: 7h)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398712&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398712
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 05/Mar/20 21:35
Start Date: 05/Mar/20 21:35
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-595459219
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398712)
Time Spent: 7h  (was: 6h 50m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398687&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398687
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 05/Mar/20 21:04
Start Date: 05/Mar/20 21:04
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-595446366
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398687)
Time Spent: 6h 50m  (was: 6h 40m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398597&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398597
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 05/Mar/20 18:41
Start Date: 05/Mar/20 18:41
Worklog Time Spent: 10m 
  Work Description: chunyang commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-595383596
 
 
   Thanks for the review @pabloem ! I've rebased off of master and squashed my 
changes.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398597)
Time Spent: 6h 40m  (was: 6.5h)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398542&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398542
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 05/Mar/20 17:32
Start Date: 05/Mar/20 17:32
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-595352786
 
 
   Can you squash your commits? 
   Github has a squashing option, but it would mark me as the author. If you 
squash them, I can merge and preserve your authorship.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398542)
Time Spent: 6.5h  (was: 6h 20m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398541&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398541
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 05/Mar/20 17:31
Start Date: 05/Mar/20 17:31
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-595352310
 
 
   LGTM. Thanks so much @chunyang 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398541)
Time Spent: 6h 20m  (was: 6h 10m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398140&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398140
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 05/Mar/20 05:11
Start Date: 05/Mar/20 05:11
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-595032438
 
 
   Run Python 3.6 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398140)
Time Spent: 6h 10m  (was: 6h)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398055&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398055
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 05/Mar/20 01:24
Start Date: 05/Mar/20 01:24
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-594978741
 
 
   Run Python 3.6 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398055)
Time Spent: 6h  (was: 5h 50m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398036&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398036
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 05/Mar/20 00:55
Start Date: 05/Mar/20 00:55
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-594970995
 
 
   Run Python 3.6 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 398036)
Time Spent: 5h 50m  (was: 5h 40m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=397142&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397142
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 04/Mar/20 00:06
Start Date: 04/Mar/20 00:06
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-594235682
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397142)
Time Spent: 5h 40m  (was: 5.5h)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=397056&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397056
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 03/Mar/20 21:46
Start Date: 03/Mar/20 21:46
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-594186585
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 397056)
Time Spent: 5.5h  (was: 5h 20m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=396870&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396870
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 03/Mar/20 17:25
Start Date: 03/Mar/20 17:25
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-594069700
 
 
   Run Python 3.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 396870)
Time Spent: 5h 20m  (was: 5h 10m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=396562&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396562
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 03/Mar/20 02:12
Start Date: 03/Mar/20 02:12
Worklog Time Spent: 10m 
  Work Description: chunyang commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-593729608
 
 
   I am able to run the integration test 
`apache_beam.io.gcp.bigquery_file_loads_test:BigQueryFileLoadsIT` but for some 
reason if I use the same procedure to run tests from 
`apache_beam.io.gcp.bigquery_write_it_test.py:BigQueryWriteIntegrationTests`, I 
get the following error:
   
   ```
   [chuck.yang ~/src/beam/sdks/python cyang/avro-bigqueryio+]
   % ./scripts/run_integration_test.sh --test_opts 
"--tests=apache_beam.io.gcp.bigquery_write_it_test.py:BigQueryWriteIntegrationTests.test_big_query_write_without_schema
 --nocapture" --project ... --gcs_location gs://... --kms_key_name "" 
--streaming false
   >>> RUNNING integration tests with pipeline options: 
--runner=TestDataflowRunner --project=... --staging_location=gs://... 
--temp_location=gs://... --output=gs://... 
--sdk_location=build/apache-beam.tar.gz 
--requirements_file=postcommit_requirements.txt --num_workers=1 --sleep_secs=20
   >>>   test options: 
--tests=apache_beam.io.gcp.bigquery_write_it_test.py:BigQueryWriteIntegrationTests.test_big_query_write_without_schema
 --nocapture
   
/home/chuck.yang/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-gcp-pytest/py37-gcp-pytest/lib/python3.7/site-packages/setuptools/dist.py:476:
 UserWarning: Normalizing '2.21.0.dev' to '2.21.0.dev0'
 normalized_version,
   running nosetests
   running egg_info
   INFO:gen_protos:Skipping proto regeneration: all files up to date
   writing apache_beam.egg-info/PKG-INFO
   writing dependency_links to apache_beam.egg-info/dependency_links.txt
   writing entry points to apache_beam.egg-info/entry_points.txt
   writing requirements to apache_beam.egg-info/requires.txt
   writing top-level names to apache_beam.egg-info/top_level.txt
   reading manifest file 'apache_beam.egg-info/SOURCES.txt'
   reading manifest template 'MANIFEST.in'
   warning: no files found matching 'README.md'
   warning: no files found matching 'NOTICE'
   warning: no files found matching 'LICENSE'
   writing manifest file 'apache_beam.egg-info/SOURCES.txt'
   Failure: ImportError (No module named 'apache_beam') ... ERROR
   
   ==
   ERROR: Failure: ImportError (No module named 'apache_beam')
   --
   Traceback (most recent call last):
 File 
"/home/chuck.yang/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-gcp-pytest/py37-gcp-pytest/lib/python3.7/site-packages/nose/failure.py",
 line 39, in runTest
   raise self.exc_val.with_traceback(self.tb)
 File 
"/home/chuck.yang/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-gcp-pytest/py37-gcp-pytest/lib/python3.7/site-packages/nose/loader.py",
 line 418, in loadTestsFromName
   addr.filename, addr.module)
 File 
"/home/chuck.yang/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-gcp-pytest/py37-gcp-pytest/lib/python3.7/site-packages/nose/importer.py",
 line 47, in importFromPath
   return self.importFromDir(dir_path, fqname)
 File 
"/home/chuck.yang/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-gcp-pytest/py37-gcp-pytest/lib/python3.7/site-packages/nose/importer.py",
 line 79, in importFromDir
   fh, filename, desc = find_module(part, path)
 File "/usr/lib/python3.7/imp.py", line 296, in find_module
   raise ImportError(_ERR_MSG.format(name), name=name)
   ImportError: No module named 'apache_beam'
   
   --
   XML: nosetests-.xml
   --
   XML: /home/chuck.yang/src/beam/sdks/python/nosetests.xml
   --
   Ran 1 test in 0.002s
   
   FAILED (errors=1)
   
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 396562)
Time Spent: 5h 10m  (was: 5h)

> Add ability to perform Bi

[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=396551&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396551
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 03/Mar/20 01:42
Start Date: 03/Mar/20 01:42
Worklog Time Spent: 10m 
  Work Description: chunyang commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-593722179
 
 
   I think I need to fix a few of the integration tests that don't provide a 
schema or use `SCHEMA_AUTODETECT`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 396551)
Time Spent: 5h  (was: 4h 50m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=396515&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396515
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 03/Mar/20 00:30
Start Date: 03/Mar/20 00:30
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-593703663
 
 
   Run Python 3.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 396515)
Time Spent: 4h 50m  (was: 4h 40m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=396512&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396512
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 03/Mar/20 00:20
Start Date: 03/Mar/20 00:20
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-593700905
 
 
   Run Python 3.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 396512)
Time Spent: 4h 40m  (was: 4.5h)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=396485&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396485
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 02/Mar/20 23:40
Start Date: 02/Mar/20 23:40
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-593679995
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 396485)
Time Spent: 4.5h  (was: 4h 20m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=396417&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396417
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 02/Mar/20 21:56
Start Date: 02/Mar/20 21:56
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-593643579
 
 
   restest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 396417)
Time Spent: 4h 20m  (was: 4h 10m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=396367&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396367
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 02/Mar/20 20:39
Start Date: 02/Mar/20 20:39
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-593609918
 
 
   restest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 396367)
Time Spent: 4h  (was: 3h 50m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=396368&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396368
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 02/Mar/20 20:39
Start Date: 02/Mar/20 20:39
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-593609967
 
 
   jenkins is the worst : )
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 396368)
Time Spent: 4h 10m  (was: 4h)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=396364&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396364
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 02/Mar/20 20:32
Start Date: 02/Mar/20 20:32
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-593606703
 
 
   Retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 396364)
Time Spent: 3h 50m  (was: 3h 40m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=396363&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396363
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 02/Mar/20 20:28
Start Date: 02/Mar/20 20:28
Worklog Time Spent: 10m 
  Work Description: chunyang commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-593605041
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 396363)
Time Spent: 3h 40m  (was: 3.5h)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-02-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=395202&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395202
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 28/Feb/20 18:36
Start Date: 28/Feb/20 18:36
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #10979: 
[BEAM-8841] Support writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#discussion_r385856505
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery.py
 ##
 @@ -1361,87 +1369,18 @@ def __init__(
 self.triggering_frequency = triggering_frequency
 self.insert_retry_strategy = insert_retry_strategy
 self._validate = validate
+self._temp_file_format = temp_file_format or bigquery_tools.FileFormat.JSON
 
 Review comment:
   +1 for making Avro the default for the new sink.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 395202)
Time Spent: 3.5h  (was: 3h 20m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-02-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=395200&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395200
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 28/Feb/20 18:32
Start Date: 28/Feb/20 18:32
Worklog Time Spent: 10m 
  Work Description: chunyang commented on pull request #10979: [BEAM-8841] 
Support writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#discussion_r385854644
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery.py
 ##
 @@ -1361,87 +1369,18 @@ def __init__(
 self.triggering_frequency = triggering_frequency
 self.insert_retry_strategy = insert_retry_strategy
 self._validate = validate
+self._temp_file_format = temp_file_format or bigquery_tools.FileFormat.JSON
 
 Review comment:
   Oh I didn't realize it was experimental. I'll make the change then!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 395200)
Time Spent: 3h 20m  (was: 3h 10m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-02-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=395189&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395189
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 28/Feb/20 18:06
Start Date: 28/Feb/20 18:06
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #10979: [BEAM-8841] 
Support writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#discussion_r385842690
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery.py
 ##
 @@ -1361,87 +1369,18 @@ def __init__(
 self.triggering_frequency = triggering_frequency
 self.insert_retry_strategy = insert_retry_strategy
 self._validate = validate
+self._temp_file_format = temp_file_format or bigquery_tools.FileFormat.JSON
 
 Review comment:
   Since this is technically still experimental, and masked behind a flag, I 
think it makes sense to make avro the main way of doing it (and simply add a 
check that the schema was passed). cc: @chamikaramj 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 395189)
Time Spent: 3h 10m  (was: 3h)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-02-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=394549&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-394549
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 28/Feb/20 00:08
Start Date: 28/Feb/20 00:08
Worklog Time Spent: 10m 
  Work Description: chunyang commented on pull request #10979: [BEAM-8841] 
Support writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#discussion_r385442046
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery.py
 ##
 @@ -1361,87 +1369,18 @@ def __init__(
 self.triggering_frequency = triggering_frequency
 self.insert_retry_strategy = insert_retry_strategy
 self._validate = validate
+self._temp_file_format = temp_file_format or bigquery_tools.FileFormat.JSON
 
 Review comment:
   AFAICT using Avro has no disadvantages compared to JSON for loading data 
into BigQuery, but would requiring a schema constitute a breaking API change 
for semantic versioning purposes?
   
   Personally I'm for using Avro as default. I guess when users update Beam, 
they'll specify a `temp_file_format` explicitly to get the old behavior.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 394549)
Time Spent: 3h  (was: 2h 50m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-02-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=394527&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-394527
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 27/Feb/20 23:24
Start Date: 27/Feb/20 23:24
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #10979: [BEAM-8841] 
Support writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#discussion_r384799264
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery_test.py
 ##
 @@ -1025,6 +1027,91 @@ def test_file_loads(self):
 WriteToBigQuery.Method.FILE_LOADS, triggering_frequency=20)
 
 
+class BigQueryFileLoadsIntegrationTests(unittest.TestCase):
+  BIG_QUERY_DATASET_ID = 'python_bq_file_loads_'
+
+  def setUp(self):
+self.test_pipeline = TestPipeline(is_integration_test=True)
+self.runner_name = type(self.test_pipeline.runner).__name__
+self.project = self.test_pipeline.get_option('project')
+
+self.dataset_id = '%s%s%s' % (
+self.BIG_QUERY_DATASET_ID,
+str(int(time.time())),
+random.randint(0, 1))
+self.bigquery_client = bigquery_tools.BigQueryWrapper()
+self.bigquery_client.get_or_create_dataset(self.project, self.dataset_id)
+self.output_table = '%s.output_table' % (self.dataset_id)
+self.table_ref = bigquery_tools.parse_table_reference(self.output_table)
+_LOGGER.info(
+'Created dataset %s in project %s', self.dataset_id, self.project)
 
 Review comment:
   Can you add code to delete the dataset after the test runs?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 394527)
Time Spent: 2h 50m  (was: 2h 40m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-02-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=394526&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-394526
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 27/Feb/20 23:24
Start Date: 27/Feb/20 23:24
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #10979: [BEAM-8841] 
Support writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#discussion_r384769744
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery.py
 ##
 @@ -1361,87 +1369,18 @@ def __init__(
 self.triggering_frequency = triggering_frequency
 self.insert_retry_strategy = insert_retry_strategy
 self._validate = validate
+self._temp_file_format = temp_file_format or bigquery_tools.FileFormat.JSON
 
 Review comment:
   I'm happy to make AVRO the default format if possible. I guess the issue is 
that users need to provide the schema, right? Otherwise we cannot write the 
avro files.
   
   We could make AVRO the default, and add a check that the schema was provided 
(i.e. is neither None nor autodetect) - and error out if that's the case? What 
do you think?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 394526)
Time Spent: 2h 40m  (was: 2.5h)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-02-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=394406&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-394406
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 27/Feb/20 18:55
Start Date: 27/Feb/20 18:55
Worklog Time Spent: 10m 
  Work Description: chunyang commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-592120245
 
 
   The PreCommit failures look like they're being fixed by #10982 .
   The failing Python 3.7 PostCommit integration tests might be flaky? I was 
not able to reproduce it--got the following instead:
   ```
   INFO:apache_beam.io.gcp.tests.bigquery_matcher:Result of query is: [(None, 
None, None, datetime.date(3000, 12, 31), None, None, None, None), (None, None, 
None, None, datetime.time(23, 59, 59), None, None, None), (0.33, None, None, 
None, None, None, None, None), (None, Decimal('10'), None, None, None, None, 
None, None), (None, None, None, None, None, datetime.datetime(2018, 12, 31, 12, 
44, 31), None, None), (None, None, None, None, None, None, 
datetime.datetime(2018, 12, 31, 12, 44, 31, 744957, tzinfo=), None), 
(None, None, None, None, None, None, None, 'POINT(30 10)'), (None, None, 
b'\xab\xac', None, None, None, None, None), (0.33, Decimal('10'), b'\xab\xac', 
datetime.date(3000, 12, 31), datetime.time(23, 59, 59), datetime.datetime(2018, 
12, 31, 12, 44, 31), datetime.datetime(2018, 12, 31, 12, 44, 31, 744957, 
tzinfo=), 'POINT(30 10)')]
   ```
   
![table](https://user-images.githubusercontent.com/454684/75476534-9ed0e480-594f-11ea-808b-c6dc096a9c00.png)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 394406)
Time Spent: 2.5h  (was: 2h 20m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-02-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=394321&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-394321
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 27/Feb/20 17:09
Start Date: 27/Feb/20 17:09
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-592071449
 
 
   Run Python 3.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 394321)
Time Spent: 2h 20m  (was: 2h 10m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=393904&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393904
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 27/Feb/20 01:48
Start Date: 27/Feb/20 01:48
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-591735822
 
 
   Run Python 3.5 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 393904)
Time Spent: 2h 10m  (was: 2h)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=393893&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393893
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 27/Feb/20 01:34
Start Date: 27/Feb/20 01:34
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-591731930
 
 
   Run Python 3.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 393893)
Time Spent: 2h  (was: 1h 50m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=393890&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393890
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 27/Feb/20 01:32
Start Date: 27/Feb/20 01:32
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-591731506
 
 
   Retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 393890)
Time Spent: 1h 50m  (was: 1h 40m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=393864&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393864
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 27/Feb/20 00:14
Start Date: 27/Feb/20 00:14
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-591710321
 
 
   Run Python 3.7 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 393864)
Time Spent: 1h 40m  (was: 1.5h)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=393860&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393860
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 27/Feb/20 00:04
Start Date: 27/Feb/20 00:04
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-591707480
 
 
   retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 393860)
Time Spent: 1.5h  (was: 1h 20m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=393743&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393743
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 26/Feb/20 20:54
Start Date: 26/Feb/20 20:54
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-591638243
 
 
   Retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 393743)
Time Spent: 1h 20m  (was: 1h 10m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=393742&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393742
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 26/Feb/20 20:52
Start Date: 26/Feb/20 20:52
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-591637525
 
 
   Yes, I will be thrilled to review this : )
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 393742)
Time Spent: 1h 10m  (was: 1h)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Assignee: Chun Yang
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=393647&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393647
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 26/Feb/20 17:44
Start Date: 26/Feb/20 17:44
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #10979: [BEAM-8841] 
Support writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-591553091
 
 
   Thanks for the contribution. Pablo will you be able to review ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 393647)
Time Spent: 1h  (was: 50m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=393644&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393644
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 26/Feb/20 17:38
Start Date: 26/Feb/20 17:38
Worklog Time Spent: 10m 
  Work Description: chunyang commented on pull request #10979: [BEAM-8841] 
Support writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#discussion_r384651096
 
 

 ##
 File path: sdks/python/scripts/run_integration_test.sh
 ##
 @@ -198,6 +198,7 @@ if [[ -z $PIPELINE_OPTS ]]; then
   # See: https://github.com/hamcrest/PyHamcrest/issues/131.
   echo "pyhamcrest!=1.10.0,<2.0.0" > postcommit_requirements.txt
   echo "mock<3.0.0" >> postcommit_requirements.txt
+  echo "parameterized>=0.7.1,<0.8.0" >> postcommit_requirements.txt
 
 Review comment:
   This was added because the `bigquery_test` module imports `_ELEMENTS` from 
the `bigquery_file_loads_test` module, which uses `parameterized`. Should we 
refactor `_ELEMENTS` into its own module?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 393644)
Time Spent: 0.5h  (was: 20m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=393643&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393643
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 26/Feb/20 17:38
Start Date: 26/Feb/20 17:38
Worklog Time Spent: 10m 
  Work Description: chunyang commented on pull request #10979: [BEAM-8841] 
Support writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#discussion_r384652808
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery_avro_tools.py
 ##
 @@ -0,0 +1,135 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Tools used tool work with Avro files in the context of BigQuery.
+
+Classes, constants and functions in this file are experimental and have no
+backwards compatibility guarantees.
+
+NOTHING IN THIS FILE HAS BACKWARDS COMPATIBILITY GUARANTEES.
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+
+BIG_QUERY_TO_AVRO_TYPES = {
+  "RECORD": "record",
+  "STRING": "string",
+  "BOOLEAN": "boolean",
+  "BYTES": "bytes",
+  "FLOAT": "double",
+  "INTEGER": "long",
+  "TIME": {
+"type": "long",
+"logicalType": "time-micros",
 
 Review comment:
   The same code in the [Java 
SDK](https://github.com/apache/beam/blob/911472cdcca7a31d1a8c690d75097b9ded9eb054/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryAvroUtils.java#L68)
 doesn't seem to support logical types, so behavior is a bit different here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 393643)
Time Spent: 0.5h  (was: 20m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=393645&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393645
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 26/Feb/20 17:38
Start Date: 26/Feb/20 17:38
Worklog Time Spent: 10m 
  Work Description: chunyang commented on pull request #10979: [BEAM-8841] 
Support writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#discussion_r384653061
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery.py
 ##
 @@ -1361,87 +1369,18 @@ def __init__(
 self.triggering_frequency = triggering_frequency
 self.insert_retry_strategy = insert_retry_strategy
 self._validate = validate
+self._temp_file_format = temp_file_format or bigquery_tools.FileFormat.JSON
 
 self.additional_bq_parameters = additional_bq_parameters or {}
 self.table_side_inputs = table_side_inputs or ()
 self.schema_side_inputs = schema_side_inputs or ()
 
-  @staticmethod
-  def get_table_schema_from_string(schema):
-"""Transform the string table schema into a
-:class:`~apache_beam.io.gcp.internal.clients.bigquery.\
-bigquery_v2_messages.TableSchema` instance.
-
-Args:
-  schema (str): The sting schema to be used if the BigQuery table to write
-has to be created.
-
-Returns:
-  ~apache_beam.io.gcp.internal.clients.bigquery.\
-bigquery_v2_messages.TableSchema:
-  The schema to be used if the BigQuery table to write has to be created
-  but in the :class:`~apache_beam.io.gcp.internal.clients.bigquery.\
-bigquery_v2_messages.TableSchema` format.
-"""
-table_schema = bigquery.TableSchema()
-schema_list = [s.strip() for s in schema.split(',')]
-for field_and_type in schema_list:
-  field_name, field_type = field_and_type.split(':')
-  field_schema = bigquery.TableFieldSchema()
-  field_schema.name = field_name
-  field_schema.type = field_type
-  field_schema.mode = 'NULLABLE'
-  table_schema.fields.append(field_schema)
-return table_schema
-
-  @staticmethod
-  def table_schema_to_dict(table_schema):
-"""Create a dictionary representation of table schema for serialization
-"""
-def get_table_field(field):
-  """Create a dictionary representation of a table field
-  """
-  result = {}
-  result['name'] = field.name
-  result['type'] = field.type
-  result['mode'] = getattr(field, 'mode', 'NULLABLE')
-  if hasattr(field, 'description') and field.description is not None:
-result['description'] = field.description
-  if hasattr(field, 'fields') and field.fields:
-result['fields'] = [get_table_field(f) for f in field.fields]
-  return result
-
-if not isinstance(table_schema, bigquery.TableSchema):
-  raise ValueError("Table schema must be of the type bigquery.TableSchema")
-schema = {'fields': []}
-for field in table_schema.fields:
-  schema['fields'].append(get_table_field(field))
-return schema
-
-  @staticmethod
-  def get_dict_table_schema(schema):
-"""Transform the table schema into a dictionary instance.
-
-Args:
-  schema (~apache_beam.io.gcp.internal.clients.bigquery.\
-bigquery_v2_messages.TableSchema):
-The schema to be used if the BigQuery table to write has to be created.
-This can either be a dict or string or in the TableSchema format.
-
-Returns:
-  Dict[str, Any]: The schema to be used if the BigQuery table to write has
-  to be created but in the dictionary format.
-"""
-if (isinstance(schema, (dict, vp.ValueProvider)) or callable(schema) or
-schema is None):
-  return schema
-elif isinstance(schema, (str, unicode)):
-  table_schema = WriteToBigQuery.get_table_schema_from_string(schema)
-  return WriteToBigQuery.table_schema_to_dict(table_schema)
-elif isinstance(schema, bigquery.TableSchema):
-  return WriteToBigQuery.table_schema_to_dict(schema)
-else:
-  raise TypeError('Unexpected schema argument: %s.' % schema)
+  # Dict/schema methods were moved to bigquery_tools, but keep references
+  # here for backward compatibility.
+  get_table_schema_from_string = \
+  staticmethod(bigquery_tools.get_table_schema_from_string)
+  table_schema_to_dict = staticmethod(bigquery_tools.table_schema_to_dict)
+  get_dict_table_schema = staticmethod(bigquery_tools.get_dict_table_schema)
 
 Review comment:
   Moved these to avoid a cyclic import.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
--

[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=393642&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393642
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 26/Feb/20 17:38
Start Date: 26/Feb/20 17:38
Worklog Time Spent: 10m 
  Work Description: chunyang commented on pull request #10979: [BEAM-8841] 
Support writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#discussion_r384650349
 
 

 ##
 File path: sdks/python/apache_beam/io/localfilesystem.py
 ##
 @@ -139,7 +140,7 @@ def _path_open(
 """Helper functions to open a file in the provided mode.
 """
 compression_type = FileSystem._get_compression_type(path, compression_type)
-raw_file = open(path, mode)
+raw_file = io.open(path, mode)
 
 Review comment:
   `open` doesn't provide `io.IOBase` methods like `writable()` in Python 2.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 393642)
Time Spent: 20m  (was: 10m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=393646&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393646
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 26/Feb/20 17:38
Start Date: 26/Feb/20 17:38
Worklog Time Spent: 10m 
  Work Description: chunyang commented on issue #10979: [BEAM-8841] Support 
writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979#issuecomment-591550523
 
 
   R: @chamikaramj 
   R: @pabloem 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 393646)
Time Spent: 50m  (was: 40m)

> Add ability to perform BigQuery file loads using avro
> -
>
> Key: BEAM-8841
> URL: https://issues.apache.org/jira/browse/BEAM-8841
> Project: Beam
>  Issue Type: Improvement
>  Components: io-py-gcp
>Reporter: Chun Yang
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently, JSON format is used for file loads into BigQuery in the Python 
> SDK. JSON has some disadvantages including size of serialized data and 
> inability to represent NaN and infinity float values.
> BigQuery supports loading files in avro format, which can overcome these 
> disadvantages. The Java SDK already supports loading files using avro format 
> (BEAM-2879) so it makes sense to support it in the Python SDK as well.
> The change will be somewhere around 
> [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro

2020-02-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=393639&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393639
 ]

ASF GitHub Bot logged work on BEAM-8841:


Author: ASF GitHub Bot
Created on: 26/Feb/20 17:31
Start Date: 26/Feb/20 17:31
Worklog Time Spent: 10m 
  Work Description: chunyang commented on pull request #10979: [BEAM-8841] 
Support writing data to BigQuery via Avro in Python SDK
URL: https://github.com/apache/beam/pull/10979
 
 
   This PR modifies `WriteToBigQuery` to be able to do file loads via Avro 
format. Using Avro allows bypassing some limitations of newline-delimited JSON 
loads, including the ability to support NaN and Inf float values.
   
   Changes include:
   * Add a `temp_file_format` option to `WriteToBigQuery` and 
`BigQueryBatchFileLoads` to select which file format to use for loading data.
   * Move implementation of `get_table_schema_from_string`, 
`table_schema_to_dict`, and `get_dict_table_schema` to `bigquery_tools` module.
   * Add `bigquery_avro_tools` module with utilities for converting BigQuery 
`TableSchema` to Avro `RecordSchema` (this is a port of what's available in the 
Java SDK, with modified behavior for logical types).
   * Modify `WriteRecordsToFile` and `WriteGroupedRecordsToFile` to accept 
schema and file format, since in order to be read by BigQuery, the Avro files 
must have schema headers.
   * Parameterize relevant tests to check both JSON and Avro code paths.
   * Add integration test using Avro-based file loads.
   * Introduce `JsonRowWriter` and `AvroRowWriter` classes which implement the 
`io.IOBase` interface and are used by `WriteRecordsToFile` and 
`WriteGroupedRecordsToFile`.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [x] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/ico