[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=440981&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-440981 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 03/Jun/20 20:57 Start Date: 03/Jun/20 20:57 Worklog Time Spent: 10m Work Description: pabloem merged pull request #11896: URL: https://github.com/apache/beam/pull/11896 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 440981) Time Spent: 9h 40m (was: 9.5h) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: P3 > Fix For: 2.21.0 > > Time Spent: 9h 40m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=440943&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-440943 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 03/Jun/20 19:35 Start Date: 03/Jun/20 19:35 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11896: URL: https://github.com/apache/beam/pull/11896#issuecomment-638416678 Run PythonLint PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 440943) Time Spent: 9.5h (was: 9h 20m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: P3 > Fix For: 2.21.0 > > Time Spent: 9.5h > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=440931&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-440931 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 03/Jun/20 19:11 Start Date: 03/Jun/20 19:11 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11896: URL: https://github.com/apache/beam/pull/11896#issuecomment-638403708 thanks @chunyang ! this LGTM. I'll merge after lint passes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 440931) Time Spent: 9h 20m (was: 9h 10m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: P3 > Fix For: 2.21.0 > > Time Spent: 9h 20m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=440930&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-440930 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 03/Jun/20 19:10 Start Date: 03/Jun/20 19:10 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11896: URL: https://github.com/apache/beam/pull/11896#issuecomment-638403555 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 440930) Time Spent: 9h 10m (was: 9h) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: P3 > Fix For: 2.21.0 > > Time Spent: 9h 10m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=440874&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-440874 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 03/Jun/20 16:38 Start Date: 03/Jun/20 16:38 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11896: URL: https://github.com/apache/beam/pull/11896#issuecomment-638314463 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 440874) Time Spent: 9h (was: 8h 50m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: P3 > Fix For: 2.21.0 > > Time Spent: 9h > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=440863&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-440863 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 03/Jun/20 16:32 Start Date: 03/Jun/20 16:32 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11896: URL: https://github.com/apache/beam/pull/11896#issuecomment-638310807 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 440863) Time Spent: 8h 50m (was: 8h 40m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: P3 > Fix For: 2.21.0 > > Time Spent: 8h 50m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=440856&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-440856 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 03/Jun/20 16:16 Start Date: 03/Jun/20 16:16 Worklog Time Spent: 10m Work Description: chunyang commented on pull request #11896: URL: https://github.com/apache/beam/pull/11896#issuecomment-638301892 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 440856) Time Spent: 8h 40m (was: 8.5h) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: P3 > Fix For: 2.21.0 > > Time Spent: 8h 40m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=440444&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-440444 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 02/Jun/20 21:06 Start Date: 02/Jun/20 21:06 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11896: URL: https://github.com/apache/beam/pull/11896#issuecomment-637806282 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 440444) Time Spent: 8.5h (was: 8h 20m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: P3 > Fix For: 2.21.0 > > Time Spent: 8.5h > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=440407&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-440407 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 02/Jun/20 19:36 Start Date: 02/Jun/20 19:36 Worklog Time Spent: 10m Work Description: chunyang commented on pull request #11896: URL: https://github.com/apache/beam/pull/11896#issuecomment-637762788 R: @pabloem This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 440407) Time Spent: 8h 20m (was: 8h 10m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: P3 > Fix For: 2.21.0 > > Time Spent: 8h 20m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=440406&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-440406 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 02/Jun/20 19:36 Start Date: 02/Jun/20 19:36 Worklog Time Spent: 10m Work Description: chunyang opened a new pull request #11896: URL: https://github.com/apache/beam/pull/11896 Make the bigquery_avro_tools_test added in #10979 Python 3 compatible. Somehow this didn't get tested against Python 3 the last time around. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_Pos
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398819&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398819 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 06/Mar/20 01:00 Start Date: 06/Mar/20 01:00 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-595522865 exciting. thanks @chunyang This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398819) Time Spent: 8h (was: 7h 50m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 8h > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398818&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398818 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 06/Mar/20 01:00 Start Date: 06/Mar/20 01:00 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398818) Time Spent: 7h 50m (was: 7h 40m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 7h 50m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398757&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398757 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 05/Mar/20 23:02 Start Date: 05/Mar/20 23:02 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-595489779 yup it seems like flaky/unrelated This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398757) Time Spent: 7h 40m (was: 7.5h) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 7h 40m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398756&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398756 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 05/Mar/20 23:02 Start Date: 05/Mar/20 23:02 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-595489733 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398756) Time Spent: 7.5h (was: 7h 20m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 7.5h > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398750&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398750 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 05/Mar/20 23:00 Start Date: 05/Mar/20 23:00 Worklog Time Spent: 10m Work Description: chunyang commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-595489197 Flaky/unrelated tests? I can't seem to reproduce locally. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398750) Time Spent: 7h 20m (was: 7h 10m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 7h 20m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398747&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398747 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 05/Mar/20 22:55 Start Date: 05/Mar/20 22:55 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-595487536 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398747) Time Spent: 7h 10m (was: 7h) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 7h 10m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398712&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398712 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 05/Mar/20 21:35 Start Date: 05/Mar/20 21:35 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-595459219 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398712) Time Spent: 7h (was: 6h 50m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 7h > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398687&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398687 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 05/Mar/20 21:04 Start Date: 05/Mar/20 21:04 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-595446366 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398687) Time Spent: 6h 50m (was: 6h 40m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 6h 50m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398597&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398597 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 05/Mar/20 18:41 Start Date: 05/Mar/20 18:41 Worklog Time Spent: 10m Work Description: chunyang commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-595383596 Thanks for the review @pabloem ! I've rebased off of master and squashed my changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398597) Time Spent: 6h 40m (was: 6.5h) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 6h 40m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398542&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398542 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 05/Mar/20 17:32 Start Date: 05/Mar/20 17:32 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-595352786 Can you squash your commits? Github has a squashing option, but it would mark me as the author. If you squash them, I can merge and preserve your authorship. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398542) Time Spent: 6.5h (was: 6h 20m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 6.5h > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398541&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398541 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 05/Mar/20 17:31 Start Date: 05/Mar/20 17:31 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-595352310 LGTM. Thanks so much @chunyang This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398541) Time Spent: 6h 20m (was: 6h 10m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 6h 20m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398140&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398140 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 05/Mar/20 05:11 Start Date: 05/Mar/20 05:11 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-595032438 Run Python 3.6 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398140) Time Spent: 6h 10m (was: 6h) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 6h 10m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398055&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398055 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 05/Mar/20 01:24 Start Date: 05/Mar/20 01:24 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-594978741 Run Python 3.6 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398055) Time Spent: 6h (was: 5h 50m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 6h > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398036&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398036 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 05/Mar/20 00:55 Start Date: 05/Mar/20 00:55 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-594970995 Run Python 3.6 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398036) Time Spent: 5h 50m (was: 5h 40m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 5h 50m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=397142&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397142 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 04/Mar/20 00:06 Start Date: 04/Mar/20 00:06 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-594235682 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397142) Time Spent: 5h 40m (was: 5.5h) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 5h 40m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=397056&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397056 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 03/Mar/20 21:46 Start Date: 03/Mar/20 21:46 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-594186585 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397056) Time Spent: 5.5h (was: 5h 20m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 5.5h > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=396870&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396870 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 03/Mar/20 17:25 Start Date: 03/Mar/20 17:25 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-594069700 Run Python 3.7 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 396870) Time Spent: 5h 20m (was: 5h 10m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 5h 20m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=396562&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396562 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 03/Mar/20 02:12 Start Date: 03/Mar/20 02:12 Worklog Time Spent: 10m Work Description: chunyang commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-593729608 I am able to run the integration test `apache_beam.io.gcp.bigquery_file_loads_test:BigQueryFileLoadsIT` but for some reason if I use the same procedure to run tests from `apache_beam.io.gcp.bigquery_write_it_test.py:BigQueryWriteIntegrationTests`, I get the following error: ``` [chuck.yang ~/src/beam/sdks/python cyang/avro-bigqueryio+] % ./scripts/run_integration_test.sh --test_opts "--tests=apache_beam.io.gcp.bigquery_write_it_test.py:BigQueryWriteIntegrationTests.test_big_query_write_without_schema --nocapture" --project ... --gcs_location gs://... --kms_key_name "" --streaming false >>> RUNNING integration tests with pipeline options: --runner=TestDataflowRunner --project=... --staging_location=gs://... --temp_location=gs://... --output=gs://... --sdk_location=build/apache-beam.tar.gz --requirements_file=postcommit_requirements.txt --num_workers=1 --sleep_secs=20 >>> test options: --tests=apache_beam.io.gcp.bigquery_write_it_test.py:BigQueryWriteIntegrationTests.test_big_query_write_without_schema --nocapture /home/chuck.yang/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-gcp-pytest/py37-gcp-pytest/lib/python3.7/site-packages/setuptools/dist.py:476: UserWarning: Normalizing '2.21.0.dev' to '2.21.0.dev0' normalized_version, running nosetests running egg_info INFO:gen_protos:Skipping proto regeneration: all files up to date writing apache_beam.egg-info/PKG-INFO writing dependency_links to apache_beam.egg-info/dependency_links.txt writing entry points to apache_beam.egg-info/entry_points.txt writing requirements to apache_beam.egg-info/requires.txt writing top-level names to apache_beam.egg-info/top_level.txt reading manifest file 'apache_beam.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' warning: no files found matching 'README.md' warning: no files found matching 'NOTICE' warning: no files found matching 'LICENSE' writing manifest file 'apache_beam.egg-info/SOURCES.txt' Failure: ImportError (No module named 'apache_beam') ... ERROR == ERROR: Failure: ImportError (No module named 'apache_beam') -- Traceback (most recent call last): File "/home/chuck.yang/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-gcp-pytest/py37-gcp-pytest/lib/python3.7/site-packages/nose/failure.py", line 39, in runTest raise self.exc_val.with_traceback(self.tb) File "/home/chuck.yang/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-gcp-pytest/py37-gcp-pytest/lib/python3.7/site-packages/nose/loader.py", line 418, in loadTestsFromName addr.filename, addr.module) File "/home/chuck.yang/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-gcp-pytest/py37-gcp-pytest/lib/python3.7/site-packages/nose/importer.py", line 47, in importFromPath return self.importFromDir(dir_path, fqname) File "/home/chuck.yang/src/beam/sdks/python/test-suites/tox/py37/build/srcs/sdks/python/target/.tox-py37-gcp-pytest/py37-gcp-pytest/lib/python3.7/site-packages/nose/importer.py", line 79, in importFromDir fh, filename, desc = find_module(part, path) File "/usr/lib/python3.7/imp.py", line 296, in find_module raise ImportError(_ERR_MSG.format(name), name=name) ImportError: No module named 'apache_beam' -- XML: nosetests-.xml -- XML: /home/chuck.yang/src/beam/sdks/python/nosetests.xml -- Ran 1 test in 0.002s FAILED (errors=1) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 396562) Time Spent: 5h 10m (was: 5h) > Add ability to perform Bi
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=396551&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396551 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 03/Mar/20 01:42 Start Date: 03/Mar/20 01:42 Worklog Time Spent: 10m Work Description: chunyang commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-593722179 I think I need to fix a few of the integration tests that don't provide a schema or use `SCHEMA_AUTODETECT`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 396551) Time Spent: 5h (was: 4h 50m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 5h > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=396515&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396515 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 03/Mar/20 00:30 Start Date: 03/Mar/20 00:30 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-593703663 Run Python 3.7 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 396515) Time Spent: 4h 50m (was: 4h 40m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 4h 50m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=396512&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396512 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 03/Mar/20 00:20 Start Date: 03/Mar/20 00:20 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-593700905 Run Python 3.7 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 396512) Time Spent: 4h 40m (was: 4.5h) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 4h 40m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=396485&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396485 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 02/Mar/20 23:40 Start Date: 02/Mar/20 23:40 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-593679995 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 396485) Time Spent: 4.5h (was: 4h 20m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 4.5h > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=396417&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396417 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 02/Mar/20 21:56 Start Date: 02/Mar/20 21:56 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-593643579 restest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 396417) Time Spent: 4h 20m (was: 4h 10m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 4h 20m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=396367&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396367 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 02/Mar/20 20:39 Start Date: 02/Mar/20 20:39 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-593609918 restest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 396367) Time Spent: 4h (was: 3h 50m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 4h > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=396368&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396368 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 02/Mar/20 20:39 Start Date: 02/Mar/20 20:39 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-593609967 jenkins is the worst : ) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 396368) Time Spent: 4h 10m (was: 4h) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 4h 10m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=396364&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396364 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 02/Mar/20 20:32 Start Date: 02/Mar/20 20:32 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-593606703 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 396364) Time Spent: 3h 50m (was: 3h 40m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 3h 50m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=396363&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-396363 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 02/Mar/20 20:28 Start Date: 02/Mar/20 20:28 Worklog Time Spent: 10m Work Description: chunyang commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-593605041 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 396363) Time Spent: 3h 40m (was: 3.5h) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 3h 40m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=395202&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395202 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 28/Feb/20 18:36 Start Date: 28/Feb/20 18:36 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#discussion_r385856505 ## File path: sdks/python/apache_beam/io/gcp/bigquery.py ## @@ -1361,87 +1369,18 @@ def __init__( self.triggering_frequency = triggering_frequency self.insert_retry_strategy = insert_retry_strategy self._validate = validate +self._temp_file_format = temp_file_format or bigquery_tools.FileFormat.JSON Review comment: +1 for making Avro the default for the new sink. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 395202) Time Spent: 3.5h (was: 3h 20m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 3.5h > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=395200&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395200 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 28/Feb/20 18:32 Start Date: 28/Feb/20 18:32 Worklog Time Spent: 10m Work Description: chunyang commented on pull request #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#discussion_r385854644 ## File path: sdks/python/apache_beam/io/gcp/bigquery.py ## @@ -1361,87 +1369,18 @@ def __init__( self.triggering_frequency = triggering_frequency self.insert_retry_strategy = insert_retry_strategy self._validate = validate +self._temp_file_format = temp_file_format or bigquery_tools.FileFormat.JSON Review comment: Oh I didn't realize it was experimental. I'll make the change then! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 395200) Time Spent: 3h 20m (was: 3h 10m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 3h 20m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=395189&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-395189 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 28/Feb/20 18:06 Start Date: 28/Feb/20 18:06 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#discussion_r385842690 ## File path: sdks/python/apache_beam/io/gcp/bigquery.py ## @@ -1361,87 +1369,18 @@ def __init__( self.triggering_frequency = triggering_frequency self.insert_retry_strategy = insert_retry_strategy self._validate = validate +self._temp_file_format = temp_file_format or bigquery_tools.FileFormat.JSON Review comment: Since this is technically still experimental, and masked behind a flag, I think it makes sense to make avro the main way of doing it (and simply add a check that the schema was passed). cc: @chamikaramj This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 395189) Time Spent: 3h 10m (was: 3h) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 3h 10m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=394549&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-394549 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 28/Feb/20 00:08 Start Date: 28/Feb/20 00:08 Worklog Time Spent: 10m Work Description: chunyang commented on pull request #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#discussion_r385442046 ## File path: sdks/python/apache_beam/io/gcp/bigquery.py ## @@ -1361,87 +1369,18 @@ def __init__( self.triggering_frequency = triggering_frequency self.insert_retry_strategy = insert_retry_strategy self._validate = validate +self._temp_file_format = temp_file_format or bigquery_tools.FileFormat.JSON Review comment: AFAICT using Avro has no disadvantages compared to JSON for loading data into BigQuery, but would requiring a schema constitute a breaking API change for semantic versioning purposes? Personally I'm for using Avro as default. I guess when users update Beam, they'll specify a `temp_file_format` explicitly to get the old behavior. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 394549) Time Spent: 3h (was: 2h 50m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 3h > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=394527&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-394527 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 27/Feb/20 23:24 Start Date: 27/Feb/20 23:24 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#discussion_r384799264 ## File path: sdks/python/apache_beam/io/gcp/bigquery_test.py ## @@ -1025,6 +1027,91 @@ def test_file_loads(self): WriteToBigQuery.Method.FILE_LOADS, triggering_frequency=20) +class BigQueryFileLoadsIntegrationTests(unittest.TestCase): + BIG_QUERY_DATASET_ID = 'python_bq_file_loads_' + + def setUp(self): +self.test_pipeline = TestPipeline(is_integration_test=True) +self.runner_name = type(self.test_pipeline.runner).__name__ +self.project = self.test_pipeline.get_option('project') + +self.dataset_id = '%s%s%s' % ( +self.BIG_QUERY_DATASET_ID, +str(int(time.time())), +random.randint(0, 1)) +self.bigquery_client = bigquery_tools.BigQueryWrapper() +self.bigquery_client.get_or_create_dataset(self.project, self.dataset_id) +self.output_table = '%s.output_table' % (self.dataset_id) +self.table_ref = bigquery_tools.parse_table_reference(self.output_table) +_LOGGER.info( +'Created dataset %s in project %s', self.dataset_id, self.project) Review comment: Can you add code to delete the dataset after the test runs? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 394527) Time Spent: 2h 50m (was: 2h 40m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 2h 50m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=394526&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-394526 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 27/Feb/20 23:24 Start Date: 27/Feb/20 23:24 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#discussion_r384769744 ## File path: sdks/python/apache_beam/io/gcp/bigquery.py ## @@ -1361,87 +1369,18 @@ def __init__( self.triggering_frequency = triggering_frequency self.insert_retry_strategy = insert_retry_strategy self._validate = validate +self._temp_file_format = temp_file_format or bigquery_tools.FileFormat.JSON Review comment: I'm happy to make AVRO the default format if possible. I guess the issue is that users need to provide the schema, right? Otherwise we cannot write the avro files. We could make AVRO the default, and add a check that the schema was provided (i.e. is neither None nor autodetect) - and error out if that's the case? What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 394526) Time Spent: 2h 40m (was: 2.5h) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 2h 40m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=394406&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-394406 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 27/Feb/20 18:55 Start Date: 27/Feb/20 18:55 Worklog Time Spent: 10m Work Description: chunyang commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-592120245 The PreCommit failures look like they're being fixed by #10982 . The failing Python 3.7 PostCommit integration tests might be flaky? I was not able to reproduce it--got the following instead: ``` INFO:apache_beam.io.gcp.tests.bigquery_matcher:Result of query is: [(None, None, None, datetime.date(3000, 12, 31), None, None, None, None), (None, None, None, None, datetime.time(23, 59, 59), None, None, None), (0.33, None, None, None, None, None, None, None), (None, Decimal('10'), None, None, None, None, None, None), (None, None, None, None, None, datetime.datetime(2018, 12, 31, 12, 44, 31), None, None), (None, None, None, None, None, None, datetime.datetime(2018, 12, 31, 12, 44, 31, 744957, tzinfo=), None), (None, None, None, None, None, None, None, 'POINT(30 10)'), (None, None, b'\xab\xac', None, None, None, None, None), (0.33, Decimal('10'), b'\xab\xac', datetime.date(3000, 12, 31), datetime.time(23, 59, 59), datetime.datetime(2018, 12, 31, 12, 44, 31), datetime.datetime(2018, 12, 31, 12, 44, 31, 744957, tzinfo=), 'POINT(30 10)')] ``` ![table](https://user-images.githubusercontent.com/454684/75476534-9ed0e480-594f-11ea-808b-c6dc096a9c00.png) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 394406) Time Spent: 2.5h (was: 2h 20m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 2.5h > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=394321&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-394321 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 27/Feb/20 17:09 Start Date: 27/Feb/20 17:09 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-592071449 Run Python 3.7 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 394321) Time Spent: 2h 20m (was: 2h 10m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 2h 20m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=393904&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393904 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 27/Feb/20 01:48 Start Date: 27/Feb/20 01:48 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-591735822 Run Python 3.5 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393904) Time Spent: 2h 10m (was: 2h) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 2h 10m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=393893&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393893 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 27/Feb/20 01:34 Start Date: 27/Feb/20 01:34 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-591731930 Run Python 3.7 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393893) Time Spent: 2h (was: 1h 50m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 2h > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=393890&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393890 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 27/Feb/20 01:32 Start Date: 27/Feb/20 01:32 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-591731506 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393890) Time Spent: 1h 50m (was: 1h 40m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 1h 50m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=393864&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393864 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 27/Feb/20 00:14 Start Date: 27/Feb/20 00:14 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-591710321 Run Python 3.7 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393864) Time Spent: 1h 40m (was: 1.5h) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 1h 40m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=393860&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393860 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 27/Feb/20 00:04 Start Date: 27/Feb/20 00:04 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-591707480 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393860) Time Spent: 1.5h (was: 1h 20m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=393743&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393743 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 26/Feb/20 20:54 Start Date: 26/Feb/20 20:54 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-591638243 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393743) Time Spent: 1h 20m (was: 1h 10m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=393742&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393742 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 26/Feb/20 20:52 Start Date: 26/Feb/20 20:52 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-591637525 Yes, I will be thrilled to review this : ) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393742) Time Spent: 1h 10m (was: 1h) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=393647&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393647 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 26/Feb/20 17:44 Start Date: 26/Feb/20 17:44 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-591553091 Thanks for the contribution. Pablo will you be able to review ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393647) Time Spent: 1h (was: 50m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=393644&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393644 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 26/Feb/20 17:38 Start Date: 26/Feb/20 17:38 Worklog Time Spent: 10m Work Description: chunyang commented on pull request #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#discussion_r384651096 ## File path: sdks/python/scripts/run_integration_test.sh ## @@ -198,6 +198,7 @@ if [[ -z $PIPELINE_OPTS ]]; then # See: https://github.com/hamcrest/PyHamcrest/issues/131. echo "pyhamcrest!=1.10.0,<2.0.0" > postcommit_requirements.txt echo "mock<3.0.0" >> postcommit_requirements.txt + echo "parameterized>=0.7.1,<0.8.0" >> postcommit_requirements.txt Review comment: This was added because the `bigquery_test` module imports `_ELEMENTS` from the `bigquery_file_loads_test` module, which uses `parameterized`. Should we refactor `_ELEMENTS` into its own module? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393644) Time Spent: 0.5h (was: 20m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=393643&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393643 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 26/Feb/20 17:38 Start Date: 26/Feb/20 17:38 Worklog Time Spent: 10m Work Description: chunyang commented on pull request #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#discussion_r384652808 ## File path: sdks/python/apache_beam/io/gcp/bigquery_avro_tools.py ## @@ -0,0 +1,135 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""Tools used tool work with Avro files in the context of BigQuery. + +Classes, constants and functions in this file are experimental and have no +backwards compatibility guarantees. + +NOTHING IN THIS FILE HAS BACKWARDS COMPATIBILITY GUARANTEES. +""" + +from __future__ import absolute_import +from __future__ import division + +BIG_QUERY_TO_AVRO_TYPES = { + "RECORD": "record", + "STRING": "string", + "BOOLEAN": "boolean", + "BYTES": "bytes", + "FLOAT": "double", + "INTEGER": "long", + "TIME": { +"type": "long", +"logicalType": "time-micros", Review comment: The same code in the [Java SDK](https://github.com/apache/beam/blob/911472cdcca7a31d1a8c690d75097b9ded9eb054/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryAvroUtils.java#L68) doesn't seem to support logical types, so behavior is a bit different here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393643) Time Spent: 0.5h (was: 20m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=393645&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393645 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 26/Feb/20 17:38 Start Date: 26/Feb/20 17:38 Worklog Time Spent: 10m Work Description: chunyang commented on pull request #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#discussion_r384653061 ## File path: sdks/python/apache_beam/io/gcp/bigquery.py ## @@ -1361,87 +1369,18 @@ def __init__( self.triggering_frequency = triggering_frequency self.insert_retry_strategy = insert_retry_strategy self._validate = validate +self._temp_file_format = temp_file_format or bigquery_tools.FileFormat.JSON self.additional_bq_parameters = additional_bq_parameters or {} self.table_side_inputs = table_side_inputs or () self.schema_side_inputs = schema_side_inputs or () - @staticmethod - def get_table_schema_from_string(schema): -"""Transform the string table schema into a -:class:`~apache_beam.io.gcp.internal.clients.bigquery.\ -bigquery_v2_messages.TableSchema` instance. - -Args: - schema (str): The sting schema to be used if the BigQuery table to write -has to be created. - -Returns: - ~apache_beam.io.gcp.internal.clients.bigquery.\ -bigquery_v2_messages.TableSchema: - The schema to be used if the BigQuery table to write has to be created - but in the :class:`~apache_beam.io.gcp.internal.clients.bigquery.\ -bigquery_v2_messages.TableSchema` format. -""" -table_schema = bigquery.TableSchema() -schema_list = [s.strip() for s in schema.split(',')] -for field_and_type in schema_list: - field_name, field_type = field_and_type.split(':') - field_schema = bigquery.TableFieldSchema() - field_schema.name = field_name - field_schema.type = field_type - field_schema.mode = 'NULLABLE' - table_schema.fields.append(field_schema) -return table_schema - - @staticmethod - def table_schema_to_dict(table_schema): -"""Create a dictionary representation of table schema for serialization -""" -def get_table_field(field): - """Create a dictionary representation of a table field - """ - result = {} - result['name'] = field.name - result['type'] = field.type - result['mode'] = getattr(field, 'mode', 'NULLABLE') - if hasattr(field, 'description') and field.description is not None: -result['description'] = field.description - if hasattr(field, 'fields') and field.fields: -result['fields'] = [get_table_field(f) for f in field.fields] - return result - -if not isinstance(table_schema, bigquery.TableSchema): - raise ValueError("Table schema must be of the type bigquery.TableSchema") -schema = {'fields': []} -for field in table_schema.fields: - schema['fields'].append(get_table_field(field)) -return schema - - @staticmethod - def get_dict_table_schema(schema): -"""Transform the table schema into a dictionary instance. - -Args: - schema (~apache_beam.io.gcp.internal.clients.bigquery.\ -bigquery_v2_messages.TableSchema): -The schema to be used if the BigQuery table to write has to be created. -This can either be a dict or string or in the TableSchema format. - -Returns: - Dict[str, Any]: The schema to be used if the BigQuery table to write has - to be created but in the dictionary format. -""" -if (isinstance(schema, (dict, vp.ValueProvider)) or callable(schema) or -schema is None): - return schema -elif isinstance(schema, (str, unicode)): - table_schema = WriteToBigQuery.get_table_schema_from_string(schema) - return WriteToBigQuery.table_schema_to_dict(table_schema) -elif isinstance(schema, bigquery.TableSchema): - return WriteToBigQuery.table_schema_to_dict(schema) -else: - raise TypeError('Unexpected schema argument: %s.' % schema) + # Dict/schema methods were moved to bigquery_tools, but keep references + # here for backward compatibility. + get_table_schema_from_string = \ + staticmethod(bigquery_tools.get_table_schema_from_string) + table_schema_to_dict = staticmethod(bigquery_tools.table_schema_to_dict) + get_dict_table_schema = staticmethod(bigquery_tools.get_dict_table_schema) Review comment: Moved these to avoid a cyclic import. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=393642&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393642 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 26/Feb/20 17:38 Start Date: 26/Feb/20 17:38 Worklog Time Spent: 10m Work Description: chunyang commented on pull request #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#discussion_r384650349 ## File path: sdks/python/apache_beam/io/localfilesystem.py ## @@ -139,7 +140,7 @@ def _path_open( """Helper functions to open a file in the provided mode. """ compression_type = FileSystem._get_compression_type(path, compression_type) -raw_file = open(path, mode) +raw_file = io.open(path, mode) Review comment: `open` doesn't provide `io.IOBase` methods like `writable()` in Python 2. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393642) Time Spent: 20m (was: 10m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=393646&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393646 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 26/Feb/20 17:38 Start Date: 26/Feb/20 17:38 Worklog Time Spent: 10m Work Description: chunyang commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-591550523 R: @chamikaramj R: @pabloem This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 393646) Time Spent: 50m (was: 40m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=393639&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-393639 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 26/Feb/20 17:31 Start Date: 26/Feb/20 17:31 Worklog Time Spent: 10m Work Description: chunyang commented on pull request #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979 This PR modifies `WriteToBigQuery` to be able to do file loads via Avro format. Using Avro allows bypassing some limitations of newline-delimited JSON loads, including the ability to support NaN and Inf float values. Changes include: * Add a `temp_file_format` option to `WriteToBigQuery` and `BigQueryBatchFileLoads` to select which file format to use for loading data. * Move implementation of `get_table_schema_from_string`, `table_schema_to_dict`, and `get_dict_table_schema` to `bigquery_tools` module. * Add `bigquery_avro_tools` module with utilities for converting BigQuery `TableSchema` to Avro `RecordSchema` (this is a port of what's available in the Java SDK, with modified behavior for logical types). * Modify `WriteRecordsToFile` and `WriteGroupedRecordsToFile` to accept schema and file format, since in order to be read by BigQuery, the Avro files must have schema headers. * Parameterize relevant tests to check both JSON and Avro code paths. * Add integration test using Avro-based file loads. * Introduce `JsonRowWriter` and `AvroRowWriter` classes which implement the `io.IOBase` interface and are used by `WriteRecordsToFile` and `WriteGroupedRecordsToFile`. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [x] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/ico