[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=339673&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339673 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 07/Nov/19 01:22 Start Date: 07/Nov/19 01:22 Worklog Time Spent: 10m Work Description: steveniemitz commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-550576486 > Thanks. Looks great to me. > > To make sure I understood correctly, this will not be enabled for existing users by default and to enable this users have to specify withAvroFormatFunction(), correct ? Correct, with schemas I think we could make this enabled transparently, but for now its opt-in only. > Also, can we add a version of BigQueryIOIT so that we can continue to monitor both Avro and JSON based BQ write transforms ? Yeah I can add that in there. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339673) Time Spent: 2h 50m (was: 2h 40m) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 2h 50m > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=339693&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339693 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 07/Nov/19 02:05 Start Date: 07/Nov/19 02:05 Worklog Time Spent: 10m Work Description: steveniemitz commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-550586372 @chamikaramj are there up-to-date docs anywhere on how to run the BigQueryIOIT tests? The example args in the javadoc are super out of date. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339693) Time Spent: 3h (was: 2h 50m) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 3h > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=339712&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339712 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 07/Nov/19 02:45 Start Date: 07/Nov/19 02:45 Worklog Time Spent: 10m Work Description: steveniemitz commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-550595649 > @chamikaramj are there up-to-date docs anywhere on how to run the BigQueryIOIT tests? The example args in the javadoc are super out of date. nm I hacked it up enough to get it to run. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 339712) Time Spent: 3h 10m (was: 3h) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 3h 10m > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340544&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340544 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 08/Nov/19 15:33 Start Date: 08/Nov/19 15:33 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-551873821 Thanks. Do TODOs in PR description still apply ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340544) Time Spent: 3h 20m (was: 3h 10m) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 3h 20m > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340546&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340546 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 08/Nov/19 15:35 Start Date: 08/Nov/19 15:35 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-551874483 Also please fixup commits before merging. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340546) Time Spent: 3h 40m (was: 3.5h) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 3h 40m > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340545&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340545 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 08/Nov/19 15:34 Start Date: 08/Nov/19 15:34 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-551873892 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340545) Time Spent: 3.5h (was: 3h 20m) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 3.5h > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340552&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340552 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 08/Nov/19 15:37 Start Date: 08/Nov/19 15:37 Worklog Time Spent: 10m Work Description: steveniemitz commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-551875264 > Thanks. > > Do TODOs in PR description still apply ? nope, I just edited the message This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340552) Time Spent: 3h 50m (was: 3h 40m) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 3h 50m > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340564&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340564 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 08/Nov/19 15:51 Start Date: 08/Nov/19 15:51 Worklog Time Spent: 10m Work Description: steveniemitz commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-551880744 > Also please fixup commits before merging. Was this for me or whoever ends up merging it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340564) Time Spent: 4h (was: 3h 50m) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 4h > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340568&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340568 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 08/Nov/19 15:55 Start Date: 08/Nov/19 15:55 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-551882174 It's for you :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340568) Time Spent: 4h 10m (was: 4h) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 4h 10m > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340572&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340572 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 08/Nov/19 15:57 Start Date: 08/Nov/19 15:57 Worklog Time Spent: 10m Work Description: steveniemitz commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-551883131 coolio, fwiw, the contribution guide is ambiguous wrt who should do the squashing. > Beam committers can squash all commits in the PR during merge, however if a PR has a mixture of independent changes that should not be squashed, and fixup commits, then the PR author should help squashing fixup commits to maintain a clean commmit history. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340572) Time Spent: 4h 20m (was: 4h 10m) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 4h 20m > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340624&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340624 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 08/Nov/19 17:46 Start Date: 08/Nov/19 17:46 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-551924883 Thanks. LGTM. Yeah, committer can squash and commit if you just need one commit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340624) Time Spent: 4.5h (was: 4h 20m) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 4.5h > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340625&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340625 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 08/Nov/19 17:47 Start Date: 08/Nov/19 17:47 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-551924974 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340625) Time Spent: 4h 40m (was: 4.5h) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 4h 40m > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340626&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340626 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 08/Nov/19 17:47 Start Date: 08/Nov/19 17:47 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-551925048 Will merge after post-commit tests pass. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340626) Time Spent: 4h 50m (was: 4h 40m) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 4h 50m > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340752&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340752 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 08/Nov/19 21:19 Start Date: 08/Nov/19 21:19 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-551993467 Unfortunately, looks like BigQueryIOIT is a recently added test that is currently not captured by any of the test suites. @steveniemitz Will you be able to add a test that writes to BQ using Avro to the BigQueryTornadoesIT that is captured by the Beam Java PostCommit test suite ? https://github.com/apache/beam/blob/master/examples/java/src/test/java/org/apache/beam/examples/cookbook/BigQueryTornadoesIT.java https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_Java/lastCompletedBuild/testReport/org.apache.beam.examples.cookbook/BigQueryTornadoesIT/ @mwalenia can you comment on the status of BigQueryIOIT ? cc: @pabloem This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340752) Time Spent: 5h (was: 4h 50m) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 5h > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340753&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340753 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 08/Nov/19 21:26 Start Date: 08/Nov/19 21:26 Worklog Time Spent: 10m Work Description: steveniemitz commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-551995495 > @steveniemitz Will you be able to add a test that writes to BQ using Avro to the BigQueryTornadoesIT that is captured by the Beam Java PostCommit test suite ? honestly at this point I'm not going to prioritize doing so. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340753) Time Spent: 5h 10m (was: 5h) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 5h 10m > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340755&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340755 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 08/Nov/19 21:28 Start Date: 08/Nov/19 21:28 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340755) Time Spent: 5.5h (was: 5h 20m) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 5.5h > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340754&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340754 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 08/Nov/19 21:28 Start Date: 08/Nov/19 21:28 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-551996254 Ok, fair enough. I'll go ahead and merge this. But please consider adding a regularly running integration test in a follow up PR to make sure that this codepath does not become stale/broken. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340754) Time Spent: 5h 20m (was: 5h 10m) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 5h 20m > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340756&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340756 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 08/Nov/19 21:30 Start Date: 08/Nov/19 21:30 Worklog Time Spent: 10m Work Description: steveniemitz commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-551996661 maybe we should just make BigQueryIOIT actually run ;) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 340756) Time Spent: 5h 40m (was: 5.5h) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 5h 40m > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=318608&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-318608 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 25/Sep/19 21:18 Start Date: 25/Sep/19 21:18 Worklog Time Spent: 10m Work Description: steveniemitz commented on pull request #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665 This change enhances BigQueryIO.Write to support writing avro files rather than json when using FILE_LOADS (STREAMING_INSERTS is unchanged). This is semi-WIP, but I wanted to get the review up sooner to get feedback. TODO: - more documentation in BigQueryIO - unit tests ### Benchmarks Preliminary results look good. The more CPU constrained a job is, the faster avro becomes. My test dataset is a typical workload of ours, around 2 billion records (~130 GB serialized) representing the result of a combine. My tests read these records from GCS and wrote them to BigQuery. The jobs were run in dataflow with 150 x n1-standard-2 workers. format | time to start load job | bytes written | BQ slot time (ms) ---|--|--|- avro| 6 m 30 s | 126 GB | 35,189,679 json| 8 m 5 s| 712 GB | 96,006,088 - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=323863&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323863 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 05/Oct/19 03:50 Start Date: 05/Oct/19 03:50 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#discussion_r331727151 ## File path: sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryAvroUtilsTest.java ## @@ -243,8 +243,8 @@ public void testConvertBigQuerySchemaToAvroSchema() { Schema.create(Type.NULL), Schema.createRecord( "scion", -"org.apache.beam.sdk.io.gcp.bigquery", "Translated Avro Schema for scion", +"org.apache.beam.sdk.io.gcp.bigquery", Review comment: superfluous change? : ) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323863) Time Spent: 20m (was: 10m) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 20m > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=323868&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323868 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 05/Oct/19 04:02 Start Date: 05/Oct/19 04:02 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-538613986 This is just a brain dump of what I'm thinking... I wonder whether we need the `AvroWriteRequest`, and the Avro schema. I guess we do, as the `InputElement` (whatever it is) + the Avro schema are all one needs to build the `GenericRecord`. Having the `AvroWriteRequest` may help make the formatting function as concise as possible As for supporting Beam schemas + avro files, one could have a `useBeamSchemaForAvroFiles()`... though it's a little strange Another option is to have `useBeamSchema`, and a pre-coded avro formatting function called something like ... `BigQueryIOUtils.beamRowToAvroRecord()`. Though this is a little awkward too. --- Overall, I like using `AvroWriteRequest` as input for the avro format function... and for supporting Beam schemas, it may be that `useBeamSchemaForAvroFiles` (or some better name) is the more reasonable options. WDYT? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323868) Time Spent: 0.5h (was: 20m) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 0.5h > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=323921&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323921 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 05/Oct/19 13:13 Start Date: 05/Oct/19 13:13 Worklog Time Spent: 10m Work Description: steveniemitz commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-538648713 Thanks for the thoughts! My comments inline > This is just a brain dump of what I'm thinking... > > I wonder whether we need the `AvroWriteRequest`, and the Avro schema. I guess we do, as the `InputElement` (whatever it is) + the Avro schema are all one needs to build the `GenericRecord`. Having the `AvroWriteRequest` may help make the formatting function as concise as possible I really went back and forth on this a few times. We could use `SerializableBiFunction` here, but if in the future we ever wanted to add another parameter, it'd be a breaking change. This was we can just add a field to the class. This follows the same pattern as read does, where it takes a `SchemaAndRecord` as an input. You do need both the avro schema and the element though in order to support more advanced cases w/ DynamicDestinations, etc. Plus avro schemas themselves aren't easily serializable (until avro 1.9) so users can't simply create a closure over them. I do hate the name though, if you can think of anything better I'd love to rename this! > As for supporting Beam schemas + avro files, one could have a `useBeamSchemaForAvroFiles()`... though it's a little strange > > Another option is to have `useBeamSchema`, and a pre-coded avro formatting function called something like ... `BigQueryIOUtils.beamRowToAvroRecord()`. Though this is a little awkward too. Yeah I struggled with this as well. The only thing stopping us from having a version that supports beam schemas is the interface. `useBeamSchemaForAvroFiles` is a pretty reasonable name. > Overall, I like using `AvroWriteRequest` as input for the avro format function... and for supporting Beam schemas, it may be that `useBeamSchemaForAvroFiles` (or some better name) is the more reasonable options. > WDYT? I'd be up for adding that in a follow-up PR. I also have some ideas around `writeGenericRecords()` I want to play around with (that would also use beam schemas, similar to `AvroIO.writeGenericRecords()` ) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323921) Time Spent: 40m (was: 0.5h) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 40m > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=323922&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323922 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 05/Oct/19 13:15 Start Date: 05/Oct/19 13:15 Worklog Time Spent: 10m Work Description: steveniemitz commented on pull request #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#discussion_r331746343 ## File path: sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryAvroUtilsTest.java ## @@ -243,8 +243,8 @@ public void testConvertBigQuerySchemaToAvroSchema() { Schema.create(Type.NULL), Schema.createRecord( "scion", -"org.apache.beam.sdk.io.gcp.bigquery", "Translated Avro Schema for scion", +"org.apache.beam.sdk.io.gcp.bigquery", Review comment: oh no actually, this is pretty important! The previous version flipped the namespace and description parameters, resulting in an invalid namespace. See the corresponding change up in BigQueryAvroUtils. Really avro's fault for making a function with 3 string parameters... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323922) Time Spent: 50m (was: 40m) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 50m > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=323925&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323925 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 05/Oct/19 13:21 Start Date: 05/Oct/19 13:21 Worklog Time Spent: 10m Work Description: steveniemitz commented on pull request #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#discussion_r331746527 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/AbstractRowWriter.java ## @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; + +import com.google.api.services.bigquery.model.TableRow; +import java.io.IOException; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import java.util.UUID; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.CountingOutputStream; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** Writes {@link TableRow} objects out to a file. Used when doing batch load jobs into BigQuery. */ +abstract class AbstractRowWriter implements AutoCloseable { Review comment: nit: I don't usually write java, is the style convention `AbstractDerp` for base classes like this? or should this just be `RowWriter`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 323925) Time Spent: 1h (was: 50m) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 1h > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=327023&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-327023 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 11/Oct/19 18:13 Start Date: 11/Oct/19 18:13 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#discussion_r334114662 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/AbstractRowWriter.java ## @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; + +import com.google.api.services.bigquery.model.TableRow; +import java.io.IOException; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import java.util.UUID; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.CountingOutputStream; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** Writes {@link TableRow} objects out to a file. Used when doing batch load jobs into BigQuery. */ +abstract class AbstractRowWriter implements AutoCloseable { Review comment: In my experience, the Abstract is usually not added to the abstract class. I am okay with either. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 327023) Time Spent: 1h 10m (was: 1h) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 1h 10m > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=332337&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332337 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 23/Oct/19 00:31 Start Date: 23/Oct/19 00:31 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-545211844 LMK if you'd like me to take another look. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332337) Time Spent: 1h 20m (was: 1h 10m) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 1h 20m > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=332340&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332340 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 23/Oct/19 00:33 Start Date: 23/Oct/19 00:33 Worklog Time Spent: 10m Work Description: steveniemitz commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-545212207 > LMK if you'd like me to take another look. oh, yeah please do. I don't have much more from my end other than renaming the class above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 332340) Time Spent: 1.5h (was: 1h 20m) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 1.5h > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=333695&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333695 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 24/Oct/19 20:03 Start Date: 24/Oct/19 20:03 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-546079959 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 333695) Time Spent: 1h 40m (was: 1.5h) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 1h 40m > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=335100&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335100 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 28/Oct/19 18:35 Start Date: 28/Oct/19 18:35 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-547087321 ok this LGTM. @chamikaramj - thoughts? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 335100) Time Spent: 1h 50m (was: 1h 40m) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 1h 50m > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=335108&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335108 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 28/Oct/19 18:44 Start Date: 28/Oct/19 18:44 Worklog Time Spent: 10m Work Description: steveniemitz commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-547091022 Thanks! I'll do the class rename asap too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 335108) Time Spent: 2h (was: 1h 50m) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 2h > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=335810&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335810 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 29/Oct/19 22:58 Start Date: 29/Oct/19 22:58 Worklog Time Spent: 10m Work Description: steveniemitz commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-547665959 Thanks! You can merge this now and I'll do the rename in another PR, or I can get to it tomorrow, up to you! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 335810) Time Spent: 2h 10m (was: 2h) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 2h 10m > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=335815&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335815 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 29/Oct/19 23:07 Start Date: 29/Oct/19 23:07 Worklog Time Spent: 10m Work Description: pabloem commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-547668425 I don't think we strictly need the rename. But I do want to wait for @chamikaramj to take a look : ) - I don't think he'll have objections, but just in case he thinks of any improvements. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 335815) Time Spent: 2h 20m (was: 2h 10m) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 2h 20m > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=335844&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335844 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 30/Oct/19 00:49 Start Date: 30/Oct/19 00:49 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#issuecomment-547691457 Sorry about the delay. Taking a look. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 335844) Time Spent: 2.5h (was: 2h 20m) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 2.5h > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=335893&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335893 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 30/Oct/19 02:27 Start Date: 30/Oct/19 02:27 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #9665: [BEAM-2879] Support writing data to BigQuery via avro URL: https://github.com/apache/beam/pull/9665#discussion_r340405122 ## File path: sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/AbstractRowWriter.java ## @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.sdk.io.gcp.bigquery; + +import static org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState; + +import com.google.api.services.bigquery.model.TableRow; +import java.io.IOException; +import java.io.OutputStream; +import java.nio.channels.Channels; +import java.nio.channels.WritableByteChannel; +import java.util.UUID; +import org.apache.beam.sdk.io.FileSystems; +import org.apache.beam.sdk.io.fs.ResourceId; +import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.CountingOutputStream; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** Writes {@link TableRow} objects out to a file. Used when doing batch load jobs into BigQuery. */ +abstract class AbstractRowWriter implements AutoCloseable { Review comment: +1 for just RowWriter or BigQueryRowWriter to be more specific. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 335893) Time Spent: 2h 40m (was: 2.5h) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 2h 40m > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=368351&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-368351 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 08/Jan/20 18:44 Start Date: 08/Jan/20 18:44 Worklog Time Spent: 10m Work Description: steveniemitz commented on issue #10527: [BEAM-2879] Metric name should not be constant URL: https://github.com/apache/beam/pull/10527#issuecomment-572202864 seems reasonable. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 368351) Time Spent: 5h 50m (was: 5h 40m) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 5h 50m > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=368352&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-368352 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 08/Jan/20 18:51 Start Date: 08/Jan/20 18:51 Worklog Time Spent: 10m Work Description: 11moon11 commented on issue #10527: [BEAM-2879] Metric name should not be constant URL: https://github.com/apache/beam/pull/10527#issuecomment-571841009 CC: @apilloud Could you please run following Jenkins jobs/tests? `Run BigQueryIO Batch Performance Test Java Avro` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 368352) Time Spent: 6h (was: 5h 50m) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 6h > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=369256&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369256 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 09/Jan/20 18:57 Start Date: 09/Jan/20 18:57 Worklog Time Spent: 10m Work Description: apilloud commented on issue #10527: [BEAM-2879] Metric name should not be constant URL: https://github.com/apache/beam/pull/10527#issuecomment-572703985 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 369256) Time Spent: 6h 10m (was: 6h) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 6h 10m > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=369340&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369340 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 09/Jan/20 20:27 Start Date: 09/Jan/20 20:27 Worklog Time Spent: 10m Work Description: apilloud commented on issue #10527: [BEAM-2879] Metric name should not be constant URL: https://github.com/apache/beam/pull/10527#issuecomment-572740318 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 369340) Time Spent: 6h 20m (was: 6h 10m) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 6h 20m > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=369382&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369382 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 09/Jan/20 21:34 Start Date: 09/Jan/20 21:34 Worklog Time Spent: 10m Work Description: 11moon11 commented on issue #10527: [BEAM-2879] Metric name should not be constant URL: https://github.com/apache/beam/pull/10527#issuecomment-572767018 R: @apilloud This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 369382) Time Spent: 6.5h (was: 6h 20m) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 6.5h > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery
[ https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=369446&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369446 ] ASF GitHub Bot logged work on BEAM-2879: Author: ASF GitHub Bot Created on: 09/Jan/20 22:53 Start Date: 09/Jan/20 22:53 Worklog Time Spent: 10m Work Description: apilloud commented on pull request #10527: [BEAM-2879] Metric name should not be constant URL: https://github.com/apache/beam/pull/10527 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 369446) Time Spent: 6h 40m (was: 6.5h) > Implement and use an Avro coder rather than the JSON one for intermediary > files to be loaded in BigQuery > > > Key: BEAM-2879 > URL: https://issues.apache.org/jira/browse/BEAM-2879 > Project: Beam > Issue Type: Improvement > Components: io-java-gcp >Reporter: Black Phoenix >Assignee: Steve Niemitz >Priority: Minor > Labels: starter > Time Spent: 6h 40m > Remaining Estimate: 0h > > Before being loaded in BigQuery, temporary files are created and encoded in > JSON. Which is a costly solution compared to an Avro alternative -- This message was sent by Atlassian Jira (v8.3.4#803005)