[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-11-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=339673&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339673
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 07/Nov/19 01:22
Start Date: 07/Nov/19 01:22
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-550576486
 
 
   > Thanks. Looks great to me.
   > 
   > To make sure I understood correctly, this will not be enabled for existing 
users by default and to enable this users have to specify 
withAvroFormatFunction(), correct ?

   Correct, with schemas I think we could make this enabled transparently, but 
for now its opt-in only.
   
   > Also, can we add a version of BigQueryIOIT so that we can continue to 
monitor both Avro and JSON based BQ write transforms ?
   
   Yeah I can add that in there.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339673)
Time Spent: 2h 50m  (was: 2h 40m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-11-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=339693&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339693
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 07/Nov/19 02:05
Start Date: 07/Nov/19 02:05
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-550586372
 
 
   @chamikaramj are there up-to-date docs anywhere on how to run the 
BigQueryIOIT tests?  The example args in the javadoc are super out of date.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339693)
Time Spent: 3h  (was: 2h 50m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-11-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=339712&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-339712
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 07/Nov/19 02:45
Start Date: 07/Nov/19 02:45
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-550595649
 
 
   > @chamikaramj are there up-to-date docs anywhere on how to run the 
BigQueryIOIT tests? The example args in the javadoc are super out of date.
   
   nm I hacked it up enough to get it to run.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 339712)
Time Spent: 3h 10m  (was: 3h)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340544&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340544
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 08/Nov/19 15:33
Start Date: 08/Nov/19 15:33
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-551873821
 
 
   Thanks. 
   
   Do TODOs in PR description still apply ? 
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340544)
Time Spent: 3h 20m  (was: 3h 10m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340546&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340546
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 08/Nov/19 15:35
Start Date: 08/Nov/19 15:35
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-551874483
 
 
   Also please fixup commits before merging.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340546)
Time Spent: 3h 40m  (was: 3.5h)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340545&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340545
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 08/Nov/19 15:34
Start Date: 08/Nov/19 15:34
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-551873892
 
 
   Run Java PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340545)
Time Spent: 3.5h  (was: 3h 20m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340552&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340552
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 08/Nov/19 15:37
Start Date: 08/Nov/19 15:37
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-551875264
 
 
   > Thanks.
   > 
   > Do TODOs in PR description still apply ?
   
   nope, I just edited the message
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340552)
Time Spent: 3h 50m  (was: 3h 40m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340564&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340564
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 08/Nov/19 15:51
Start Date: 08/Nov/19 15:51
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-551880744
 
 
   > Also please fixup commits before merging.
   
   Was this for me or whoever ends up merging it?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340564)
Time Spent: 4h  (was: 3h 50m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340568&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340568
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 08/Nov/19 15:55
Start Date: 08/Nov/19 15:55
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-551882174
 
 
   It's for you :)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340568)
Time Spent: 4h 10m  (was: 4h)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340572&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340572
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 08/Nov/19 15:57
Start Date: 08/Nov/19 15:57
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-551883131
 
 
   coolio, fwiw, the contribution guide is ambiguous wrt who should do the 
squashing.
   
   > Beam committers can squash all commits in the PR during merge, however if 
a PR has a mixture of independent changes that should not be squashed, and 
fixup commits, then the PR author should help squashing fixup commits to 
maintain a clean commmit history.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340572)
Time Spent: 4h 20m  (was: 4h 10m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340624&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340624
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 08/Nov/19 17:46
Start Date: 08/Nov/19 17:46
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-551924883
 
 
   Thanks. LGTM.
   
   Yeah, committer can squash and commit if you just need one commit. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340624)
Time Spent: 4.5h  (was: 4h 20m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340625&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340625
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 08/Nov/19 17:47
Start Date: 08/Nov/19 17:47
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-551924974
 
 
   Run Java PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340625)
Time Spent: 4h 40m  (was: 4.5h)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340626&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340626
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 08/Nov/19 17:47
Start Date: 08/Nov/19 17:47
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-551925048
 
 
   Will merge after post-commit tests pass.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340626)
Time Spent: 4h 50m  (was: 4h 40m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340752&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340752
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 08/Nov/19 21:19
Start Date: 08/Nov/19 21:19
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-551993467
 
 
   Unfortunately, looks like BigQueryIOIT is a recently added test that is 
currently not captured by any of the test suites.
   
   @steveniemitz Will you be able to add a test that writes to BQ using Avro to 
the BigQueryTornadoesIT that is captured by the Beam Java PostCommit test suite 
?
   
https://github.com/apache/beam/blob/master/examples/java/src/test/java/org/apache/beam/examples/cookbook/BigQueryTornadoesIT.java
   
https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_Java/lastCompletedBuild/testReport/org.apache.beam.examples.cookbook/BigQueryTornadoesIT/
   
   @mwalenia can you comment on the status of BigQueryIOIT ?
   
   cc: @pabloem 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340752)
Time Spent: 5h  (was: 4h 50m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340753&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340753
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 08/Nov/19 21:26
Start Date: 08/Nov/19 21:26
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-551995495
 
 
   > @steveniemitz Will you be able to add a test that writes to BQ using Avro 
to the BigQueryTornadoesIT that is captured by the Beam Java PostCommit test 
suite ?
   
   honestly at this point I'm not going to prioritize doing so.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340753)
Time Spent: 5h 10m  (was: 5h)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340755&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340755
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 08/Nov/19 21:28
Start Date: 08/Nov/19 21:28
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #9665: 
[BEAM-2879] Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340755)
Time Spent: 5.5h  (was: 5h 20m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340754&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340754
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 08/Nov/19 21:28
Start Date: 08/Nov/19 21:28
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-551996254
 
 
   Ok, fair enough.
   
   I'll go ahead and merge this. But please consider adding a regularly running 
integration test in a follow up PR to make sure that this codepath does not 
become stale/broken.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340754)
Time Spent: 5h 20m  (was: 5h 10m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=340756&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-340756
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 08/Nov/19 21:30
Start Date: 08/Nov/19 21:30
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-551996661
 
 
   maybe we should just make BigQueryIOIT actually run ;)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 340756)
Time Spent: 5h 40m  (was: 5.5h)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-09-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=318608&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-318608
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 25/Sep/19 21:18
Start Date: 25/Sep/19 21:18
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on pull request #9665: 
[BEAM-2879] Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665
 
 
   This change enhances BigQueryIO.Write to support writing avro files rather 
than json when using FILE_LOADS (STREAMING_INSERTS is unchanged).
   
   This is semi-WIP, but I wanted to get the review up sooner to get feedback.
   
   TODO:
   - more documentation in BigQueryIO
   - unit tests
   
   ### Benchmarks
   
   Preliminary results look good.  The more CPU constrained a job is, the 
faster avro becomes.
   
   My test dataset is a typical workload of ours, around 2 billion records 
(~130 GB serialized) representing the result of a combine.  My tests read these 
records from GCS and wrote them to BigQuery.  The jobs were run in dataflow 
with 150 x n1-standard-2 workers.
   
   format | time to start load job | bytes written | BQ slot time (ms)
   ---|--|--|-
   avro|  6 m 30 s | 126 GB  | 35,189,679
   json|  8 m 5 s|  712 GB | 96,006,088
   
   
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild

[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=323863&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323863
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 05/Oct/19 03:50
Start Date: 05/Oct/19 03:50
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#discussion_r331727151
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryAvroUtilsTest.java
 ##
 @@ -243,8 +243,8 @@ public void testConvertBigQuerySchemaToAvroSchema() {
 Schema.create(Type.NULL),
 Schema.createRecord(
 "scion",
-"org.apache.beam.sdk.io.gcp.bigquery",
 "Translated Avro Schema for scion",
+"org.apache.beam.sdk.io.gcp.bigquery",
 
 Review comment:
   superfluous change? : )
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323863)
Time Spent: 20m  (was: 10m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=323868&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323868
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 05/Oct/19 04:02
Start Date: 05/Oct/19 04:02
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9665: [BEAM-2879] Support 
writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-538613986
 
 
   This is just a brain dump of what I'm thinking...
   
   I wonder whether we need the `AvroWriteRequest`, and the Avro schema. I 
guess we do, as the `InputElement` (whatever it is) + the Avro schema are all 
one needs to build the `GenericRecord`. Having the `AvroWriteRequest` may help 
make the formatting function as concise as possible
   
   As for supporting Beam schemas + avro files, one could have a 
`useBeamSchemaForAvroFiles()`... though it's a little strange
   
   Another option is to have `useBeamSchema`, and a pre-coded avro formatting 
function called something like ... `BigQueryIOUtils.beamRowToAvroRecord()`. 
Though this is a little awkward too.
   
   ---
   Overall, I like using `AvroWriteRequest` as input for the avro format 
function... and for supporting Beam schemas, it may be that 
`useBeamSchemaForAvroFiles` (or some better name) is the more reasonable 
options.
   WDYT?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323868)
Time Spent: 0.5h  (was: 20m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=323921&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323921
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 05/Oct/19 13:13
Start Date: 05/Oct/19 13:13
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-538648713
 
 
   Thanks for the thoughts!  My comments inline
   
   > This is just a brain dump of what I'm thinking...
   > 
   > I wonder whether we need the `AvroWriteRequest`, and the Avro schema. I 
guess we do, as the `InputElement` (whatever it is) + the Avro schema are all 
one needs to build the `GenericRecord`. Having the `AvroWriteRequest` may help 
make the formatting function as concise as possible
   
   I really went back and forth on this a few times.  We could use 
`SerializableBiFunction` here, but if in the future we ever wanted to add 
another parameter, it'd be a breaking change.  This was we can just add a field 
to the class.  This follows the same pattern as read does, where it takes a 
`SchemaAndRecord` as an input.  You do need both the avro schema and the 
element though in order to support more advanced cases w/ DynamicDestinations, 
etc.  Plus avro schemas themselves aren't easily serializable (until avro 1.9) 
so users can't simply create a closure over them.
   
   I do hate the name though, if you can think of anything better I'd love to 
rename this!

   > As for supporting Beam schemas + avro files, one could have a 
`useBeamSchemaForAvroFiles()`... though it's a little strange
   > 
   > Another option is to have `useBeamSchema`, and a pre-coded avro formatting 
function called something like ... `BigQueryIOUtils.beamRowToAvroRecord()`. 
Though this is a little awkward too.
   
   Yeah I struggled with this as well.  The only thing stopping us from having 
a version that supports beam schemas is the interface.  
`useBeamSchemaForAvroFiles` is a pretty reasonable name. 
   
   > Overall, I like using `AvroWriteRequest` as input for the avro format 
function... and for supporting Beam schemas, it may be that 
`useBeamSchemaForAvroFiles` (or some better name) is the more reasonable 
options.
   > WDYT?
   
I'd be up for adding that in a follow-up PR.  I also have some ideas around 
`writeGenericRecords()` I want to play around with (that would also use beam 
schemas, similar to `AvroIO.writeGenericRecords()` )
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323921)
Time Spent: 40m  (was: 0.5h)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=323922&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323922
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 05/Oct/19 13:15
Start Date: 05/Oct/19 13:15
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on pull request #9665: 
[BEAM-2879] Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#discussion_r331746343
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryAvroUtilsTest.java
 ##
 @@ -243,8 +243,8 @@ public void testConvertBigQuerySchemaToAvroSchema() {
 Schema.create(Type.NULL),
 Schema.createRecord(
 "scion",
-"org.apache.beam.sdk.io.gcp.bigquery",
 "Translated Avro Schema for scion",
+"org.apache.beam.sdk.io.gcp.bigquery",
 
 Review comment:
   oh no actually, this is pretty important!  The previous version flipped the 
namespace and description parameters, resulting in an invalid namespace.  See 
the corresponding change up in BigQueryAvroUtils.
   
   Really avro's fault for making a function with 3 string parameters...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323922)
Time Spent: 50m  (was: 40m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-10-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=323925&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-323925
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 05/Oct/19 13:21
Start Date: 05/Oct/19 13:21
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on pull request #9665: 
[BEAM-2879] Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#discussion_r331746527
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/AbstractRowWriter.java
 ##
 @@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.gcp.bigquery;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState;
+
+import com.google.api.services.bigquery.model.TableRow;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.util.UUID;
+import org.apache.beam.sdk.io.FileSystems;
+import org.apache.beam.sdk.io.fs.ResourceId;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.CountingOutputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** Writes {@link TableRow} objects out to a file. Used when doing batch load 
jobs into BigQuery. */
+abstract class AbstractRowWriter implements AutoCloseable {
 
 Review comment:
   nit: I don't usually write java, is the style convention `AbstractDerp` for 
base classes like this?  or should this just be `RowWriter`?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 323925)
Time Spent: 1h  (was: 50m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-10-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=327023&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-327023
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 11/Oct/19 18:13
Start Date: 11/Oct/19 18:13
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#discussion_r334114662
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/AbstractRowWriter.java
 ##
 @@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.gcp.bigquery;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState;
+
+import com.google.api.services.bigquery.model.TableRow;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.util.UUID;
+import org.apache.beam.sdk.io.FileSystems;
+import org.apache.beam.sdk.io.fs.ResourceId;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.CountingOutputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** Writes {@link TableRow} objects out to a file. Used when doing batch load 
jobs into BigQuery. */
+abstract class AbstractRowWriter implements AutoCloseable {
 
 Review comment:
   In my experience, the Abstract is usually not added to the abstract class. I 
am okay with either.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 327023)
Time Spent: 1h 10m  (was: 1h)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=332337&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332337
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 23/Oct/19 00:31
Start Date: 23/Oct/19 00:31
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9665: [BEAM-2879] Support 
writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-545211844
 
 
   LMK if you'd like me to take another look.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332337)
Time Spent: 1h 20m  (was: 1h 10m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-10-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=332340&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-332340
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 23/Oct/19 00:33
Start Date: 23/Oct/19 00:33
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-545212207
 
 
   > LMK if you'd like me to take another look.
   
   oh, yeah please do.  I don't have much more from my end other than renaming 
the class above.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 332340)
Time Spent: 1.5h  (was: 1h 20m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-10-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=333695&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-333695
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 24/Oct/19 20:03
Start Date: 24/Oct/19 20:03
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9665: [BEAM-2879] Support 
writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-546079959
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 333695)
Time Spent: 1h 40m  (was: 1.5h)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=335100&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335100
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 28/Oct/19 18:35
Start Date: 28/Oct/19 18:35
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9665: [BEAM-2879] Support 
writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-547087321
 
 
   ok this LGTM. @chamikaramj - thoughts?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 335100)
Time Spent: 1h 50m  (was: 1h 40m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=335108&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335108
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 28/Oct/19 18:44
Start Date: 28/Oct/19 18:44
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-547091022
 
 
   Thanks!  I'll do the class rename asap too.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 335108)
Time Spent: 2h  (was: 1h 50m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-10-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=335810&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335810
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 29/Oct/19 22:58
Start Date: 29/Oct/19 22:58
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-547665959
 
 
   Thanks!  You can merge this now and I'll do the rename in another PR, or I 
can get to it tomorrow, up to you!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 335810)
Time Spent: 2h 10m  (was: 2h)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-10-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=335815&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335815
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 29/Oct/19 23:07
Start Date: 29/Oct/19 23:07
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9665: [BEAM-2879] Support 
writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-547668425
 
 
   I don't think we strictly need the rename. But I do want to wait for 
@chamikaramj to take a look : ) - I don't think he'll have objections, but just 
in case he thinks of any improvements.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 335815)
Time Spent: 2h 20m  (was: 2h 10m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-10-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=335844&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335844
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 30/Oct/19 00:49
Start Date: 30/Oct/19 00:49
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #9665: [BEAM-2879] 
Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#issuecomment-547691457
 
 
   Sorry about the delay. Taking a look.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 335844)
Time Spent: 2.5h  (was: 2h 20m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2019-10-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=335893&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-335893
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 30/Oct/19 02:27
Start Date: 30/Oct/19 02:27
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #9665: 
[BEAM-2879] Support writing data to BigQuery via avro
URL: https://github.com/apache/beam/pull/9665#discussion_r340405122
 
 

 ##
 File path: 
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/AbstractRowWriter.java
 ##
 @@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.gcp.bigquery;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState;
+
+import com.google.api.services.bigquery.model.TableRow;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.nio.channels.Channels;
+import java.nio.channels.WritableByteChannel;
+import java.util.UUID;
+import org.apache.beam.sdk.io.FileSystems;
+import org.apache.beam.sdk.io.fs.ResourceId;
+import 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.io.CountingOutputStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** Writes {@link TableRow} objects out to a file. Used when doing batch load 
jobs into BigQuery. */
+abstract class AbstractRowWriter implements AutoCloseable {
 
 Review comment:
   +1 for just RowWriter or BigQueryRowWriter to be more specific.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 335893)
Time Spent: 2h 40m  (was: 2.5h)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2020-01-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=368351&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-368351
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 08/Jan/20 18:44
Start Date: 08/Jan/20 18:44
Worklog Time Spent: 10m 
  Work Description: steveniemitz commented on issue #10527: [BEAM-2879] 
Metric name should not be constant
URL: https://github.com/apache/beam/pull/10527#issuecomment-572202864
 
 
   seems reasonable.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 368351)
Time Spent: 5h 50m  (was: 5h 40m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2020-01-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=368352&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-368352
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 08/Jan/20 18:51
Start Date: 08/Jan/20 18:51
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10527: [BEAM-2879] Metric 
name should not be constant
URL: https://github.com/apache/beam/pull/10527#issuecomment-571841009
 
 
   CC: @apilloud 
   Could you please run following Jenkins jobs/tests?
   `Run BigQueryIO Batch Performance Test Java Avro`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 368352)
Time Spent: 6h  (was: 5h 50m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=369256&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369256
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 09/Jan/20 18:57
Start Date: 09/Jan/20 18:57
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #10527: [BEAM-2879] Metric 
name should not be constant
URL: https://github.com/apache/beam/pull/10527#issuecomment-572703985
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369256)
Time Spent: 6h 10m  (was: 6h)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=369340&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369340
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 09/Jan/20 20:27
Start Date: 09/Jan/20 20:27
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #10527: [BEAM-2879] Metric 
name should not be constant
URL: https://github.com/apache/beam/pull/10527#issuecomment-572740318
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369340)
Time Spent: 6h 20m  (was: 6h 10m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=369382&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369382
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 09/Jan/20 21:34
Start Date: 09/Jan/20 21:34
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #10527: [BEAM-2879] Metric 
name should not be constant
URL: https://github.com/apache/beam/pull/10527#issuecomment-572767018
 
 
   R: @apilloud 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369382)
Time Spent: 6.5h  (was: 6h 20m)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-2879) Implement and use an Avro coder rather than the JSON one for intermediary files to be loaded in BigQuery

2020-01-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2879?focusedWorklogId=369446&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-369446
 ]

ASF GitHub Bot logged work on BEAM-2879:


Author: ASF GitHub Bot
Created on: 09/Jan/20 22:53
Start Date: 09/Jan/20 22:53
Worklog Time Spent: 10m 
  Work Description: apilloud commented on pull request #10527: [BEAM-2879] 
Metric name should not be constant
URL: https://github.com/apache/beam/pull/10527
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 369446)
Time Spent: 6h 40m  (was: 6.5h)

> Implement and use an Avro coder rather than the JSON one for intermediary 
> files to be loaded in BigQuery
> 
>
> Key: BEAM-2879
> URL: https://issues.apache.org/jira/browse/BEAM-2879
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Reporter: Black Phoenix
>Assignee: Steve Niemitz
>Priority: Minor
>  Labels: starter
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> Before being loaded in BigQuery, temporary files are created and encoded in 
> JSON. Which is a costly solution compared to an Avro alternative 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)