[ 
https://issues.apache.org/jira/browse/BEAM-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127687#comment-16127687
 ] 

Eugene Kirpichov commented on BEAM-2768:
----------------------------------------

Could you tell more about how you're using BigQueryIO.Write (it has many modes 
- it would be best if you could show a code snippet where you're applying 
BigQueryIO.write() in your pipeline, removing all personal data but at least 
exactly showing all BigQueryIO API methods you're using) and what exact version 
of Beam SDK you're using? Your links point to the master branch, but the bug 
description says 2.0.0 - these versions have very different implementations of 
BigQueryIO.Write.

Looking at the current code, the job id *does* contain a random UUID that comes 
from 
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java#L348.

> Fix bigquery.WriteTables generating non-unique job identifiers
> --------------------------------------------------------------
>
>                 Key: BEAM-2768
>                 URL: https://issues.apache.org/jira/browse/BEAM-2768
>             Project: Beam
>          Issue Type: Bug
>          Components: beam-model
>    Affects Versions: 2.0.0
>            Reporter: Matti Remes
>            Assignee: Reuven Lax
>
> This is a result of BigQueryIO not creating unique job ids for batch inserts, 
> thus BigQuery API responding with a 409 conflict error:
> {code:java}
> Request failed with code 409, will NOT retry: 
> https://www.googleapis.com/bigquery/v2/projects/<project_id>/jobs
> {code}
> The jobs are initiated in a step BatchLoads/SinglePartitionWriteTables, 
> called by step's WriteTables ParDo:
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java#L511-L521
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java#L148
> It would probably be a good idea to append a UUIDs as part of a job id.
> Edit: This is a major bug blocking using BigQuery as a sink for bounded input.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to