[ 
https://issues.apache.org/jira/browse/BEAM-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127852#comment-16127852
 ] 

Matti Remes commented on BEAM-2768:
-----------------------------------

{code:java}
    public static void loadRowsToBigQuery(String name, PCollection<TableRow> 
rows, DynamicDestinations<TableRow, String> destination) {
        rows.apply(name, BigQueryIO.<TableRow>write()
                .withFormatFunction(new TableRowFormatter())
                .to(destination)
                
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
                
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED));
    }

public class TableRowFormatter implements SerializableFunction<TableRow, 
TableRow> {
    @Override
    public TableRow apply(TableRow tableRow) {
        return tableRow;
    }
}

{code}

Apologies for the references, yes I was intending to point to the 2.0.0 source 
(I'm using 2.0.0).

The problem might be with the way the UUID is created and stored. Now the code 
states that the generated UUID "will be used as the base for all load jobs 
issued from this instance of the transform":
https://github.com/apache/beam/blob/v2.0.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java#L184

I can indeed confirm it from the logs that the job id is the same.

> Fix bigquery.WriteTables generating non-unique job identifiers
> --------------------------------------------------------------
>
>                 Key: BEAM-2768
>                 URL: https://issues.apache.org/jira/browse/BEAM-2768
>             Project: Beam
>          Issue Type: Bug
>          Components: beam-model
>    Affects Versions: 2.0.0
>            Reporter: Matti Remes
>            Assignee: Reuven Lax
>
> This is a result of BigQueryIO not creating unique job ids for batch inserts, 
> thus BigQuery API responding with a 409 conflict error:
> {code:java}
> Request failed with code 409, will NOT retry: 
> https://www.googleapis.com/bigquery/v2/projects/<project_id>/jobs
> {code}
> The jobs are initiated in a step BatchLoads/SinglePartitionWriteTables, 
> called by step's WriteTables ParDo:
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java#L511-L521
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java#L148
> It would probably be a good idea to append a UUIDs as part of a job id.
> Edit: This is a major bug blocking using BigQuery as a sink for bounded input.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to