[ 
https://issues.apache.org/jira/browse/BEAM-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

D Bodych updated BEAM-8257:
---------------------------
    Affects Version/s:     (was: 2.15.0)
                       2.21.0

> BigQueryIO - only first day table can be created, despite having 
> CreateDisposition.CREATE_IF_NEEDED
> ---------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-8257
>                 URL: https://issues.apache.org/jira/browse/BEAM-8257
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-gcp
>    Affects Versions: 2.21.0
>         Environment:  Dataflow streaming pipeline 
>            Reporter: D Bodych
>            Priority: P2
>
> I have a dataflow job processing data from pub/sub defined like this:
> *read from pub/sub -> process (my function) -> group into day windows -> 
> write to BQ*
> I'm using *Write.Method.FILE_LOADS* because of bounded input.
> My job works fine, processing lots of GBs of data but it fails and tries to 
> retry forever when it gets to create another table. The job is meant to run 
> continuously and create day tables on its own, it does fine on the first few 
> ones but then gives me indefinitely:
> {code:java}
> Processing stuck in step 
> write-bq/BatchLoads/SinglePartitionWriteTables/ParMultiDo(WriteTables) for at 
> least 05h30m00s without outputting or completing in state finish{code}
> Before this happens it also throws: 
> {code:java}
> Load job <job_id> failed, will retry: {"errorResult": {"message":"Not found: 
> Table <name_of_table> was not found in location US","reason":"notFound"}
> {code}
> It is indeed a right error because this table doesn't exists. Problem is that 
> the job should create it on its own because of defined option 
> *CreateDisposition.CREATE_IF_NEEDED*.
> The number of day tables that it creates correctly without a problem depends 
> on number of workers. It seems that when some worker creates one table its 
> *CreateDisposition* changes to *CREATE_NEVER* causing the problem, but it's 
> only my guess.
> The similar problem was reported here but without any definite answer:
>  
> https://issues.apache.org/jira/browse/BEAM-3772?focusedCommentId=16387609&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16387609
> ProcessElement definition here seems to give some clues but I cannot really 
> say how it works with multiple workers: 
> [https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java#L138]
> I use 2.15.0 Apache SDK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to