[
https://issues.apache.org/jira/browse/BEAM-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17548227#comment-17548227
]
Danny McCormick commented on BEAM-8257:
---------------------------------------
This issue has been migrated to https://github.com/apache/beam/issues/19872
> BigQueryIO - only first day table can be created, despite having
> CreateDisposition.CREATE_IF_NEEDED
> ---------------------------------------------------------------------------------------------------
>
> Key: BEAM-8257
> URL: https://issues.apache.org/jira/browse/BEAM-8257
> Project: Beam
> Issue Type: Bug
> Components: io-java-gcp
> Affects Versions: 2.21.0
> Environment: Dataflow streaming pipeline
> Reporter: D Bodych
> Priority: P3
>
> I have a dataflow job processing data from pub/sub defined like this:
> *read from pub/sub -> process (my function) -> group into day windows ->
> write to BQ*
> I'm using *Write.Method.FILE_LOADS* because of bounded input.
> My job works fine, processing lots of GBs of data but it fails and tries to
> retry forever when it gets to create another table. The job is meant to run
> continuously and create day tables on its own, it does fine on the first few
> ones but then gives me indefinitely:
> {code:java}
> Processing stuck in step
> write-bq/BatchLoads/SinglePartitionWriteTables/ParMultiDo(WriteTables) for at
> least 05h30m00s without outputting or completing in state finish{code}
> Before this happens it also throws:
> {code:java}
> Load job <job_id> failed, will retry: {"errorResult": {"message":"Not found:
> Table <name_of_table> was not found in location US","reason":"notFound"}
> {code}
> It is indeed a right error because this table doesn't exists. Problem is that
> the job should create it on its own because of defined option
> *CreateDisposition.CREATE_IF_NEEDED*.
> The number of day tables that it creates correctly without a problem depends
> on number of workers. It seems that when some worker creates one table its
> *CreateDisposition* changes to *CREATE_NEVER* causing the problem, but it's
> only my guess.
> The similar problem was reported here but without any definite answer:
>
> https://issues.apache.org/jira/browse/BEAM-3772?focusedCommentId=16387609&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16387609
> ProcessElement definition here seems to give some clues but I cannot really
> say how it works with multiple workers:
> [https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java#L138]
> I use 2.15.0 Apache SDK.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)