[ https://issues.apache.org/jira/browse/BEAM-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
D Bodych updated BEAM-8257: --------------------------- Affects Version/s: (was: 2.15.0) 2.21.0 > BigQueryIO - only first day table can be created, despite having > CreateDisposition.CREATE_IF_NEEDED > --------------------------------------------------------------------------------------------------- > > Key: BEAM-8257 > URL: https://issues.apache.org/jira/browse/BEAM-8257 > Project: Beam > Issue Type: Bug > Components: io-java-gcp > Affects Versions: 2.21.0 > Environment: Dataflow streaming pipeline > Reporter: D Bodych > Priority: P2 > > I have a dataflow job processing data from pub/sub defined like this: > *read from pub/sub -> process (my function) -> group into day windows -> > write to BQ* > I'm using *Write.Method.FILE_LOADS* because of bounded input. > My job works fine, processing lots of GBs of data but it fails and tries to > retry forever when it gets to create another table. The job is meant to run > continuously and create day tables on its own, it does fine on the first few > ones but then gives me indefinitely: > {code:java} > Processing stuck in step > write-bq/BatchLoads/SinglePartitionWriteTables/ParMultiDo(WriteTables) for at > least 05h30m00s without outputting or completing in state finish{code} > Before this happens it also throws: > {code:java} > Load job <job_id> failed, will retry: {"errorResult": {"message":"Not found: > Table <name_of_table> was not found in location US","reason":"notFound"} > {code} > It is indeed a right error because this table doesn't exists. Problem is that > the job should create it on its own because of defined option > *CreateDisposition.CREATE_IF_NEEDED*. > The number of day tables that it creates correctly without a problem depends > on number of workers. It seems that when some worker creates one table its > *CreateDisposition* changes to *CREATE_NEVER* causing the problem, but it's > only my guess. > The similar problem was reported here but without any definite answer: > > https://issues.apache.org/jira/browse/BEAM-3772?focusedCommentId=16387609&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16387609 > ProcessElement definition here seems to give some clues but I cannot really > say how it works with multiple workers: > [https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java#L138] > I use 2.15.0 Apache SDK. -- This message was sent by Atlassian Jira (v8.3.4#803005)