[ https://issues.apache.org/jira/browse/BEAM-7693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Juan Urrego updated BEAM-7693: ------------------------------ Attachment: Screenshot 2019-07-06 at 14.51.36.png > FILE_LOADS option for inserting rows in BigQuery creates a stuck process in > Dataflow that saturates all the resources of the Job > -------------------------------------------------------------------------------------------------------------------------------- > > Key: BEAM-7693 > URL: https://issues.apache.org/jira/browse/BEAM-7693 > Project: Beam > Issue Type: Bug > Components: io-java-files > Affects Versions: 2.13.0 > Environment: Dataflow > Reporter: Juan Urrego > Priority: Major > Attachments: Screenshot 2019-07-06 at 14.51.36.png > > > During a Stream Job, when you insert records to BigQuery in batch using the > FILE_LOADS option and one of the jobs fail, the thread who failed is getting > stuck and eventually it saturates the Job resources, making the autoscaling > option useless (uses the max number of workers and the system latency always > goes up). In some cases it has become ridiculous slow trying to process the > incoming events. > Here is an example: > {code:java} > BigQueryIO.writeTableRows() > .to(destinationTableSerializableFunction) > .withMethod(Method.FILE_LOADS) > .withJsonSchema(tableSchema) > .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED) > .withWriteDisposition(WriteDisposition.WRITE_APPEND) > .withTriggeringFrequency(Duration.standardMinutes(5)) > .withNumFileShards(25); > {code} > The pipeline works like a charm, but in the moment that I send a wrong > tableRow (for instance a required value as null) the pipeline starts sending > this messages: > {code:java} > Processing stuck in step FILE_LOADS: <StepName> in > BigQuery/BatchLoads/SinglePartitionWriteTables/ParMultiDo(WriteTables) for at > least 10m00s without outputting or completing in state finish at > java.lang.Thread.sleep(Native Method) at > com.google.api.client.util.Sleeper$1.sleep(Sleeper.java:42) at > com.google.api.client.util.BackOffUtils.next(BackOffUtils.java:48) at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJobManager.nextBackOff(BigQueryHelpers.java:159) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJobManager.waitForDone(BigQueryHelpers.java:145) > at > org.apache.beam.sdk.io.gcp.bigquery.WriteTables$WriteTablesDoFn.finishBundle(WriteTables.java:255) > at > org.apache.beam.sdk.io.gcp.bigquery.WriteTables$WriteTablesDoFn$DoFnInvoker.invokeFinishBundle(Unknown > Source) > {code} > It's clear that the step keeps running even when it failed. The BigQuery Job > mentions that the task failed, but DataFlow keeps trying to wait for a > response, even when the job is never executed again. At the same time, no > message is sent to the DropInputs step, even when I created my own step for > DeadLetter, the process think that it hasn't failed yet. > The only option that I have found so far, is to pre validate all the fields > before, but I was expecting the DB to do that for me, especially in some > extreme cases (like decimal numbers or constraint limitations). Please help > fixing this issue, otherwise the batch option in stream jobs is almost > useless, because I can't trust the own library to manage dead letters properly > -- This message was sent by Atlassian JIRA (v7.6.3#76005)