[ https://issues.apache.org/jira/browse/BEAM-7693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ismaël Mejía updated BEAM-7693: ------------------------------- Status: Open (was: Triage Needed) > FILE_LOADS option for inserting rows in BigQuery creates a stuck process in > Dataflow that saturates all the resources of the Job > -------------------------------------------------------------------------------------------------------------------------------- > > Key: BEAM-7693 > URL: https://issues.apache.org/jira/browse/BEAM-7693 > Project: Beam > Issue Type: Bug > Components: io-java-files > Affects Versions: 2.13.0 > Environment: Dataflow > Reporter: Juan Urrego > Priority: Major > Attachments: Screenshot 2019-07-06 at 15.05.04.png, > image-2019-07-06-15-04-17-593.png > > > During a Stream Job, when you insert records to BigQuery in batch using the > FILE_LOADS option and one of the jobs fail, the thread who failed is getting > stuck and eventually it saturates the Job resources, making the autoscaling > option useless (uses the max number of workers and the system latency always > goes up). In some cases it has become ridiculous slow trying to process the > incoming events. > Here is an example: > {code:java} > BigQueryIO.writeTableRows() > .to(destinationTableSerializableFunction) > .withMethod(Method.FILE_LOADS) > .withJsonSchema(tableSchema) > .withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED) > .withWriteDisposition(WriteDisposition.WRITE_APPEND) > .withTriggeringFrequency(Duration.standardMinutes(5)) > .withNumFileShards(25); > {code} > The pipeline works like a charm, but in the moment that I send a wrong > tableRow (for instance a required value as null) the pipeline starts sending > this messages: > {code:java} > Processing stuck in step FILE_LOADS: <StepName> in > BigQuery/BatchLoads/SinglePartitionWriteTables/ParMultiDo(WriteTables) for at > least 10m00s without outputting or completing in state finish at > java.lang.Thread.sleep(Native Method) at > com.google.api.client.util.Sleeper$1.sleep(Sleeper.java:42) at > com.google.api.client.util.BackOffUtils.next(BackOffUtils.java:48) at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJobManager.nextBackOff(BigQueryHelpers.java:159) > at > org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers$PendingJobManager.waitForDone(BigQueryHelpers.java:145) > at > org.apache.beam.sdk.io.gcp.bigquery.WriteTables$WriteTablesDoFn.finishBundle(WriteTables.java:255) > at > org.apache.beam.sdk.io.gcp.bigquery.WriteTables$WriteTablesDoFn$DoFnInvoker.invokeFinishBundle(Unknown > Source) > {code} > It's clear that the step keeps running even when it failed. The BigQuery Job > mentions that the task failed, but DataFlow keeps trying to wait for a > response, even when the job is never executed again. > !image-2019-07-06-15-04-17-593.png|width=497,height=134! > At the same time, no message is sent to the DropInputs step, even when I > created my own step for DeadLetter, the process think that it hasn't failed > yet. > !Screenshot 2019-07-06 at 15.05.04.png|width=490,height=306! > The only option that I have found so far, is to pre validate all the fields > before, but I was expecting the DB to do that for me, especially in some > extreme cases (like decimal numbers or constraint limitations). Please help > fixing this issue, otherwise the batch option in stream jobs is almost > useless, because I can't trust the own library to manage dead letters properly > -- This message was sent by Atlassian JIRA (v7.6.14#76016)