[ https://issues.apache.org/jira/browse/BEAM-11061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17548931#comment-17548931 ]
Danny McCormick commented on BEAM-11061: ---------------------------------------- This issue has been migrated to https://github.com/apache/beam/issues/20541 > BigQuery IO - Storage API (DIRECT_READ): Fraction consumed decreasing between > responses > --------------------------------------------------------------------------------------- > > Key: BEAM-11061 > URL: https://issues.apache.org/jira/browse/BEAM-11061 > Project: Beam > Issue Type: Bug > Components: io-java-gcp > Affects Versions: 2.23.0 > Reporter: Jonas Grabber > Priority: P3 > > When reading a couple of million rows (and above 100 Gigabytes) from BigQuery > Storage (DIRECT_READ) with Dataflow and above 8 vCPUs (4x n1-standard-4) the > attached exception is thrown about once per vCPU. > The issue seems to be that the value of fraction_consumed in the > [StreamStatus|https://cloud.google.com/bigquery/docs/reference/storage/rpc/google.cloud.bigquery.storage.v1beta1#streamstatus] > object returned from the Storage API decreased between responses. > I tested this repeatedly with varying amounts of input data, number of > workers and machine types and was able to reproduce the issue repeatedly with > different configurations above 4 -8- vCPUs used (2, 4, 16, 32, 128 and, > n1-highmem-4, n1-standard-4, n1-standard-8, and n1-standard-16). > So far Jobs with 4 -8- vCPUs ran fine. (Update: Latest job with 4 vCPUs, 2x > n1-highmem-4, also threw the exception). > {{Error message from worker: java.io.IOException: Failed to advance reader of > source: name: "projects/REDACTED/locations/eu/streams/REDACTED" > org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.advance(WorkerCustomSources.java:620) > > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.advance(ReadOperation.java:399) > > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:194) > > org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159) > > org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77) > > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.executeWork(BatchDataflowWorker.java:417) > > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:386) > > org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:311) > > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:140) > > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:120) > > org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:107) > java.util.concurrent.FutureTask.run(FutureTask.java:266) > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > java.lang.Thread.run(Thread.java:748) Caused by: > java.lang.IllegalArgumentException: Fraction consumed from the current > response (0.7945484519004822) has to be larger than or equal to the fraction > consumed from the previous response (0.8467302322387695). > org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:440) > > org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.readNextRecord(BigQueryStorageStreamSource.java:242) > > org.apache.beam.sdk.io.gcp.bigquery.BigQueryStorageStreamSource$BigQueryStorageStreamReader.advance(BigQueryStorageStreamSource.java:211) > > org.apache.beam.runners.dataflow.worker.WorkerCustomSources$BoundedReaderIterator.advance(WorkerCustomSources.java:617) > ... 14 more}} -- This message was sent by Atlassian Jira (v8.20.7#820007)