[ https://issues.apache.org/jira/browse/BEAM-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17061239#comment-17061239 ]
Luke Cwik edited comment on BEAM-9390 at 3/17/20, 10:57 PM: ------------------------------------------------------------ The issue is that the BigQuerySource only works with initial splitting since initial splitting kicks off the import job. This can be fixed by: 1) Restore the JavaReadViaImpulse expansion for Dataflow runner only when using beam_fn_api 2) Update the JRH to support doing the expansion for SplittableDoFns itself 3) Migrate BigQuerySource to a SplittableDoFn 4) Implement "concat" source over the AvroIO files. 5) Migrate to use the UW and then Dataflow will perform the SDF expansion and disable the JRH version of this test. All but #3 is throw away work but #3 would make the source incompatible with non-portable runners. #5 is likely the easiest to do once it becomes available. was (Author: lcwik): The issue is that the BigQuerySource only works with initial splitting since initial splitting kicks off the import job. This can be fixed by: 1) Restore the JavaReadViaImpulse expansion for Dataflow runner only when using beam_fn_api 2) Update the JRH to support doing the expansion for SplittableDoFns itself 3) Migrate BigQuerySource to a SplittableDoFn 4) Implement "concat" source over the AvroIO files. 5) Migrate to use the UW and then Dataflow will perform the SDF expansion. All but #3 is throw away work but #3 would make the source incompatible with non-portable runners. #5 is likely the easiest to do once it becomes available. > [PostCommit_Java_PortabilityApi] [BigQuery related ITs] > UnsupportedOperationException: BigQuery source must be split before being read > -------------------------------------------------------------------------------------------------------------------------------------- > > Key: BEAM-9390 > URL: https://issues.apache.org/jira/browse/BEAM-9390 > Project: Beam > Issue Type: Bug > Components: test-failures > Reporter: Yueyang Qiu > Assignee: Luke Cwik > Priority: Major > Labels: currently-failing > > Failed tests: > org.apache.beam.examples.cookbook.BigQueryTornadoesIT.testE2EBigQueryTornadoesWithExport > > org.apache.beam.examples.cookbook.BigQueryTornadoesIT.testE2eBigQueryTornadoesWithStorageApi > > org.apache.beam.sdk.io.gcp.bigquery.BigQueryTimePartitioningClusteringIT.testE2EBigQueryClusteringTableFunction > > org.apache.beam.sdk.io.gcp.bigquery.BigQueryTimePartitioningClusteringIT.testE2EBigQueryClusteringDynamicDestinations > > org.apache.beam.sdk.io.gcp.bigquery.BigQueryTimePartitioningClusteringIT.testE2EBigQueryTimePartitioning > > org.apache.beam.sdk.io.gcp.bigquery.BigQueryTimePartitioningClusteringIT.testE2EBigQueryClustering > > org.apache.beam.sdk.io.gcp.bigquery.BigQueryToTableIT.testNewTypesQueryWithReshuffle > > org.apache.beam.sdk.io.gcp.bigquery.BigQueryToTableIT.testLegacyQueryWithoutReshuffle > > org.apache.beam.sdk.io.gcp.bigquery.BigQueryToTableIT.testNewTypesQueryWithoutReshuffle > > org.apache.beam.sdk.io.gcp.bigquery.BigQueryToTableIT.testStandardQueryWithoutCustom > ([https://builds.apache.org/job/beam_PostCommit_Java_PortabilityApi/4226/#showFailuresLink)] > > Example failures: > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: Error received from SDK harness for instruction > -596: java.lang.UnsupportedOperationException: BigQuery source must be split > before being read at > org.apache.beam.sdk.io.gcp.bigquery.BigQuerySourceBase.createReader(BigQuerySourceBase.java:173) > at > org.apache.beam.fn.harness.BoundedSourceRunner.runReadLoop(BoundedSourceRunner.java:159) > at > org.apache.beam.fn.harness.BoundedSourceRunner.start(BoundedSourceRunner.java:146) > ... -- This message was sent by Atlassian Jira (v8.3.4#803005)