helensilva14 opened a new issue, #21893: URL: https://github.com/apache/beam/issues/21893
### What happened? Hello! Me and my team found a scenario where we needed to check if Beam can handle the dynamically creation of BQ tables with partitions using the new API. Like the Spark BigQuery Connector, the Beam connector supports different ways of writing to BigQuery, currently having t[hese write methods available](https://beam.apache.org/releases/javadoc/2.39.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.Method.html): 1. FILE_LOADS 2. STORAGE_API_AT_LEAST_ONCE 3. STORAGE_WRITE_API 4. STREAMING_INSERTS The second and third ones make use of [BigQuery Storage Write API](https://cloud.google.com/bigquery/docs/write-api) (if using the Spark BQ connector, this would be the direct mode) **Regarding table partitions (default BQ time partitioning):** - According to the documentation, ALL 4 methods have the option [withTimePartitioning](https://beam.apache.org/releases/javadoc/2.39.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html#withTimePartitioning-com.google.api.services.bigquery.model.TimePartitioning-), which allows only the default BQ partitions (DAY, MONTH, YEAR). We tested it with the second and third write methods and we got this error: ``` SEVERE: 2022-06-15T16:03:40.193Z: java.lang.NoSuchMethodError: 'long com.google.cloud.bigquery.storage.v1.StreamWriter.getInflightWaitSeconds()' at org.apache.beam.sdk.io.gcp.bigquery.BigQueryServicesImpl$DatasetServiceImpl$1.getInflightWaitSeconds(BigQueryServicesImpl.java:1291) at org.apache.beam.sdk.io.gcp.bigquery.StorageApiWriteUnshardedRecords$WriteRecordsDoFn$DestinationState.lambda$flush$1(StorageApiWriteUnshardedRecords.java:342) at org.apache.beam.sdk.io.gcp.bigquery.RetryManager$Operation.run(RetryManager.java:131) at org.apache.beam.sdk.io.gcp.bigquery.RetryManager.run(RetryManager.java:247) at org.apache.beam.sdk.io.gcp.bigquery.StorageApiWriteUnshardedRecords$WriteRecordsDoFn.flushAll(StorageApiWriteUnshardedRecords.java:435) at org.apache.beam.sdk.io.gcp.bigquery.StorageApiWriteUnshardedRecords$WriteRecordsDoFn.finishBundle(StorageApiWriteUnshardedRecords.java:495) ``` **Regarding table partitions (custom columns):** - We know that BigQuery has the option for specifying a column as time or range partition, and we are interested in that - According to the documentation, unlike the Spark BigQuery connector (indirect mode), the Beam connector does not support specifying custom columns as partitions when writing to BQ - The Spark BigQuery connector (direct mode) claims that it supports partitions if the table is already created and defined with the specific column as a partition. - We tried this approach with the Beam connector, and we got the same error presented above **Conclusion:** it seems that Apache Beam connector implementation which uses the BigQuery Storage Write API has problems and limitations regarding table partitions. The testing pipeline is provided as a Gist [here](https://gist.github.com/helensilva14/b4f4a33c1c339a866af4f7f4c7918c1e). We hope this issue can be addressed and would be glad to help/validate. Thanks! ### Issue Priority Priority: 1 ### Issue Component Component: io-java-gcp -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
