jbandoro commented on issue #31422: URL: https://github.com/apache/beam/issues/31422#issuecomment-3033835318
Similar to the OP but with the python sdk, we're reading messages from a Pub/Sub subscription (with `beam.io.gcp.pubsub.ReadFromPubSub`), parsing the records and then writing to BigQuery with: ```python parsed_messages | "Write to BigQuery" >> beam.io.WriteToBigQuery( table=FULL_TABLE_ID, schema=TABLE_SCHEMA, write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND, method=beam.io.WriteToBigQuery.Method.STORAGE_WRITE_API, with_auto_sharding=True, ) ``` Using `apache-beam==2.66.0` with the DataflowRunner on a streaming pipeline with the `streaming_mode_at_least_once` mode. We're using a custom dataflow image following the instructions in the [docs](https://cloud.google.com/dataflow/docs/guides/build-container-image). From my testing setting `with_auto_sharding` to False still resulted in the the errors I posted in the comment above. One arg for `WriteToBigQuery` that seemed to reduce the frequency of the errors was to set `triggering_frequency` to a value above 5 seconds. I also set some custom metric counters and it doesn't appear that we're loosing any data as the retries by the workers are working and all the streaming data is landing in the BigQuery sink. Let me know if there's anything else that would be helpful or steps I can take, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org