ng-oliver opened a new issue, #35978:
URL: https://github.com/apache/beam/issues/35978

   Currently in `WriteToBigQuery(PTransform)`, the argument 
`[retry_strategy](https://github.com/apache/beam/blob/4c9799388c0386920fa2c058c5b66b8a9b0505bd/sdks/python/apache_beam/io/gcp/bigquery.py#L1444)`
 is only effective under `method = streaming_inserts`. It would be very helpful 
if the argument is also effective under `method = storage_write_api` to enable 
users more control on retry mechanism.
   
   I come across an issue today that 
   - my production database had a row of data that was 26MB
   - when dataflow read the 26MB row and attempted to write into BigQuery 
warehouse, it reached the BigQuery 10MB row size limit under Storage Write API
   - even though i configured no retry in my dataflow job, because the argument 
was not taken into consideration under `method = storage_write_api`, dataflow 
was retrying indefinitely until i drained the job and restarted a new one
   
   **Why is it problem**
   - Because retry will be made indefinitely, all other error handling 
mechanisms downstream such as writing to dead letter tables will not be 
executed 
   - My only option was to configure in the software side to ensure the API 
call will be made in chunks of reasonable size such that when the row of data 
lands into production, it will be smaller than a size of 10MB
   
   To be fair, the chance of having a row of data > 10MB in production is low, 
but it is still very valuable to enable users to effectively handle errors 
under `method = storage_write_api` regardless the type of error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to