ajdub508 commented on code in PR #28091:
URL: https://github.com/apache/beam/pull/28091#discussion_r1308634631
##########
sdks/python/apache_beam/io/gcp/bigquery_tools.py:
##########
@@ -732,12 +732,23 @@ def _insert_all_rows(
except (ClientError, GoogleAPICallError) as e:
# e.code contains the numeric http status code.
service_call_metric.call(e.code)
- # Re-reise the exception so that we re-try appropriately.
- raise
- except HttpError as e:
+ # Package exception up with required fields
+ # Set reason to 'invalid' to consider these execptions as
RetryStrategy._NON_TRANSIENT_ERRORS
Review Comment:
Thanks @liferoad, I'll take a closer look at that. Let me know if I'm off on
any of this, but my thinking had been - as the code currently stands the
`ClientError`, `GoogleAPICallError`, and `HttpError` exceptions never get a
chance to be retried anyway. So this doesn't take away a chance to be retried,
just makes sure the rows can be captured in failed_rows and provide a way to
disposition the message and ack a pubsub message.
The reason I had thought that the ClientError, GoogleAPICallError, and
HttpError exceptions never get a chance to be retried is:
- the exceptions gets re-raised in `_insert_all_rows`
[here](https://github.com/apache/beam/blob/26b77723445cf383365098b296b9db77409af94c/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L732-L740)
- the exceptions is not caught by `insert_rows`
[here](https://github.com/apache/beam/blob/26b77723445cf383365098b296b9db77409af94c/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L1267-L1271)
- and the `_flush_batch` method
[here](https://github.com/apache/beam/blob/26b77723445cf383365098b296b9db77409af94c/sdks/python/apache_beam/io/gcp/bigquery.py#L1635-L1647)
isn't catching them either
End result of all that for `ClientError`, `GoogleAPICallError`, and
`HttpError` exceptions is that the exception results in a pipeline error rather
than producing a usable `errors` list, and RetryStrategy never gets a chance to
be evaluated.
My thought was that it would be better to at least route those rows to the
failed rows tag where users can choose what to do with them and avoid issues
I've seen with pubsub messages that are never acked.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]