ahmedabu98 commented on code in PR #25325:
URL: https://github.com/apache/beam/pull/25325#discussion_r1146050389


##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1552,30 @@ def _flush_batch(self, destination):
       insert_ids = [None for r in rows_and_insert_ids]
     else:
       insert_ids = [r[1] for r in rows_and_insert_ids]
-
     while True:
+      errors = []
+      passed = False
       start = time.time()
-      passed, errors = self.bigquery_wrapper.insert_rows(
-          project_id=table_reference.projectId,
-          dataset_id=table_reference.datasetId,
-          table_id=table_reference.tableId,
-          rows=rows,
-          insert_ids=insert_ids,
-          skip_invalid_rows=True,
-          ignore_unknown_values=self.ignore_unknown_columns)
+      try:
+        passed, errors = self.bigquery_wrapper.insert_rows(
+              project_id=table_reference.projectId,
+              dataset_id=table_reference.datasetId,
+              table_id=table_reference.tableId,
+              rows=rows,
+              insert_ids=insert_ids,
+              skip_invalid_rows=True,
+              ignore_unknown_values=self.ignore_unknown_columns)
+      except (ClientError, GoogleAPICallError) as e:
+        if e.code == 404 and destination in _KNOWN_TABLES:
+          _KNOWN_TABLES.remove(destination)

Review Comment:
   > I believe after raising, the bundle will retry, _create_table_if_needed is 
called again
   
   Ahh you're right, ignore my suggestion.
   
   > I've also realised, if the create_disposition is 'CREATE_NEVER' and the 
insert_retry_strategy is set to 'RETRY_ALWAYS', the pipeline will be stuck in a 
loop, no?
   
   The retry strategy refers to errors we receive when inserting individual 
rows (e.g. schema mismatch) and they come after BQ tries inserting in the 
table. Those failed row insertions may be retried according to the strategy 
(and that logic is handled directly in this file), but the errors don't cause 
the whole bundle to fail.
   
   The error we're looking at here is from the HTTP request (the table itself 
doesn't exist). This error will cause the bundle to fail and it's up to the 
runner to decide how it deals with this (e.g. DirectRunner fails the pipeline; 
DataflowRunner retries a failed bundle 3 times then fails the pipeline).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to