shunping commented on code in PR #32428:
URL: https://github.com/apache/beam/pull/32428#discussion_r1761723747
##########
sdks/python/apache_beam/io/gcp/gcsio.py:
##########
@@ -262,33 +269,18 @@ def delete_batch(self, paths):
succeeded or the relevant exception if the operation failed.
"""
final_results = []
- s = 0
- if not isinstance(paths, list): paths = list(iter(paths))
- while s < len(paths):
- if (s + MAX_BATCH_OPERATION_SIZE) < len(paths):
- current_paths = paths[s:s + MAX_BATCH_OPERATION_SIZE]
- else:
- current_paths = paths[s:]
- current_batch = self.client.batch(raise_exception=False)
- with current_batch:
- for path in current_paths:
- bucket_name, blob_name = parse_gcs_path(path)
- bucket = self.client.bucket(bucket_name)
- bucket.delete_blob(blob_name)
-
- for i, path in enumerate(current_paths):
- error_code = None
- resp = current_batch._responses[i]
- if resp.status_code >= 400 and resp.status_code != 404:
- error_code = resp.status_code
- final_results.append((path, error_code))
-
- s += MAX_BATCH_OPERATION_SIZE
-
+ for path in paths:
+ error_code = None
+ try:
+ self.delete(path)
Review Comment:
I think retrying on batch operation is useful too, but I agree with your
concern of performance regression. I think an alternative way is to keep both
the old and new implementations. We can call the old way (fast code path)
first, if there is anything wrong, we then call the new way (slow code path)
which increments the throttling counter.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]