Abacn commented on code in PR #32428:
URL: https://github.com/apache/beam/pull/32428#discussion_r1761710267


##########
sdks/python/apache_beam/io/gcp/gcsio.py:
##########
@@ -262,33 +269,18 @@ def delete_batch(self, paths):
              succeeded or the relevant exception if the operation failed.
     """
     final_results = []
-    s = 0
-    if not isinstance(paths, list): paths = list(iter(paths))
-    while s < len(paths):
-      if (s + MAX_BATCH_OPERATION_SIZE) < len(paths):
-        current_paths = paths[s:s + MAX_BATCH_OPERATION_SIZE]
-      else:
-        current_paths = paths[s:]
-      current_batch = self.client.batch(raise_exception=False)
-      with current_batch:
-        for path in current_paths:
-          bucket_name, blob_name = parse_gcs_path(path)
-          bucket = self.client.bucket(bucket_name)
-          bucket.delete_blob(blob_name)
-
-      for i, path in enumerate(current_paths):
-        error_code = None
-        resp = current_batch._responses[i]
-        if resp.status_code >= 400 and resp.status_code != 404:
-          error_code = resp.status_code
-        final_results.append((path, error_code))
-
-      s += MAX_BATCH_OPERATION_SIZE
-
+    for path in paths:
+      error_code = None
+      try:
+        self.delete(path)

Review Comment:
   If so, I think we do not need to add throttling metrics counter to batch 
copy and delete. By construct if they do not retry then there is no wait time 
due to throttling.
   
   Alternatively we can check the return code of the batch process, if a 
resource exahusted request indeed returns some http error, we can treat the 
whole blocking operation time as a throttled time and report it with the 
throttling counter



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to