andrewsg commented on code in PR #32428:
URL: https://github.com/apache/beam/pull/32428#discussion_r1763556675
##########
sdks/python/apache_beam/io/gcp/gcsio.py:
##########
@@ -245,12 +249,21 @@ def delete(self, path):
bucket_name, blob_name = parse_gcs_path(path)
try:
bucket = self.client.bucket(bucket_name)
- bucket.delete_blob(blob_name)
+ if self._use_blob_generation:
Review Comment:
Since delete_blob below has retry=self._storage_client_retry, you can skip
generations here as long as you don't care if there is an unexpected new
version of the object due to some other process.
I would recommend wrapping the delete call in a try/except to catch and
ignore 404 errors, though.
##########
sdks/python/apache_beam/io/gcp/gcsio.py:
##########
@@ -297,19 +308,32 @@ def copy(self, src, dest):
dest: GCS file path pattern in the form gs://<bucket>/<name>.
Raises:
- TimeoutError: on timeout.
+ Any exceptions during copying
"""
src_bucket_name, src_blob_name = parse_gcs_path(src)
dest_bucket_name, dest_blob_name= parse_gcs_path(dest,
object_optional=True)
src_bucket = self.client.bucket(src_bucket_name)
- src_blob = src_bucket.blob(src_blob_name)
+ if self._use_blob_generation:
Review Comment:
Since you have retry=... below, you can probably do without this generation
lock. In your case you only really need one of them.
##########
sdks/python/apache_beam/io/gcp/gcsio.py:
##########
@@ -297,19 +308,32 @@ def copy(self, src, dest):
dest: GCS file path pattern in the form gs://<bucket>/<name>.
Raises:
- TimeoutError: on timeout.
+ Any exceptions during copying
"""
src_bucket_name, src_blob_name = parse_gcs_path(src)
dest_bucket_name, dest_blob_name= parse_gcs_path(dest,
object_optional=True)
src_bucket = self.client.bucket(src_bucket_name)
- src_blob = src_bucket.blob(src_blob_name)
+ if self._use_blob_generation:
+ src_blob = src_bucket.get_blob(src_blob_name)
+ if src_blob is None:
+ raise NotFound("source blob %s not found during copying" % src)
+ src_generation = getattr(src_blob, "generation", None)
Review Comment:
This should only happen if src_blob is None, which is handled in the above
line. So I think you don't need getattr here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]