shahar1 commented on code in PR #61949:
URL: https://github.com/apache/airflow/pull/61949#discussion_r3223655275
##########
providers/google/src/airflow/providers/google/cloud/hooks/gcs.py:
##########
@@ -347,6 +344,16 @@ def download(
blob = bucket.blob(blob_name=object_name,
chunk_size=chunk_size)
if filename:
+ blob.reload(timeout=timeout)
+ if blob.size is not None:
+ directory = os.path.dirname(filename) or os.getcwd()
+ free_space = shutil.disk_usage(directory).free
+ if free_space < blob.size:
+ raise AirflowException(
+ f"Insufficient disk space to download file. "
+ f"Required: {blob.size} bytes, Available:
{free_space} bytes."
+ )
Review Comment:
Could you please replace `AirflowException` with a native Python exception?
(static checks should now fail because of that upon rebase/merge from `main`)
##########
providers/google/src/airflow/providers/google/cloud/hooks/gcs.py:
##########
@@ -347,6 +344,16 @@ def download(
blob = bucket.blob(blob_name=object_name,
chunk_size=chunk_size)
if filename:
+ blob.reload(timeout=timeout)
Review Comment:
1. Can we extract the `blob.reload(...)` outside the retry loop?
2. Could you please add a flag (+docstring) for checking disk space
(default: `False`), as it is an extra API call which might not be relevant when
there's no storage limitations.
##########
providers/google/src/airflow/providers/google/cloud/hooks/gcs.py:
##########
@@ -347,6 +344,16 @@ def download(
blob = bucket.blob(blob_name=object_name,
chunk_size=chunk_size)
if filename:
+ blob.reload(timeout=timeout)
+ if blob.size is not None:
+ directory = os.path.dirname(filename) or os.getcwd()
+ free_space = shutil.disk_usage(directory).free
Review Comment:
It will fail with Python's `FileNotFoundError` if directory doesn't exist,
rather than what GCS client raises.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]