atrbgithub opened a new issue, #34909:
URL: https://github.com/apache/airflow/issues/34909

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### What happened
   
   This affects Airflow 2.7.2. It appears that the 10.9.0 version of 
apache-airflow-providers-google fails to list objects in gcs. 
   
   Example to recreate:
   
   ```shell
   pipenv --python 3.8
   pipenv shell
   pip install apache-airflow==2.7.2 apache-airflow-providers-google==10.9.0
   export AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT='google-cloud-platform://'
   ```
   
   Then create the following python test file:
   
   ```python
   from airflow.providers.google.cloud.hooks.gcs import GCSHook
   
   result = GCSHook().list(
       bucket_name='a-test-bucket,
       prefix="a/test/prefix",
       delimiter='.csv'
   )
   
   result = list(result)
   print(result)
   ```
   
   The output if this is:
   ```
   []
   ```
   
   In a different pipenv environment, this works when using Airflow 2.7.1 and 
the 10.7.0 version of the provider:
   
   ```shell
   pipenv --python 3.8
   pipenv shell
   pip install apache-airflow==2.7.1 apache-airflow-providers-google==10.7.0
   export AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT='google-cloud-platform://'
   ```
   
   Use the same python test file as above. The output of this is a list of 
files as expected. 
   
   
[this](https://github.com/apache/airflow/commit/3fa9d46ec74ef8453fcf17fbd49280cb6fb37cef#diff-82854006b5553665046db26d43a9dfa90bec78d4ba93e2d2ca7ff5bf632fa624R832)
 appears to be the commit which may have broken things. 
   
   The `hooks/gcs.py` file can be patched in the following way which appears to 
force the lazy loading to kick in:
   
   ```python
               print("Forcing loading....")
               all_blobs = list(blobs)
   
               for blob in all_blobs:
                   print(blob.name)
   
               if blobs.prefixes:
                   ids.extend(blobs.prefixes)
               else:
                   ids.extend(blob.name for blob in all_blobs)
   
               page_token = blobs.next_page_token
   
               if page_token is None:
                   # empty next page token
                   break
   ```
   
   Example patch file:
   
   ```
   +++ gcs.py      2023-10-12 11:34:00.774206013 +0000
   @@ -829,12 +829,19 @@
                        versions=versions,
                    )
   
   +            print("Forcing loading....")
   +            all_blobs = list(blobs)
   +
   +            for blob in all_blobs:
   +                print(blob.name)
   +
                if blobs.prefixes:
                    ids.extend(blobs.prefixes)
                else:
   -                ids.extend(blob.name for blob in blobs)
   +                ids.extend(blob.name for blob in all_blobs)
   
                page_token = blobs.next_page_token
   +
                if page_token is None:
                    # empty next page token
                    break
   ```
   
   
   
   
   ### What you think should happen instead
   
   The provider should be able to list files in gcs. 
   
   ### How to reproduce
   
   Please see above for the steps to reproduce. 
   
   ### Operating System
   
   n/a
   
   ### Versions of Apache Airflow Providers
   
   10.9.0 of the google provider. 
   
   ### Deployment
   
   Other 3rd-party Helm chart
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to