dominikhei opened a new issue, #52920:
URL: https://github.com/apache/airflow/issues/52920

   ### Apache Airflow Provider(s)
   
   amazon
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Apache Airflow version
   
   3.0.2
   
   ### Operating System
   
   linux
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   I was trying to replicate the behavior in #52869 and noticed something 
interesting.
   
   When executed with `deferable = True` no files are passed to the S3KeySensor 
by a check_fn, even though they exist in S3. If `deferable = False`, the 
correct files in the bucket will be passed. I used the following function:
   
   ```
   def check_key_not_present(files: list, **kwargs) -> bool:
       print(f"Files received by check_fn: {files}")
       return len(files) == 0
   ```
   
   These are the two tasks:
   
   ```
   deferrable_sensor = S3KeySensor(
           task_id='deferrable_sensor',
           bucket_name='#########################',
           bucket_key='scripts/extract-raw-data-job.py',
           poke_interval=10,
           timeout=60,
           check_fn=check_key_not_present,
           deferrable=True,
           mode="poke",
           aws_conn_id='aws_default',  
           dag=dag,
           trigger_rule='none_skipped'
       )
   
   non_deferrable_sensor = S3KeySensor(
           task_id='non_deferrable_sensor',
           bucket_name='######################',
           bucket_key='scripts/extract-raw-data-job.py',
           poke_interval=10,
           timeout=60,
           check_fn=check_key_not_present,
           deferrable=False,
           mode="poke",
           aws_conn_id='aws_default',  
           dag=dag,
           trigger_rule='none_skipped'
       )
   ```
   
   Logs of non_deferrable_sensor:
   
   `[2025-07-04, 20:24:09] INFO - Files received by check_fn: [{'Size': 363, 
'Key': None}]: chan="stdout": source="task"`
   
   Logs of the deferrable_sensor:
   
   `[2025-07-04, 20:23:53] INFO - Files received by check_fn: []: 
chan="stdout": source="task"`


   
   After my adjustments:
   
   

```INFO - Files received by check_fn: [{'Key': 'xxxxxxxxxxx', 
'LastModified': DateTime(2025, 7, 5, 8, 59, 14, tzinfo=Timezone('UTC')), 
'ETag': '"9fe1b02003ad4a0a062ff7df14b36d54"', 'ChecksumAlgorithm': 
['CRC64NVME'], 'ChecksumType': 'FULL_OBJECT', 'Size': 209, 'StorageClass': 
'STANDARD'}]
```


   
   The problem lies here:


   
   ```
 
   async def get_files_async(
           self,
           client: AioBaseClient,
           bucket: str,
           bucket_keys: str | list[str],
           wildcard_match: bool,
           delimiter: str | None = "/",
       ) -> list[Any]:
           """Get a list of files in the bucket."""
           keys: list[Any] = []
           for key in bucket_keys:
               prefix = key
               if wildcard_match:
                   prefix = re.split(r"[\[*?]", key, 1)[0]
   
               paginator = client.get_paginator("list_objects_v2")
               params = {
                   "Bucket": bucket,
                   "Prefix": prefix,
                   "Delimiter": delimiter,
               }
   ```
   

Why dont we check whether `bucket_keys` is a single string or a list? Is 
this somehow intentional? If `bucket_keys` is a single string: `for key in 
bucket_keys` will split it into individual chars.


   
   I am opening this issue instead of a direct PR, to validate that this is not 
actually intended and I am missing something. Because [this 
test](https://github.com/apache/airflow/blob/main/providers/amazon/tests/unit/amazon/aws/hooks/test_s3.py#L718)
 makes me question whether this is intended behavior?
   
   ### What you think should happen instead
   
   _No response_
   
   ### How to reproduce
   
   Create a Dag with two S3 KeySensors, one deferrable and one not and pass a 
custom check_fn.
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to