dominikhei opened a new issue, #52920:
URL: https://github.com/apache/airflow/issues/52920
### Apache Airflow Provider(s)
amazon
### Versions of Apache Airflow Providers
_No response_
### Apache Airflow version
3.0.2
### Operating System
linux
### Deployment
Official Apache Airflow Helm Chart
### Deployment details
_No response_
### What happened
I was trying to replicate the behavior in #52869 and noticed something
interesting.
When executed with `deferable = True` no files are passed to the S3KeySensor
by a check_fn, even though they exist in S3. If `deferable = False`, the
correct files in the bucket will be passed. I used the following function:
```
def check_key_not_present(files: list, **kwargs) -> bool:
print(f"Files received by check_fn: {files}")
return len(files) == 0
```
These are the two tasks:
```
deferrable_sensor = S3KeySensor(
task_id='deferrable_sensor',
bucket_name='#########################',
bucket_key='scripts/extract-raw-data-job.py',
poke_interval=10,
timeout=60,
check_fn=check_key_not_present,
deferrable=True,
mode="poke",
aws_conn_id='aws_default',
dag=dag,
trigger_rule='none_skipped'
)
non_deferrable_sensor = S3KeySensor(
task_id='non_deferrable_sensor',
bucket_name='######################',
bucket_key='scripts/extract-raw-data-job.py',
poke_interval=10,
timeout=60,
check_fn=check_key_not_present,
deferrable=False,
mode="poke",
aws_conn_id='aws_default',
dag=dag,
trigger_rule='none_skipped'
)
```
Logs of non_deferrable_sensor:
`[2025-07-04, 20:24:09] INFO - Files received by check_fn: [{'Size': 363,
'Key': None}]: chan="stdout": source="task"`
Logs of the deferrable_sensor:
`[2025-07-04, 20:23:53] INFO - Files received by check_fn: []:
chan="stdout": source="task"`
After my adjustments:
```INFO - Files received by check_fn: [{'Key': 'xxxxxxxxxxx',
'LastModified': DateTime(2025, 7, 5, 8, 59, 14, tzinfo=Timezone('UTC')),
'ETag': '"9fe1b02003ad4a0a062ff7df14b36d54"', 'ChecksumAlgorithm':
['CRC64NVME'], 'ChecksumType': 'FULL_OBJECT', 'Size': 209, 'StorageClass':
'STANDARD'}]
```
The problem lies here:
```
async def get_files_async(
self,
client: AioBaseClient,
bucket: str,
bucket_keys: str | list[str],
wildcard_match: bool,
delimiter: str | None = "/",
) -> list[Any]:
"""Get a list of files in the bucket."""
keys: list[Any] = []
for key in bucket_keys:
prefix = key
if wildcard_match:
prefix = re.split(r"[\[*?]", key, 1)[0]
paginator = client.get_paginator("list_objects_v2")
params = {
"Bucket": bucket,
"Prefix": prefix,
"Delimiter": delimiter,
}
```
Why dont we check whether `bucket_keys` is a single string or a list? Is
this somehow intentional? If `bucket_keys` is a single string: `for key in
bucket_keys` will split it into individual chars.
I am opening this issue instead of a direct PR, to validate that this is not
actually intended and I am missing something. Because [this
test](https://github.com/apache/airflow/blob/main/providers/amazon/tests/unit/amazon/aws/hooks/test_s3.py#L718)
makes me question whether this is intended behavior?
### What you think should happen instead
_No response_
### How to reproduce
Create a Dag with two S3 KeySensors, one deferrable and one not and pass a
custom check_fn.
### Anything else
_No response_
### Are you willing to submit PR?
- [x] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]