pippo995 opened a new pull request, #62078: URL: https://github.com/apache/airflow/pull/62078
## Summary - `S3Hook.download_file()` writes S3 object content to a file via `download_fileobj()` but never calls `flush()` before returning the file path - When the caller immediately reads the returned path, the file may contain 0 bytes because data is still in Python's write buffer - Added `file.flush()` after `download_fileobj()` to ensure buffered content is written to disk ## Details The original implementation used a `with` context manager which auto-closes (and flushes) the file. When `preserve_file_name` support was added, the `with` was removed and the file is now left open and unflushed. This particularly affects small files (< ~8KB) that fit entirely in the buffer. The bug is latent in all environments but was exposed by `apache-airflow-providers-common-compat==1.13.1` (PR #61157), which changed the execution timing of `get_hook_lineage_collector()` between `download_fileobj()` and `return file.name`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
