rpofuk opened a new issue, #64254:
URL: https://github.com/apache/airflow/issues/64254

   ### Description
   
   Add support for **pre-signed URL based log I/O** as a `RemoteLogIO` 
implementation, where:
   
   - **API server (reader)**: Uses the built-in `S3RemoteLogIO` with direct IAM 
access for reading logs (no change needed).
   - **Worker (uploader)**: Uses a new `RemoteLogIO` that requests a pre-signed 
PUT URL from the API server and uploads via plain HTTP. No AWS credentials 
needed on the worker.
   
   ### How it works
   
   ```
   Worker (after task)          API Server               S3
     |                            |                       |
     |-- POST /presigned-url ---->|                       |
     |                            |-- generate PUT URL -->|
     |<-- { presigned_url } ------|                       |
     |                                                    |
     |------------- HTTP PUT log file ------------------->|
   ```
   
   The API server endpoint that generates pre-signed URLs can enforce **custom 
authorization rules** before issuing the URL - e.g. verifying the worker's 
service account is only allowed to upload logs for DAGs in its bundle.
   
   The only change a worker deployment needs is to use new functionalit:
   
   ```ini
   [logging]
   remote_log_io_role = worker
   ```
   
   ### Optional: custom auth hook
   
   By default, the presigned URL endpoints use standard Airflow authentication 
(the requesting user must be authenticated). For deployments that need 
additional authorization logic (e.g. bundle-scoped access, tenant isolation), 
an optional callable can be configured:
   
   ```ini
   [logging]
   presigned_url_auth_hook = mypackage.auth.validate_log_access
   ```
   
   I'm deploying solution on our side to pruduction and would gladely 
contribute if it would be accepted (Dont want to go to trouble of getting 
appoval to opensource it nobody is interested :))
   
   
   
   ### Use case/motivation
   
   Airflow 3.x introduced `RemoteLogIO` as the protocol for remote log 
upload/download from the supervisor process. Currently, the only built-in 
implementation uses direct S3 access (`S3RemoteLogIO`), which requires the 
worker to have S3 credentials.
   
   In multi-account or zero-trust deployments, workers run on a separate AWS 
account and should **not** be trusted with S3 write credentials. There is no 
way to plug in a custom log upload/download mechanism that uses pre-signed URLs 
instead of direct S3 access.
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to