oboki opened a new pull request, #50175:
URL: https://github.com/apache/airflow/pull/50175

   In environments with multiple workers (e.g., `CeleryExecutor`), task logs 
for previous tries (i.e., not the latest `try_number`) may fail to load with 
the following error:
   
   ```python
   Could not read served logs: 404 Client Error: NOT FOUND for url: 
http://worker-2:8793/log/dag_id=tutorial_dag/run_id=manual__2025-05-04T12:54:25.273420+00:00/task_id=extract/attempt=5.log
   ```
   
   As shown in the screenshots below, `attempt=5` was actually executed on 
`worker-1`, but the Web UI incorrectly tries to fetch the logs from `worker-2`:
   
   
![image](https://github.com/user-attachments/assets/10fdf94a-6ba9-4331-b585-6407a654fdf9)
   
   
![image](https://github.com/user-attachments/assets/38b02c81-f22c-4056-94a9-eed881b60435)
   
   This happens because `_get_log_retrieval_url` generates the log URL based on 
`TaskInstance.hostname`, which only stores the hostname of the latest execution 
attempt. It does not keep track of the history of previous tries.
   
   To fix this, I updated the logic to use the `TaskInstanceHistory` model, 
just as the "Details" tab does, so the correct hostname is used for each 
specific `try_number`.
   
   With this change, logs for previous attempts load correctly as expected.
   
   
![image](https://github.com/user-attachments/assets/70b93b44-0760-4e0c-a33e-77346455949a)
   
   
![image](https://github.com/user-attachments/assets/18c7faa5-2722-45eb-8a2e-c4144a97924b)
   
   
![image](https://github.com/user-attachments/assets/99dde47f-44b0-4d3e-80fe-4827bbf3a5cd)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to