Subham-KRLX opened a new pull request, #64203:
URL: https://github.com/apache/airflow/pull/64203

   This PR resolves critical task heartbeat conflicts (HTTP 409/404) that 
frequently occur in containerized or offline environments (e.g., Podman) where 
hostname resolution can be unstable or where race conditions exist during task 
termination.
   
   Key Changes:
   
   API Resilience: Modified the ti_heartbeat endpoint to allow heartbeats for 
tasks that have recently transitioned out of RUNNING (e.g., to UP_FOR_RETRY or 
SUCCESS), provided the hostname and pid still match. This prevents the 
supervisor from erroneously killing tasks during final state propagation.
   
   Hostname Stability: Added @cache to get_hostname in the Task SDK and updated 
the Supervisor to cache its hostname at startup. This ensures consistent 
reporting even if the underlying environment's hostname fluctuates.
   
   Impact: Prevents "Server indicated the task shouldn't be running anymore" 
and "Task killed!" errors when tasks are technically finishing or retrying 
correctly.
   
   closes: #63774
   
   Was generative AI tooling used to co-author this PR?
   
   Yes — Gemini (Code Research and PR Description)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to