Subham-KRLX opened a new pull request, #64221:
URL: https://github.com/apache/airflow/pull/64221

   This PR resolves critical task heartbeat conflicts (HTTP 409/404) that 
frequently occur in containerized/offline environments during task termination.
   
   Problem: When a task finishes and transitions its state to SUCCESS a final 
heartbeat might fire. The API server correctly returns 409 Conflict but the 
Supervisor would immediately force-kill the already-finishing process causing 
unnecessary "Task killed!" errors.
   
   Solution:
   
   Supervisor Resilience: Modified ActivitySubprocess to check if the child 
process has already exited when a heartbeat conflict occurs. If it has the 
error is ignored as a benign race condition.
   
   Stable Identity: Cached the hostname in the Task SDK to ensure consistent 
reporting even if the container's hostname resolution fluctuates.
   
   Strict API: Reverted previous server side leniency to maintain architectural 
strictness.
   
   Impact: Prevents spurious task failures and premature termination during 
final state propagation.
   
   closes: #63774
   
   Was generative AI tooling used to co-author this PR?
    Yes — Gemini (Troubleshooting)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to