Subham-KRLX opened a new pull request, #64221:
URL: https://github.com/apache/airflow/pull/64221
This PR resolves critical task heartbeat conflicts (HTTP 409/404) that
frequently occur in containerized/offline environments during task termination.
Problem: When a task finishes and transitions its state to SUCCESS a final
heartbeat might fire. The API server correctly returns 409 Conflict but the
Supervisor would immediately force-kill the already-finishing process causing
unnecessary "Task killed!" errors.
Solution:
Supervisor Resilience: Modified ActivitySubprocess to check if the child
process has already exited when a heartbeat conflict occurs. If it has the
error is ignored as a benign race condition.
Stable Identity: Cached the hostname in the Task SDK to ensure consistent
reporting even if the container's hostname resolution fluctuates.
Strict API: Reverted previous server side leniency to maintain architectural
strictness.
Impact: Prevents spurious task failures and premature termination during
final state propagation.
closes: #63774
Was generative AI tooling used to co-author this PR?
Yes — Gemini (Troubleshooting)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]