Subham-KRLX opened a new pull request, #64203: URL: https://github.com/apache/airflow/pull/64203
This PR resolves critical task heartbeat conflicts (HTTP 409/404) that frequently occur in containerized or offline environments (e.g., Podman) where hostname resolution can be unstable or where race conditions exist during task termination. Key Changes: API Resilience: Modified the ti_heartbeat endpoint to allow heartbeats for tasks that have recently transitioned out of RUNNING (e.g., to UP_FOR_RETRY or SUCCESS), provided the hostname and pid still match. This prevents the supervisor from erroneously killing tasks during final state propagation. Hostname Stability: Added @cache to get_hostname in the Task SDK and updated the Supervisor to cache its hostname at startup. This ensures consistent reporting even if the underlying environment's hostname fluctuates. Impact: Prevents "Server indicated the task shouldn't be running anymore" and "Task killed!" errors when tasks are technically finishing or retrying correctly. closes: #63774 Was generative AI tooling used to co-author this PR? Yes — Gemini (Code Research and PR Description) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
