Pranaykarvi commented on PR #64709: URL: https://github.com/apache/airflow/pull/64709#issuecomment-4186809140
Thanks for the feedback @dabla, and totally fair point from an architectural perspective! I agree that having a specific operator manage its own heartbeats is not ideal as a general pattern. A cleaner long-term solution would be something like a background heartbeat thread inside `LocalTaskJob` that fires independently of what the operator's main thread is doing — that way every long-running operator benefits without needing operator-level changes. The reason I went this route here is that `GenericTransfer` blocks the Python thread completely during `executemany()` across potentially thousands of paginated batches. During that time there is simply no gap for any framework-level mechanism to fire a heartbeat, which is what causes the scheduler to incorrectly treat the task as a zombie. So this PR is really meant as a practical short-term fix for users hitting this today, while a proper framework-level solution could be tracked as a separate improvement. That said I am fully open to whatever direction the maintainers prefer: - Keep this as a targeted operator-level fix for now - Or close this and open a dedicated issue proposing a background heartbeat thread at the `LocalTaskJob` level as the right long-term fix Would love guidance from a core maintainer on which approach fits better with the project's direction. Happy to do the work either way! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
