Pranaykarvi commented on PR #64709:
URL: https://github.com/apache/airflow/pull/64709#issuecomment-4186809140

   Thanks for the feedback @dabla, and totally fair point from an architectural 
perspective!
   
   I agree that having a specific operator manage its own heartbeats is not 
ideal as a general pattern. 
   A cleaner long-term solution would be something like a background heartbeat 
thread inside `LocalTaskJob` 
   that fires independently of what the operator's main thread is doing — that 
way every long-running 
   operator benefits without needing operator-level changes.
   
   The reason I went this route here is that `GenericTransfer` blocks the 
Python thread completely during 
   `executemany()` across potentially thousands of paginated batches. During 
that time there is simply no 
   gap for any framework-level mechanism to fire a heartbeat, which is what 
causes the scheduler to 
   incorrectly treat the task as a zombie.
   
   So this PR is really meant as a practical short-term fix for users hitting 
this today, while a proper 
   framework-level solution could be tracked as a separate improvement.
   
   That said I am fully open to whatever direction the maintainers prefer:
   - Keep this as a targeted operator-level fix for now
   - Or close this and open a dedicated issue proposing a background heartbeat 
thread at the `LocalTaskJob` 
   level as the right long-term fix
   
   Would love guidance from a core maintainer on which approach fits better 
with the project's direction. 
   Happy to do the work either way!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to