Pranaykarvi opened a new pull request, #64709:
URL: https://github.com/apache/airflow/pull/64709

   Closes #64658
   
   ## Problem
   
   Long-running `GenericTransfer` tasks (>3 hours) were being incorrectly
   killed by the scheduler due to a heartbeat timeout / zombie detection
   false positive.
   
   During the paginated transfer loop, the operator performs long blocking
   work (bulk inserts via `executemany`) without emitting any heartbeat to
   the Airflow metadata DB. The scheduler's
   `_find_and_purge_task_instances_without_heartbeats` routine in
   `scheduler_job_runner.py` checks `last_heartbeat_at` periodically — if
   it goes stale beyond `task_instance_heartbeat_timeout`, the task is
   treated as a zombie and terminated, even though it is actively
   processing data.
   
   This affects both:
   - The paginated path (`execute_complete` — called per page when deferred)
   - The non-paginated multi-SQL path (`execute` — iterates over a list of SQL 
statements)
   
   ## Fix
   
   - Added `_emit_transfer_heartbeat()` helper that calls `ti.heartbeat()`
     or `ti.update_heartbeat()` (first match wins via `getattr`) after each
     page in `execute_complete()` and after each SQL batch in `execute()`
   - Helper is best-effort — no-ops cleanly if neither method exists on the
     task instance (no regression for older runtimes)
   - Added docstring note on tuning the following config values for
     long-running transfers:
     - `[scheduler] task_instance_heartbeat_timeout`
     - `[celery_broker_transport_options] visibility_timeout`
     - `[scheduler] task_instance_heartbeat_sec`
   - Added `test_heartbeat_called_during_paginated_transfer` to verify
     heartbeat is called once per page during a paginated transfer
   
   ## Testing
   ```bash
   uv run --project providers/common/sql pytest \
     
providers/common/sql/tests/unit/common/sql/operators/test_generic_transfer.py \
     -xvs
   ```
   
   ## Related Issues
   
   - Closes #64658
   - Related to #48719
   - Related to #54479


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to