kaxil opened a new pull request, #67353:
URL: https://github.com/apache/airflow/pull/67353

   Restore `_handle_fail_fast_for_dag` in the MySQL-`TIMESTAMP`-limit branch of 
the
   reschedule path. Without this, DAGs with `fail_fast=True` silently fail to 
stop
   sibling tasks when a reschedule date exceeds 2038-01-19 on MySQL.
   
   ## What was wrong
   
   PR #59686 removed the fail-fast call here with this rationale:
   
   > We skip fail_fast handling in this error case to avoid fetching the TI 
object
   > while the row is still locked from the earlier with_for_update() query, 
which
   > might cause deadlock issues in SQLA2. The task is marked as FAILED 
regardless.
   
   That rationale was incorrect on both counts:
   
   - A transaction cannot deadlock with itself. A plain `session.get(TI, id)` 
on a
     row already locked by the same transaction acquires no new lock and reads
     freely (Postgres, MySQL 8.0+, SQLite all permit this).
   - "The task is marked as FAILED regardless" is true for the *failing* TI, but
     silently drops the contract for the rest of the DAG. With `fail_fast=True`,
     sibling non-teardown tasks should be stopped -- the skip turned that into 
a no-op.
   
   The deadlock that motivated #59686 came from a different code path (`FOR 
UPDATE`
   expanding to the lazy-joined `dag_run` row), fixed in #67246 by scoping the
   lock with `with_for_update={"of": TI}`. With that scope in place, the 
fail-fast
   call is safe and matches the file's two existing fail-fast sites.
   
   ## Behavior change
   
   - Before: \`fail_fast=True\` DAG that reschedules past 2038-01-19 on MySQL ->
     failing TI is marked FAILED, siblings keep running (or stay queued).
   - After: failing TI is marked FAILED *and* sibling non-teardown tasks are 
stopped.
   
   Silent functional bugfix; MySQL-only code path. The regression test mocks the
   dialect gate so it runs on every backend in CI.
   
   Also drops a second misleading comment in the same function claiming 
\`session.get\`
   was avoided to "avoid SQLA2 lock contention issues" -- the code itself is 
fine;
   the rationale was wrong.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to