Re: [PR] fix(scheduler): catch StaleDataError in verify_integrity to prevent scheduler crash [airflow]

via GitHub Wed, 01 Apr 2026 17:47:06 -0700


Copilot commented on code in PR #64503:
URL: https://github.com/apache/airflow/pull/64503#discussion_r3025330794



##########
airflow-core/src/airflow/models/dagrun.py:
##########
@@ -1879,9 +1880,10 @@ def _create_task_instances(
                     extra_tags={"task_type": task_type},
                 )
             session.flush()
-        except IntegrityError:
+        except (IntegrityError, StaleDataError) as exc:
             self.log.info(

Review Comment:
   `DualStatsManager.incr("task_instance_created", ...)` happens before 
`session.flush()`. If `flush()` raises and the transaction is rolled back, 
these counters will still be incremented even though the TIs were not 
persisted. Consider emitting these stats only after a successful flush (or 
otherwise ensuring they’re not counted on rollback), especially now that 
`StaleDataError` is handled here too.



##########
airflow-core/tests/unit/models/test_dagrun.py:
##########
@@ -1443,6 +1443,25 @@ def mynameis(arg):
         assert indices == [0, 1, 2, 3]
 
 
+def test_verify_integrity_handles_stale_data_error(dag_maker, session):
+    """Test that StaleDataError during _create_task_instances is caught and 
session is rolled back."""
+    from unittest.mock import patch
+
+    from sqlalchemy.orm.exc import StaleDataError
+

Review Comment:
   This test introduces imports inside the test function body (`patch`, 
`StaleDataError`). In this file `unittest.mock` is already imported as `mock`, 
so `mock.patch.object` can be used directly, and `StaleDataError` should be 
imported at module scope to keep imports consistent and avoid per-test import 
overhead.



##########
airflow-core/tests/unit/utils/test_retries.py:
##########
@@ -78,6 +79,29 @@ def test_function(session):
         assert mock_rollback.call_count == 3
         mock_rollback.assert_has_calls([mock.call(), mock.call(), mock.call()])
 
+    @pytest.mark.db_test
+    def test_retry_db_transaction_with_stale_data_error(self, caplog):
+        """Test that StaleDataError is retried just like DBAPIError"""
+        mock_obj = mock.MagicMock()
+        mock_session = mock.MagicMock()
+        mock_rollback = mock.MagicMock()
+        mock_session.rollback = mock_rollback

Review Comment:
   New mocks in this test are created without `spec`/`autospec` (e.g. 
`mock_session = mock.MagicMock()`), which can hide real API breakages by 
allowing arbitrary attribute access. Consider using 
`create_autospec`/`spec_set` for the session/rollback mocks so the test fails 
if the production code calls unexpected methods.



##########
airflow-core/tests/unit/utils/test_retries.py:
##########
@@ -78,6 +79,29 @@ def test_function(session):
         assert mock_rollback.call_count == 3
         mock_rollback.assert_has_calls([mock.call(), mock.call(), mock.call()])
 
+    @pytest.mark.db_test
+    def test_retry_db_transaction_with_stale_data_error(self, caplog):
+        """Test that StaleDataError is retried just like DBAPIError"""

Review Comment:
   In this test, `caplog` is configured/cleared but no log output is asserted 
(unlike `test_retry_db_transaction_with_default_retries`). Either add an 
assertion on the expected retry log lines, or drop the `caplog` parameter and 
the related setup to keep the test focused.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] fix(scheduler): catch StaleDataError in verify_integrity to prevent scheduler crash [airflow]

Reply via email to