AmosG opened a new pull request, #59167: URL: https://github.com/apache/airflow/pull/59167
Fix DAG processor crash on MySQL connection failure during import error recording Fixes #59166 The DAG processor was crashing when MySQL connection failures occurred while recording DAG import errors to the database. The root cause was missing session.rollback() calls after caught exceptions, leaving the SQLAlchemy session in an invalid state. When session.flush() was subsequently called, it would raise a new exception that wasn't caught, causing the DAG processor to crash and enter restart loops. This issue was observed in production environments where the DAG processor would restart 1,259 times in 4 days (~13 restarts/hour), leading to: - Connection pool exhaustion - Cascading failures across Airflow components - Import errors not being recorded in the UI - System instability ## Changes - Add `session.rollback()` after caught exceptions in `_update_import_errors()` - Add `session.rollback()` after caught exceptions in `_update_dag_warnings()` - Wrap `session.flush()` in try-except with `session.rollback()` on failure - Add comprehensive unit tests for all failure scenarios - Update comments to clarify error handling behavior ## Testing Added 5 new unit tests in `TestDagProcessorCrashFix` class: - `test_update_dag_parsing_results_handles_db_failure_gracefully` - `test_update_dag_parsing_results_handles_dag_warnings_db_failure_gracefully` - `test_update_dag_parsing_results_handles_session_flush_failure_gracefully` - `test_session_rollback_called_on_import_errors_failure` - `test_session_rollback_called_on_dag_warnings_failure` All tests pass and verify that: 1. Database failures don't crash the DAG processor 2. `session.rollback()` is called correctly on failures 3. The processor continues gracefully after errors ## Impact The fix ensures the DAG processor gracefully handles database connection failures and continues processing other DAGs instead of crashing, preventing production outages from restart loops. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
