Copilot commented on code in PR #65239:
URL: https://github.com/apache/airflow/pull/65239#discussion_r3204634937
##########
airflow-core/src/airflow/utils/db_cleanup.py:
##########
@@ -613,9 +631,22 @@ def run_cleanup(
batch_size=batch_size,
)
session.commit()
+ if ctx.failed:
+ failed_tables.append(table_name)
else:
logger.warning("Table %s not found. Skipping.", table_name)
+ if failed_tables:
+ if error_on_cleanup_failure:
+ raise RuntimeError(
+ f"airflow db clean encountered errors on the following tables
and did not clean them: "
+ f"{failed_tables}. Check the logs above for details."
+ )
Review Comment:
This code raises `RuntimeError`, but the PR description indicates an
`AirflowException` should be raised, and `RuntimeError` will generally produce
a full traceback in the CLI. Consider raising `AirflowException` (or
`SystemExit(1)` with a user-friendly message) for a cleaner CLI failure mode
and consistency with the stated behavior.
##########
airflow-core/src/airflow/utils/db_cleanup.py:
##########
@@ -613,9 +631,22 @@ def run_cleanup(
batch_size=batch_size,
)
session.commit()
Review Comment:
`run_cleanup()` now calls `session.commit()` after `_cleanup_table()`, but
`_cleanup_table()` already commits internally. This adds redundant commits and
makes transaction boundaries harder to reason about; consider removing the
extra commit here (or moving commit responsibility to one place consistently).
##########
airflow-core/docs/howto/usage-cli.rst:
##########
@@ -221,6 +221,31 @@ By default, ``db clean`` will archive purged rows in
tables of the form ``_airfl
When you encounter an error without using ``--skip-archive``,
``_airflow_deleted__<table>__<timestamp>`` would still exist in the DB. You can
use ``db drop-archived`` command to manually drop these tables.
+Detecting cleanup failures
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+By default, ``db clean`` suppresses per-table errors (such as a database
``statement_timeout``
+being exceeded on a very large table) and exits with code 0 even if one or
more tables were not
+cleaned. A WARNING is always emitted in the logs listing which tables were
skipped due to errors.
+
+To make the command exit with a non-zero code whenever any table cleanup fails
— useful when
+``airflow db clean`` is invoked from a DAG task and you want the task to turn
red on failure —
+pass ``--error-on-cleanup-failure``:
Review Comment:
The docs state that a WARNING listing skipped tables is "always" emitted
when per-table cleanup errors occur, but the current implementation raises
immediately when `--error-on-cleanup-failure` is set (before the summary
WARNING). Either adjust the implementation to always log the summary, or
clarify the docs for the `--error-on-cleanup-failure` case.
##########
airflow-core/src/airflow/utils/db_cleanup.py:
##########
@@ -613,9 +631,22 @@ def run_cleanup(
batch_size=batch_size,
)
session.commit()
+ if ctx.failed:
+ failed_tables.append(table_name)
else:
logger.warning("Table %s not found. Skipping.", table_name)
+ if failed_tables:
+ if error_on_cleanup_failure:
+ raise RuntimeError(
+ f"airflow db clean encountered errors on the following tables
and did not clean them: "
+ f"{failed_tables}. Check the logs above for details."
+ )
+ logger.warning(
+ "The following tables were not cleaned due to errors: %s. Check
the logs above for details.",
+ failed_tables,
+ )
Review Comment:
`run_cleanup()` raises before emitting the failure-summary WARNING when
`error_on_cleanup_failure=True`, so operators won't see the table list in logs
despite the docstring/docs saying the summary is always emitted. Consider
logging the summary first (or logging in a `finally`) and then raising so the
message is always present.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]