hkc-8010 commented on code in PR #65239:
URL: https://github.com/apache/airflow/pull/65239#discussion_r3206828382


##########
airflow-core/docs/howto/usage-cli.rst:
##########
@@ -221,6 +221,31 @@ By default, ``db clean`` will archive purged rows in 
tables of the form ``_airfl
 
 When you encounter an error without using ``--skip-archive``,  
``_airflow_deleted__<table>__<timestamp>`` would still exist in the DB. You can 
use  ``db drop-archived`` command to manually drop these tables.
 
+Detecting cleanup failures
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+By default, ``db clean`` suppresses per-table errors (such as a database 
``statement_timeout``
+being exceeded on a very large table) and exits with code 0 even if one or 
more tables were not
+cleaned.  A WARNING is always emitted in the logs listing which tables were 
skipped due to errors.
+
+To make the command exit with a non-zero code whenever any table cleanup fails 
— useful when
+``airflow db clean`` is invoked from a DAG task and you want the task to turn 
red on failure —
+pass ``--error-on-cleanup-failure``:

Review Comment:
   Updated the docs to match the current behavior. The default path still logs 
the summary warning, while `--error-on-cleanup-failure` surfaces the failed 
tables in the raised `RuntimeError` instead of printing a duplicate summary.



##########
airflow-core/src/airflow/utils/db_cleanup.py:
##########
@@ -613,9 +631,22 @@ def run_cleanup(
                     batch_size=batch_size,
                 )
                 session.commit()
+            if ctx.failed:
+                failed_tables.append(table_name)
         else:
             logger.warning("Table %s not found.  Skipping.", table_name)
 
+    if failed_tables:
+        if error_on_cleanup_failure:
+            raise RuntimeError(
+                f"airflow db clean encountered errors on the following tables 
and did not clean them: "
+                f"{failed_tables}. Check the logs above for details."
+            )
+        logger.warning(
+            "The following tables were not cleaned due to errors: %s. Check 
the logs above for details.",
+            failed_tables,
+        )

Review Comment:
   Kept the non-duplicate output behavior here and updated the docs instead. 
With `--error-on-cleanup-failure`, the raised `RuntimeError` includes the 
failed tables, and the per-table warning is still logged above, so the failure 
details are still surfaced without printing a second summary line.



##########
airflow-core/src/airflow/utils/db_cleanup.py:
##########
@@ -613,9 +631,22 @@ def run_cleanup(
                     batch_size=batch_size,
                 )
                 session.commit()

Review Comment:
   Removed the extra `session.commit()` from `run_cleanup()`. 
`_cleanup_table()` already owns the transaction boundaries, and I added a 
regression test to make sure `run_cleanup()` does not commit again afterward.



##########
airflow-core/src/airflow/utils/db_cleanup.py:
##########
@@ -613,9 +631,22 @@ def run_cleanup(
                     batch_size=batch_size,
                 )
                 session.commit()
+            if ctx.failed:
+                failed_tables.append(table_name)
         else:
             logger.warning("Table %s not found.  Skipping.", table_name)
 
+    if failed_tables:
+        if error_on_cleanup_failure:
+            raise RuntimeError(
+                f"airflow db clean encountered errors on the following tables 
and did not clean them: "
+                f"{failed_tables}. Check the logs above for details."
+            )

Review Comment:
   Keeping `RuntimeError` here intentionally. This was discussed earlier in the 
review and `jscheffl` asked that we avoid `AirflowException` in this path, so I 
left the exception type as-is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to