dstandish commented on code in PR #29058: URL: https://github.com/apache/airflow/pull/29058#discussion_r1087278596
########## airflow/utils/db_cleanup.py: ########## @@ -159,6 +170,14 @@ def _do_delete(*, query, orm_model, skip_archive, session): logger.debug("delete statement:\n%s", delete.compile()) session.execute(delete) session.commit() + if export_to_csv: + if not output_path.startswith(AIRFLOW_HOME): + output_path = os.path.join(AIRFLOW_HOME, output_path) + os.makedirs(output_path, exist_ok=True) + _to_csv( + target_table=target_table, file_path=f"{output_path}/{target_table_name}.csv", session=session + ) + skip_archive = True Review Comment: I just think it doesn't really hurt to leave the data backed up in a table. A user might not understand that by wanting the data exported it will also be deleted irretrievably. What if something goes wrong with the export? What if the user is running this with `kubectl exec -ti` and the pod dies before they save the data? I would let skip archive define the skip archive behavior _always_ and not have the export behavior determine it. User can always drop the tables later but they can't bring them back. We also can't really sure that our CSV export would actually be easily importable back into the database without loss. BUT that's of course just one man's take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org