[GitHub] [airflow] dstandish commented on a diff in pull request #29058: Optionally export `airflow db clean` data to CSV files

via GitHub Wed, 25 Jan 2023 15:16:41 -0800


dstandish commented on code in PR #29058:
URL: https://github.com/apache/airflow/pull/29058#discussion_r1087278596



##########
airflow/utils/db_cleanup.py:
##########
@@ -159,6 +170,14 @@ def _do_delete(*, query, orm_model, skip_archive, session):
     logger.debug("delete statement:\n%s", delete.compile())
     session.execute(delete)
     session.commit()
+    if export_to_csv:
+        if not output_path.startswith(AIRFLOW_HOME):
+            output_path = os.path.join(AIRFLOW_HOME, output_path)
+        os.makedirs(output_path, exist_ok=True)
+        _to_csv(
+            target_table=target_table, 
file_path=f"{output_path}/{target_table_name}.csv", session=session
+        )
+        skip_archive = True

Review Comment:
   I just think it doesn't really hurt to leave the data backed up in a table.  
A user might not understand that by wanting the data exported it will also be 
deleted irretrievably.  What if something goes wrong with the export?  What if 
the user is running this with `kubectl exec -ti` and the pod dies before they 
save the data?  I would let skip archive define the skip archive behavior 
_always_ and not have the export behavior determine it.   User can always drop 
the tables later but they can't bring them back.  We also can't really sure 
that our CSV export would actually be easily importable back into the database 
without loss.
   
   BUT that's of course just one man's take 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [airflow] dstandish commented on a diff in pull request #29058: Optionally export `airflow db clean` data to CSV files

Reply via email to