GitHub user Pedrinhonitz added a comment to the discussion: Need Help with
deleting DagFiles from FileSystem using Airflow CLI
Hello,
>From what I understand, you want to actually delete a .py file from the DAG
>after 48 hours of its execution, whether successful or failed. I'm not
>entirely sure why you're doing this, and I understand correctly.
I did some brief research on your point and found how to do this by listing the
DAGs in the Airflow database and then embedding a shell script that removes the
files. I don't know if this helps, as your use case isn't clear to me. If it's
not, please provide more details.
The Airflow version I used for testing was Airflow 3.2.1.
Basically, I copied a file called clean.sql into the container using the
Airflow CLI. The query contained within was the following:
```sql
WITH last_run AS (
SELECT
_dag_run.dag_id,
_dag_run.state,
_dag_run.start_date,
ROW_NUMBER() OVER (PARTITION BY _dag_run.dag_id ORDER BY
_dag_run.start_date DESC NULLS LAST) AS rn
FROM
dag_run AS _dag_run
)
SELECT
_dag.dag_id,
_dag.fileloc,
_last_run.state,
_last_run.start_date
FROM
dag AS _dag
INNER JOIN last_run AS _last_run ON
_last_run.dag_id = _dag.dag_id
AND _last_run.rn = 1
WHERE
_last_run.state IN ('success','failed')
AND _last_run.start_date < (NOW() - INTERVAL '48 hours')
AND _dag.fileloc IS NOT NULL
ORDER BY
_last_run.start_date
ASC;
```
**Leave a blank line at the end of the file, as the command may become confused
otherwise.**
After that, this file becomes available as clean.sql inside my container.
With that, I executed the following command.
```shell
psql -At "postgresql://airflow:airflow@postgres:5432/airflow" -f
/opt/airflow/clean.sql | awk -F'|' 'NF{print $2}' | sort -u | xargs -I{}
echo rm -f "{}"
```
This command has an `echo`, and it doesn't actually execute the remove command.
I only used it to test the structure before impacting the deletion of my DAG.
As you can see in the screenshot, it returned the remove command correctly.
<img width="566" height="57" alt="image"
src="https://github.com/user-attachments/assets/f1185d4f-6519-47f4-b6b3-e571e9b472b6"
/>
In other words, if you execute the command this way without the echo, it will
remove the DAG Python files returned by the query in the .sql file.
```bash
psql -At "postgresql://airflow:airflow@postgres:5432/airflow" -f
/opt/airflow/clean.sql | awk -F'|' 'NF{print $2}' | sort -u | xargs -I{}
rm -f "{}"
```
**In my research, I haven't found how to do this directly through the Airflow
CLI, and I don't know if it's supported; we can wait for someone with
experience with the CLI to help. But if my understanding of your problem is
correct, this should solve and help, improving both the speed of command
execution and code readability.**
GitHub link:
https://github.com/apache/airflow/discussions/60954#discussioncomment-17065810
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]