Hello everyone, after having issues with the 'airflow db clean' command,
where due to the amount of dags and tasks that are running every day in our
deployments, we get a lot of new data every day, which is stored in the
database, and when we delete the data, due to the way PGSQL works, the
WAL's get replicated to both the archive storage and the main data storage
of the db instance, which in turn, causes a significant jump in cpu usage,
ram usage and disk usage, whenever we run the command, which causes all
kinds of issues, we even had it once fill up the db storage, and causing
the database to be unresponsive, forcing us to move to our backup database,
after we haven't ran the command for a few months due to human error.

As of now, I know that this is the accepted and widely used way of managing
the airflow database's size, however, we noticed that it may cause issues
in certain cases, just like in our case, where if the db has not been
cleaned up for a while, cleaning it can be problematic.

We decided to try and partition the table, and use pgsql's built in
retention of partitions, which does not issue a DELETE query, and is
lighter and faster, while being simpler to use, however, we have
encountered issues due to having Foreign Key constraints in some tables,
having to duplicate such keys and other than forcing code changes (as the
foreign key must include the partitioned key, as the partitioned key must
be part of the primary key), while also having the issue of sqlalchemy
breaking once we change the primary key, with the addition of the
constraints on the primary key breaking.

And in Mysql, due to the foreign keys, it is not possible to partition
tables which include them, as it is not supported yet (according to this
<https://dev.mysql.com/doc/refman/8.4/en/partitioning-limitations-storage-engines.html>
).

Has anyone else tried to use the databases built in partition retention
system instead of the 'airflow db clean' command?

Thanks, Natanel.

Reply via email to