Purely theoretically, you could change the log and job tables to be
unlogged - and thus avoid WAL for them.

The drawback of this:

* the data in those tables will be lost if you pull the plug or kill -9 the
primary postgres server
* the tables are not available (at all) in replicas - so in case of a
fallback, you would have to have a manual data import/export for those
tables on fail-over, rather than rely on replicas being "ready to fallback
immediately". Or accept data loss.
* the data from those tables will not be present in backups

I am not 100% sure, but I believe loss of data in both tables is not really
catastrophic for Airflow, so maybe it's acceptable risk (but likely you
should do a disaster-recovery test to see what happens and how to recover
in case, indeed, someone pulls the plug on your postgres server.

J,




On Mon, Dec 29, 2025 at 7:16 PM Natanel <[email protected]> wrote:

> Yes, the problem is the manual deletions, we have tried it, it resulted in
> the same exact issue, as the scheduled db procedures which clean up the
> rows marked as deleted actually get deleted, and so it takes up storage,
> yet it does not solve the WAL problem, the problematic table is actually
> not task_instance, it is relatively small, the log and job tables are the
> biggest tables (the problem with them is the primary key change required),
> by a multiple of 10 (or more, cluster dependant).
>
> The smaller batches might solve the issue, however, it seems to just delay
> the problem a little rather than solve it, as deleting data with a delete
> query (especially a lot of data) is not a very light operation, and so I
> think that this is the main issue.
>
> It would be nice if we could use partitions instead, as it is a lighter
> operation, and does not require us to maintain a query and manage our db, I
> have thought about changing the models, most of the changes are relatively
> simple, for some it is just removing the foreign key and relying on ORM
> level constraints, for others, it requires adding a pre query to have the
> same constrains but I do not like that idea, maybe there is another way to
> make airflow "partition-friendly"?
>
> I can't think of a nice way to do so, maybe it does not exist, as the db
> clean is as simple as a delete query gets, yet when there is a lot of data,
> it is all duplicated in WALs.
>
> On Mon, Dec 29, 2025, 19:40 Daniel Standish via dev <
> [email protected]>
> wrote:
>
> > Have you looked at doing manual deletions?  I.e. writing your own sql?
> >
> > The db clean command is probably not "optimal" for all scenarios.
> >
> > So for example, if the main problem table for you is task_instance, you
> > could periodically delete TI records in smaller batches using some
> > appropriate index (whether it exists now or you add it).  Then maybe you
> > would not stress the db as hard.
> >
> > Airflow isn't designed to use partitions so, you may not get good results
> > with that approach.
> >
> >
> >
> > On Mon, Dec 29, 2025 at 7:32 AM Natanel <[email protected]> wrote:
> >
> > > Hello everyone, after having issues with the 'airflow db clean'
> command,
> > > where due to the amount of dags and tasks that are running every day in
> > our
> > > deployments, we get a lot of new data every day, which is stored in the
> > > database, and when we delete the data, due to the way PGSQL works, the
> > > WAL's get replicated to both the archive storage and the main data
> > storage
> > > of the db instance, which in turn, causes a significant jump in cpu
> > usage,
> > > ram usage and disk usage, whenever we run the command, which causes all
> > > kinds of issues, we even had it once fill up the db storage, and
> causing
> > > the database to be unresponsive, forcing us to move to our backup
> > database,
> > > after we haven't ran the command for a few months due to human error.
> > >
> > > As of now, I know that this is the accepted and widely used way of
> > managing
> > > the airflow database's size, however, we noticed that it may cause
> issues
> > > in certain cases, just like in our case, where if the db has not been
> > > cleaned up for a while, cleaning it can be problematic.
> > >
> > > We decided to try and partition the table, and use pgsql's built in
> > > retention of partitions, which does not issue a DELETE query, and is
> > > lighter and faster, while being simpler to use, however, we have
> > > encountered issues due to having Foreign Key constraints in some
> tables,
> > > having to duplicate such keys and other than forcing code changes (as
> the
> > > foreign key must include the partitioned key, as the partitioned key
> must
> > > be part of the primary key), while also having the issue of sqlalchemy
> > > breaking once we change the primary key, with the addition of the
> > > constraints on the primary key breaking.
> > >
> > > And in Mysql, due to the foreign keys, it is not possible to partition
> > > tables which include them, as it is not supported yet (according to
> this
> > > <
> > >
> >
> https://dev.mysql.com/doc/refman/8.4/en/partitioning-limitations-storage-engines.html
> > > >
> > > ).
> > >
> > > Has anyone else tried to use the databases built in partition retention
> > > system instead of the 'airflow db clean' command?
> > >
> > > Thanks, Natanel.
> > >
> >
>

Reply via email to