GitHub user potiuk added a comment to the discussion: Massive metadata table 
even after clean with CLI

That's one option - but then it would be a separate table.  

How exactly are you checking size of public.job table (i.e. how do you **know** 
it takes 20 GB and how did you compare it to before ? It might simply be that 
you have a lot of entris in the job table which are older than the date you 
specified  - because of some bug or maybe some failed attempts of yours to 
generate a lot of jobs. I'd also advise you to take a look at the content of 
that table - it's likely for some reason you could have generated A LOT of jobs 
and all of them are "after" the cut-off date you specified.

You might have a state of the DB that typical "tools" provided by Airflow CLI 
might be not enough and you migh have to employ your deployment manager 
investigative efforts to find out what's in the DB and why it is so big, 
because likely you got into some unusal state. 

Also it might be that you generate so many jobs that if you want to keep all of 
them since 1st of March the simply WILL take that much space. Not very likely 
for a JOB table but well - we do not know your case. In which case you either 
should keep much shorter history or rethink the way why so many jobs are 
created.

Simply counting how many entries are in the table and looking at the data and 
aggregating it- by day, type or smth could show you where you might try to 
start investigate where the numbers are coming from - whether it's normal and 
expected, or whether you had some unusual event that generated it.

All on you to get and run some DB queries to check it.

GitHub link: 
https://github.com/apache/airflow/discussions/52889#discussioncomment-13667890

----
This is an automatically sent email for commits@airflow.apache.org.
To unsubscribe, please send an email to: commits-unsubscr...@airflow.apache.org

Reply via email to