potiuk commented on PR #36483: URL: https://github.com/apache/airflow/pull/36483#issuecomment-1872226453
> > If we can add a feature to automatically clean the data based on the retention parameter that we set in the airflow configuration. > > IMHO, the `db clean` command available in the Airflow CLI and the Python methods provided in the module `db_cleanup` are sufficient for the users because they can be easily used in a simple dag with a bash operator for the CLI or Python operator for the Python API. (that's what most of Airflow users do) Agree here with @hussein-awala . Those are tools we give the Deployment Managers but they should adjust the usage to their needs and run those tools as and when needed with the parameters they decide are best for their deployment. Generally speaking I think we should stress (and this is partially my motivation) that Airflow (components) and Airflow maintainers will NOT solve the problems of DB retention, configuration and optimization. Also commenting to @BasPH - I think the part where we explicilty say `Hey Deployment Manager - it's Your job to make sure to keep your database healthy and to analyse if you see some performance issues` is extremely important - and also FAIR, I think (and in parts that is my motivation behind this change) is not only to help our users to see what they should do but also make it `crystal clear` they are responsible for taking care about all those things. It's extremely complex and rather brittle to develop a generic and applicable to all cases ways to keep the database healthy and optimized. There is a good reason why databases need maintenance and someone to look after that - otherwise we would have it already implemented in Postgres and MySQL so that it's completely `worry-free` and handle all usage cases. Airflow and the way it can be used - from small single node installation to 100s of nodes and 1000s of DAGs and tasks does not narrow down the DB maintenance problem to much smaller and simpler case - not enough that we provide a "zero maintenance" solution out-of the box. Even the @dirrao 's request is going in that direction. But I think it's far too much to expect this kind of "zero maintenance" from a non-managed, free, open-source software. But some of our users **expect** this will happen. And with that documentation chapter I want to make it crystal clear and set expectations Also mentioning "managed database" in this context is important - those databases **might** be closer to "zero-maintenance". And then if you go to "managed Airflow" - that's even more "zero-maintenance" when you pay someone to manage Airflow, then absolutely - yes, you should expect you do not have to worry about database maintenance and the one who provides "managed Airflow" should take care about it - this is precisely why you pay (among other things) - you pay for the maintanance that you do not have to do. @BasPH - yep I hear you. I will try to remove some duplications and restructure it a bit (I just bought Chat GPT 4.5 subscription **just** to see if it can help with that). But the points about "Deployment Manager responsibity" especially and settting clear expectations what they have to do is crucial part of this change and one that is the main reason I am doing it in the first place. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org