potiuk commented on PR #36483:
URL: https://github.com/apache/airflow/pull/36483#issuecomment-1872226453

   > > If we can add a feature to automatically clean the data based on the 
retention parameter that we set in the airflow configuration.
   > 
   > IMHO, the `db clean` command available in the Airflow CLI and the Python 
methods provided in the module `db_cleanup` are sufficient for the users 
because they can be easily used in a simple dag with a bash operator for the 
CLI or Python operator for the Python API. (that's what most of Airflow users 
do)
   
   Agree here with @hussein-awala . Those are tools we give the Deployment 
Managers but they should adjust the usage to their needs and run those tools as 
and when needed with the parameters they decide are best for their deployment.
   
   Generally speaking I think we should stress (and this is partially my 
motivation) that Airflow (components) and Airflow maintainers will NOT solve 
the problems of DB retention, configuration and optimization. Also commenting 
to @BasPH - I think  the part where we explicilty say `Hey Deployment Manager - 
it's Your job to make sure to keep your database healthy and to analyse if you 
see some performance issues` is extremely important  - and also FAIR,
   
   I think (and in parts that is my motivation behind this change) is not only 
to help our users to see what they should do but also make it `crystal clear` 
they are responsible for taking care about all those things. 
   
   It's extremely complex and rather brittle to develop a generic and 
applicable to all cases ways to keep the database healthy and optimized. There 
is a good reason why databases need maintenance and someone to look after that 
- otherwise we would have it already implemented in Postgres and MySQL so that 
it's completely `worry-free` and handle all usage cases. Airflow and the way it 
can be used - from small single node installation to 100s of nodes and 1000s of 
DAGs and tasks does not narrow down the DB maintenance problem to much smaller 
and simpler case - not enough that we provide a "zero maintenance" solution 
out-of the box.  Even the @dirrao 's request is going in that direction. But I 
think it's far too much to expect this kind of "zero maintenance" from a 
non-managed, free, open-source software. 
   
   But some of our users **expect** this will happen. And with that 
documentation chapter  I want to make it crystal clear and set expectations 
   
   Also mentioning "managed database" in this context is important - those 
databases **might** be closer to "zero-maintenance". And then if you go to 
"managed Airflow" - that's even more "zero-maintenance" when you pay someone to 
manage Airflow, then absolutely - yes, you should expect you do not have to 
worry about database maintenance and the one who provides "managed Airflow" 
should take care about it - this is precisely why you pay  (among other things) 
- you pay for the maintanance that you do not have to do.
   
   @BasPH - yep I hear you. I will try to remove some duplications and 
restructure it a bit (I just bought Chat GPT 4.5 subscription **just** to see 
if it can help with that). But the points about "Deployment Manager 
responsibity" especially and settting clear expectations what they have to do 
is crucial part of this change and one that is the main reason I am doing it in 
the first place.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to