> For example, say we want to support retention in some Airflow tables such > as task_instance, dag_run and log, it seems reasonable to me to create a > DAG to periodically clean up the tables
I guess you mean something like https://github.com/teamclairvoyant/airflow-maintenance-dags/tree/master/db-cleanup but just shipped with Airflow? The reason I don't think this should be best as a DAG in Airflow is that we can do it better/cleaner if it is core to Airflow: - We don't need to speculatively run a DAG that does nothing - We don't need to "waste" an executor slot - It could automatically be done before/after running another task in the dag - We don't create extra task instance rows/logs that we than have to clean up too. That is my thinking of why I don't think this sort of built-in functionality should be a DAG if it is shipped _with_ Airflow. -ash > On 31 Mar 2019, at 20:42, Chao-Han Tsai <milton0...@gmail.com> wrote: > > Thanks Ash and Kevin for the feedback. > > I think there are some utilities that can be solved easily with a DAG > without introducing more logic to complicate the scheduler code. Also, > these utilities may run periodically and can be abstracted out with a DAG. > For example, say we want to support retention in some Airflow tables such > as task_instance, dag_run and log, it seems reasonable to me to create a > DAG to periodically clean up the tables. > > Would like to learn more about the concerns about introducing these utility > DAGs. > > On Sun, Mar 31, 2019 at 1:17 AM Kevin Yang <yrql...@gmail.com> wrote: > >> Agree on having core airflow related stuff built into airflow( like >> schedule delay instrumentation) and leave the others to cluster maintainer >> to set up( like log retention). How people handle log retention might be >> quite different depends on the logging backend. E.g. we use ElasticSearch >> and we don't even manage the log retention ourselves. Same for stuff like >> metrics/ alert submitting. >> >> Just my $0.02 >> >> Cheers, >> Kevin Y >> >> On Sun, Mar 31, 2019 at 12:48 AM Ash Berlin-Taylor <a...@apache.org> wrote: >> >>> Do these need to me dags of they are built in to Airflow, or could/should >>> they be just handled internally by the scheduler? >>> >>> -a >>> >>> On 31 March 2019 03:57:08 BST, Chao-Han Tsai <milton0...@gmail.com> >> wrote: >>>> Hi all, >>>> >>>> I have been thinking about adding some DAGs that are for the purpose of >>>> AIrflow cluster operation, DAG schedule delay instrumentation and log >>>> retention for instance. Currently we have example_dags, should we add >>>> another directory utility_dags in the repo? We can have a flag in >>>> airflow.cfg to let user decide whether to load the utility_dags (just >>>> like >>>> what we did for example_dags). >>>> >>>> What do you think? >>>> >>>> -- >>>> Chao-Han Tsai >>> >> > > > -- > > Chao-Han Tsai