Thanks Ash and Kevin for the feedback. I think there are some utilities that can be solved easily with a DAG without introducing more logic to complicate the scheduler code. Also, these utilities may run periodically and can be abstracted out with a DAG. For example, say we want to support retention in some Airflow tables such as task_instance, dag_run and log, it seems reasonable to me to create a DAG to periodically clean up the tables.
Would like to learn more about the concerns about introducing these utility DAGs. On Sun, Mar 31, 2019 at 1:17 AM Kevin Yang <[email protected]> wrote: > Agree on having core airflow related stuff built into airflow( like > schedule delay instrumentation) and leave the others to cluster maintainer > to set up( like log retention). How people handle log retention might be > quite different depends on the logging backend. E.g. we use ElasticSearch > and we don't even manage the log retention ourselves. Same for stuff like > metrics/ alert submitting. > > Just my $0.02 > > Cheers, > Kevin Y > > On Sun, Mar 31, 2019 at 12:48 AM Ash Berlin-Taylor <[email protected]> wrote: > > > Do these need to me dags of they are built in to Airflow, or could/should > > they be just handled internally by the scheduler? > > > > -a > > > > On 31 March 2019 03:57:08 BST, Chao-Han Tsai <[email protected]> > wrote: > > >Hi all, > > > > > >I have been thinking about adding some DAGs that are for the purpose of > > >AIrflow cluster operation, DAG schedule delay instrumentation and log > > >retention for instance. Currently we have example_dags, should we add > > >another directory utility_dags in the repo? We can have a flag in > > >airflow.cfg to let user decide whether to load the utility_dags (just > > >like > > >what we did for example_dags). > > > > > >What do you think? > > > > > >-- > > >Chao-Han Tsai > > > -- Chao-Han Tsai
