One idea to solve this is to use a daemon that uses inotify to watch for changes in files and then reprocesses just those files. The hard part is without any kind of dependency/build system for DAGs it can be hard to tell which DAGs depend on which files.
On Mon, Apr 24, 2017 at 1:21 PM, Gerard Toonstra <gtoons...@gmail.com> wrote: > Hey, > > I've seen some people complain about DAG file processing times. An issue > was raised about this today: > > https://issues.apache.org/jira/browse/AIRFLOW-1139 > > I attempted to provide a good explanation what's going on. Feel free to > validate and comment. > > > I'm noticing that the file processor is a bit naive in the way it > reprocesses DAGs. It doesn't look at the DAG interval for example, so it > looks like it reprocesses all files continuously in one big batch, even if > we can determine that the next "schedule" for all its dags are in the > future? > > > Wondering if a change in the DagFileProcessingManager could optimize things > a bit here. > > In the part where it gets the simple_dags from a file it's currently > processing: > > for simple_dag in processor.result: > simple_dags.append(simple_dag) > > the file_path is in the context and the simple_dags should be able to > provide the next interval date for each dag in the file. > > The idea is to add files to a sorted deque by "next_schedule_datetime" (the > minimum next interval date), so that when we build the list > "files_paths_to_queue", it can remove files that have dags that we know > won't have a new dagrun for a while. > > One gotcha to resolve after that is to deal with files getting updated with > new dags or changed dag definitions and renames and different interval > schedules. > > Worth a PR to glance over? > > Rgds, > > Gerard >