Hey AirFlow Devs: In our organization, we build a Machine Learning WorkBench with AirFlow as an orchestrator of the ML Work Flows, and have wrapped AirFlow python operators to customize the behaviour. These work flows are specified in YAML.
We drop a DAG loader (written python) in the default location airflow expects the DAG files. This DAG loader reads the specified YAML files and converts them into airflow DAG objects. Essentially, we are programmatically creating the DAG objects. In order to support muliple parsers (yaml, json etc), we separated the DAG creation from loading. But when a DAG is created (in a separate module) and made available to the DAG loaders, airflow does not pick it up. As an example, consider that I created a DAG picked it, and will simply unpickle the DAG and give it to airflow. However, in current avatar of airfow, the very creation of DAG has to happen in the loader itself. As far I am concerned, airflow should not care where and how the DAG object is created, so long as it is a valid DAG object. The workaround for us is to mix parser and loader in the same file and drop it in the airflow default dags folder. During dag_bag creation, this file is loaded up with import_modules utility and shows up in the UI. While this is a solution, but it is not clean. What do DEVs think about a solution to this problem? Will saving the DAG to the db and reading it from the db work? Or some core changes need to happen in the dag_bag creation. Can dag_bag take a bunch of "created" DAGs. thanks, -soma