In my opinion this searching for dags is not ideal. We should be explicitly specifying the dags to load somewhere.
> On 25 Nov 2018, at 10:41 am, Kevin Yang <yrql...@gmail.com> wrote: > > I believe that is mostly because we want to skip parsing/loading .py files > that doesn't contain DAG defs to save time, as scheduler is going to > parse/load the .py files over and over again and some files can take quite > long to load. > > Cheers, > Kevin Y > > On Fri, Nov 23, 2018 at 12:44 AM soma dhavala <soma.dhav...@gmail.com> > wrote: > >> happy to report that the “fix” worked. thanks Alex. >> >> btw, wondering why was it there in the first place? how does it help — >> saves time, early termination — what? >> >> >>> On Nov 23, 2018, at 8:18 AM, Alex Guziel <alex.guz...@airbnb.com> wrote: >>> >>> Yup. >>> >>> On Thu, Nov 22, 2018 at 3:16 PM soma dhavala <soma.dhav...@gmail.com >> <mailto:soma.dhav...@gmail.com>> wrote: >>> >>> >>>> On Nov 23, 2018, at 3:28 AM, Alex Guziel <alex.guz...@airbnb.com >> <mailto:alex.guz...@airbnb.com>> wrote: >>>> >>>> It’s because of this >>>> >>>> “When searching for DAGs, Airflow will only consider files where the >> string “airflow” and “DAG” both appear in the contents of the .py file.” >>>> >>> >>> Have not noticed it. From airflow/models.py, in process_file — (both in >> 1.9 and 1.10) >>> .. >>> if not all([s in content for s in (b'DAG', b'airflow')]): >>> .. >>> is looking for those strings and if they are not found, it is returning >> without loading the DAGs. >>> >>> >>> So having “airflow” and “DAG” dummy strings placed somewhere will make >> it work? >>> >>> >>>> On Thu, Nov 22, 2018 at 2:27 AM soma dhavala <soma.dhav...@gmail.com >> <mailto:soma.dhav...@gmail.com>> wrote: >>>> >>>> >>>>> On Nov 22, 2018, at 3:37 PM, Alex Guziel <alex.guz...@airbnb.com >> <mailto:alex.guz...@airbnb.com>> wrote: >>>>> >>>>> I think this is what is going on. The dags are picked by local >> variables. I.E. if you do >>>>> dag = Dag(...) >>>>> dag = Dag(…) >>>> >>>> from my_module import create_dag >>>> >>>> for file in yaml_files: >>>> dag = create_dag(file) >>>> globals()[dag.dag_id] = dag >>>> >>>> You notice that create_dag is in a different module. If it is in the >> same scope (file), it will be fine. >>>> >>>>> >>>> >>>>> Only the second dag will be picked up. >>>>> >>>>> On Thu, Nov 22, 2018 at 2:04 AM Soma S Dhavala <soma.dhav...@gmail.com >> <mailto:soma.dhav...@gmail.com>> wrote: >>>>> Hey AirFlow Devs: >>>>> In our organization, we build a Machine Learning WorkBench with >> AirFlow as >>>>> an orchestrator of the ML Work Flows, and have wrapped AirFlow python >>>>> operators to customize the behaviour. These work flows are specified in >>>>> YAML. >>>>> >>>>> We drop a DAG loader (written python) in the default location airflow >>>>> expects the DAG files. This DAG loader reads the specified YAML files >> and >>>>> converts them into airflow DAG objects. Essentially, we are >>>>> programmatically creating the DAG objects. In order to support muliple >>>>> parsers (yaml, json etc), we separated the DAG creation from loading. >> But >>>>> when a DAG is created (in a separate module) and made available to the >> DAG >>>>> loaders, airflow does not pick it up. As an example, consider that I >>>>> created a DAG picked it, and will simply unpickle the DAG and give it >> to >>>>> airflow. >>>>> >>>>> However, in current avatar of airfow, the very creation of DAG has to >>>>> happen in the loader itself. As far I am concerned, airflow should not >> care >>>>> where and how the DAG object is created, so long as it is a valid DAG >>>>> object. The workaround for us is to mix parser and loader in the same >> file >>>>> and drop it in the airflow default dags folder. During dag_bag >> creation, >>>>> this file is loaded up with import_modules utility and shows up in the >> UI. >>>>> While this is a solution, but it is not clean. >>>>> >>>>> What do DEVs think about a solution to this problem? Will saving the >> DAG to >>>>> the db and reading it from the db work? Or some core changes need to >> happen >>>>> in the dag_bag creation. Can dag_bag take a bunch of "created" DAGs. >>>>> >>>>> thanks, >>>>> -soma >>>> >>> >> >>