At the risk of oversimplifying things, your DAG definition file is loaded *every* time a DAG (or any task in that DAG) is run. Think of it as a literal Python import of your dag-defining module: any variables are loaded along with the DAGs, which are then executed. That's why your dict is always available. This will work with Celery since it follows the same approach, parsing your DAG file to run each task.
(By the way, this is why it's critical that all parts of your Airflow infrastructure have access to the same DAGS_FOLDER) Now it is true that the DagBag loads DAG objects but think of it as more of an "index" so that the scheduler/webserver know what DAGs are available. When it's time to actually run one of those DAGs, the executor loads it from the underlying source file. Jeremiah On Wed, Mar 22, 2017 at 8:45 AM Boris Tyukin <bo...@boristyukin.com> wrote: > Hi, > > I have a weird question but it bugs my mind. I have some like below to > generate dags dynamically, using Max's example code from FAQ. > > It works fine but I have one large dict (let's call it my_outer_dict) that > takes over 60Mb in memory and I need to access it from all generated dags. > Needless to say, i do not want to recreate that dict for every dag as I > want to load it to memory only once. > > To my surprise, if i define that dag outside of my dag definition code, I > can still access it. > > Can someone explain why and where is it stored? I thought only dag > definitions are loaded to dagbag and not the variables outside it. > > Is it even a good practice and will it work still if I switch to celery > executor? > > > def get_dag(i): > dag_id = 'foo_{}'.format(i) > dag = DAG(dag_id) > .... > print my_outer_dict > > my_outer_dict = {} > for i in range(10): > dag = get_dag(i) > globals()[dag.dag_id] = dag >