At the risk of oversimplifying things, your DAG definition file is loaded
*every* time a DAG (or any task in that DAG) is run. Think of it as a
literal Python import of your dag-defining module: any variables are loaded
along with the DAGs, which are then executed. That's why your dict is
always available. This will work with Celery since it follows the same
approach, parsing your DAG file to run each task.

(By the way, this is why it's critical that all parts of your Airflow
infrastructure have access to the same DAGS_FOLDER)

Now it is true that the DagBag loads DAG objects but think of it as more of
an "index" so that the scheduler/webserver know what DAGs are available.
When it's time to actually run one of those DAGs, the executor loads it
from the underlying source file.

Jeremiah

On Wed, Mar 22, 2017 at 8:45 AM Boris Tyukin <bo...@boristyukin.com> wrote:

> Hi,
>
> I have a weird question but it bugs my mind. I have some like below to
> generate dags dynamically, using Max's example code from FAQ.
>
> It works fine but I have one large dict (let's call it my_outer_dict) that
> takes over 60Mb in memory and I need to access it from all generated dags.
> Needless to say, i do not want to recreate that dict for every dag as I
> want to load it to memory only once.
>
> To my surprise, if i define that dag outside of my dag definition code, I
> can still access it.
>
> Can someone explain why and where is it stored? I thought only dag
> definitions are loaded to dagbag and not the variables outside it.
>
> Is it even a good practice and will it work still if I switch to celery
> executor?
>
>
> def get_dag(i):
>     dag_id = 'foo_{}'.format(i)
> dag = DAG(dag_id)
> ....
> print my_outer_dict
>
> my_outer_dict = {}
> for i in range(10):
> dag = get_dag(i)
>     globals()[dag.dag_id] = dag
>

Reply via email to