Matthew Bruce created AIRFLOW-6796:
--------------------------------------

             Summary: Serialized DAGs can be incorrectly deleted
                 Key: AIRFLOW-6796
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6796
             Project: Apache Airflow
          Issue Type: Bug
          Components: serialization
    Affects Versions: 1.10.9
            Reporter: Matthew Bruce


With serialization of DAGs enabled, `SerializedDagModel.remove_deleted_dags` 
called from `DagFileProcessManager.refresh_dag_dir` can delete the 
serialization of DAGs if they were loaded via a DagBag and globals in a 
different `.py` file:

Consider something like this:
`/home/airflow/dags/loader.py`
```
dags = []
dags.append(models.DagBag('/home/airflow/project-a/dags')
dags.append(models.DagBag('/home/airflow/project-b/dags')

globals().update(dags)
```

with files:
`/home/airflow/project-a/dags/dag-a.py`
`/home/airflow/project-b/dags/dag-b.py`


The list of file paths passed to `SerializedDagModel.remove_deleted_dags` is 
only going to contain `/home/airflow/dags/loader.py` and the method will remove 
the serializations for the DAGs in dag-a.py and dag-b.py

With non-serialized DAGs, airflow seems to mark DAGs as inactive based on when 
the scheduler last processed them - I wonder if we should make these two 
methods consistent?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to