Matthew Bruce created AIRFLOW-6796: -------------------------------------- Summary: Serialized DAGs can be incorrectly deleted Key: AIRFLOW-6796 URL: https://issues.apache.org/jira/browse/AIRFLOW-6796 Project: Apache Airflow Issue Type: Bug Components: serialization Affects Versions: 1.10.9 Reporter: Matthew Bruce
With serialization of DAGs enabled, `SerializedDagModel.remove_deleted_dags` called from `DagFileProcessManager.refresh_dag_dir` can delete the serialization of DAGs if they were loaded via a DagBag and globals in a different `.py` file: Consider something like this: `/home/airflow/dags/loader.py` ``` dags = [] dags.append(models.DagBag('/home/airflow/project-a/dags') dags.append(models.DagBag('/home/airflow/project-b/dags') globals().update(dags) ``` with files: `/home/airflow/project-a/dags/dag-a.py` `/home/airflow/project-b/dags/dag-b.py` The list of file paths passed to `SerializedDagModel.remove_deleted_dags` is only going to contain `/home/airflow/dags/loader.py` and the method will remove the serializations for the DAGs in dag-a.py and dag-b.py With non-serialized DAGs, airflow seems to mark DAGs as inactive based on when the scheduler last processed them - I wonder if we should make these two methods consistent? -- This message was sent by Atlassian Jira (v8.3.4#803005)