TerryYin1777 commented on issue #42542:
URL: https://github.com/apache/airflow/issues/42542#issuecomment-2644166002

   We have a similar setup (kubernetes deployment with git-sync) and have the 
same issue. After some deep dive in the code base, I believe the root cause is 
the following:
   
   When git-sync resyncs, it changes the DAGs folder from:
   
   hash-123/dags/example_dag.py  →  hash-456/dags/example_dag.py
   
   Despite the symlink that points these directories to a contract directory, 
the 
[get_dag_directory](https://github.com/apache/airflow/blob/2.9.1/airflow/dag_processing/manager.py#L963)
 resolves this path to its canonical path, which will be different across 
re-syncs. This path get passed all the way to the 
[deactivate_deleted_dags](https://github.com/apache/airflow/blob/2.9.1/airflow/models/dag.py#L3830)
 function in the dag model, by which is used to mark the dag inactive and 
therefore hidden from the UI. The deletion is not happening since the 
processor_subdir value does not match the previously registered 
processor_subdir.
   
   Not fully sure the reason why we need to resolve the dag folder path to its 
canonical. I understand it's not an issue with Airflow itself and probably 
happens only when git-sync is used for dag deployment. But given it's a quite 
widely adopted combination, is it possible to add a configuration like  
GET_DAG_FOLDER_RESOLVE=False so the symlink path is not resolved to its 
canonical path?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to