spider-man-tm commented on PR #68678:
URL: https://github.com/apache/airflow/pull/68678#issuecomment-4752324322

   @uranusjr
   
   Thanks for the question. The concrete scenario I had in mind is:
   
   before:  file_a.py → dag_id="hoge"
   
   after:   file_a.py → dag_id="fuga"   (hoge removed, fuga added)
              file_b.py → dag_id="hoge"   (hoge moved here)
   
   If file_b.py is parsed before file_a.py, this PR raises a transient import 
error on file_b.py since hoge → file_a.py is still active in the DB. However, 
this self-heals automatically: once file_a.py is re-parsed, hoge's 
last_parsed_time stops being updated, and DagProcessorManager._scan_stale_dags 
marks it stale after stale_dag_threshold seconds — so the next parse of 
file_b.py succeeds without collision.
   
   So the current implementation is already eventually consistent in 
periodic-parsing environments.
   
   What I'd rather avoid is silently letting the new file overwrite the 
existing entry — that's what current Airflow does today, and it's exactly how 
our team ended up with a broken DAG that nobody noticed. Making the collision 
visible as an import error, even transiently, is the whole point of this PR.
   
   If the transient error window is a concern, one option would be to 
proactively mark disappeared DAGs as stale inside 
update_dag_parsing_results_in_db at parse time, which would reduce the window 
to just the next parse of the original file. Happy to implement that if you 
think it's worth the added complexity.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to