spider-man-tm commented on PR #68678:
URL: https://github.com/apache/airflow/pull/68678#issuecomment-4752324322
@uranusjr
Thanks for the question. The concrete scenario I had in mind is:
before: file_a.py → dag_id="hoge"
after: file_a.py → dag_id="fuga" (hoge removed, fuga added)
file_b.py → dag_id="hoge" (hoge moved here)
If file_b.py is parsed before file_a.py, this PR raises a transient import
error on file_b.py since hoge → file_a.py is still active in the DB. However,
this self-heals automatically: once file_a.py is re-parsed, hoge's
last_parsed_time stops being updated, and DagProcessorManager._scan_stale_dags
marks it stale after stale_dag_threshold seconds — so the next parse of
file_b.py succeeds without collision.
So the current implementation is already eventually consistent in
periodic-parsing environments.
What I'd rather avoid is silently letting the new file overwrite the
existing entry — that's what current Airflow does today, and it's exactly how
our team ended up with a broken DAG that nobody noticed. Making the collision
visible as an import error, even transiently, is the whole point of this PR.
If the transient error window is a concern, one option would be to
proactively mark disappeared DAGs as stale inside
update_dag_parsing_results_in_db at parse time, which would reduce the window
to just the next parse of the original file. Happy to implement that if you
think it's worth the added complexity.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]