spider-man-tm commented on PR #68678:
URL: https://github.com/apache/airflow/pull/68678#issuecomment-4742367272

   @uranusjr 
   
   Thanks for the review. I believe this can happen when `dag_id=hoge` is moved 
from file_a.py to file_b.py, but file_a.py is kept and gets a new `dag_id=fuga` 
defined in it. Here are two approaches:
   
   **Option A: Mark disappeared Dags as stale at parse time**
   
   The fix is to explicitly mark those entries stale inside 
update_dag_parsing_results_in_db:
   
   ```python
     parsed_dag_ids = set(dag_op.dags.keys())
     session.execute(
         update(DagModel)
         .where(
             DagModel.bundle_name == bundle_name,
             DagModel.relative_fileloc == <rel_fileloc_of_parsed_file>,
             DagModel.dag_id.not_in(parsed_dag_ids),
             ~DagModel.is_stale,
         )
         .values(is_stale=True)
     )
   ```
   
   A transient import error may appear if file_b.py is parsed first, but it 
clears up once file_a.py is re-parsed. In Cloud Composer-style environments 
where files are parsed periodically, this resolves automatically.
   
   
   **Option B: Downgrade the collision to a warning and let the new file 
overwrite the entry**
   
   This avoids the transient error but also silences genuine duplicates.
   
   Which approach do you prefer, or is there a better way I'm missing?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to