uranusjr commented on pull request #16960:
URL: https://github.com/apache/airflow/pull/16960#issuecomment-879590535


   > it will set the dag_run's dag_id && execution_date to null (just like 
deleted the dag_run)
   
   Ah I see, that makes perfect sense. So both the original issue and this PR 
is using “delete” in an inaccurate sense; the dag run is not actually deleted, 
but incorrectly modified and becomes unavailable.
   
   I’m far from a SQLAlchemy expert, so I want to summon a fellow maintainer on 
this topic, but I feel the current relationship setup between DagRun and 
TaskInstance is backwards.
   
   Currently we have this on `DagRun`:
   
   
https://github.com/apache/airflow/blob/c46e841519ef2df7dc40ff2596dd49c010514d87/airflow/models/dagrun.py#L92-L97
   
   and this on `TaskInstance`:
   
   
https://github.com/apache/airflow/blob/c46e841519ef2df7dc40ff2596dd49c010514d87/airflow/models/taskinstance.py#L2147
   
   But if I’m understaing SQLAlchemy correctly (a big if), this makes `DagRun` 
a child of `TaskInstance`, thus deleting a `TaskInstance` triggers cascading on 
its associated `DagRun`, setting the foreign key columns to NULL. But the 
hierarchy is the other way around: a `TaskInstance` only makes sense when there 
is an associated `DagRun`, so deleting a `DagRun` should cascade to its 
`TaskInstance` children, but deleting `TaskInstance` shouldn’t affect a 
`DagRun`. So we should set the backref at the other side (on 
`TaskInstance.dag_run`) instead…?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to