uranusjr commented on pull request #16960: URL: https://github.com/apache/airflow/pull/16960#issuecomment-879590535
> it will set the dag_run's dag_id && execution_date to null (just like deleted the dag_run) Ah I see, that makes perfect sense. So both the original issue and this PR is using “delete” in an inaccurate sense; the dag run is not actually deleted, but incorrectly modified and becomes unavailable. I’m far from a SQLAlchemy expert, so I want to summon a fellow maintainer on this topic, but I feel the current relationship setup between DagRun and TaskInstance is backwards. Currently we have this on `DagRun`: https://github.com/apache/airflow/blob/c46e841519ef2df7dc40ff2596dd49c010514d87/airflow/models/dagrun.py#L92-L97 and this on `TaskInstance`: https://github.com/apache/airflow/blob/c46e841519ef2df7dc40ff2596dd49c010514d87/airflow/models/taskinstance.py#L2147 But if I’m understaing SQLAlchemy correctly (a big if), this makes `DagRun` a child of `TaskInstance`, thus deleting a `TaskInstance` triggers cascading on its associated `DagRun`, setting the foreign key columns to NULL. But the hierarchy is the other way around: a `TaskInstance` only makes sense when there is an associated `DagRun`, so deleting a `DagRun` should cascade to its `TaskInstance` children, but deleting `TaskInstance` shouldn’t affect a `DagRun`. So we should set the backref at the other side (on `TaskInstance.dag_run`) instead…? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
