Please not that Fks can be quite slow... Verstuurd vanaf mijn iPad
> Op 10 apr. 2019 om 19:55 heeft Ash Berlin-Taylor <a...@apache.org> het > volgende geschreven: > > I am all for FKs. > > How do you think we should handle the case of "Xcom but missing TIs" (or > similar) that we might run into on installs when a user runs `airflow > upgradedb`? > > -a > >> On 10 Apr 2019, at 18:44, Driesprong, Fokko <fo...@driesprong.frl> wrote: >> >> Reviving this discussion again :-) >> >> Lately, I was digging into the PR of Julian regarding adding FK's to the >> database: https://github.com/apache/airflow/pull/4922 >> >> After digging into the details, I've noticed that the current situation >> regarding referential integrity is bad. It is not uncommon the have >> DagRun's without having the DAG in the database. For example, you can do a >> backfill job before the scheduler persisted the DAG in the database. I also >> think this is often the case in the UI, where we the nuke when some of the >> models haven't been persisted in the database. Therefore I'd like to >> suggest to enforce consistency by foreign keys. This will prevent us from >> having DagRuns without DAGs, but also removing xcom objects of >> TaskInstances that are already removed. >> >> To create an overview of the FK's, I've created subtasks the ticket of >> Peter: https://jira.apache.org/jira/browse/AIRFLOW-3904 >> >> WDYT? >> >> Cheers, Fokko >> >> >> >> Op di 18 sep. 2018 om 21:51 schreef Maxime Beauchemin < >> maximebeauche...@gmail.com>: >> >>> The database migration creating the FK will/would need to have something >>> that either creates dummy missing PKs first, or delete the orphaned keys to >>> insure the operation of creating the FK doesn't error out. Seems like >>> adding dummy keys is a better approach. Then you'll have to make sure that >>> everywhere where FKs are created that there are no edge cases of missing >>> PKs. Then some delete operations in some cases may have to "cascade" >>> properly. >>> >>> The Django Admin had this nice confirm screen on delete that would show you >>> clearly the scope of the cascading operation when deleting objects. To my >>> knowledge Flask-Admin and FAB don't have such a feature. Personally I >>> wouldn't allow cascade unless such a feature is implemented in some way. >>> Note that SQLAlchemy has builtin semantics for specifying how/whether >>> cascading should take place. >>> >>> Personally I think referential integrity is overrated in some cases, >>> especially when using meaningful "business keys" (as opposed to >>> auto-increted "surrogate" keys) as PKs. It also slows down insert >>> operations. For data warehousing (this clearly doesn't apply to the Airflow >>> metadata store), the best practice on most db engine is to **not** enforce >>> FK constraints as it slows down inserts in fact tables and straight out >>> prevents bulk loading. >>> >>> Another approach is to avoid deleting in general, especially referenced >>> fks, and setting up some activity/visibility flag to false instead. >>> >>> Max >>> >>> On Tue, Sep 18, 2018 at 10:47 AM Driesprong, Fokko <fo...@driesprong.frl> >>> wrote: >>> >>>> I'm in favor of having referential integrity. It will add some load in >>>> having to enforce the referential integrity, but it will also make sure >>>> that the database stays clean. Also in Airflow we use transactions which >>>> will make sure that the integrity checks are not validated on every >>>> statement, but after the commit. I'm happy to help with this as well. >>>> >>>> Cheers, Fokko >>>> >>>> Op di 18 sep. 2018 om 11:07 schreef Bolke de Bruin <bdbr...@gmail.com>: >>>> >>>>> Adding these kind of checks which work for integrity well make database >>>>> access pretty slow. In addition it isnt there because in the past there >>>> was >>>>> no strong connection between for example tasks and dagruns, it was more >>>> or >>>>> less just coincidental. There also so some bisecting tools that >>> probably >>>>> have difficulty functioning in a new regime. In other words it is not >>> an >>>>> easy change and it will have operational challenges. >>>>> >>>>>> On 18 Sep 2018, at 11:03, Ash Berlin-Taylor <a...@apache.org> wrote: >>>>>> >>>>>> Ooh good spot. >>>>>> >>>>>> Yes I would be in favour of adding these, but as you say we need to >>>>> thing about how we might migrate old data. >>>>>> >>>>>> Doing this at 2.0.0 and providing a cleanup script (or doing it as >>> part >>>>> of the migration?) is probably the way to go. >>>>>> >>>>>> -ash- >>>>>> >>>>>>> On 17 Sep 2018, at 19:56, Stefan Seelmann <m...@stefan-seelmann.de> >>>>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> looking into the DB schema there is almost no referral integrity >>>>>>> enforced at the database level. Many foreign key constraints between >>>>>>> dag, dag_run, task_instance, xcom, dag_pickle, log, etc would make >>>> sense >>>>>>> IMO. >>>>>>> >>>>>>> Is there a particular reason why that's not implemented? >>>>>>> >>>>>>> Introducing it now will be hard, probably any real-world setup has >>>> some >>>>>>> violations. But I'm still in favor of this additional safety net. >>>>>>> >>>>>>> Kind Regards, >>>>>>> Stefan >>>>>> >>>>> >>>>> >>>> >>> >