This is a proposal on how we can address dynamic provider discovery in Airflow 2.0 and 1.10.13 as well.
At the meeting on Monday, we agreed that Airflow 2.0 will be released using a mechanism based on what we have for backport packages. One of the problems to solve was the dynamic discovery of packages and the "dependency injection" of some sort of providers to core. It was mainly about making provider-specific connections "discoverable" by the core after installing a provider package. We discussed that we could use a plugin mechanism, but Ash commented that using plugins might introduce conflicts in dependencies so I went a different route. I prepared a POC showing how it can look like. It is very much WIP but I already tested it in various scenarios (airflow 2.0 from package and sources, airflow 1.10 using backport packages, airflow 1.10 using the "providers" symbolic link" workaround and it seems to handle all the cases. There are some things we will need to change a bit though and add some more "dynamic" parts that I currently did not add but it all seems to be doable and easy. The current WIP is here: * Master/2.0 -> https://github.com/apache/airflow/pull/10822 * 1.10 backport -> https://github.com/apache/airflow/pull/10823 There are a few things to note:- - it is very fast - on my PC it's sub-second (and much less than second) delay introduced - because I only limit the search to sub-packages of "airflow.providers". For now that is a limitation that all the providers must be installed in the same "path" (but maybe with Ash's help we can get this one solved). The importlib's "walk_packages" imports all the packages (but not modules!) it walks through, so skipping the "airflow" directory is a huge performance boost (it takes several seconds because our __init__.py are loading pretty much everything in airflow). - we could release a new wave of backport packages soon and implement changes in 1.10.13 so that all those "dynamic features" of the backport packages will be available to 1.10 users (that would solve for example this problem https://github.com/apache/airflow/issues/10783). Those packages will not be dynamically discovered on 1.10.12 but then in 1.10.13 they will work nicely if we make the changes. - currently, packages that are one level "deeper" (apache/NN or microsoft/azure) are not discoverable until we create __init__.py in the parent packages - but it can be solved for sure (Ash - looking for your help here :) ). - I have added some additional meta-data to the provider-info.py file that allows us to do a very nice thing - we could add a UI page and CLI command to display information about installed (and discovered) provider packages. I think that would be super-useful. I already return a dictionary ready to be rendered in the get_provider_info() method. I have not implemented everything - this is just POC to show that it is possible and start the discussion on possibly improving that. I am sure there are still some things left that I have not found yet :). There are a few things missing for sure: - I have not yet solved the "javascript" changes - it can be for sure by some clever template modifications and generating javascript sources - but I look for help here from people who are more familiar with flask/javascript and the UI part. - we also need to add extra links in a similar way. Happy for comments, suggestions, improvements (and help!) - either here or directly in the PRs J. -- Jarek Potiuk Polidea | Principal Software Engineer M: +48 660 796 129
