Hello everyone. Tl;DR; I have a proposal to adopt Hatchling as a build backend (and recommend, but not require Hatch as frontend) for Airflow as our way of switching to PEP-standard compliant pyproject.toml way of installing Airflow (including local venvs) and building the Airflow package.
I have a working implementation that needs polishing and taking a few less important decisions and rather simple TODOS). Here is draft PR: https://github.com/apache/airflow/pull/36537 I've spent a better part of the Xmas/New Years break on implementing it - something that we've been discussing for - literally - years - and several people (including myself) made several attempts in the past - unsuccessfully- with standardising python packaging/ build process for Airflow to use modern standard-driven tooling. I think I succeeded. finally. In short, what it means: When this change is merged, Airflow will have a nice and slick and modern, standard compliant contributor's experience - with editable installation that will **just work**, that will work with multiple build front-ends and it will make it very easy to install and manage local virtualenv(s) to contribute to Airflow. The extras structure and airflow configuration will be in one place (pyproject.toml) and it will be much easier to reason about our extras and dependencies. As a bonus point - with tools like Hatch, contributors will get the canonical way of managing local virtualenvs for Airflow development and a very easy recommended way to manage both Python and Venvs (but without forcing a single frontend). >From the user perspective Airflow packages will be more standardised, with just user extras defined. From maintainers and PMC members, we will get reproducible builds (similarly as we have now for Providers) - which means that it will be easier and more robust to verify provenance of the packages (security!) Why can we do it now and we could not do it before ? This is mostly thanks to Herculean efforts of Python Packaging team (hats off to TP being part of the team and leading a lot of standardisation efforts there) - after a few years of relentless introduction and implementation of many PEPs and releasing new tooling (particularly Hatch, but also Flit that we already use for providers) it seems finally Airflow can move away from a very complex, completely custom setup.py and setup tools being abused by us in ways that authors and Packaging team did not originally anticipate. What problems does the change solve? My PR solves all the difficult requirements of our custom solution, but also (mostly thanks to standardisation efforts by the packaging team), it improves on a lot of problems we could not solve. Happy to have a detailed discussion here, and more detailed in the PR (I added a lot more context and documentation- showing how this will work when we merge it). but here is the list of things such a move provides: * We are using hatchling build backend, that follows appropriate PEP standards and makes it work with any "frontend" you choose to install and manage your local installation (You can use modern Hatch which is counterpart to hatchling - highly recommended, but also it will work with just pip, poetry, flit, and any other standard-compliant tool in the future. No habits of the contributors need to be changed, it will **just** work * our editable installation has been broken for some time (mostly because we were abusing setuptools and setup.py A LOT). See https://github.com/apache/airflow/issues/30764 . This change puts the shine back on being able to make editable install of airflow work as expected and getting a first-class experience for contributors with local virtualenvs * all Airflow package configuration is now merged into a single appropriate PEP-compliant pyproject.toml - no more setup.py, setup.cfg, MANIFEST.in. * the extras are refactored and organized into logical groups and start to make sense. I introduced new "editable" extras to allow you to easily install provider dependencies locally and reorganized devel extras to make it easy to understand what you should install in your editable environment to run tests. More importantly those "devel" extras - while present in pyproject.toml are stripped off (thanks to custom hooks) from the final package - so final package has just things that are important to our users * we use pre-commit to automatically use provider.yaml dependencies and merge them into pyproject.toml - thanks to that provider.yaml will remain the single source of truth for providers. This provides a single source of truth for provider configuration, while it also allows one local installation to develop them all together" - and in a very seamless way. * no more INSTALL_PROVIDERS_FROM_SOURCES hack when you install airflow for local development. I figured a nice way to avoid installing pre-installed providers, and to make it super-easy to install dependencies of providers in editable installation (hint: `pip install -e .[editable_google]` . This thanks to custom build hooks the PEP standardized. * I also recommend Hatch as a Python/Venv management tool and used it for testing - it's a great tool for managing both - Python installations and Virtualenv management. For many people - providing such a canonical way (while following the standards and not forcing Hatch) will be really great to simplify their local environment installation. * Hatchling supports reproducible builds out-of-the-box, which is great for security - and it will make our package generation much safer and easier to verify (as we do with our providers now). There are many more details and thoughts (and also some future possible developments) that I am aware of, but this mail is already too long. and we can discuss it in the thread/PR or future threads. Happy to take any questions, critique, proposals and feedback - I got quite deep into how modern package building works so I likely made some mistakes / bad assumptions or things can be improved or maybe we can take other directions. It will take some time to merge and discuss details, and if this one gets approved it's likely going to be targeted for Airflow 2.9. J. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org For additional commands, e-mail: dev-h...@airflow.apache.org