Hello everyone.

Tl;DR; I have a proposal to adopt Hatchling as a build backend (and
recommend, but not require Hatch as frontend) for Airflow as our way
of switching to PEP-standard compliant pyproject.toml way of
installing Airflow (including local venvs) and building the Airflow
package.

I have a working implementation that needs polishing and taking a few
less important decisions and rather simple TODOS). Here is draft PR:
https://github.com/apache/airflow/pull/36537

I've spent a better part of the Xmas/New Years break on implementing
it - something that we've been discussing for - literally - years -
and several people (including myself) made several attempts in the
past  - unsuccessfully- with standardising python packaging/ build
process for Airflow to use modern standard-driven tooling.

I think I succeeded. finally.

In short, what it means:

When this change is merged, Airflow will have a nice and slick and
modern, standard compliant contributor's experience - with editable
installation that will **just work**, that will work with multiple
build front-ends and it will make it very easy to install and manage
local virtualenv(s) to contribute to Airflow. The extras structure and
airflow configuration will be in one place (pyproject.toml) and it
will be much easier to reason about our extras and dependencies. As a
bonus point - with tools like Hatch, contributors will get the
canonical way of managing local virtualenvs for Airflow development
and a very easy recommended way to manage both Python and Venvs (but
without forcing a single frontend).

>From the user perspective Airflow packages will be more standardised,
with just user extras defined. From maintainers and PMC members, we
will get reproducible builds (similarly as we have now for Providers)
- which means that it will be easier and more robust to verify
provenance of the packages (security!)

Why can we do it now and we could not do it before ?

This is mostly thanks to Herculean efforts of Python Packaging team
(hats off to TP being part of the team and leading a lot of
standardisation efforts there) - after a few years of relentless
introduction and implementation of many PEPs and releasing new tooling
(particularly Hatch, but also Flit that we already use for providers)
it seems finally Airflow can move away from a very complex, completely
custom setup.py and setup tools being abused by us in ways that
authors and Packaging team did not originally anticipate.

What problems does the change solve?

My PR solves all the difficult requirements of our custom solution,
but also (mostly thanks to standardisation efforts by the packaging
team), it improves on a lot of problems we could not solve.

Happy to have a detailed discussion here, and more detailed in the PR
(I added a lot more context and documentation-  showing how this will
work when we merge it). but here is the list of things such a move
provides:

* We are using hatchling build backend, that follows appropriate PEP
standards and makes it work with any "frontend" you choose to install
and manage your local installation (You can use modern Hatch which is
counterpart to hatchling - highly recommended, but also it will work
with just pip, poetry, flit, and any other standard-compliant tool in
the future. No habits of the contributors need to be changed, it will
**just** work

* our editable installation has been broken for some time (mostly
because we were abusing setuptools and setup.py A LOT). See
https://github.com/apache/airflow/issues/30764 . This change puts the
shine back on being able to make editable install of airflow work as
expected and getting a first-class experience for contributors with
local virtualenvs

* all Airflow package configuration is now merged into a single
appropriate PEP-compliant pyproject.toml - no more setup.py,
setup.cfg, MANIFEST.in.

* the extras are refactored and organized into logical groups and
start to make sense. I introduced new "editable" extras to allow you
to easily install provider dependencies locally and reorganized devel
extras to make it easy to understand what you should install in your
editable environment to run tests. More importantly those "devel"
extras - while present in pyproject.toml are stripped off (thanks to
custom hooks) from the final package - so final package has just
things that are important to our users

* we use pre-commit to automatically use provider.yaml dependencies
and merge them into pyproject.toml - thanks to that provider.yaml will
remain the single source of truth for providers. This provides a
single source of truth for provider configuration, while it also
allows one local installation to develop them all together" - and in a
very seamless way.

* no more INSTALL_PROVIDERS_FROM_SOURCES hack when you install airflow
for local development. I figured a nice way to avoid installing
pre-installed providers, and to make it super-easy to install
dependencies of providers in editable installation (hint: `pip install
-e .[editable_google]` . This thanks to custom build hooks the PEP
standardized.

* I also recommend Hatch as a Python/Venv management tool and used it
for testing - it's a great tool for managing both - Python
installations and Virtualenv management. For many people - providing
such a canonical way (while following the standards and not forcing
Hatch) will be really great to simplify their local environment
installation.

* Hatchling supports reproducible builds out-of-the-box, which is
great for security - and it will make our package generation much
safer and easier to verify (as we do with our providers now).

There are many more details and thoughts (and also some future
possible developments) that I am aware of, but this mail is already
too long. and we can discuss it in the thread/PR or future threads.

Happy to take any questions, critique, proposals and feedback - I got
quite deep into how modern package building works so I likely made
some mistakes / bad assumptions or things can be improved or maybe we
can take other directions.  It will take some time to merge and
discuss details, and if this one gets approved it's likely going to be
targeted for Airflow 2.9.

J.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
For additional commands, e-mail: dev-h...@airflow.apache.org

Reply via email to