And merged :D.

On Wed, Jan 10, 2024 at 6:56 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> I made a comparison of package files before <> after and added a few
> corrections. Also I've added an extra security layer for CI building of
> airflow packages - it runs inside a fully isolated Docker container.
>
> Would be great to get another quick look /review before I merge it :)
>
> J,
>
> On Wed, Jan 10, 2024 at 1:36 AM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Hey Everyone,
>>
>> I got the PR green: https://github.com/apache/airflow/pull/36537 - I got
>> a really comprehensive review and a number of iterations with Jens (and
>> approval! yay!!) and a number of comments from TP.
>>
>> I would love to have some feedback from others before merging, I still
>> want to (I will do it tomorrow) go through the packages prepared with hatch
>> and make sure we have not lost (or added) too much from the packages and
>> add appropriate inclusions/exclusions  - but other than that, I think it
>> could be merged even today.
>>
>> I'd love some more comments - especially from those who struggled with
>> local venv/editable installation and dependency management/adding provider
>> dependencies recently - as the way it is done now should be WAY simpler and
>> better.
>>
>> Just to repeat what we get with that one:
>>
>> 1. cutting-edge support for packaging Python standards (see previous mail
>> in the thread) - with complete configuration for project in single
>> pyproject.toml file. Allows to use any modern build frontend for
>> development (hatch, pip. poetry, pipenv etc.)
>> 2. nicer integration with IDEs (Pycharm/VScode etc.) with installing
>> dependency management
>> 3. nicely and logically organized dependencies - including devel
>> dependencies + extras per provider, nicely managed from provider.yaml
>> 4. seamlessly working `pip install --editable .` (it was hacked before,
>> and not working in recent `pip` versions - now it will `**just work**)
>> 5. a way to easily install provider devel dependencies for testing in
>> local venv (`pip install -e ".[amazon,google]"`)
>> 6. hatch as recommended (but not mandatory) frontend that supports
>> out-of-the-box:
>>    a) installing python interpreters (`hatch python install all`)
>>    b) creating local venvs (`hatch env create`, `hatch env shell`, `hatch
>> -e airflow-311 create` and so on)
>>    c) building packages for release (`hatch build -c custom -c wheel -c
>> sdist`)
>>    d) later we will use more things that hatch gives us (reproducible
>> builds, publishing to PyPI, possibly local testing and code formatting,
>> better monorepo organization in the future).
>> 7. Updated documentation for all the above.
>>
>> Note: It does not replace Breeze for reproducing and optimizing our CI
>> build (Breeze has way more optimizations and customisations needed for
>> Airflow). However it makes the LOCAL_VIRTUALENV option of running tests and
>> developing airflow much easier to manage and get it under control.
>>
>> Just as a teaser - here is the output of `hash env show`:
>>
>>
>> ┏━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
>> ┃ Name        ┃ Type    ┃ Features ┃ Description
>>                           ┃
>>
>> ┡━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
>> │ default     │ virtual │ devel    │ Default environment with Python 3.8
>> for maximum compatibility │
>>
>> ├─────────────┼─────────┼──────────┼───────────────────────────────────────────────────────────────┤
>> │ airflow-38  │ virtual │          │ Environment with Python 3.8. No
>> devel installed.              │
>>
>> ├─────────────┼─────────┼──────────┼───────────────────────────────────────────────────────────────┤
>> │ airflow-39  │ virtual │          │ Environment with Python 3.9. No
>> devel installed.              │
>>
>> ├─────────────┼─────────┼──────────┼───────────────────────────────────────────────────────────────┤
>> │ airflow-310 │ virtual │          │ Environment with Python 3.10. No
>> devel installed.             │
>>
>> ├─────────────┼─────────┼──────────┼───────────────────────────────────────────────────────────────┤
>> │ airflow-311 │ virtual │          │ Environment with Python 3.11. No
>> devel installed              │
>>
>> └─────────────┴─────────┴──────────┴───────────────────────────────────────────────────────────────┘
>>
>> J.
>>
>>
>>
>> On Sun, Jan 7, 2024 at 11:55 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>
>>> Ah .. .And comparing to the original proposal I simplified it a LOT.
>>> generally speaking for both contributor and user the way how you
>>> install Airflow for installation and contribution is "standard" and
>>> basically just "fixes" what has been broken - i.e. you just install it
>>> as expected:
>>>
>>> * `pip install apache-airflow[google]`  or `pip install .[google]`
>>> will install airflow + google provider (user story)
>>> * `pip install -e .[google]` will install airflow + all google
>>> provider dependencies in editable mode - ready to run tests
>>>
>>> Plus Airflow follows all the PEP-standards so that it is compatible
>>> with all the modern tooling for Python packaging. Here is the list of
>>> PEP's that it makes airflow generally compatible with:
>>>
>>> * `PEP-440 Version Identification and Dependency Specification
>>> <https://www.python.org/dev/peps/pep-0440/>`__
>>> * `PEP-517 A build-system independent format for source trees
>>> <https://www.python.org/dev/peps/pep-0517/>`__
>>> * `PEP-518 Specifying Minimum Build System Requirements for Python
>>> Projects <https://www.python.org/dev/peps/pep-0518/>`__
>>> * `PEP-561 Distributing and Packaging Type Information
>>> <https://www.python.org/dev/peps/pep-0561/>`__
>>> * `PEP-621 Storing project metadata in pyproject.toml
>>> <https://www.python.org/dev/peps/pep-0621/>`__
>>> * `PEP-685 Comparison of extra names for optional distribution
>>> dependencies <https://www.python.org/dev/peps/pep-0622/>`__
>>>
>>> J.
>>>
>>>
>>> On Sun, Jan 7, 2024 at 11:27 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>> >
>>> > Hello everyone,
>>> >
>>> > I iterated quite a bit on the PR and I think it's ready for an even
>>> > more serious review:  https://github.com/apache/airflow/pull/36537 . I
>>> > solved all of the TODOs and teething problems and while it likely
>>> > still has some tests to fix, all the build and packaging pieces, local
>>> > installation and even developer/contributor documentation should be
>>> > already in the state that is ready for serious scrutiny. Thanks to
>>> > Jens and TP for the reviews so far - I addressed all of the comments
>>> > already - and there are just 2 conversations left remaining.
>>> >
>>> > See the comment for status summary:
>>> > https://github.com/apache/airflow/pull/36537#issuecomment-1880193452
>>> >
>>> > BTW. I found it really useful to follow the "unresolved conversation"
>>> > routine - it's really nice to see such things as a summary (see
>>> > attachment) and be able to see that there are still 2 conversations to
>>> > resolve.
>>> > That's the in-progress experiment with conversations which I
>>> > personally like a lot so far. It already saved me from merging a PR
>>> > that still had things to resolve.
>>> >
>>> > J.
>>> >
>>> > On Thu, Jan 4, 2024 at 8:04 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>> > >
>>> > > I slept over it a few nights and got away of it and I have an idea to
>>> > > simplify it quite a bit - i.e. cut the number of extras by half and
>>> > > virtually make 0 impact on current editable installation so you might
>>> > > wnnt to hold on a bit with that (unless you want to see it changing
>>> :)
>>> > >  ) .. The whole concept won't change, I just realized that I do not
>>> > > need to add new `editable_` extras to achieve the same effect.
>>> > >
>>> > > I will also attempt to split it a bit to make it easier to review.
>>> > >
>>> > > Hold tight :) - but also feel free to look and comment even now :)
>>> > >
>>> > > And yes. Exciting. It kept me awake a night or two where I could not
>>> > > get to sleep until I finally got it working :D
>>> > >
>>> > > J
>>> > >
>>> > > On Thu, Jan 4, 2024 at 6:52 PM Pierre Jeambrun <
>>> pierrejb...@gmail.com> wrote:
>>> > > >
>>> > > > I personally think that this is a great idea. I have been
>>> following the
>>> > > > hatch project for a while and I am convinced it has a lot to offer
>>> for
>>> > > > airflow. The two big pros for me are its ease of use (backend and
>>> front
>>> > > > end) as well as the security covered aspects (reproducible builds
>>> to name
>>> > > > one).
>>> > > >
>>> > > > I will take a look at the PR later this week, but it definitely
>>> sounds
>>> > > > exciting.
>>> > > >
>>> > > >
>>> > > >
>>> > > > On Tue 2 Jan 2024 at 20:26, Jarek Potiuk <ja...@potiuk.com> wrote:
>>> > > >
>>> > > > > Hello everyone.
>>> > > > >
>>> > > > > Tl;DR; I have a proposal to adopt Hatchling as a build backend
>>> (and
>>> > > > > recommend, but not require Hatch as frontend) for Airflow as our
>>> way
>>> > > > > of switching to PEP-standard compliant pyproject.toml way of
>>> > > > > installing Airflow (including local venvs) and building the
>>> Airflow
>>> > > > > package.
>>> > > > >
>>> > > > > I have a working implementation that needs polishing and taking
>>> a few
>>> > > > > less important decisions and rather simple TODOS). Here is draft
>>> PR:
>>> > > > > https://github.com/apache/airflow/pull/36537
>>> > > > >
>>> > > > > I've spent a better part of the Xmas/New Years break on
>>> implementing
>>> > > > > it - something that we've been discussing for - literally -
>>> years -
>>> > > > > and several people (including myself) made several attempts in
>>> the
>>> > > > > past  - unsuccessfully- with standardising python packaging/
>>> build
>>> > > > > process for Airflow to use modern standard-driven tooling.
>>> > > > >
>>> > > > > I think I succeeded. finally.
>>> > > > >
>>> > > > > In short, what it means:
>>> > > > >
>>> > > > > When this change is merged, Airflow will have a nice and slick
>>> and
>>> > > > > modern, standard compliant contributor's experience - with
>>> editable
>>> > > > > installation that will **just work**, that will work with
>>> multiple
>>> > > > > build front-ends and it will make it very easy to install and
>>> manage
>>> > > > > local virtualenv(s) to contribute to Airflow. The extras
>>> structure and
>>> > > > > airflow configuration will be in one place (pyproject.toml) and
>>> it
>>> > > > > will be much easier to reason about our extras and dependencies.
>>> As a
>>> > > > > bonus point - with tools like Hatch, contributors will get the
>>> > > > > canonical way of managing local virtualenvs for Airflow
>>> development
>>> > > > > and a very easy recommended way to manage both Python and Venvs
>>> (but
>>> > > > > without forcing a single frontend).
>>> > > > >
>>> > > > > From the user perspective Airflow packages will be more
>>> standardised,
>>> > > > > with just user extras defined. From maintainers and PMC members,
>>> we
>>> > > > > will get reproducible builds (similarly as we have now for
>>> Providers)
>>> > > > > - which means that it will be easier and more robust to verify
>>> > > > > provenance of the packages (security!)
>>> > > > >
>>> > > > > Why can we do it now and we could not do it before ?
>>> > > > >
>>> > > > > This is mostly thanks to Herculean efforts of Python Packaging
>>> team
>>> > > > > (hats off to TP being part of the team and leading a lot of
>>> > > > > standardisation efforts there) - after a few years of relentless
>>> > > > > introduction and implementation of many PEPs and releasing new
>>> tooling
>>> > > > > (particularly Hatch, but also Flit that we already use for
>>> providers)
>>> > > > > it seems finally Airflow can move away from a very complex,
>>> completely
>>> > > > > custom setup.py and setup tools being abused by us in ways that
>>> > > > > authors and Packaging team did not originally anticipate.
>>> > > > >
>>> > > > > What problems does the change solve?
>>> > > > >
>>> > > > > My PR solves all the difficult requirements of our custom
>>> solution,
>>> > > > > but also (mostly thanks to standardisation efforts by the
>>> packaging
>>> > > > > team), it improves on a lot of problems we could not solve.
>>> > > > >
>>> > > > > Happy to have a detailed discussion here, and more detailed in
>>> the PR
>>> > > > > (I added a lot more context and documentation-  showing how this
>>> will
>>> > > > > work when we merge it). but here is the list of things such a
>>> move
>>> > > > > provides:
>>> > > > >
>>> > > > > * We are using hatchling build backend, that follows appropriate
>>> PEP
>>> > > > > standards and makes it work with any "frontend" you choose to
>>> install
>>> > > > > and manage your local installation (You can use modern Hatch
>>> which is
>>> > > > > counterpart to hatchling - highly recommended, but also it will
>>> work
>>> > > > > with just pip, poetry, flit, and any other standard-compliant
>>> tool in
>>> > > > > the future. No habits of the contributors need to be changed, it
>>> will
>>> > > > > **just** work
>>> > > > >
>>> > > > > * our editable installation has been broken for some time (mostly
>>> > > > > because we were abusing setuptools and setup.py A LOT). See
>>> > > > > https://github.com/apache/airflow/issues/30764 . This change
>>> puts the
>>> > > > > shine back on being able to make editable install of airflow
>>> work as
>>> > > > > expected and getting a first-class experience for contributors
>>> with
>>> > > > > local virtualenvs
>>> > > > >
>>> > > > > * all Airflow package configuration is now merged into a single
>>> > > > > appropriate PEP-compliant pyproject.toml - no more setup.py,
>>> > > > > setup.cfg, MANIFEST.in.
>>> > > > >
>>> > > > > * the extras are refactored and organized into logical groups and
>>> > > > > start to make sense. I introduced new "editable" extras to allow
>>> you
>>> > > > > to easily install provider dependencies locally and reorganized
>>> devel
>>> > > > > extras to make it easy to understand what you should install in
>>> your
>>> > > > > editable environment to run tests. More importantly those "devel"
>>> > > > > extras - while present in pyproject.toml are stripped off
>>> (thanks to
>>> > > > > custom hooks) from the final package - so final package has just
>>> > > > > things that are important to our users
>>> > > > >
>>> > > > > * we use pre-commit to automatically use provider.yaml
>>> dependencies
>>> > > > > and merge them into pyproject.toml - thanks to that
>>> provider.yaml will
>>> > > > > remain the single source of truth for providers. This provides a
>>> > > > > single source of truth for provider configuration, while it also
>>> > > > > allows one local installation to develop them all together" -
>>> and in a
>>> > > > > very seamless way.
>>> > > > >
>>> > > > > * no more INSTALL_PROVIDERS_FROM_SOURCES hack when you install
>>> airflow
>>> > > > > for local development. I figured a nice way to avoid installing
>>> > > > > pre-installed providers, and to make it super-easy to install
>>> > > > > dependencies of providers in editable installation (hint: `pip
>>> install
>>> > > > > -e .[editable_google]` . This thanks to custom build hooks the
>>> PEP
>>> > > > > standardized.
>>> > > > >
>>> > > > > * I also recommend Hatch as a Python/Venv management tool and
>>> used it
>>> > > > > for testing - it's a great tool for managing both - Python
>>> > > > > installations and Virtualenv management. For many people -
>>> providing
>>> > > > > such a canonical way (while following the standards and not
>>> forcing
>>> > > > > Hatch) will be really great to simplify their local environment
>>> > > > > installation.
>>> > > > >
>>> > > > > * Hatchling supports reproducible builds out-of-the-box, which is
>>> > > > > great for security - and it will make our package generation much
>>> > > > > safer and easier to verify (as we do with our providers now).
>>> > > > >
>>> > > > > There are many more details and thoughts (and also some future
>>> > > > > possible developments) that I am aware of, but this mail is
>>> already
>>> > > > > too long. and we can discuss it in the thread/PR or future
>>> threads.
>>> > > > >
>>> > > > > Happy to take any questions, critique, proposals and feedback -
>>> I got
>>> > > > > quite deep into how modern package building works so I likely
>>> made
>>> > > > > some mistakes / bad assumptions or things can be improved or
>>> maybe we
>>> > > > > can take other directions.  It will take some time to merge and
>>> > > > > discuss details, and if this one gets approved it's likely going
>>> to be
>>> > > > > targeted for Airflow 2.9.
>>> > > > >
>>> > > > > J.
>>> > > > >
>>> > > > >
>>> ---------------------------------------------------------------------
>>> > > > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>>> > > > > For additional commands, e-mail: dev-h...@airflow.apache.org
>>> > > > >
>>> > > > >
>>>
>>

Reply via email to