Personally I don't like to have two versions in the PyPi repo. This also
complicates the releases, since we need to test, release and verify two
versions of Airflow. I'm afraid that this might confuse users.

Besides that, it feels a bit like we're reinventing certain mechanisms that
are already in tooling like Poetry.

My 2ct.

Cheers, Fokko

Op ma 16 mrt. 2020 om 11:04 schreef Jarek Potiuk <jarek.pot...@polidea.com>:

> Does anyone think this one needs voting? I think it's an incremental change
> (but one that helps having stable installation mechanism on installing any
> released version of Airflow).
>
> J.
>
>
> On Sun, Mar 15, 2020 at 1:17 PM Jarek Potiuk <jarek.pot...@polidea.com>
> wrote:
>
> > I think I have finally found a good and simple solution that I will also
> > be able to use in the production image.
> >
> > It boils down to adding "-apache-airflow-pinned" package additionally to
> > "apache-airflow". This package is the same as apache-airflow but it has
> > pinned dependencies (so it will never fail to install because of
> transitive
> > dependencies). The original 'apache-airflow' package is untouched - it
> > works as previously and sometimes it will fail when some transitive
> > dependencies will change.
> >
> > With this change we also have self-managed standard python
> > requirements.txt file (every time you update setup.py with dependencies,
> it
> > will be regenerated via pre-commit when you rebuild the breeze image).
> So
> > this will make it nice for IDE integration (a lot of ides will
> > automatically manage new requirements from requirements.txt file).
> >
> > It is designed to be self-managing.
> >
> > The PR is here: https://github.com/apache/airflow/pull/7730 - looking
> > forward to reviews and feedback.
> >
> > I also described it in CONTRIBUTING.rst so I just copy the description
> > here:
> >
> > Pinned Airflow version and requirements.txt file
> > <
> https://github.com/PolideaInternal/airflow/blob/AIRFLOW-7067-pinned-version-of-airflow/CONTRIBUTING.rst#id18
> >
> >
> > Airflow is not a standard python project. Most of the python projects
> fall
> > into one of two types - application or library. As described in [Should I
> > pin my python dependencies versions](
> >
> https://stackoverflow.com/questions/28509481/should-i-pin-my-python-dependencies-versions
> )
> > decision whether to pin (freeze) requirements for a python project
> depdends
> > on the type. For applications, dependencies should be pinned, but for
> > libraries, they should be open.
> >
> > For application, pinning the dependencies makes it more stable to install
> > in the future - because new (even transitive) dependencies might cause
> > installation to fail. For libraries - the dependencies should be open to
> > allow several different libraries with the same requirements to be
> > installed at the same time.
> >
> > The problem is that Apache Airflow is a bit of both - application to
> > install and library to be used when you are developing your own operators
> > and DAGs.
> >
> > This - seemingly unsolvable - puzzle is solved as follows:
> >
> >    - by default when you install apache-airflow package - the
> >    dependencies are as open as possible while still allowing the
> >    apache-airflow to install. This means that 'apache-airflow' package
> might
> >    fail to install in case a direct or transitive dependency is released
> that
> >    breaks the installation. In such case when installing apache-airflow,
> >    you might need to provide additional constraints (for example pip
> >    install apache-airflow==1.10.2 Werkzeug<1.0.0)
> >    - we have requirements.txt file generated automatically based on the
> >    set of all latest working and tested requirement versions. You can
> also use
> >    that file as a constraints file when installing apache airflow -
> either
> >    from the sources pip install -e . --constraint requirements.txt or
> >    from the pypi package pip install apache-airflow --constraint
> >    requirements.txt. Note that this will also work with extras for
> >    example pip install .[gcp] --constraint requirements.txt or pip
> >    install apache-airflow[gcp] --constraint requirements.txt
> >    - we also release apache-airflow-pinned package which has all
> >    dependencies frozen. This is guaranteed to work and be installable no
> >    matter if some of the direct or transitive dependencies are released.
> If
> >    you want to only install airflow and start using it - install
> >    apache-airflow-pinned package.
> >
> > The requirements.txt file is maintained automatically as we update
> > dependencies and verified at CI time via pre-commit check/generation. It
> > reflects the current set of dependencies installed in the CI image of
> > Apache Airflow. The same set of requirements will be used to produce the
> > production image.
> >
> > J.
> >
> >
> > On Sat, Nov 30, 2019 at 1:33 PM Jarek Potiuk <jarek.pot...@polidea.com>
> > wrote:
> >
> >> Just to update the plan and have some summary for those who did not read
> >> my TL;DR mail (and include probot integration). Here is a shorter
> summary
> >> of the proposal I have:
> >>
> >>    1. Keep the current process of *pip install -e[<extras>]* unchanged -
> >>    taking into account the limits from setup.py
> >>    2. Add *requirements.txt* file with a fixed set of requirements
> >>    (always *== -* including transitive deps). Those will be *managed by
> >>    probot*, similarly to dependabot.
> >>    3. PRs in CI will use the requirements.txt file as a constraint file
> >>    (known to work) when running tests to avoid transitive dependency
> problems
> >>    4. Daily CRON job installs all deps from scratch without
> *requirements.txt
> >>    - * to detect early problems with current set of dependencies. An
> >>    example of that from last night [1]: the CRON job failed and I
> >>    could prepare the fix [2] while PRs of contributors continued to
> work.
> >>    5. Airflow installation for development will be guaranteed to work
> >>    with *pip install -e . [<extras>] --constraints requirements.txt
> *from
> >>    master - no matter the state of transitive dependencies
> >>    6. We will modify setup.py to also have a way to take the
> >>    constraints, to guarantee installation works. Something like: *pip
> >>    install apache-airflow==1.10.6[<extras>,constrained] *
> >>
> >> I'd love to hear more comments and ask you about opinion on that
> proposal.
> >>
> >> [1] Failed CRON job from last night - triggered likely by "from scratch"
> >> dependencies: https://travis-ci.org/apache/airflow/builds/618832805
> >> [2] Fix to the last night CRON job failure:
> >> https://github.com/apache/airflow/pull/6691
> >> [4] AIP-8 - Split Hooks/Operators to separate packages:
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303
> >> [3] AIP-21 - Changes in import paths
> >>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> >>
> >> *@Dan Davydow* -> Fully agree we should have something like that. I
> >> think this should be part of AIP-8 [3] when we get there. I hope we can
> >> learn a lot about those when we get AIP-21 [4] fully implemented and
> >> backporting packages generated and used for people as preparation for
> >> Airflow 2.0 migration (I hope it will happen soon - we are very close to
> >> have automation in place to move all the packages from contrib - worked
> on
> >> by Kamil BreguĊ‚a).  Having such (probot managed) set of requirements + a
> >> way to test it automatically is - I think - an important prerequisite to
> >> start designing it (and work out a reasonable approach for
> cross-dependency
> >> problems).
> >>
> >> J.
> >>
> >> On Mon, Nov 18, 2019 at 6:18 PM Dan Davydov
> <ddavy...@twitter.com.invalid>
> >> wrote:
> >>
> >>> My 2 cents on the long-term plan:
> >>> Once Airflow has Dag Isolation (i.e. DAG python dependencies are
> >>> completely
> >>> decoupled from Airflow), we should pin the core Airflow deps, and
> package
> >>> operators separately with version ranges (for the reasons Ash mentioned
> >>> about libraries vs applications).
> >>>
> >>> On Sat, Nov 16, 2019 at 6:22 AM Jarek Potiuk <jarek.pot...@polidea.com
> >
> >>> wrote:
> >>>
> >>> > After some discussions and some tests by myself and Astronomer
> docker,
> >>> I
> >>> > think I have finally a complete proposal for good, consistent
> solution
> >>> for
> >>> > our dependency management.
> >>> >
> >>> > It uses well-known standards, does not introduce any new tools (like
> >>> > poetry) and I think it serves all the use cases we have very well.
> >>> >
> >>> > I wanted to summarise it here before I create it as an official AIP.
> I
> >>> will
> >>> > need to finalise the approach to complete the work on the official
> >>> > production image.
> >>> >
> >>> > Here are some basic assumptions:
> >>> >
> >>> > 1) We keep the current process of setup.py which is used to install
> >>> Airflow
> >>> > for development *'pip install -e .[extras]'*. It's nice and modular.
> >>> It has
> >>> > open-dependencies (some of them constrained due to compatibilities).
> >>> This
> >>> > will break from time to time for fresh install when some transitive
> >>> > dependencies are changed (or like with yesterday's pymssql drama).
> No
> >>> > changes whatsoever for people who continue installing it this way.
> The
> >>> same
> >>> > setup.py is used to install airflow in production (but see point 7
> >>> below
> >>> > for constrained builds).
> >>> >
> >>> > 2) On top of setup.py, we will have a standard *requirements.txt*
> file
> >>> as
> >>> > well. This will be a snapshot of current "working" dependencies
> >>> (including
> >>> > all transitive dependencies!). This file might be generated
> >>> automatically
> >>> > by pip freeze (with some scripting) and possibly put as part of
> >>> pre-commit
> >>> > checks to make sure it is self-maintainable when new dependencies are
> >>> > added. With the current Breeze + Pre-commit setup we can do it fully
> >>> > automatically, so that it will happen behind the scenes (and it will
> be
> >>> > verified in CI).  This requirements.txt file will be a set of "KNOWN
> >>> TO BE
> >>> > GOOD" requirements. This requirements.txt file will have ONLY
> >>> > 'dependency==VERSION' - specific versions of each requirement . Nice
> >>> > side-effect of the requirements.txt file is that if you use IDEs like
> >>> > Pycharm, VSCode, it will automatically suggest installing
> >>> missing/outdated
> >>> > requirements in your virtualenv so you do not have to remember about
> >>> it.
> >>> >
> >>> > 3) By having *requirements.txt *we can add a dependabot
> >>> > <https://dependabot.com/> that will upgrade dependencies in this
> file
> >>> > automatically (it works in a way that it will create a PR every time
> >>> one of
> >>> > the dependencies is released). It works nicely with  some of our
> other
> >>> > projects - see for example Oozie-2-Airflow here
> >>> > <
> >>> https://github.com/GoogleCloudPlatform/oozie-to-airflow/commits/master
> >
> >>> .
> >>> > It has the benefit that it prepares super-nice, isolated commits/PRs.
> >>> Those
> >>> > PRs (example here
> >>> > <https://github.com/GoogleCloudPlatform/oozie-to-airflow/pull/438>)
> >>> have
> >>> > all the information about changes and commits. Those are nice and
> >>> > actionable in terms of checking if we have breaking changes before
> >>> merging
> >>> > it (example below). The nice thing about dependabot is that it will
> >>> create
> >>> > a PR and our CI will test it automatically. So committers will have
> to
> >>> just
> >>> > double-check, take a look at the release notes and will be able to
> >>> merge it
> >>> > straight away. I will have to check if dependabot can be used for
> >>> Apache
> >>> > repos though (dependabot is owned by Github). It's an optional step
> >>> and we
> >>> > can do it manually though if we cannot use dependabot (but it
> requires
> >>> some
> >>> > more complex scripting).
> >>> >
> >>> > 4) We will use the requirements.txt file in CI in PRs so that we know
> >>> that
> >>> > CI always uses a "known to be good" set of requirements. Every PR
> will
> >>> have
> >>> > to have good set of requirements to be merged. No more transitive
> >>> > dependency changes.
> >>> >
> >>> > 5) We have a special CRON job run in CI daily and this job will use
> >>> '*pip
> >>> > install -e . [all]*'  (not using the requirements.txt) - this way we
> >>> will
> >>> > have a mechanism to detect early problems with transitive
> dependencies
> >>> -
> >>> > without breaking PR builds.
> >>> >
> >>> > 6) We can use the requirements.txt file as constraints file as well
> >>> (!).
> >>> > For example you should be able to run '*pip install -e . [gcp]
> >>> > --constraints requirements.txt*' and it will install the "known to be
> >>> good"
> >>> > versions but only for those extras we want.
> >>> >
> >>> > 7) I believe I can customise setup.py to add extra "*constrained*"
> >>> extra
> >>> > which will also use the information from the requirements.txt. This
> >>> way the
> >>> > users will be able to do `*pip install
> >>> > apache-airflow==1.10.6[gcp,aws,constrained]*` and have "known to be
> >>> good"
> >>> > installation of Airflow - even if some transitive dependencies break
> >>> the
> >>> > "clean" install. I am not 100% sure if it is possible exactly this
> >>> way, but
> >>> > I think in the worst case it will be `*pip install
> >>> > apache-airflow=1.10.6[constrained,gcp-constrained,aws-constrained]*`
> -
> >>> we
> >>> > can generate a "-constrained" version of each of the extras
> >>> automatically
> >>> > based on the requirements.txt. I already had a POC for that. And then
> >>> it
> >>> > should be possible to upgrade each of the dependencies on its own.
> >>> >
> >>> > 8) Last but not least - we will use the requirements.txt file to
> build
> >>> the
> >>> > production image. One of the requirements for a good official image
> >>> > <
> >>> >
> >>>
> https://github.com/docker-library/official-images/blob/master/README.md#repeatability
> >>> > >
> >>> > is repeatability - which boils down to dependency-pinning. Currently
> we
> >>> > have no mechanism to enforce that. With requirements.txt this will be
> >>> > possible.
> >>> >
> >>> > Let me know what you think!
> >>> >
> >>> > J.
> >>> >
> >>> >
> >>> > Appendix: example dependabot-generated commit:
> >>> >
> >>> > Bump pylint from 2.4.3 to 2.4.4
> >>> >
> >>> > Bumps [pylint](https://github.com/PyCQA/pylint) from 2.4.3 to 2.4.4.
> >>> > - [Release notes](https://github.com/PyCQA/pylint/releases)
> >>> > - [Changelog](https://github.com/PyCQA/pylint/blob/master/ChangeLog)
> >>> > - [Commits](PyCQA/pylint@pylint-2.4.3...pylint-2.4.4
> >>> > <https://github.com/PyCQA/pylint/compare/pylint-2.4.3...pylint-2.4.4
> >)
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > On Fri, Aug 2, 2019 at 8:34 AM Felix Uellendall
> <felue...@pm.me.invalid
> >>> >
> >>> > wrote:
> >>> >
> >>> > > I understand what Ash and Jarek are saying. I actually use airflow
> >>> with
> >>> > > custom plugins to have a end-product that fully satisfy our needs
> and
> >>> > when
> >>> > > writing new hooks and operators I don't want to see "airflow uses
> >>> > > requirement foo=1 but you have foo=2" - but actually that sometimes
> >>> also
> >>> > > just works. So why not let the developer decide if he want to risk
> >>> > breaking
> >>> > > it?
> >>> > > I like the idea of firstly defining groups for core and non-core
> >>> > > dependencies. I think only need to manage the core ones to always
> >>> work
> >>> > will
> >>> > > be a lot easier.
> >>> > > So I would suggest for core ones we have ranges including minor
> >>> releases
> >>> > > and non-core ones getting pinned by default installation (and ci).
> >>> > >
> >>> > > I think the more dependencies we add (which comes automatically as
> >>> > airflow
> >>> > > is growing) the more we are willing to set to more specific version
> >>> > ranges
> >>> > > and maybe even pin them at the end.
> >>> > >
> >>> > > Felix
> >>> > >
> >>> > > Sent from ProtonMail mobile
> >>> > >
> >>> > > -------- Original Message --------
> >>> > > On Aug 2, 2019, 06:10, Jarek Potiuk wrote:
> >>> > >
> >>> > > > Ash is totally right - that's exactly the difficulty we face.
> >>> Airflow
> >>> > is
> >>> > > > both a library and end product and this makes the usual advice
> >>> (pin if
> >>> > > you
> >>> > > > are end-product, don't pin if you are library) not really useful.
> >>> From
> >>> > > the
> >>> > > > very beginning of my adventures with Airflow I was for pinning of
> >>> > > > everything (and using dependabot or similar - I use it for other
> >>> > > projects),
> >>> > > > but over time I realised that this is very short-sighted
> approach.
> >>> it
> >>> > > does
> >>> > > > not take into account the "library" point of view.. Depending
> which
> >>> > user
> >>> > > of
> >>> > > > airflow you are, you have contradicting requirements. If you are
> >>> user
> >>> > of
> >>> > > > airflow who just want to use it as "end product" you want
> pinning.
> >>> If
> >>> > you
> >>> > > > want to develop your own operators or extend existing ones - you
> >>> use
> >>> > > > airflow as "library" and you do not want pinning.
> >>> > > >
> >>> > > > I also proposed at the beginning of that thread that we split
> core
> >>> > > > requirements (and pin it) and non-core ones (and don't pin it).
> >>> But it
> >>> > > > ain't easy to separate those two sets in a clear way
> unfortunately.
> >>> > > >
> >>> > > > That's why the idea of choosing at the installation time (and not
> >>> at
> >>> > > build
> >>> > > > time) whether you want to install "loose" or "frozen"
> dependencies
> >>> is
> >>> > so
> >>> > > > appealing.
> >>> > > > Possibly the best solution could be that you 'pip install
> airflow'
> >>> and
> >>> > > you
> >>> > > > get the pinned versions and then some other way to get the loose
> >>> one.
> >>> > > But I
> >>> > > > think we are a bit on the mercy of pip - this does not seem to be
> >>> > > possible.
> >>> > > >
> >>> > > > Then - it looks like using extras to add "pinning" mechanism is
> the
> >>> > next
> >>> > > > best idea.
> >>> > > >
> >>> > > > I am not afraid about complexity. We can fully automate
> generating
> >>> > those
> >>> > > > pinned requirements. I already have some ideas how we can make
> sure
> >>> > that
> >>> > > we
> >>> > > > keep those requirements in sync while developing and how they can
> >>> end
> >>> > up
> >>> > > > frozen in release. I would like to run a POC on that but in short
> >>> it is
> >>> > > > another "by-product" of the CI image we have now. Our CI image is
> >>> the
> >>> > > > perfect source of frozen requirements - we know those
> requirements
> >>> in
> >>> > CI
> >>> > > > image are ok and we can use them to generate the standard
> >>> > > > "requirements.txt" file and keep it updated via some local update
> >>> > script
> >>> > > > (and pre-commit hooks) + we can verify that they are updated in
> >>> the CI.
> >>> > > We
> >>> > > > can then write custom setup.py that will use that
> requirements.txt
> >>> and
> >>> > > the
> >>> > > > existing "extras" and generate "pinned" extras automatically.
> That
> >>> > sounds
> >>> > > > like fully doable and with very limited maintenance effort.
> >>> > > >
> >>> > > > J.
> >>> > > >
> >>> > > > On Thu, Aug 1, [2019](tel:2019) at 10:45 PM Qingping Hou <
> >>> > q...@scribd.com>
> >>> > > wrote:
> >>> > > >
> >>> > > >> On Thu, Aug 1, [2019](tel:2019) at 1:33 PM Chen Tong <
> >>> > cix...@gmail.com>
> >>> > > wrote:
> >>> > > >> > It is sometimes hard to distinguish if it is a library or an
> >>> > > application.
> >>> > > >> > Take operator as an example, a non-tech people may think it
> as a
> >>> > > >> well-built
> >>> > > >> > application while an engineer may consider it as a library and
> >>> > extends
> >>> > > >> > functionalities on it.
> >>> > > >> >
> >>> > > >>
> >>> > > >> Yeah, I agree. Personally, I would consider operator to be a
> >>> library
> >>> > > >> due to the expectation that other people will import them in
> >>> their own
> >>> > > >> projects/source tree.
> >>> > > >>
> >>> > > >> For things like REST endpoint handlers and perhaps scheduler, it
> >>> seems
> >>> > > >> safe to assume all changes and improvements will happen within
> >>> Airflow
> >>> > > >> source tree. In that case, it's safe to classify that part of
> >>> code as
> >>> > > >> application and freeze all its dependencies. The current plugin
> >>> system
> >>> > > >> might make this slightly complicated because people can still
> >>> extend
> >>> > > >> the core with custom code. But again, in an ideal world, plugins
> >>> > > >> should be self-contained and communicating with core through a
> >>> well
> >>> > > >> defined interface ;)
> >>> > > >>
> >>> > > >
> >>> > > > --
> >>> > > >
> >>> > > > Jarek Potiuk
> >>> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>> > > >
> >>> > > > M: [+48 660 796 129](tel:+48660796129)
> >>> > <[+48660796129](tel:+48660796129)>
> >>> > > > [image: Polidea] <https://www.polidea.com/>
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> >
> >>> > Jarek Potiuk
> >>> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>> >
> >>> > M: +48 660 796 129 <+48660796129>
> >>> > [image: Polidea] <https://www.polidea.com/>
> >>> >
> >>>
> >>
> >>
> >> --
> >>
> >> Jarek Potiuk
> >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>
> >> M: +48 660 796 129 <+48660796129>
> >> [image: Polidea] <https://www.polidea.com/>
> >>
> >>
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
> >
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Reply via email to