Just to update the plan and have some summary for those who did not read my
TL;DR mail (and include probot integration). Here is a shorter summary of
the proposal I have:

   1. Keep the current process of *pip install -e[<extras>]* unchanged -
   taking into account the limits from setup.py
   2. Add *requirements.txt* file with a fixed set of requirements (always *==
   -* including transitive deps). Those will be *managed by probot*,
   similarly to dependabot.
   3. PRs in CI will use the requirements.txt file as a constraint file
   (known to work) when running tests to avoid transitive dependency problems
   4. Daily CRON job installs all deps from scratch without *requirements.txt
   - * to detect early problems with current set of dependencies. An
   example of that from last night [1]: the CRON job failed and I
   could prepare the fix [2] while PRs of contributors continued to work.
   5. Airflow installation for development will be guaranteed to work with *pip
   install -e . [<extras>] --constraints requirements.txt *from master - no
   matter the state of transitive dependencies
   6. We will modify setup.py to also have a way to take the constraints,
   to guarantee installation works. Something like: *pip install
   apache-airflow==1.10.6[<extras>,constrained] *

I'd love to hear more comments and ask you about opinion on that proposal.

[1] Failed CRON job from last night - triggered likely by "from scratch"
dependencies: https://travis-ci.org/apache/airflow/builds/618832805
[2] Fix to the last night CRON job failure:
https://github.com/apache/airflow/pull/6691
[4] AIP-8 - Split Hooks/Operators to separate packages:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303
[3] AIP-21 - Changes in import paths
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths

*@Dan Davydow* -> Fully agree we should have something like that. I think
this should be part of AIP-8 [3] when we get there. I hope we can learn a
lot about those when we get AIP-21 [4] fully implemented and backporting
packages generated and used for people as preparation for Airflow 2.0
migration (I hope it will happen soon - we are very close to have
automation in place to move all the packages from contrib - worked on by
Kamil BreguĊ‚a).  Having such (probot managed) set of requirements + a way
to test it automatically is - I think - an important prerequisite to start
designing it (and work out a reasonable approach for cross-dependency
problems).

J.

On Mon, Nov 18, 2019 at 6:18 PM Dan Davydov <[email protected]>
wrote:

> My 2 cents on the long-term plan:
> Once Airflow has Dag Isolation (i.e. DAG python dependencies are completely
> decoupled from Airflow), we should pin the core Airflow deps, and package
> operators separately with version ranges (for the reasons Ash mentioned
> about libraries vs applications).
>
> On Sat, Nov 16, 2019 at 6:22 AM Jarek Potiuk <[email protected]>
> wrote:
>
> > After some discussions and some tests by myself and Astronomer docker, I
> > think I have finally a complete proposal for good, consistent solution
> for
> > our dependency management.
> >
> > It uses well-known standards, does not introduce any new tools (like
> > poetry) and I think it serves all the use cases we have very well.
> >
> > I wanted to summarise it here before I create it as an official AIP. I
> will
> > need to finalise the approach to complete the work on the official
> > production image.
> >
> > Here are some basic assumptions:
> >
> > 1) We keep the current process of setup.py which is used to install
> Airflow
> > for development *'pip install -e .[extras]'*. It's nice and modular. It
> has
> > open-dependencies (some of them constrained due to compatibilities). This
> > will break from time to time for fresh install when some transitive
> > dependencies are changed (or like with yesterday's pymssql drama).  No
> > changes whatsoever for people who continue installing it this way. The
> same
> > setup.py is used to install airflow in production (but see point 7 below
> > for constrained builds).
> >
> > 2) On top of setup.py, we will have a standard *requirements.txt* file as
> > well. This will be a snapshot of current "working" dependencies
> (including
> > all transitive dependencies!). This file might be generated automatically
> > by pip freeze (with some scripting) and possibly put as part of
> pre-commit
> > checks to make sure it is self-maintainable when new dependencies are
> > added. With the current Breeze + Pre-commit setup we can do it fully
> > automatically, so that it will happen behind the scenes (and it will be
> > verified in CI).  This requirements.txt file will be a set of "KNOWN TO
> BE
> > GOOD" requirements. This requirements.txt file will have ONLY
> > 'dependency==VERSION' - specific versions of each requirement . Nice
> > side-effect of the requirements.txt file is that if you use IDEs like
> > Pycharm, VSCode, it will automatically suggest installing
> missing/outdated
> > requirements in your virtualenv so you do not have to remember about it.
> >
> > 3) By having *requirements.txt *we can add a dependabot
> > <https://dependabot.com/> that will upgrade dependencies in this file
> > automatically (it works in a way that it will create a PR every time one
> of
> > the dependencies is released). It works nicely with  some of our other
> > projects - see for example Oozie-2-Airflow here
> > <https://github.com/GoogleCloudPlatform/oozie-to-airflow/commits/master>
> .
> > It has the benefit that it prepares super-nice, isolated commits/PRs.
> Those
> > PRs (example here
> > <https://github.com/GoogleCloudPlatform/oozie-to-airflow/pull/438>) have
> > all the information about changes and commits. Those are nice and
> > actionable in terms of checking if we have breaking changes before
> merging
> > it (example below). The nice thing about dependabot is that it will
> create
> > a PR and our CI will test it automatically. So committers will have to
> just
> > double-check, take a look at the release notes and will be able to merge
> it
> > straight away. I will have to check if dependabot can be used for Apache
> > repos though (dependabot is owned by Github). It's an optional step and
> we
> > can do it manually though if we cannot use dependabot (but it requires
> some
> > more complex scripting).
> >
> > 4) We will use the requirements.txt file in CI in PRs so that we know
> that
> > CI always uses a "known to be good" set of requirements. Every PR will
> have
> > to have good set of requirements to be merged. No more transitive
> > dependency changes.
> >
> > 5) We have a special CRON job run in CI daily and this job will use '*pip
> > install -e . [all]*'  (not using the requirements.txt) - this way we will
> > have a mechanism to detect early problems with transitive dependencies -
> > without breaking PR builds.
> >
> > 6) We can use the requirements.txt file as constraints file as well (!).
> > For example you should be able to run '*pip install -e . [gcp]
> > --constraints requirements.txt*' and it will install the "known to be
> good"
> > versions but only for those extras we want.
> >
> > 7) I believe I can customise setup.py to add extra "*constrained*" extra
> > which will also use the information from the requirements.txt. This way
> the
> > users will be able to do `*pip install
> > apache-airflow==1.10.6[gcp,aws,constrained]*` and have "known to be good"
> > installation of Airflow - even if some transitive dependencies break the
> > "clean" install. I am not 100% sure if it is possible exactly this way,
> but
> > I think in the worst case it will be `*pip install
> > apache-airflow=1.10.6[constrained,gcp-constrained,aws-constrained]*` - we
> > can generate a "-constrained" version of each of the extras automatically
> > based on the requirements.txt. I already had a POC for that. And then it
> > should be possible to upgrade each of the dependencies on its own.
> >
> > 8) Last but not least - we will use the requirements.txt file to build
> the
> > production image. One of the requirements for a good official image
> > <
> >
> https://github.com/docker-library/official-images/blob/master/README.md#repeatability
> > >
> > is repeatability - which boils down to dependency-pinning. Currently we
> > have no mechanism to enforce that. With requirements.txt this will be
> > possible.
> >
> > Let me know what you think!
> >
> > J.
> >
> >
> > Appendix: example dependabot-generated commit:
> >
> > Bump pylint from 2.4.3 to 2.4.4
> >
> > Bumps [pylint](https://github.com/PyCQA/pylint) from 2.4.3 to 2.4.4.
> > - [Release notes](https://github.com/PyCQA/pylint/releases)
> > - [Changelog](https://github.com/PyCQA/pylint/blob/master/ChangeLog)
> > - [Commits](PyCQA/[email protected]
> > <https://github.com/PyCQA/pylint/compare/pylint-2.4.3...pylint-2.4.4>)
> >
> >
> >
> >
> >
> >
> > On Fri, Aug 2, 2019 at 8:34 AM Felix Uellendall <[email protected]>
> > wrote:
> >
> > > I understand what Ash and Jarek are saying. I actually use airflow with
> > > custom plugins to have a end-product that fully satisfy our needs and
> > when
> > > writing new hooks and operators I don't want to see "airflow uses
> > > requirement foo=1 but you have foo=2" - but actually that sometimes
> also
> > > just works. So why not let the developer decide if he want to risk
> > breaking
> > > it?
> > > I like the idea of firstly defining groups for core and non-core
> > > dependencies. I think only need to manage the core ones to always work
> > will
> > > be a lot easier.
> > > So I would suggest for core ones we have ranges including minor
> releases
> > > and non-core ones getting pinned by default installation (and ci).
> > >
> > > I think the more dependencies we add (which comes automatically as
> > airflow
> > > is growing) the more we are willing to set to more specific version
> > ranges
> > > and maybe even pin them at the end.
> > >
> > > Felix
> > >
> > > Sent from ProtonMail mobile
> > >
> > > -------- Original Message --------
> > > On Aug 2, 2019, 06:10, Jarek Potiuk wrote:
> > >
> > > > Ash is totally right - that's exactly the difficulty we face. Airflow
> > is
> > > > both a library and end product and this makes the usual advice (pin
> if
> > > you
> > > > are end-product, don't pin if you are library) not really useful.
> From
> > > the
> > > > very beginning of my adventures with Airflow I was for pinning of
> > > > everything (and using dependabot or similar - I use it for other
> > > projects),
> > > > but over time I realised that this is very short-sighted approach. it
> > > does
> > > > not take into account the "library" point of view.. Depending which
> > user
> > > of
> > > > airflow you are, you have contradicting requirements. If you are user
> > of
> > > > airflow who just want to use it as "end product" you want pinning. If
> > you
> > > > want to develop your own operators or extend existing ones - you use
> > > > airflow as "library" and you do not want pinning.
> > > >
> > > > I also proposed at the beginning of that thread that we split core
> > > > requirements (and pin it) and non-core ones (and don't pin it). But
> it
> > > > ain't easy to separate those two sets in a clear way unfortunately.
> > > >
> > > > That's why the idea of choosing at the installation time (and not at
> > > build
> > > > time) whether you want to install "loose" or "frozen" dependencies is
> > so
> > > > appealing.
> > > > Possibly the best solution could be that you 'pip install airflow'
> and
> > > you
> > > > get the pinned versions and then some other way to get the loose one.
> > > But I
> > > > think we are a bit on the mercy of pip - this does not seem to be
> > > possible.
> > > >
> > > > Then - it looks like using extras to add "pinning" mechanism is the
> > next
> > > > best idea.
> > > >
> > > > I am not afraid about complexity. We can fully automate generating
> > those
> > > > pinned requirements. I already have some ideas how we can make sure
> > that
> > > we
> > > > keep those requirements in sync while developing and how they can end
> > up
> > > > frozen in release. I would like to run a POC on that but in short it
> is
> > > > another "by-product" of the CI image we have now. Our CI image is the
> > > > perfect source of frozen requirements - we know those requirements in
> > CI
> > > > image are ok and we can use them to generate the standard
> > > > "requirements.txt" file and keep it updated via some local update
> > script
> > > > (and pre-commit hooks) + we can verify that they are updated in the
> CI.
> > > We
> > > > can then write custom setup.py that will use that requirements.txt
> and
> > > the
> > > > existing "extras" and generate "pinned" extras automatically. That
> > sounds
> > > > like fully doable and with very limited maintenance effort.
> > > >
> > > > J.
> > > >
> > > > On Thu, Aug 1, [2019](tel:2019) at 10:45 PM Qingping Hou <
> > [email protected]>
> > > wrote:
> > > >
> > > >> On Thu, Aug 1, [2019](tel:2019) at 1:33 PM Chen Tong <
> > [email protected]>
> > > wrote:
> > > >> > It is sometimes hard to distinguish if it is a library or an
> > > application.
> > > >> > Take operator as an example, a non-tech people may think it as a
> > > >> well-built
> > > >> > application while an engineer may consider it as a library and
> > extends
> > > >> > functionalities on it.
> > > >> >
> > > >>
> > > >> Yeah, I agree. Personally, I would consider operator to be a library
> > > >> due to the expectation that other people will import them in their
> own
> > > >> projects/source tree.
> > > >>
> > > >> For things like REST endpoint handlers and perhaps scheduler, it
> seems
> > > >> safe to assume all changes and improvements will happen within
> Airflow
> > > >> source tree. In that case, it's safe to classify that part of code
> as
> > > >> application and freeze all its dependencies. The current plugin
> system
> > > >> might make this slightly complicated because people can still extend
> > > >> the core with custom code. But again, in an ideal world, plugins
> > > >> should be self-contained and communicating with core through a well
> > > >> defined interface ;)
> > > >>
> > > >
> > > > --
> > > >
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > >
> > > > M: [+48 660 796 129](tel:+48660796129)
> > <[+48660796129](tel:+48660796129)>
> > > > [image: Polidea] <https://www.polidea.com/>
> >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Reply via email to