Re: [DISCUSS] Back to (some) dependency pinning

Jarek Potiuk Sat, 06 Jul 2019 10:04:35 -0700

I think the recent case with werkzeug calls for action here (also see
https://issues.apache.org/jira/browse/AIRFLOW-4903 ). We again ended up
with released Airflow version that cannot be installed easily because of
some transient dependencies upgrade.


I think this is something we should at least consider for 2.*   version.

The problem is that simply running 'pip install airflow==1.10.3' . Right
now this will not work - you have to hack it and manually upgrade deps
(like https://github.com/godatadriven/whirl/issues/50).

I really do not like that changes beyond our control impact the release we
already made (and is out there in pip).

I've read recently the nice writeup
https://docs.google.com/document/d/1x_VrNtXCup75qA3glDd2fQOB2TakldwjKZ6pXaAjAfg/edit
about
Python Dependency problems and I think it's the only solution to pin the
"core" packages. This likely means that we have to be ready to release
sub-releases with security dependencies updated (like 1.10.4.1 maybe or
change semantics a bit to more semver and start releasing 2.0.0- 2.1.0 and
then release security updates as 2.0.1 etc. If those 2.0.1 etc are released
only because of dependency updates/security bugfixes and some critical
problems, and if we automate it - I don't think this would be a great
problem to release those security-patched versions. We can have services
like pyup (https://pyup.io/) or even github itself monitor dependencies for
us and create PRs automatically to update them.

Would someone actually complain if any of the "core" packages
(install_requires + devel) below got pinned ? I am not sure if that would
be a big problem for anyone, and even if you need (in your operator) some
newer version - you can always upgrade it afterwards and ignore the fact
that airflow has it pinned.

Here are the dependencies that are the "core" ones:

install_requires:

   -             'alembic',
   -             'cached_property',
   -             'configparser',
   -             'croniter',
   -             'dill',
   -             'dumb-ini',
   -             'flask',
   -             'flask-appbuilder',
   -             'flask-caching',
   -             'flask-login',
   -             'flask-swagger',
   -             'flask-wtf',
   -             'funcsigs',
   -             'gitpython',
   -             'gunicorn',
   -             'iso8601',
   -             'json-merge-patch',
   -             'jinja2',
   -             'lazy_object_proxy',
   -             'markdown',
   -             'pendulum',
   -             'psutil',
   -             'pygments',
   -             'python-daemon',
   -             'python-dateutil',
   -             'requests',
   -             'setproctitle',
   -             'sqlalchemy',
   -             'tabulate',
   -             'tenacity',
   -             'text-unidecode',
   -             'thrift',
   -             'tzlocal',
   -             'unicodecsv',
   -             'zope.deprecation',

Devel:

   -     'beautifulsoup4',
   -     'click',
   -     'codecov',
   -     'flake8',
   -     'freezegun',
   -     'ipdb',
   -     'jira',
   -     'mongomock',
   -     'moto',
   -     'nose',
   -     'nose-ignore-docstring',
   -     'nose-timer',
   -     'parameterized',
   -     'paramiko',
   -     'pylint',
   -     'pysftp',
   -     'pywinrm',
   -     'qds-sdk', -> should be moved to separate qubole
   -     'rednose',
   -     'requests_mock',

J.


On Mon, Jun 24, 2019 at 3:03 PM Ash Berlin-Taylor <[email protected]> wrote:

> Another suggestion someone (I forget who, sorry) had was that we could
> maintain a full list of _fully tested and supported versions_ (i.e. the
> output of `pip freeze`) - that way people _can_ use other versions if they
> want, but we can at least say "use these versions".
>
> I'm not 100% sure how that would work in practice though, but having it be
> some list we can update without having to do a release is crucial.
>
> -ash
>
> > On 24 Jun 2019, at 10:00, Jarek Potiuk <[email protected]> wrote:
> >
> > With the recent Sphinx problem
> > <https://issues.apache.org/jira/browse/AIRFLOW-4841>- we got back our
> > old-time enemy. In this case sphinx autoapi has been released yesterday
> to
> > 1.1.0 version and it started to caused our master to fail, causing kind
> of
> > emergency rush to fix as master (and all PRs based on it) would be
> broken.
> >
> > I think I have a proposal that can address similar problems without
> pushing
> > us in emergency mode.
> >
> > *Context:*
> >
> > I wanted to return back to an old discussion - how we can avoid unrelated
> > dependencies to cause emergencies on our side where we have to quickly
> > solve such dependency issues when they break our builds.
> >
> > *Change coming soon:*
> >
> > The problems will be partially addressed with last stage of AIP-10 (
> > https://github.com/apache/airflow/pull/4938 - pending only Kubernetes
> test
> > fix). It effectively freezes installed dependencies as cached layer of
> > docker image for builds which do not touch setup.py - so in case setup.py
> > does not change, the dependencies will not be updated to latest ones.
> >
> > *Possibly even better long-term solution:*
> >
> > I think we should address it a bit better. We had a number of discussions
> > on pinning dependencies (for example here
> > <
> https://lists.apache.org/thread.html/9e775d11cce6a3473cbe31908a17d7840072125be2dff020ff59a441@%3Cdev.airflow.apache.org%3E
> >).
> > I think the conclusion there was that airflow is both "library" (for
> DAGs)
> > - where dependencies should not be pinned and end-product (where the
> > dependencies should be pinned). So it's a bit catch-22 situation.
> >
> > Looking at the problem with Sphinx however It came to me that maybe we
> can
> > use hybrid solution. We pin all the libraries (like Sphinx or Flask) that
> > are used to merely build and test the end product but we do not pin the
> > libraries (like google-api) which are used in the context of library
> > (writing the operators and DAGs).
> >
> > What do you think? Maybe that will be the best of both worlds ? Then we
> > would have to classify the dependencies and maybe restructure setup.py
> > slightly to have an obvious distinction between those two types of
> > dependencies.
> >
> > J.
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: [DISCUSS] Back to (some) dependency pinning

Reply via email to