The problem with pinning everything is that it makes installing Airflow along 
with other python modules more fraught.

The usual advice (at least for other languages, I don't know about Python) is 
that end applications should exactly pin their deps, but that libraries should 
be forgiving, so that it is easier to use it alongside other things, and for 
instance so that a site operator can install a security fix to a module without 
us having to make patch release.

And Airflow is both an application, and a library. 

Not to mention that 100% pinning all of our transitive deps is going to 
introduce version hell for anyone wanting to install something we haven't 
thought of.

-a

> On 1 Aug 2019, at 18:24, Qingping Hou <q...@scribd.com> wrote:
> 
> Is there any reason why we don't just pin all dependencies to the exact 
> version?
> 
> I can see the benefit of the current relaxed dependency requirement,
> which is to avoid having to maintain and do frequent update for frozen
> dependencies. If we are already going down the route of maintaining a
> separate frozen dependency requirements, then we might as well just
> use the frozen dependency tree for everything :)
> 
> I personally recommend going with frozen dependencies for production
> python services. We will get a lot less unexpected surprises during
> build time (and sometimes even runtime). I wish there is better
> support for automatic frozen dependency (essentially lock file) update
> from the official python package system.
> 
> --QP
> 
> On Thu, Aug 1, 2019 at 10:05 AM Jarek Potiuk <jarek.pot...@polidea.com 
> <mailto:jarek.pot...@polidea.com>> wrote:
>> 
>> Hello Everyone,
>> 
>> Just to revive the thread - we had a discussion with Ash today after
>> today's small "spanner" drama, and we came with a possible solution.
>> 
>> This is something we yet have to try but it seems that it should be
>> possible to generate additional "pinned" extras (pinned, gcp_api-pinned
>> etc.) - it could also be "frozen" instead of "pinned" if the name sounds
>> better.
>> 
>> This way you would be able to run:
>> 
>>   - `pip install airflow==1.10.4[all-pinned]`
>>   - `pip-install airflow==1.10.4[gcp_api-pinned]'
>>   - ...
>> 
>> This way -  it will always work no matter if new dependencies are released.
>> It will install the "frozen" version of dependencies that we know work for
>> sure. We could update the documentation to add this is as the recommended
>> method of standalone installation. Then if you need some other set of
>> dependencies (newer) you could have a custom pip install to fix certain
>> dependencies.
>> 
>> What do you think? Would that work for the users of airflow ?
>> 
>> J.
>> 
>> On Tue, Jul 9, 2019 at 9:06 PM Driesprong, Fokko <fo...@driesprong.frl>
>> wrote:
>> 
>>> Hi Jarek,
>>> 
>>> Thanks for bringing this up. I certainly think this is a good idea.
>>> Unfortunately I'm in a plane right now so I'm unable to read the Google doc
>>> right now.
>>> 
>>> GitHub recently acquired Dependabot which even supports automatic updates
>>> of dependencies. The we at least know when something breaks. The only
>>> problem right now is that this bot isn't allowed by the ASF policies since
>>> it requires write access to the repository.
>>> 
>>> Regarding the symver. I do often see packages changing the public API in a
>>> minor update without any notice of deprecation. In this case it is
>>> impossible to make this watertight, but at least a more structured process
>>> using something like Dependabot would be a big plus!
>>> 
>>> Cheers, Fokko
>>> 
>>> 
>>> 
>>> Op zo 7 jul. 2019 om 11:34 schreef Jarek Potiuk <jarek.pot...@polidea.com>
>>> 
>>>> All for deeper release-cycle discussion. I think after 1.10.4 is out we
>>>> should discuss/agree and document the release scheme we are going to use.
>>>> Semver and patching seems like a good idea.
>>>> 
>>>> We have already quite an experience in backporting to 1.10.x branch and
>>> it
>>>> was surprisingly easy - small, focused commits help with that. And if we
>>>> limit patches to dependency updates and security fixes only, I don't see
>>> it
>>>> will be a lot of effort.
>>>> 
>>>> Bot and automation is definitely something we should do. The pyup bot is
>>>> great - for one - to automate upgrades of pinned dependencies. We use it
>>> in
>>>> Oozie-to-airflow for quite some time and it takes almost no time to
>>> upgrade
>>>> deps regularly:
>>>> 
>>>> 
>>> https://github.com/GoogleCloudPlatform/oozie-to-airflow/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aclosed+pyup
>>>> - those are automated PRs we got from pyup and it was just enough to do
>>>> "approve" + "merge" after we saw that all the tests passed with the new
>>>> version.
>>>> 
>>>> J.
>>>> 
>>>> 
>>>> 
>>>> On Sat, Jul 6, 2019 at 9:24 PM Philippe Gagnon <philgagn...@gmail.com>
>>>> wrote:
>>>> 
>>>>> I am +1 on pinning core packages, even though this adds a bit of manual
>>>>> labor for maintenance. This latest werkzeug issue highlights why this
>>> is
>>>> a
>>>>> good idea.
>>>>> 
>>>>> Also +1 on changing the versioning scheme to something more akin to
>>>> semver.
>>>>> The current scheme basically does not support patch-only releases and a
>>>>> 4-part version notation seems a bit much. Overall, I think that
>>>> patch-only
>>>>> releases would make the project healthier.
>>>>> 
>>>>> Two points though:
>>>>> 
>>>>> 1. I think that there should be a more in-depth discussion about
>>>> clarifying
>>>>> the release lifecycle policy.
>>>>> 2. This implies a lot more backport-related work, which is a bit of a
>>>>> burden since it is both tedious and boring. Perhaps we could look into
>>>>> having a bot help out with this (similar to
>>>>> https://github.com/miss-islington)?
>>>>> 
>>>>> On Sat, Jul 6, 2019 at 1:04 PM Jarek Potiuk <jarek.pot...@polidea.com>
>>>>> wrote:
>>>>> 
>>>>>> I think the recent case with werkzeug calls for action here (also see
>>>>>> https://issues.apache.org/jira/browse/AIRFLOW-4903 ). We again ended
>>>> up
>>>>>> with released Airflow version that cannot be installed easily because
>>>> of
>>>>>> some transient dependencies upgrade.
>>>>>> 
>>>>>> I think this is something we should at least consider for 2.*
>>>> version.
>>>>>> 
>>>>>> The problem is that simply running 'pip install airflow==1.10.3' .
>>>> Right
>>>>>> now this will not work - you have to hack it and manually upgrade
>>> deps
>>>>>> (like https://github.com/godatadriven/whirl/issues/50).
>>>>>> 
>>>>>> I really do not like that changes beyond our control impact the
>>> release
>>>>> we
>>>>>> already made (and is out there in pip).
>>>>>> 
>>>>>> I've read recently the nice writeup
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> https://docs.google.com/document/d/1x_VrNtXCup75qA3glDd2fQOB2TakldwjKZ6pXaAjAfg/edit
>>>>>> about
>>>>>> Python Dependency problems and I think it's the only solution to pin
>>>> the
>>>>>> "core" packages. This likely means that we have to be ready to
>>> release
>>>>>> sub-releases with security dependencies updated (like 1.10.4.1 maybe
>>> or
>>>>>> change semantics a bit to more semver and start releasing 2.0.0-
>>> 2.1.0
>>>>> and
>>>>>> then release security updates as 2.0.1 etc. If those 2.0.1 etc are
>>>>> released
>>>>>> only because of dependency updates/security bugfixes and some
>>> critical
>>>>>> problems, and if we automate it - I don't think this would be a great
>>>>>> problem to release those security-patched versions. We can have
>>>> services
>>>>>> like pyup (https://pyup.io/) or even github itself monitor
>>>> dependencies
>>>>>> for
>>>>>> us and create PRs automatically to update them.
>>>>>> 
>>>>>> Would someone actually complain if any of the "core" packages
>>>>>> (install_requires + devel) below got pinned ? I am not sure if that
>>>> would
>>>>>> be a big problem for anyone, and even if you need (in your operator)
>>>> some
>>>>>> newer version - you can always upgrade it afterwards and ignore the
>>>> fact
>>>>>> that airflow has it pinned.
>>>>>> 
>>>>>> Here are the dependencies that are the "core" ones:
>>>>>> 
>>>>>> install_requires:
>>>>>> 
>>>>>>   -             'alembic',
>>>>>>   -             'cached_property',
>>>>>>   -             'configparser',
>>>>>>   -             'croniter',
>>>>>>   -             'dill',
>>>>>>   -             'dumb-ini',
>>>>>>   -             'flask',
>>>>>>   -             'flask-appbuilder',
>>>>>>   -             'flask-caching',
>>>>>>   -             'flask-login',
>>>>>>   -             'flask-swagger',
>>>>>>   -             'flask-wtf',
>>>>>>   -             'funcsigs',
>>>>>>   -             'gitpython',
>>>>>>   -             'gunicorn',
>>>>>>   -             'iso8601',
>>>>>>   -             'json-merge-patch',
>>>>>>   -             'jinja2',
>>>>>>   -             'lazy_object_proxy',
>>>>>>   -             'markdown',
>>>>>>   -             'pendulum',
>>>>>>   -             'psutil',
>>>>>>   -             'pygments',
>>>>>>   -             'python-daemon',
>>>>>>   -             'python-dateutil',
>>>>>>   -             'requests',
>>>>>>   -             'setproctitle',
>>>>>>   -             'sqlalchemy',
>>>>>>   -             'tabulate',
>>>>>>   -             'tenacity',
>>>>>>   -             'text-unidecode',
>>>>>>   -             'thrift',
>>>>>>   -             'tzlocal',
>>>>>>   -             'unicodecsv',
>>>>>>   -             'zope.deprecation',
>>>>>> 
>>>>>> Devel:
>>>>>> 
>>>>>>   -     'beautifulsoup4',
>>>>>>   -     'click',
>>>>>>   -     'codecov',
>>>>>>   -     'flake8',
>>>>>>   -     'freezegun',
>>>>>>   -     'ipdb',
>>>>>>   -     'jira',
>>>>>>   -     'mongomock',
>>>>>>   -     'moto',
>>>>>>   -     'nose',
>>>>>>   -     'nose-ignore-docstring',
>>>>>>   -     'nose-timer',
>>>>>>   -     'parameterized',
>>>>>>   -     'paramiko',
>>>>>>   -     'pylint',
>>>>>>   -     'pysftp',
>>>>>>   -     'pywinrm',
>>>>>>   -     'qds-sdk', -> should be moved to separate qubole
>>>>>>   -     'rednose',
>>>>>>   -     'requests_mock',
>>>>>> 
>>>>>> J.
>>>>>> 
>>>>>> 
>>>>>> On Mon, Jun 24, 2019 at 3:03 PM Ash Berlin-Taylor <a...@apache.org>
>>>>> wrote:
>>>>>> 
>>>>>>> Another suggestion someone (I forget who, sorry) had was that we
>>>> could
>>>>>>> maintain a full list of _fully tested and supported versions_ (i.e.
>>>> the
>>>>>>> output of `pip freeze`) - that way people _can_ use other versions
>>> if
>>>>>> they
>>>>>>> want, but we can at least say "use these versions".
>>>>>>> 
>>>>>>> I'm not 100% sure how that would work in practice though, but
>>> having
>>>> it
>>>>>> be
>>>>>>> some list we can update without having to do a release is crucial.
>>>>>>> 
>>>>>>> -ash
>>>>>>> 
>>>>>>>> On 24 Jun 2019, at 10:00, Jarek Potiuk <jarek.pot...@polidea.com
>>>> 
>>>>>> wrote:
>>>>>>>> 
>>>>>>>> With the recent Sphinx problem
>>>>>>>> <https://issues.apache.org/jira/browse/AIRFLOW-4841>- we got
>>> back
>>>>> our
>>>>>>>> old-time enemy. In this case sphinx autoapi has been released
>>>>> yesterday
>>>>>>> to
>>>>>>>> 1.1.0 version and it started to caused our master to fail,
>>> causing
>>>>> kind
>>>>>>> of
>>>>>>>> emergency rush to fix as master (and all PRs based on it) would
>>> be
>>>>>>> broken.
>>>>>>>> 
>>>>>>>> I think I have a proposal that can address similar problems
>>> without
>>>>>>> pushing
>>>>>>>> us in emergency mode.
>>>>>>>> 
>>>>>>>> *Context:*
>>>>>>>> 
>>>>>>>> I wanted to return back to an old discussion - how we can avoid
>>>>>> unrelated
>>>>>>>> dependencies to cause emergencies on our side where we have to
>>>>> quickly
>>>>>>>> solve such dependency issues when they break our builds.
>>>>>>>> 
>>>>>>>> *Change coming soon:*
>>>>>>>> 
>>>>>>>> The problems will be partially addressed with last stage of
>>> AIP-10
>>>> (
>>>>>>>> https://github.com/apache/airflow/pull/4938 - pending only
>>>>> Kubernetes
>>>>>>> test
>>>>>>>> fix). It effectively freezes installed dependencies as cached
>>> layer
>>>>> of
>>>>>>>> docker image for builds which do not touch setup.py - so in case
>>>>>> setup.py
>>>>>>>> does not change, the dependencies will not be updated to latest
>>>> ones.
>>>>>>>> 
>>>>>>>> *Possibly even better long-term solution:*
>>>>>>>> 
>>>>>>>> I think we should address it a bit better. We had a number of
>>>>>> discussions
>>>>>>>> on pinning dependencies (for example here
>>>>>>>> <
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> https://lists.apache.org/thread.html/9e775d11cce6a3473cbe31908a17d7840072125be2dff020ff59a441@%3Cdev.airflow.apache.org%3E
>>>>>>>> ).
>>>>>>>> I think the conclusion there was that airflow is both "library"
>>>> (for
>>>>>>> DAGs)
>>>>>>>> - where dependencies should not be pinned and end-product (where
>>>> the
>>>>>>>> dependencies should be pinned). So it's a bit catch-22 situation.
>>>>>>>> 
>>>>>>>> Looking at the problem with Sphinx however It came to me that
>>> maybe
>>>>> we
>>>>>>> can
>>>>>>>> use hybrid solution. We pin all the libraries (like Sphinx or
>>>> Flask)
>>>>>> that
>>>>>>>> are used to merely build and test the end product but we do not
>>> pin
>>>>> the
>>>>>>>> libraries (like google-api) which are used in the context of
>>>> library
>>>>>>>> (writing the operators and DAGs).
>>>>>>>> 
>>>>>>>> What do you think? Maybe that will be the best of both worlds ?
>>>> Then
>>>>> we
>>>>>>>> would have to classify the dependencies and maybe restructure
>>>>> setup.py
>>>>>>>> slightly to have an obvious distinction between those two types
>>> of
>>>>>>>> dependencies.
>>>>>>>> 
>>>>>>>> J.
>>>>>>>> 
>>>>>>>> --
>>>>>>>> 
>>>>>>>> Jarek Potiuk
>>>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>>>> 
>>>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> Jarek Potiuk
>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>>> 
>>>>>> M: +48 660 796 129 <+48660796129>
>>>>>> [image: Polidea] <https://www.polidea.com/>
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> 
>>>> Jarek Potiuk
>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>> 
>>>> M: +48 660 796 129 <+48660796129>
>>>> [image: Polidea] <https://www.polidea.com/>
>>>> 
>>> 
>> 
>> 
>> --
>> 
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | Principal 
>> Software Engineer
>> 
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>

Reply via email to