Hey Jarek, sounds good, but actually I would probably go with pinning everything by default and have a "Dependency Bot" testing new releases of packages. But regarding of the big amount of computing (=costs) we already have by running our ci pipeline we cannot set up a Dependency Bot at the moment, right? :/
So for the beginning I think sth like `all-pinned` works :) Kind regards, Felix Am 01/08/2019 um 19:05 schrieb Jarek Potiuk: > Hello Everyone, > > Just to revive the thread - we had a discussion with Ash today after > today's small "spanner" drama, and we came with a possible solution. > > This is something we yet have to try but it seems that it should be > possible to generate additional "pinned" extras (pinned, gcp_api-pinned > etc.) - it could also be "frozen" instead of "pinned" if the name sounds > better. > > This way you would be able to run: > > - `pip install airflow==1.10.4[all-pinned]` > - `pip-install airflow==1.10.4[gcp_api-pinned]' > - ... > > This way - it will always work no matter if new dependencies are released. > It will install the "frozen" version of dependencies that we know work for > sure. We could update the documentation to add this is as the recommended > method of standalone installation. Then if you need some other set of > dependencies (newer) you could have a custom pip install to fix certain > dependencies. > > What do you think? Would that work for the users of airflow ? > > J. > > On Tue, Jul 9, 2019 at 9:06 PM Driesprong, Fokko <fo...@driesprong.frl> > wrote: > >> Hi Jarek, >> >> Thanks for bringing this up. I certainly think this is a good idea. >> Unfortunately I'm in a plane right now so I'm unable to read the Google doc >> right now. >> >> GitHub recently acquired Dependabot which even supports automatic updates >> of dependencies. The we at least know when something breaks. The only >> problem right now is that this bot isn't allowed by the ASF policies since >> it requires write access to the repository. >> >> Regarding the symver. I do often see packages changing the public API in a >> minor update without any notice of deprecation. In this case it is >> impossible to make this watertight, but at least a more structured process >> using something like Dependabot would be a big plus! >> >> Cheers, Fokko >> >> >> >> Op zo 7 jul. 2019 om 11:34 schreef Jarek Potiuk <jarek.pot...@polidea.com> >> >>> All for deeper release-cycle discussion. I think after 1.10.4 is out we >>> should discuss/agree and document the release scheme we are going to use. >>> Semver and patching seems like a good idea. >>> >>> We have already quite an experience in backporting to 1.10.x branch and >> it >>> was surprisingly easy - small, focused commits help with that. And if we >>> limit patches to dependency updates and security fixes only, I don't see >> it >>> will be a lot of effort. >>> >>> Bot and automation is definitely something we should do. The pyup bot is >>> great - for one - to automate upgrades of pinned dependencies. We use it >> in >>> Oozie-to-airflow for quite some time and it takes almost no time to >> upgrade >>> deps regularly: >>> >>> >> https://github.com/GoogleCloudPlatform/oozie-to-airflow/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Aclosed+pyup >>> - those are automated PRs we got from pyup and it was just enough to do >>> "approve" + "merge" after we saw that all the tests passed with the new >>> version. >>> >>> J. >>> >>> >>> >>> On Sat, Jul 6, 2019 at 9:24 PM Philippe Gagnon <philgagn...@gmail.com> >>> wrote: >>> >>>> I am +1 on pinning core packages, even though this adds a bit of manual >>>> labor for maintenance. This latest werkzeug issue highlights why this >> is >>> a >>>> good idea. >>>> >>>> Also +1 on changing the versioning scheme to something more akin to >>> semver. >>>> The current scheme basically does not support patch-only releases and a >>>> 4-part version notation seems a bit much. Overall, I think that >>> patch-only >>>> releases would make the project healthier. >>>> >>>> Two points though: >>>> >>>> 1. I think that there should be a more in-depth discussion about >>> clarifying >>>> the release lifecycle policy. >>>> 2. This implies a lot more backport-related work, which is a bit of a >>>> burden since it is both tedious and boring. Perhaps we could look into >>>> having a bot help out with this (similar to >>>> https://github.com/miss-islington)? >>>> >>>> On Sat, Jul 6, 2019 at 1:04 PM Jarek Potiuk <jarek.pot...@polidea.com> >>>> wrote: >>>> >>>>> I think the recent case with werkzeug calls for action here (also see >>>>> https://issues.apache.org/jira/browse/AIRFLOW-4903 ). We again ended >>> up >>>>> with released Airflow version that cannot be installed easily because >>> of >>>>> some transient dependencies upgrade. >>>>> >>>>> I think this is something we should at least consider for 2.* >>> version. >>>>> The problem is that simply running 'pip install airflow==1.10.3' . >>> Right >>>>> now this will not work - you have to hack it and manually upgrade >> deps >>>>> (like https://github.com/godatadriven/whirl/issues/50). >>>>> >>>>> I really do not like that changes beyond our control impact the >> release >>>> we >>>>> already made (and is out there in pip). >>>>> >>>>> I've read recently the nice writeup >>>>> >>>>> >> https://docs.google.com/document/d/1x_VrNtXCup75qA3glDd2fQOB2TakldwjKZ6pXaAjAfg/edit >>>>> about >>>>> Python Dependency problems and I think it's the only solution to pin >>> the >>>>> "core" packages. This likely means that we have to be ready to >> release >>>>> sub-releases with security dependencies updated (like 1.10.4.1 maybe >> or >>>>> change semantics a bit to more semver and start releasing 2.0.0- >> 2.1.0 >>>> and >>>>> then release security updates as 2.0.1 etc. If those 2.0.1 etc are >>>> released >>>>> only because of dependency updates/security bugfixes and some >> critical >>>>> problems, and if we automate it - I don't think this would be a great >>>>> problem to release those security-patched versions. We can have >>> services >>>>> like pyup (https://pyup.io/) or even github itself monitor >>> dependencies >>>>> for >>>>> us and create PRs automatically to update them. >>>>> >>>>> Would someone actually complain if any of the "core" packages >>>>> (install_requires + devel) below got pinned ? I am not sure if that >>> would >>>>> be a big problem for anyone, and even if you need (in your operator) >>> some >>>>> newer version - you can always upgrade it afterwards and ignore the >>> fact >>>>> that airflow has it pinned. >>>>> >>>>> Here are the dependencies that are the "core" ones: >>>>> >>>>> install_requires: >>>>> >>>>> - 'alembic', >>>>> - 'cached_property', >>>>> - 'configparser', >>>>> - 'croniter', >>>>> - 'dill', >>>>> - 'dumb-ini', >>>>> - 'flask', >>>>> - 'flask-appbuilder', >>>>> - 'flask-caching', >>>>> - 'flask-login', >>>>> - 'flask-swagger', >>>>> - 'flask-wtf', >>>>> - 'funcsigs', >>>>> - 'gitpython', >>>>> - 'gunicorn', >>>>> - 'iso8601', >>>>> - 'json-merge-patch', >>>>> - 'jinja2', >>>>> - 'lazy_object_proxy', >>>>> - 'markdown', >>>>> - 'pendulum', >>>>> - 'psutil', >>>>> - 'pygments', >>>>> - 'python-daemon', >>>>> - 'python-dateutil', >>>>> - 'requests', >>>>> - 'setproctitle', >>>>> - 'sqlalchemy', >>>>> - 'tabulate', >>>>> - 'tenacity', >>>>> - 'text-unidecode', >>>>> - 'thrift', >>>>> - 'tzlocal', >>>>> - 'unicodecsv', >>>>> - 'zope.deprecation', >>>>> >>>>> Devel: >>>>> >>>>> - 'beautifulsoup4', >>>>> - 'click', >>>>> - 'codecov', >>>>> - 'flake8', >>>>> - 'freezegun', >>>>> - 'ipdb', >>>>> - 'jira', >>>>> - 'mongomock', >>>>> - 'moto', >>>>> - 'nose', >>>>> - 'nose-ignore-docstring', >>>>> - 'nose-timer', >>>>> - 'parameterized', >>>>> - 'paramiko', >>>>> - 'pylint', >>>>> - 'pysftp', >>>>> - 'pywinrm', >>>>> - 'qds-sdk', -> should be moved to separate qubole >>>>> - 'rednose', >>>>> - 'requests_mock', >>>>> >>>>> J. >>>>> >>>>> >>>>> On Mon, Jun 24, 2019 at 3:03 PM Ash Berlin-Taylor <a...@apache.org> >>>> wrote: >>>>>> Another suggestion someone (I forget who, sorry) had was that we >>> could >>>>>> maintain a full list of _fully tested and supported versions_ (i.e. >>> the >>>>>> output of `pip freeze`) - that way people _can_ use other versions >> if >>>>> they >>>>>> want, but we can at least say "use these versions". >>>>>> >>>>>> I'm not 100% sure how that would work in practice though, but >> having >>> it >>>>> be >>>>>> some list we can update without having to do a release is crucial. >>>>>> >>>>>> -ash >>>>>> >>>>>>> On 24 Jun 2019, at 10:00, Jarek Potiuk <jarek.pot...@polidea.com >>>>> wrote: >>>>>>> With the recent Sphinx problem >>>>>>> <https://issues.apache.org/jira/browse/AIRFLOW-4841>- we got >> back >>>> our >>>>>>> old-time enemy. In this case sphinx autoapi has been released >>>> yesterday >>>>>> to >>>>>>> 1.1.0 version and it started to caused our master to fail, >> causing >>>> kind >>>>>> of >>>>>>> emergency rush to fix as master (and all PRs based on it) would >> be >>>>>> broken. >>>>>>> I think I have a proposal that can address similar problems >> without >>>>>> pushing >>>>>>> us in emergency mode. >>>>>>> >>>>>>> *Context:* >>>>>>> >>>>>>> I wanted to return back to an old discussion - how we can avoid >>>>> unrelated >>>>>>> dependencies to cause emergencies on our side where we have to >>>> quickly >>>>>>> solve such dependency issues when they break our builds. >>>>>>> >>>>>>> *Change coming soon:* >>>>>>> >>>>>>> The problems will be partially addressed with last stage of >> AIP-10 >>> ( >>>>>>> https://github.com/apache/airflow/pull/4938 - pending only >>>> Kubernetes >>>>>> test >>>>>>> fix). It effectively freezes installed dependencies as cached >> layer >>>> of >>>>>>> docker image for builds which do not touch setup.py - so in case >>>>> setup.py >>>>>>> does not change, the dependencies will not be updated to latest >>> ones. >>>>>>> *Possibly even better long-term solution:* >>>>>>> >>>>>>> I think we should address it a bit better. We had a number of >>>>> discussions >>>>>>> on pinning dependencies (for example here >>>>>>> < >> https://lists.apache.org/thread.html/9e775d11cce6a3473cbe31908a17d7840072125be2dff020ff59a441@%3Cdev.airflow.apache.org%3E >>>>>>> ). >>>>>>> I think the conclusion there was that airflow is both "library" >>> (for >>>>>> DAGs) >>>>>>> - where dependencies should not be pinned and end-product (where >>> the >>>>>>> dependencies should be pinned). So it's a bit catch-22 situation. >>>>>>> >>>>>>> Looking at the problem with Sphinx however It came to me that >> maybe >>>> we >>>>>> can >>>>>>> use hybrid solution. We pin all the libraries (like Sphinx or >>> Flask) >>>>> that >>>>>>> are used to merely build and test the end product but we do not >> pin >>>> the >>>>>>> libraries (like google-api) which are used in the context of >>> library >>>>>>> (writing the operators and DAGs). >>>>>>> >>>>>>> What do you think? Maybe that will be the best of both worlds ? >>> Then >>>> we >>>>>>> would have to classify the dependencies and maybe restructure >>>> setup.py >>>>>>> slightly to have an obvious distinction between those two types >> of >>>>>>> dependencies. >>>>>>> >>>>>>> J. >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Jarek Potiuk >>>>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer >>>>>>> >>>>>>> M: +48 660 796 129 <+48660796129> >>>>>>> [image: Polidea] <https://www.polidea.com/> >>>>>> >>>>> -- >>>>> >>>>> Jarek Potiuk >>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer >>>>> >>>>> M: +48 660 796 129 <+48660796129> >>>>> [image: Polidea] <https://www.polidea.com/> >>>>> >>> >>> -- >>> >>> Jarek Potiuk >>> Polidea <https://www.polidea.com/> | Principal Software Engineer >>> >>> M: +48 660 796 129 <+48660796129> >>> [image: Polidea] <https://www.polidea.com/> >>> > > -- > > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] <https://www.polidea.com/>