Update - It seems that we won't need the -pinned version eventually. I realized that we need to have slightly different requirements for different python versions.
I just added PR for that: https://github.com/apache/airflow/pull/7841 I also found out (during production image exercise) that we can install airflow predictably in a very simple way (once we release the requirements in 1.10.10): pip install apache-airflow[gcp]==1.10.10 --constraint https://raw.githubusercontent.com/apache/airflow/1.10.10/requirements/requirements-python3.7.txt I think this is simple enough to be used as installation method. I added it to the documentation and I think I am ok with dropping -pinned package altogether. J. On Sun, Mar 22, 2020 at 10:15 AM Jarek Potiuk <jarek.pot...@polidea.com> wrote: > Yesterday we had another master breakage - this time from elasticsearch > releasing MINOR version 7.6 breaking our builds (not it was MINOR version > so should be compatible .... it was not for us). I fixed it quickly > yesterday by limiting it to < 7.6 but for me - this is quite clear that > trying to rely on SemVer being followed by others is a futile effort (at > least in python's world). > > The theory is nice, but it breaks in practice. And it's not really a fault > of the library maintainers. It's simply sometimes not so easy to see how > your APIs are used - and in Python, you cannot prevent using stuff that you > think is an internal detail. This is what happened in elasticsearch case > yesterday - apparently, our plugin was using an "internal" API unknowingly > and some parameters from that API were dropped during refactoring of > elasticsearch library. > > My observation (it's anecdotal though) is that the COVID-19 situation made > people have more time, fewer distractions, fewer things to do, and we > have higher frequency of OSS packages being released recently so we > should protect a bit from more often breakages. > > I think learning from yesterday is: > > * we should merge the requirements.txt solution quickly to prevent further > breakages (I am reading and testing it now) - I think everyone agrees it's > good to have it > * I think we can continue discussing whether apache-airflow-pinned package > should be released or not. I can leave the code building the package but we > can decide about it after some more discussion > > Does it sound good? > > J > > > > On Fri, Mar 20, 2020 at 2:47 PM Jarek Potiuk <jarek.pot...@polidea.com> > wrote: > >> And rebased it right now and fixed automated requirements update. >> >> On Fri, Mar 20, 2020 at 2:28 PM Jarek Potiuk <jarek.pot...@polidea.com> >> wrote: >> >>> Ah BTW. I just noticed that for some reason I pasted an old PR earlier >>> in the thread :(. >>> This is the one with requirements.txt I am talking about: >>> https://github.com/apache/airflow/pull/7730 >>> >>> On Fri, Mar 20, 2020 at 2:26 PM Jarek Potiuk <jarek.pot...@polidea.com> >>> wrote: >>> >>>> Nope. Not blocking. I can work with my branch just requirements.txt is >>>> enough for that :) >>>> >>>> I think the problem with semver is that it is loosely followed - we had >>>> a number of breakages in the past with minor version upgrades :(. >>>> >>>> J. >>>> >>>> >>>> >>>> On Fri, Mar 20, 2020 at 1:27 PM Kaxil Naik <kaxiln...@gmail.com> wrote: >>>> >>>>> Thanks for the detailed explanation Jarek. >>>>> >>>>> How about we have an upper limit for all our dependencies, example >>>>> instead >>>>> of "google-cloud-storage>=1.16", we have >>>>> "google-cloud-storage>=1.16,<2.0" ? >>>>> >>>>> If a dependency breaks compatibility in minor versions, we can't do >>>>> anything about it but if they follow SemVer, we should be safe and the >>>>> first-time installers would have a non-breaking package. WDYT? >>>>> >>>>> Btw I hope this is not blocking you in building a production image as I >>>>> think requirements.txt is solving that? Please let me know if it is >>>>> blocking. >>>>> >>>>> PS: I am also just dumping my ideas to solve this issue. Love to hear >>>>> what >>>>> others think too. >>>>> >>>>> Regards, >>>>> Kaxil >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, Mar 19, 2020 at 2:43 PM Jarek Potiuk <jarek.pot...@polidea.com >>>>> > >>>>> wrote: >>>>> >>>>> > I think we have similar understanding. But let me just clarify >>>>> because I >>>>> > think we think about we think about solving two different problems >>>>> > My proposal is not solving all problems with dependencies - quite the >>>>> > contrary, I want to solve just one specific "repeatability" problem >>>>> - read >>>>> > on :).. >>>>> > >>>>> > 1. A potential source of confusion: using "-pinned" for >>>>> installation but >>>>> > > using "non-pinned" for DAG development. >>>>> > > >>>>> > >>>>> > This could be confusing indeed - but they are the same in fact - >>>>> > just deps might be different over time. >>>>> > >>>>> > 2. Most of the users would still try to install "apache-airflow" >>>>> package >>>>> > > that might have been broken for example because of a dependency >>>>> > release, >>>>> > > either way, we would still have to suggest them to use "pinned" >>>>> > version >>>>> > > >>>>> > >>>>> > True. I thought we might describe it in the README and make it >>>>> prominently >>>>> > explained. Usually people look at the readme in PyPI when they are >>>>> > installing >>>>> > stuff and it does not work: https://pypi.org/project/apache-airflow/ >>>>> . >>>>> > >>>>> > Also - we could of course explain how to use requirements.txt from >>>>> the >>>>> > released >>>>> > version when they are installing it. That would be an extra friction >>>>> point >>>>> > though >>>>> > and maybe having "always installable" version of airflow is a better >>>>> > choice. >>>>> > >>>>> > 3. If they install "pinned" version, it is no longer a library >>>>> again, >>>>> > > that is users won't be able to use new NumPy release or >>>>> matplotlib for >>>>> > > example. In which case we are just circling back to the same >>>>> problem, >>>>> > > "either we risk broken package" while releasing or we risk >>>>> potentially >>>>> > > incompatible versions. >>>>> > > >>>>> > >>>>> > Yep. But maybe it's just a question of naming. Maybe even we could >>>>> name >>>>> > this package differently to indicate that this version is a way to >>>>> quickly >>>>> > install >>>>> > airflow but not to do any serious development with it. >>>>> > >>>>> > So speaking about THE problem I want to solve with the >>>>> > requirements.txt and apache-airflow-pinned package: >>>>> > >>>>> > I really only want to solve "first-time-user" experience here - >>>>> nothing >>>>> > more. I >>>>> > definitely do not want to replace the current installation method for >>>>> > experienced >>>>> > users - for them using --constraint requirements.txt is exactly what >>>>> they >>>>> > need. >>>>> > The only problem I am trying to solve with that is "repeatability" of >>>>> > installation. >>>>> > >>>>> > Maybe "apache-airflow-quickinstall" or something like that would be >>>>> better >>>>> > than "apache-airflow-pinned" or "apache-airflow-repeatable-install" >>>>> or >>>>> > something like that. I think about it as a "flavour" of ariflow >>>>> rather than >>>>> > anything else. I even originally implemented it as [pinned] extra >>>>> where I >>>>> > pinned all requirements. Unfortunately I found that if you have >>>>> > main requirement without limits, adding the same requirement as >>>>> extra with >>>>> > == does not make it pinned :(. That was my original plan. >>>>> > >>>>> > >>>>> > > Btw I have been on "we should have pinned dependency" camp as >>>>> Airflow >>>>> > > should definitely install without breaking since day-1 but I think >>>>> a >>>>> > > separate "-pinned" package won't solve that issue. >>>>> > > >>>>> > >>>>> > Ah yeah we went the same route. I do not think we can solve the >>>>> > "library vs. app" problem easily. This is a bit of >>>>> "eat-and-have-cake" >>>>> > at the same time. I know people have problems >>>>> > with conflicting dependencies when they are trying to install >>>>> libraries >>>>> > with different requirements. And I am not even trying to solve that >>>>> > problem now. Not even close. This requires some other solution >>>>> > (for example separate virtualenvs with different dependencies >>>>> > build from wheels on per-task basis). But that's something much >>>>> further >>>>> > in the future (if at all). >>>>> > >>>>> > >>>>> > > >>>>> > > WDYT? Also please do let me know if I have misunderstood something >>>>> > > (definitely possible :D). >>>>> > > >>>>> > > Regards, >>>>> > > Kaxil >>>>> > >>>>> >>>> >>>> >>>> -- >>>> >>>> Jarek Potiuk >>>> Polidea <https://www.polidea.com/> | Principal Software Engineer >>>> >>>> M: +48 660 796 129 <+48660796129> >>>> [image: Polidea] <https://www.polidea.com/> >>>> >>>> >>> >>> -- >>> >>> Jarek Potiuk >>> Polidea <https://www.polidea.com/> | Principal Software Engineer >>> >>> M: +48 660 796 129 <+48660796129> >>> [image: Polidea] <https://www.polidea.com/> >>> >>> >> >> -- >> >> Jarek Potiuk >> Polidea <https://www.polidea.com/> | Principal Software Engineer >> >> M: +48 660 796 129 <+48660796129> >> [image: Polidea] <https://www.polidea.com/> >> >> > > -- > > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] <https://www.polidea.com/> > > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>