I am wondering if this will have a negative impact on installations using tools that do not support constraints files like poetry or pip-tools. These tools are not officially supported, but many users may use them nonetheless, so I think it is worth considering before making any changes that affect them. Unfortunately, I have too little experience with them to say at the same time how your proposal will affect it.
pon., 24 sty 2022 o 23:08 Jarek Potiuk <[email protected]> napisał(a): > > Hey Ash (and everyone). > > I think we are really close to some consensus, but we are coming from two > directions. I do agree there will be some of the deps that are more important > to keep "in-check" than the rest. > I think it makes perfect sense to distinguish the dependencies we really > "astrongly feel" are important for airflow vs. those that "are important to > our users". > I am perfectly ok with upper-bounding deps that we really think are important. > > But with >500 deps total for aifrflow (transitive) I think we should have. > shortlist of those we care only - and the rest should be upper-open by > default. > And "short" is a good name, because those are the dependencies that we take > responsibility of manually upgrading when needed. > > I would be happy if we can come up wit the list that is really important and > "core" to airflow - and "really likely" to break things with new major > release. And those are the things we care more as an "application" that our > users care as "libraries". This is my subjective list (those are all things > taht we care : > > * sqlalchemy > * alembic > * flask (and all flask libs) > * flask-app-builder > * werkzeug (yeah. not surprisingly - including last meetup I spoke about > Airflow, my previous presented mentioned that Werkzeug comes up as > "problematic case" and I was only able to confirm that :). > > + all those that we know break things (for example we know a bunch of google > deps <2.0.0 that we know are breaking things already - but we can easily > describe those) > > I think if we upper-bound those (and make appropriate comment in our > setup.py/cfg) this would be pretty "good" setup. And having a shortlist of > those that we want to keep upper-bound makes sense and is manageable. > > WDYT? > > J > > > On Mon, Jan 24, 2022 at 10:30 PM Collin McNulty > <[email protected]> wrote: >> >> Ash, >> >> I think your code snippet will result in getting the latest version of >> pandas even if there is an upper constraint on pandas in apache-airflow. I >> simulated with these commands on the latest pip available. >> >> ``` >> pip install apache-airflow "pandas<1.4" >> pip freeze | grep pandas >> pip install -U pandas >> pip freeze | grep pandas >> ``` >> >> Collin McNulty >> >> On Mon, Jan 24, 2022 at 3:11 PM Ash Berlin-Taylor <[email protected]> wrote: >>> >>> To your inital thoughts: >>> - lower bounds: yeah, introduce them when we need to, so a new package >>> could be added with no lower bound, and then a bound placed only when we >>> discover we need a minium version (include bug fix etc. PR author can >>> choose wether or not to have a lower bound initially.) >>> - not have upper bound by default: see below >>> - only introduce upper bound if breaking change: see below >>> - specify reason for upper bound. yes, 100% to this >>> >>> >>> I think you have created a strawman argument here to say that the >>> '<2.0'constraint is useless', as Panda's documentation[1] say they follow >>> SemVer so 2.0 is going to break _something_. It didn't help us with this >>> specific problem. >>> >>> As you mentioned (possibly only in slack) constraints protect against >>> initial install only, but not future in-place upgrades. Without an upper >>> constraint I could do >>> >>> ``` >>> pip install --constraint $FILE apache-airflow >>> pip install -U pandas >>> ``` >>> >>> And then airflow could be broken. And being able to upgrade without >>> constraints files is the only way to apply a security upgrade to a module >>> (otherwise we'd have to update the constraint files for all supported >>> versions when ever any of our transitive dependencies has a security >>> update, not something we are in a position to do). >>> >>> I think it also depends how we use the dependency in Airflow -- some of >>> them are so core that we don't want anything to break, but others (such as >>> Pandas in this case) where our use of them in Airflow is actually fairly >>> superficial. >>> >>> So I think I would just tone down your middle suggestions slightly -- Upper >>> limits dependencies can be optional. I think the only place we really >>> differ is what is "foreseen". >>> >>> -ash >>> >>> [1]: https://pandas.pydata.org/docs/development/policies.html >>> >>> >>> On Sun, Jan 23 2022 at 17:35:04 +0100, Jarek Potiuk <[email protected]> >>> wrote: >>> >>> Just to illustrate it better. Here is an extremely good example from today: >>> >>> Our main build started to fail today >>> (https://github.com/apache/airflow/runs/4913148486?check_suite_focus=true) >>> because Pandas released a new 1.4.0 version last night: >>> https://pypi.org/project/pandas/1.4.0/ >>> The root cause of the failure was that Pandas 1.4.0 requires SqlAlchemy >>> 1.4.0 or above. But surprisingly - there is no hard limit in the released >>> Pandas. What's more - Pandas does not even mention that it **might** need >>> sqlalchemy at all (and in version >=1.4.0 !). >>> >>> What happens is that in runtime, during running our tests we get this error: >>> `Pandas requires version '1.4.0' or newer of 'sqlalchemy' (version '1.3.24' >>> currently installed).` >>> >>> We already had a limitation (not documented why) on Pandas >>> >>> pandas_requirement = 'pandas>=0.17.1, <2.0' >>> >>> However this `<2.0` was completely useless in this case, because it's 1.4 >>> version that broke parts of Airflow. >>> We assumed that the <2.0 limitation will protect us (apparently) but it did >>> not. However, the failure in tests protected our constraints from being >>> upgraded and in our "main" pandas is still 1.3.4. Our users who follow "use >>> constraints" and those who use Airflow images, are fully protected against >>> those kinds of problems. >>> >>> This particular problem will likely be solved in a few days when Flask >>> Application Builder 3.4.4rc1 >>> (https://pypi.org/project/Flask-AppBuilder/3.4.4rc1/|) will be released (It >>> moves the upper bound limit for sqlalchemy to <1.5 and that was the only >>> thing that blocked us from getting sqlalchemy 1.4). >>> >>> My point is that simply we do not know if any future version of any >>> dependency will break Airflow. And any reasonable assumptions about that >>> guessing from "Major" version is IMHO just wild guessing. And also by >>> adding such limits - we are limiting our users to update to higher version >>> of dependencies when it is harmless (very similarly as Flask Application >>> Builder blocked us in this case from migration to sqlalchemy 1.4). >>> >>> The temporary fix (until FAB 3.4.4 is released) is here: >>> >>> https://github.com/apache/airflow/pull/21045 >>> >>> I believe after FAB 3.4.4 is released we remove the fix we should just get >>> rid of the pandas limitation. Constraints of ours protect our users when >>> they install Airflow "according to our instructions with constraints" - >>> which is the only official way of installing Airflow. We are protecting our >>> users from those kind of events, but at the same time we should relax >>> pretty much all the upper bounds to not block our users in the future in >>> case they need to move to higher versions of dependencies released in the >>> future that we have no idea if they will break, or not Airflow.. >>> >>> Let me know your thoughts - I think this Pandas case is great to illustrate >>> my point. >>> >>> J. >>> >>> >>> >>> >>> On Sun, Jan 23, 2022 at 1:40 AM Jarek Potiuk <[email protected]> wrote: >>>> >>>> Hello everyone, >>>> >>>> TL;DR; I think there is one thing about dependencies that we should have >>>> some agreement on to make some common approach. Namely about using upper >>>> bounds on our dependencies. >>>> Should we set some rules on when we set upper-bounds on the deps ? What >>>> rules should they be? >>>> >>>> Currently we use constraints to make sure Airflow in specific version can >>>> be installed using "golden" set of dependencies. Those are part of our CI, >>>> automatically updated and tests which makes it really nice as they are >>>> "fixed" for installation of particular version but they continuously >>>> upgrade (even across major versions) when they pass tests in CI. >>>> >>>> This all happens pretty much automatically by our CI, except the cases >>>> where our dependencies are upper-limited. We occasionally do some >>>> "setup.py", "setup.cfg" changes manually to bump some upper limits, but >>>> this is more the result of some external request or "occasional" cleanup >>>> to do it - rather than a regular process. >>>> >>>> Now, we have some dependencies that are upper-bound for good reasons and >>>> they are documented. We also have a number of dependencies that are not >>>> upper-limited. But I think this pretty inconsistent. Small excerpt from >>>> setup.cfg: >>>> >>>> # Logging is broken with itsdangerous > 2 >>>> itsdangerous>=1.1.0, <2.0 >>>> # Jinja2 3.1 will remove the 'autoescape' and 'with' extensions, which >>>> would >>>> # break Flask 1.x, so we limit this for future compatibility. Remove >>>> this >>>> # when bumping Flask to >=2. >>>> jinja2>=2.10.1,<3.1 >>>> ... >>>> # python daemon crashes with 'socket operation on non-socket' for >>>> python 3.8+ in version < 2.2.4 >>>> # https://pagure.io/python-daemon/issue/34 >>>> python-daemon>=2.2.4 >>>> python-dateutil>=2.3, <3 >>>> python-nvd3~=0.15.0 >>>> >>>> * itsdangerous is upper limited and the reason is specified in the comment >>>> (though we do not know when we could remove the limit) >>>> >>>> * Jinja is upper-limited and we know not only why but also when it can be >>>> removed >>>> * Python-daemon is not upper-limited but it has a comment why it is >>>> "lower-limited" (which is pretty useless IMHO) >>>> * python-dateutil is upper-limited but we do not know why >>>> * python-nvd3 is also upper limited (~0.15.0 - limits it to any 0.15.x >>>> version but 0.16 could not be installed) >>>> >>>> I think there are many inconsistencies and the way we treat dependency >>>> limits is pretty inconsistent - we have no rules agreed. >>>> >>>> I would love to discuss how we can standardize it or at least set some >>>> rules we can follow. >>>> >>>> My initial thoughts are: >>>> >>>> * we can (but do not have to) have lower bounds if we need to protect and >>>> we know about some limitations, but we do not need to document those. Just >>>> set the limit. The nice thing about lower bounds that they do not "decay" >>>> over time. By default latest "eligible" version is installed by PIP (but >>>> of course if other deps have conflicting upper bounds, keeping lower >>>> limits when unnecessary is a problem. >>>> * by default we should not have upper-bounds. We know it limits our users >>>> and constraints + CI tests are nicely handling the scenario when things >>>> are breaking. >>>> * we should only introduce upper bounds if we know that there is breaking >>>> change (or that it is very likely and "foreseen" - betas/rc2/discussions >>>> about upcoming breaking changes in the dependencies >>>> * when we introduce upper-bounds we should always specify why and what is >>>> the condition to remove them >>>> >>>> WDYT? >>>> >>>> J. >>>> >>>> >>>>
