I am wondering if this will have a negative impact on installations
using tools that do not support constraints files like poetry or
pip-tools. These tools are not officially supported, but many users
may use them nonetheless, so I think it is worth considering before
making any changes that affect them. Unfortunately, I have too little
experience with them to say at the same time how your proposal will
affect it.

pon., 24 sty 2022 o 23:08 Jarek Potiuk <[email protected]> napisał(a):
>
> Hey Ash (and everyone).
>
> I think we are really close to some consensus, but we are coming from two 
> directions. I do agree there will be some of the deps that are more important 
> to keep "in-check" than the rest.
> I think it makes perfect sense to distinguish the dependencies we really 
> "astrongly feel" are important for airflow vs. those that "are important to 
> our users".
> I am perfectly ok with upper-bounding deps that we really think are important.
>
> But with >500 deps total for aifrflow (transitive) I think we should have. 
> shortlist of those we care only - and the rest should be upper-open by 
> default.
> And "short" is a good name, because those are the dependencies that we take 
> responsibility of manually upgrading when needed.
>
> I would be happy if we can come up wit the list that is really important and 
> "core" to airflow - and "really likely" to break things with new major 
> release. And those are the things we care more as an "application" that our 
> users care as "libraries". This is my subjective list (those are all things 
> taht we care :
>
> * sqlalchemy
> * alembic
> * flask (and all flask libs)
> * flask-app-builder
> * werkzeug  (yeah. not surprisingly - including last meetup I spoke about 
> Airflow, my previous presented mentioned that Werkzeug comes up as 
> "problematic case" and I was only able to confirm that :).
>
> + all those that we know break things (for example we know a bunch of google 
> deps <2.0.0 that we know are breaking things already - but we can easily 
> describe those)
>
> I think if we upper-bound those (and make appropriate comment in our 
> setup.py/cfg)  this would be pretty "good" setup. And having a shortlist of 
> those that we want to keep upper-bound makes sense and is manageable.
>
> WDYT?
>
> J
>
>
> On Mon, Jan 24, 2022 at 10:30 PM Collin McNulty 
> <[email protected]> wrote:
>>
>> Ash,
>>
>> I think your code snippet will result in getting the latest version of 
>> pandas even if there is an upper constraint on pandas in apache-airflow. I 
>> simulated with these commands on the latest pip available.
>>
>> ```
>> pip install apache-airflow "pandas<1.4"
>> pip freeze | grep pandas
>> pip install -U pandas
>> pip freeze | grep pandas
>> ```
>>
>> Collin McNulty
>>
>> On Mon, Jan 24, 2022 at 3:11 PM Ash Berlin-Taylor <[email protected]> wrote:
>>>
>>> To your inital thoughts:
>>> - lower bounds: yeah, introduce them when we need to, so a new package 
>>> could be added with no lower bound, and then a bound placed only when we 
>>> discover we need a minium version (include bug fix etc. PR author can 
>>> choose wether or not to have a lower bound initially.)
>>> - not have upper bound by default: see below
>>> - only introduce upper bound if breaking change: see below
>>> - specify reason for upper bound. yes, 100% to this
>>>
>>>
>>> I think you have created a strawman argument here to say that the 
>>> '<2.0'constraint is useless', as Panda's documentation[1] say they follow 
>>> SemVer so 2.0 is going to break _something_. It didn't help us with this 
>>> specific problem.
>>>
>>> As you mentioned (possibly only in slack) constraints protect against 
>>> initial install only, but not future in-place upgrades. Without an upper 
>>> constraint I could do
>>>
>>> ```
>>> pip install --constraint $FILE apache-airflow
>>> pip install -U pandas
>>> ```
>>>
>>> And then airflow could be broken. And being able to upgrade without 
>>> constraints files is the only way to apply a security upgrade to a module 
>>> (otherwise we'd have to update the constraint files for all supported 
>>> versions when ever any of our transitive dependencies has a security 
>>> update, not something we are in a position to do).
>>>
>>> I think it also depends how we use the dependency in Airflow -- some of 
>>> them are so core that we don't want anything to break, but others (such as 
>>> Pandas in this case) where our use of them in Airflow is actually fairly 
>>> superficial.
>>>
>>> So I think I would just tone down your middle suggestions slightly -- Upper 
>>> limits dependencies can be optional. I think the only place we really 
>>> differ is what is "foreseen".
>>>
>>> -ash
>>>
>>> [1]: https://pandas.pydata.org/docs/development/policies.html
>>>
>>>
>>> On Sun, Jan 23 2022 at 17:35:04 +0100, Jarek Potiuk <[email protected]> 
>>> wrote:
>>>
>>> Just to illustrate it better. Here is an extremely good example from today:
>>>
>>> Our main build started to fail today 
>>> (https://github.com/apache/airflow/runs/4913148486?check_suite_focus=true) 
>>> because Pandas released a new 1.4.0 version last  night: 
>>> https://pypi.org/project/pandas/1.4.0/
>>> The root cause of the failure was that Pandas 1.4.0 requires SqlAlchemy 
>>> 1.4.0 or above. But surprisingly -  there is no hard limit in the released 
>>> Pandas. What's more - Pandas does not even mention that it **might** need 
>>> sqlalchemy at all (and in version >=1.4.0 !).
>>>
>>> What happens is that in runtime, during running our tests we get this error:
>>> `Pandas requires version '1.4.0' or newer of 'sqlalchemy' (version '1.3.24' 
>>> currently installed).`
>>>
>>> We already had a limitation (not documented why) on Pandas
>>>
>>> pandas_requirement = 'pandas>=0.17.1, <2.0'
>>>
>>> However this `<2.0` was completely useless in this case, because it's 1.4 
>>> version that broke parts of Airflow.
>>> We assumed that the <2.0 limitation will protect us (apparently) but it did 
>>> not. However, the failure in tests protected our constraints from being 
>>> upgraded and in our "main" pandas is still 1.3.4. Our users who follow "use 
>>> constraints" and those who use Airflow images, are fully protected against 
>>> those kinds of problems.
>>>
>>> This particular problem will likely be solved in a few days when Flask 
>>> Application Builder 3.4.4rc1 
>>> (https://pypi.org/project/Flask-AppBuilder/3.4.4rc1/|) will be released (It 
>>> moves the upper bound limit for sqlalchemy to <1.5 and that was the only 
>>> thing that blocked us from getting sqlalchemy 1.4).
>>>
>>> My point is that simply we do not know if any future version of any 
>>> dependency will break Airflow. And any reasonable assumptions about that 
>>> guessing from "Major" version is IMHO just wild guessing. And also by 
>>> adding such limits - we are limiting our users to update to higher version 
>>> of dependencies when it is harmless (very similarly as Flask Application 
>>> Builder blocked us in this case from migration to sqlalchemy 1.4).
>>>
>>> The temporary fix (until FAB 3.4.4 is released) is here:
>>>
>>> https://github.com/apache/airflow/pull/21045
>>>
>>> I believe after FAB 3.4.4 is released we remove the fix we should just get 
>>> rid of the pandas limitation. Constraints of ours protect our users when 
>>> they install Airflow "according to our instructions with constraints" - 
>>> which is the only official way of installing Airflow. We are protecting our 
>>> users from those kind of events, but at the same time we should relax 
>>> pretty much all the upper bounds to not block our users in the future in 
>>> case they need to move to higher versions of dependencies released in the 
>>> future that we have no idea if they will break, or not Airflow..
>>>
>>> Let me know your thoughts - I think this Pandas case is great to illustrate 
>>> my point.
>>>
>>> J.
>>>
>>>
>>>
>>>
>>> On Sun, Jan 23, 2022 at 1:40 AM Jarek Potiuk <[email protected]> wrote:
>>>>
>>>> Hello everyone,
>>>>
>>>> TL;DR; I think there is one thing about dependencies that we should have 
>>>> some agreement on to make some common approach. Namely about using upper 
>>>> bounds on our dependencies.
>>>> Should we set some rules on when we set upper-bounds on the deps ? What 
>>>> rules should they be?
>>>>
>>>> Currently we use constraints to make sure Airflow in specific version can 
>>>> be installed using "golden" set of dependencies. Those are part of our CI, 
>>>> automatically updated and tests which makes it really nice as they are 
>>>> "fixed" for installation of particular version but they continuously 
>>>> upgrade (even across major versions) when they pass tests in CI.
>>>>
>>>> This all happens pretty much automatically by our CI, except the cases 
>>>> where our dependencies are upper-limited. We occasionally do some 
>>>> "setup.py", "setup.cfg" changes manually to bump some upper limits, but 
>>>> this is more the result of some external request or "occasional" cleanup 
>>>> to do it - rather than a regular process.
>>>>
>>>> Now, we have some dependencies that are upper-bound for good reasons and 
>>>> they are documented. We also have a number of dependencies that are not 
>>>> upper-limited. But I think this pretty inconsistent. Small excerpt from 
>>>> setup.cfg:
>>>>
>>>>     # Logging is broken with itsdangerous > 2
>>>>     itsdangerous>=1.1.0, <2.0
>>>>     # Jinja2 3.1 will remove the 'autoescape' and 'with' extensions, which 
>>>> would
>>>>     # break Flask 1.x, so we limit this for future compatibility. Remove 
>>>> this
>>>>     # when bumping Flask to >=2.
>>>>     jinja2>=2.10.1,<3.1
>>>>     ...
>>>>     # python daemon crashes with 'socket operation on non-socket' for 
>>>> python 3.8+ in version < 2.2.4
>>>>     # https://pagure.io/python-daemon/issue/34
>>>>     python-daemon>=2.2.4
>>>>     python-dateutil>=2.3, <3
>>>>     python-nvd3~=0.15.0
>>>>
>>>> * itsdangerous is upper limited and the reason is specified in the comment 
>>>> (though we do not know when we could remove the limit)
>>>>
>>>> * Jinja is upper-limited and we know not only why but also when it can be 
>>>> removed
>>>> * Python-daemon is not upper-limited but it has a comment why it is 
>>>> "lower-limited" (which is pretty useless IMHO)
>>>> * python-dateutil is upper-limited but we do not know why
>>>> * python-nvd3 is also upper limited (~0.15.0 - limits it to any 0.15.x 
>>>> version but 0.16 could not be installed)
>>>>
>>>> I think there are many inconsistencies and the way we treat dependency 
>>>> limits is pretty inconsistent - we have no rules agreed.
>>>>
>>>> I would love to discuss how we can standardize it or at least set some 
>>>> rules we can follow.
>>>>
>>>> My initial thoughts are:
>>>>
>>>> * we can (but do not have to) have lower bounds if we need to protect and 
>>>> we know about some limitations, but we do not need to document those. Just 
>>>> set the limit. The nice thing about lower bounds that they do not "decay" 
>>>> over time. By default latest "eligible" version is installed by PIP (but 
>>>> of course if other deps have conflicting upper bounds, keeping lower 
>>>> limits when unnecessary is a problem.
>>>> * by default we should not have upper-bounds. We know it limits our users 
>>>> and constraints + CI tests are nicely handling the scenario when things 
>>>> are breaking.
>>>> * we should only introduce upper bounds if we know that there is breaking 
>>>> change (or that it is very likely and "foreseen" - betas/rc2/discussions 
>>>> about upcoming breaking changes in the dependencies
>>>> * when we introduce upper-bounds we should always specify why and what is 
>>>> the condition to remove them
>>>>
>>>> WDYT?
>>>>
>>>> J.
>>>>
>>>>
>>>>

Reply via email to