I recommend keeping no upper-bounds as we recommend using constraints

On Sun, Jan 23, 2022 at 10:05 PM Jarek Potiuk <[email protected]> wrote:

> Just to illustrate it better. Here is an extremely good example from today:
>
> Our main build started to fail today (
> https://github.com/apache/airflow/runs/4913148486?check_suite_focus=true)
> because Pandas released a new 1.4.0 version last  night:
> https://pypi.org/project/pandas/1.4.0/
> The root cause of the failure was that Pandas 1.4.0 requires SqlAlchemy
> 1.4.0 or above. But surprisingly -  there is no hard limit in the
> released Pandas. What's more - Pandas does not even mention that
> it **might** need sqlalchemy at all (and in version >=1.4.0 !).
>
> What happens is that in runtime, during running our tests we get this
> error:
> `Pandas requires version '1.4.0' or newer of 'sqlalchemy' (version
> '1.3.24' currently installed).`
>
> We already had a limitation (not documented why) on Pandas
>
> pandas_requirement = 'pandas>=0.17.1, <2.0'
>
> However this `<2.0` was completely useless in this case, because it's 1.4
> version that broke parts of Airflow.
> We assumed that the <2.0 limitation will protect us (apparently) but it
> did not. However, the failure in tests protected our constraints from being
> upgraded and in our "main" pandas is still 1.3.4. Our users who follow "use
> constraints" and those who use Airflow images, are fully protected against
> those kinds of problems.
>
> This particular problem will likely be solved in a few days when Flask
> Application Builder 3.4.4rc1 (
> https://pypi.org/project/Flask-AppBuilder/3.4.4rc1/|) will be released
> (It moves the upper bound limit for sqlalchemy to <1.5 and that was the
> only thing that blocked us from getting sqlalchemy 1.4).
>
> My point is that simply we do not know if any future version of any
> dependency will break Airflow. And any reasonable assumptions about that
> guessing from "Major" version is IMHO just wild guessing. And also by
> adding such limits - we are limiting our users to update to higher version
> of dependencies when it is harmless (very similarly as Flask Application
> Builder blocked us in this case from migration to sqlalchemy 1.4).
>
> The temporary fix (until FAB 3.4.4 is released) is here:
>
> https://github.com/apache/airflow/pull/21045
>
> I believe after FAB 3.4.4 is released we remove the fix we should just get
> rid of the pandas limitation. Constraints of ours protect our users when
> they install Airflow "according to our instructions with constraints" -
> which is the only official way of installing Airflow. We are protecting our
> users from those kind of events, but at the same time we should relax
> pretty much all the upper bounds to not block our users in the future in
> case they need to move to higher versions of dependencies released in the
> future that we have no idea if they will break, or not Airflow..
>
> Let me know your thoughts - I think this Pandas case is great to
> illustrate my point.
>
> J.
>
>
>
>
> On Sun, Jan 23, 2022 at 1:40 AM Jarek Potiuk <[email protected]> wrote:
>
>> Hello everyone,
>>
>> TL;DR; I think there is one thing about dependencies that we should have
>> some agreement on to make some common approach. Namely about using upper
>> bounds on our dependencies.
>> Should we set some rules on when we set upper-bounds on the deps ? What
>> rules should they be?
>>
>> Currently we use constraints to make sure Airflow in specific version can
>> be installed using "golden" set of dependencies. Those are part of our CI,
>> automatically updated and tests which makes it really nice as they are
>> "fixed" for installation of particular version but they continuously
>> upgrade (even across major versions) when they pass tests in CI.
>>
>> This all happens pretty much automatically by our CI, except the cases
>> where our dependencies are upper-limited. We occasionally do some
>> "setup.py", "setup.cfg" changes manually to bump some upper limits, but
>> this is more the result of some external request or "occasional" cleanup to
>> do it - rather than a regular process.
>>
>> Now, we have some dependencies that are upper-bound for good reasons and
>> they are documented. We also have a number of dependencies that are not
>> upper-limited. But I think this pretty inconsistent. Small excerpt from
>> setup.cfg:
>>
>>     # Logging is broken with itsdangerous > 2
>>     itsdangerous>=1.1.0, <2.0
>>     # Jinja2 3.1 will remove the 'autoescape' and 'with' extensions,
>> which would
>>     # break Flask 1.x, so we limit this for future compatibility. Remove
>> this
>>     # when bumping Flask to >=2.
>>     jinja2>=2.10.1,<3.1
>>     ...
>>     # python daemon crashes with 'socket operation on non-socket' for
>> python 3.8+ in version < 2.2.4
>>     # https://pagure.io/python-daemon/issue/34
>>     python-daemon>=2.2.4
>>     python-dateutil>=2.3, <3
>>     python-nvd3~=0.15.0
>>
>> * itsdangerous is upper limited and the reason is specified in the
>> comment (though we do not know when we could remove the limit)
>>
>> * Jinja is upper-limited and we know not only why but also when it can be
>> removed
>> * Python-daemon is not upper-limited but it has a comment why it is
>> "lower-limited" (which is pretty useless IMHO)
>> * python-dateutil is upper-limited but we do not know why
>> * python-nvd3 is also upper limited (~0.15.0 - limits it to any 0.15.x
>> version but 0.16 could not be installed)
>>
>> I think there are many inconsistencies and the way we treat dependency
>> limits is pretty inconsistent - we have no rules agreed.
>>
>> I would love to discuss how we can standardize it or at least set some
>> rules we can follow.
>>
>> My initial thoughts are:
>>
>> * we can (but do not have to) have lower bounds if we need to protect and
>> we know about some limitations, but we do not need to document those. Just
>> set the limit. The nice thing about lower bounds that they do not "decay"
>> over time. By default latest "eligible" version is installed by PIP (but of
>> course if other deps have conflicting upper bounds, keeping lower limits
>> when unnecessary is a problem.
>> * by default we should not have upper-bounds. We know it limits our users
>> and constraints + CI tests are nicely handling the scenario when things are
>> breaking.
>> * we should only introduce upper bounds if we know that there is breaking
>> change (or that it is very likely and "foreseen" - betas/rc2/discussions
>> about upcoming breaking changes in the dependencies
>> * when we introduce upper-bounds we should always specify why and what is
>> the condition to remove them
>>
>> WDYT?
>>
>> J.
>>
>>
>>
>>

Reply via email to