Ash, I think your code snippet will result in getting the latest version of pandas even if there is an upper constraint on pandas in apache-airflow. I simulated with these commands on the latest pip available.
``` pip install apache-airflow "pandas<1.4" pip freeze | grep pandas pip install -U pandas pip freeze | grep pandas ``` Collin McNulty On Mon, Jan 24, 2022 at 3:11 PM Ash Berlin-Taylor <[email protected]> wrote: > To your inital thoughts: > - lower bounds: yeah, introduce them when we need to, so a new package > could be added with no lower bound, and then a bound placed only when we > discover we need a minium version (include bug fix etc. PR author can > choose wether or not to have a lower bound initially.) > - not have upper bound by default: see below > - only introduce upper bound if breaking change: see below > - specify reason for upper bound. yes, 100% to this > > > I think you have created a strawman argument here to say that the > '<2.0'constraint is useless', as Panda's documentation[1] say they follow > SemVer so 2.0 is going to break _something_. It didn't help us with this > specific problem. > > As you mentioned (possibly only in slack) constraints protect against > initial install only, but not future in-place upgrades. Without an upper > constraint I could do > > ``` > pip install --constraint $FILE apache-airflow > pip install -U pandas > ``` > > And then airflow could be broken. And being able to upgrade without > constraints files is the only way to apply a security upgrade to a module > (otherwise we'd have to update the constraint files for all supported > versions when ever any of our transitive dependencies has a security > update, not something we are in a position to do). > > I think it also depends how we use the dependency in Airflow -- some of > them are so core that we don't want anything to break, but others (such as > Pandas in this case) where our use of them in Airflow is actually fairly > superficial. > > So I think I would just tone down your middle suggestions slightly -- > Upper limits dependencies can be optional. I think the only place we really > differ is what is "foreseen". > > -ash > > [1]: https://pandas.pydata.org/docs/development/policies.html > > > On Sun, Jan 23 2022 at 17:35:04 +0100, Jarek Potiuk <[email protected]> > wrote: > > Just to illustrate it better. Here is an extremely good example from today: > > Our main build started to fail today ( > https://github.com/apache/airflow/runs/4913148486?check_suite_focus=true) > because Pandas released a new 1.4.0 version last night: > https://pypi.org/project/pandas/1.4.0/ > The root cause of the failure was that Pandas 1.4.0 requires SqlAlchemy > 1.4.0 or above. But surprisingly - there is no hard limit in the > released Pandas. What's more - Pandas does not even mention that > it **might** need sqlalchemy at all (and in version >=1.4.0 !). > > What happens is that in runtime, during running our tests we get this > error: > `Pandas requires version '1.4.0' or newer of 'sqlalchemy' (version > '1.3.24' currently installed).` > > We already had a limitation (not documented why) on Pandas > > pandas_requirement = 'pandas>=0.17.1, <2.0' > > However this `<2.0` was completely useless in this case, because it's 1.4 > version that broke parts of Airflow. > We assumed that the <2.0 limitation will protect us (apparently) but it > did not. However, the failure in tests protected our constraints from being > upgraded and in our "main" pandas is still 1.3.4. Our users who follow "use > constraints" and those who use Airflow images, are fully protected against > those kinds of problems. > > This particular problem will likely be solved in a few days when Flask > Application Builder 3.4.4rc1 ( > https://pypi.org/project/Flask-AppBuilder/3.4.4rc1/|) will be released > (It moves the upper bound limit for sqlalchemy to <1.5 and that was the > only thing that blocked us from getting sqlalchemy 1.4). > > My point is that simply we do not know if any future version of any > dependency will break Airflow. And any reasonable assumptions about that > guessing from "Major" version is IMHO just wild guessing. And also by > adding such limits - we are limiting our users to update to higher version > of dependencies when it is harmless (very similarly as Flask Application > Builder blocked us in this case from migration to sqlalchemy 1.4). > > The temporary fix (until FAB 3.4.4 is released) is here: > > https://github.com/apache/airflow/pull/21045 > > I believe after FAB 3.4.4 is released we remove the fix we should just get > rid of the pandas limitation. Constraints of ours protect our users when > they install Airflow "according to our instructions with constraints" - > which is the only official way of installing Airflow. We are protecting our > users from those kind of events, but at the same time we should relax > pretty much all the upper bounds to not block our users in the future in > case they need to move to higher versions of dependencies released in the > future that we have no idea if they will break, or not Airflow.. > > Let me know your thoughts - I think this Pandas case is great to > illustrate my point. > > J. > > > > > On Sun, Jan 23, 2022 at 1:40 AM Jarek Potiuk <[email protected]> wrote: > >> Hello everyone, >> >> TL;DR; I think there is one thing about dependencies that we should have >> some agreement on to make some common approach. Namely about using upper >> bounds on our dependencies. >> Should we set some rules on when we set upper-bounds on the deps ? What >> rules should they be? >> >> Currently we use constraints to make sure Airflow in specific version can >> be installed using "golden" set of dependencies. Those are part of our CI, >> automatically updated and tests which makes it really nice as they are >> "fixed" for installation of particular version but they continuously >> upgrade (even across major versions) when they pass tests in CI. >> >> This all happens pretty much automatically by our CI, except the cases >> where our dependencies are upper-limited. We occasionally do some >> "setup.py", "setup.cfg" changes manually to bump some upper limits, but >> this is more the result of some external request or "occasional" cleanup to >> do it - rather than a regular process. >> >> Now, we have some dependencies that are upper-bound for good reasons and >> they are documented. We also have a number of dependencies that are not >> upper-limited. But I think this pretty inconsistent. Small excerpt from >> setup.cfg: >> >> # Logging is broken with itsdangerous > 2 >> itsdangerous>=1.1.0, <2.0 >> # Jinja2 3.1 will remove the 'autoescape' and 'with' extensions, >> which would >> # break Flask 1.x, so we limit this for future compatibility. Remove >> this >> # when bumping Flask to >=2. >> jinja2>=2.10.1,<3.1 >> ... >> # python daemon crashes with 'socket operation on non-socket' for >> python 3.8+ in version < 2.2.4 >> # https://pagure.io/python-daemon/issue/34 >> python-daemon>=2.2.4 >> python-dateutil>=2.3, <3 >> python-nvd3~=0.15.0 >> >> * itsdangerous is upper limited and the reason is specified in the >> comment (though we do not know when we could remove the limit) >> >> * Jinja is upper-limited and we know not only why but also when it can be >> removed >> * Python-daemon is not upper-limited but it has a comment why it is >> "lower-limited" (which is pretty useless IMHO) >> * python-dateutil is upper-limited but we do not know why >> * python-nvd3 is also upper limited (~0.15.0 - limits it to any 0.15.x >> version but 0.16 could not be installed) >> >> I think there are many inconsistencies and the way we treat dependency >> limits is pretty inconsistent - we have no rules agreed. >> >> I would love to discuss how we can standardize it or at least set some >> rules we can follow. >> >> My initial thoughts are: >> >> * we can (but do not have to) have lower bounds if we need to protect and >> we know about some limitations, but we do not need to document those. Just >> set the limit. The nice thing about lower bounds that they do not "decay" >> over time. By default latest "eligible" version is installed by PIP (but of >> course if other deps have conflicting upper bounds, keeping lower limits >> when unnecessary is a problem. >> * by default we should not have upper-bounds. We know it limits our users >> and constraints + CI tests are nicely handling the scenario when things are >> breaking. >> * we should only introduce upper bounds if we know that there is breaking >> change (or that it is very likely and "foreseen" - betas/rc2/discussions >> about upcoming breaking changes in the dependencies >> * when we introduce upper-bounds we should always specify why and what is >> the condition to remove them >> >> WDYT? >> >> J. >> >> >> >>
