Any more comments here or in the PR? On Sun, Feb 6, 2022 at 1:32 PM Jarek Potiuk <[email protected]> wrote:
> I do not think we reached consensus yet on those but I thought maybe just > attempting it and showing how much more consistent and better it would > look, > I have attempted to describe the policies I had in mind and update our > dependencies in > > https://github.com/apache/airflow/pull/21356 > > Ash - I think you had doubts initially, maybe just reviewing and showing > how it could look might be a good idea to clear up any doubts. > Alternatively maybe we can iterate on the PR (and discussion below) if > there is any counter-proposal (I think the current situation where the > policy is basically "random" is not really a policy). > > As part of the PR I reviewed ALL our upper-bound limits (there were not > too many as I suspected) and tried to either reverse-engineer where the > limits are from (and document them) or left comment and TODO to attempt to > remove the limit (those should be separate PRs and I am happy to attempt at > least some of those as a follow-up). I also removed all upper limits where > there are no ideas on what and when and whether at all the upper limits > need to be applied because we are already at the latest version and there > are no plans from the libraries to release releases with breaking changes > (at least it's not obvious there are). > > Let me know what you think also - others who have not spoken yet - maybe > looking at the PR it will be clearer what I had in mind. > > J. > > > On Tue, Jan 25, 2022 at 10:54 AM Jarek Potiuk <[email protected]> wrote: > >> Actually - if we make a clear policy about what we agree clear policies >> when we do upper-bound and when we don't - I am super happy to write a >> short documentation for Poetry and PIP tools users on exactly how they can >> install airflow reliably (I can even add a small tool that will take our >> constraints an generate the right installation configuration for >> poetry/piptools. >> >> But without a clear policy on what our approach is, that might not be >> straightforward. >> >> J, >> >> >> On Tue, Jan 25, 2022 at 10:48 AM Jarek Potiuk <[email protected]> wrote: >> >>> > I am wondering if this will have a negative impact on installations >>> using tools that do not support constraints files like poetry or >>> pip-tools. These tools are not officially supported, but many users >>> may use them nonetheless, so I think it is worth considering before >>> making any changes that affect them. Unfortunately, I have too little >>> experience with them to say at the same time how your proposal will >>> affect it. >>> >>> I think this is not a change really, just clarification of our policies. >>> I have not counted it but by rough look we already have some ~70% of our >>> dependencies NOT upper bound already. >>> >>> We have no documentation, statement, nothing to explain - why. Why some >>> dependencies are upper bound and some are not. Looks like it is pretty >>> random now. >>> >>> So this is not a "change" but more >>> >>> * "statement on what our policy is" and clear guidelines on how to add >>> future dependencies >>> * possibly "review" the current set and make sure that our current >>> approach follows the policy we define. >>> >>> Re: poetry, piptools - I think we cannot satisfy them if we want to keep >>> our "both application and library" approach. >>> Those tools made an opinionated (and in cases of many projects very >>> reasonable BTW) decision that you are either "library" or "application" but >>> not both - and they propose different approaches for "application" and >>> "library". We are both. None of the solutions proposed by poetry and pip >>> tools work for us. There is no way we can make them work well for us. >>> >>> I really like poetry for one, and I heartily recommend most of >>> "standard" projects to use it as they have some really cool tooling and >>> make it super easy to manage dependencies. We are just different because we >>> are both application and library and poetry does not support that. >>> I think at least - if we clarify which of our dependencies are >>> "application" (and upper-bound them) it will give higher chances for the >>> users that choose to use poetry/pip-tools to install "base" version of >>> airflow - but also will give them a chance to know that they have to >>> manually pin some of the non-application dependencies if they fail for >>> them. Poetry and pip tools users can simply take our constraints and pin >>> them manually when they are installing airflow - this is a solution that I >>> mentioned many times to the users who raised their questions about it. >>> >>> J. >>> >>> >>> On Tue, Jan 25, 2022 at 3:26 AM Kamil Breguła <[email protected]> >>> wrote: >>> >>>> I am wondering if this will have a negative impact on installations >>>> using tools that do not support constraints files like poetry or >>>> pip-tools. These tools are not officially supported, but many users >>>> may use them nonetheless, so I think it is worth considering before >>>> making any changes that affect them. Unfortunately, I have too little >>>> experience with them to say at the same time how your proposal will >>>> affect it. >>>> >>>> pon., 24 sty 2022 o 23:08 Jarek Potiuk <[email protected]> napisał(a): >>>> > >>>> > Hey Ash (and everyone). >>>> > >>>> > I think we are really close to some consensus, but we are coming from >>>> two directions. I do agree there will be some of the deps that are more >>>> important to keep "in-check" than the rest. >>>> > I think it makes perfect sense to distinguish the dependencies we >>>> really "astrongly feel" are important for airflow vs. those that "are >>>> important to our users". >>>> > I am perfectly ok with upper-bounding deps that we really think are >>>> important. >>>> > >>>> > But with >500 deps total for aifrflow (transitive) I think we should >>>> have. shortlist of those we care only - and the rest should be upper-open >>>> by default. >>>> > And "short" is a good name, because those are the dependencies that >>>> we take responsibility of manually upgrading when needed. >>>> > >>>> > I would be happy if we can come up wit the list that is really >>>> important and "core" to airflow - and "really likely" to break things with >>>> new major release. And those are the things we care more as an >>>> "application" that our users care as "libraries". This is my subjective >>>> list (those are all things taht we care : >>>> > >>>> > * sqlalchemy >>>> > * alembic >>>> > * flask (and all flask libs) >>>> > * flask-app-builder >>>> > * werkzeug (yeah. not surprisingly - including last meetup I spoke >>>> about Airflow, my previous presented mentioned that Werkzeug comes up as >>>> "problematic case" and I was only able to confirm that :). >>>> > >>>> > + all those that we know break things (for example we know a bunch of >>>> google deps <2.0.0 that we know are breaking things already - but we can >>>> easily describe those) >>>> > >>>> > I think if we upper-bound those (and make appropriate comment in our >>>> setup.py/cfg) this would be pretty "good" setup. And having a >>>> shortlist of those that we want to keep upper-bound makes sense and is >>>> manageable. >>>> > >>>> > WDYT? >>>> > >>>> > J >>>> > >>>> > >>>> > On Mon, Jan 24, 2022 at 10:30 PM Collin McNulty >>>> <[email protected]> wrote: >>>> >> >>>> >> Ash, >>>> >> >>>> >> I think your code snippet will result in getting the latest version >>>> of pandas even if there is an upper constraint on pandas in apache-airflow. >>>> I simulated with these commands on the latest pip available. >>>> >> >>>> >> ``` >>>> >> pip install apache-airflow "pandas<1.4" >>>> >> pip freeze | grep pandas >>>> >> pip install -U pandas >>>> >> pip freeze | grep pandas >>>> >> ``` >>>> >> >>>> >> Collin McNulty >>>> >> >>>> >> On Mon, Jan 24, 2022 at 3:11 PM Ash Berlin-Taylor <[email protected]> >>>> wrote: >>>> >>> >>>> >>> To your inital thoughts: >>>> >>> - lower bounds: yeah, introduce them when we need to, so a new >>>> package could be added with no lower bound, and then a bound placed only >>>> when we discover we need a minium version (include bug fix etc. PR author >>>> can choose wether or not to have a lower bound initially.) >>>> >>> - not have upper bound by default: see below >>>> >>> - only introduce upper bound if breaking change: see below >>>> >>> - specify reason for upper bound. yes, 100% to this >>>> >>> >>>> >>> >>>> >>> I think you have created a strawman argument here to say that the >>>> '<2.0'constraint is useless', as Panda's documentation[1] say they follow >>>> SemVer so 2.0 is going to break _something_. It didn't help us with this >>>> specific problem. >>>> >>> >>>> >>> As you mentioned (possibly only in slack) constraints protect >>>> against initial install only, but not future in-place upgrades. Without an >>>> upper constraint I could do >>>> >>> >>>> >>> ``` >>>> >>> pip install --constraint $FILE apache-airflow >>>> >>> pip install -U pandas >>>> >>> ``` >>>> >>> >>>> >>> And then airflow could be broken. And being able to upgrade without >>>> constraints files is the only way to apply a security upgrade to a module >>>> (otherwise we'd have to update the constraint files for all supported >>>> versions when ever any of our transitive dependencies has a security >>>> update, not something we are in a position to do). >>>> >>> >>>> >>> I think it also depends how we use the dependency in Airflow -- >>>> some of them are so core that we don't want anything to break, but others >>>> (such as Pandas in this case) where our use of them in Airflow is actually >>>> fairly superficial. >>>> >>> >>>> >>> So I think I would just tone down your middle suggestions slightly >>>> -- Upper limits dependencies can be optional. I think the only place we >>>> really differ is what is "foreseen". >>>> >>> >>>> >>> -ash >>>> >>> >>>> >>> [1]: https://pandas.pydata.org/docs/development/policies.html >>>> >>> >>>> >>> >>>> >>> On Sun, Jan 23 2022 at 17:35:04 +0100, Jarek Potiuk < >>>> [email protected]> wrote: >>>> >>> >>>> >>> Just to illustrate it better. Here is an extremely good example >>>> from today: >>>> >>> >>>> >>> Our main build started to fail today ( >>>> https://github.com/apache/airflow/runs/4913148486?check_suite_focus=true) >>>> because Pandas released a new 1.4.0 version last night: >>>> https://pypi.org/project/pandas/1.4.0/ >>>> >>> The root cause of the failure was that Pandas 1.4.0 requires >>>> SqlAlchemy 1.4.0 or above. But surprisingly - there is no hard limit in >>>> the released Pandas. What's more - Pandas does not even mention that it >>>> **might** need sqlalchemy at all (and in version >=1.4.0 !). >>>> >>> >>>> >>> What happens is that in runtime, during running our tests we get >>>> this error: >>>> >>> `Pandas requires version '1.4.0' or newer of 'sqlalchemy' (version >>>> '1.3.24' currently installed).` >>>> >>> >>>> >>> We already had a limitation (not documented why) on Pandas >>>> >>> >>>> >>> pandas_requirement = 'pandas>=0.17.1, <2.0' >>>> >>> >>>> >>> However this `<2.0` was completely useless in this case, because >>>> it's 1.4 version that broke parts of Airflow. >>>> >>> We assumed that the <2.0 limitation will protect us (apparently) >>>> but it did not. However, the failure in tests protected our constraints >>>> from being upgraded and in our "main" pandas is still 1.3.4. Our users who >>>> follow "use constraints" and those who use Airflow images, are fully >>>> protected against those kinds of problems. >>>> >>> >>>> >>> This particular problem will likely be solved in a few days when >>>> Flask Application Builder 3.4.4rc1 ( >>>> https://pypi.org/project/Flask-AppBuilder/3.4.4rc1/|) will be released >>>> (It moves the upper bound limit for sqlalchemy to <1.5 and that was the >>>> only thing that blocked us from getting sqlalchemy 1.4). >>>> >>> >>>> >>> My point is that simply we do not know if any future version of any >>>> dependency will break Airflow. And any reasonable assumptions about that >>>> guessing from "Major" version is IMHO just wild guessing. And also by >>>> adding such limits - we are limiting our users to update to higher version >>>> of dependencies when it is harmless (very similarly as Flask Application >>>> Builder blocked us in this case from migration to sqlalchemy 1.4). >>>> >>> >>>> >>> The temporary fix (until FAB 3.4.4 is released) is here: >>>> >>> >>>> >>> https://github.com/apache/airflow/pull/21045 >>>> >>> >>>> >>> I believe after FAB 3.4.4 is released we remove the fix we should >>>> just get rid of the pandas limitation. Constraints of ours protect our >>>> users when they install Airflow "according to our instructions with >>>> constraints" - which is the only official way of installing Airflow. We are >>>> protecting our users from those kind of events, but at the same time we >>>> should relax pretty much all the upper bounds to not block our users in the >>>> future in case they need to move to higher versions of dependencies >>>> released in the future that we have no idea if they will break, or not >>>> Airflow.. >>>> >>> >>>> >>> Let me know your thoughts - I think this Pandas case is great to >>>> illustrate my point. >>>> >>> >>>> >>> J. >>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> >>> On Sun, Jan 23, 2022 at 1:40 AM Jarek Potiuk <[email protected]> >>>> wrote: >>>> >>>> >>>> >>>> Hello everyone, >>>> >>>> >>>> >>>> TL;DR; I think there is one thing about dependencies that we >>>> should have some agreement on to make some common approach. Namely about >>>> using upper bounds on our dependencies. >>>> >>>> Should we set some rules on when we set upper-bounds on the deps ? >>>> What rules should they be? >>>> >>>> >>>> >>>> Currently we use constraints to make sure Airflow in specific >>>> version can be installed using "golden" set of dependencies. Those are part >>>> of our CI, automatically updated and tests which makes it really nice as >>>> they are "fixed" for installation of particular version but they >>>> continuously upgrade (even across major versions) when they pass tests in >>>> CI. >>>> >>>> >>>> >>>> This all happens pretty much automatically by our CI, except the >>>> cases where our dependencies are upper-limited. We occasionally do some >>>> "setup.py", "setup.cfg" changes manually to bump some upper limits, but >>>> this is more the result of some external request or "occasional" cleanup to >>>> do it - rather than a regular process. >>>> >>>> >>>> >>>> Now, we have some dependencies that are upper-bound for good >>>> reasons and they are documented. We also have a number of dependencies that >>>> are not upper-limited. But I think this pretty inconsistent. Small excerpt >>>> from setup.cfg: >>>> >>>> >>>> >>>> # Logging is broken with itsdangerous > 2 >>>> >>>> itsdangerous>=1.1.0, <2.0 >>>> >>>> # Jinja2 3.1 will remove the 'autoescape' and 'with' >>>> extensions, which would >>>> >>>> # break Flask 1.x, so we limit this for future compatibility. >>>> Remove this >>>> >>>> # when bumping Flask to >=2. >>>> >>>> jinja2>=2.10.1,<3.1 >>>> >>>> ... >>>> >>>> # python daemon crashes with 'socket operation on non-socket' >>>> for python 3.8+ in version < 2.2.4 >>>> >>>> # https://pagure.io/python-daemon/issue/34 >>>> >>>> python-daemon>=2.2.4 >>>> >>>> python-dateutil>=2.3, <3 >>>> >>>> python-nvd3~=0.15.0 >>>> >>>> >>>> >>>> * itsdangerous is upper limited and the reason is specified in the >>>> comment (though we do not know when we could remove the limit) >>>> >>>> >>>> >>>> * Jinja is upper-limited and we know not only why but also when it >>>> can be removed >>>> >>>> * Python-daemon is not upper-limited but it has a comment why it >>>> is "lower-limited" (which is pretty useless IMHO) >>>> >>>> * python-dateutil is upper-limited but we do not know why >>>> >>>> * python-nvd3 is also upper limited (~0.15.0 - limits it to any >>>> 0.15.x version but 0.16 could not be installed) >>>> >>>> >>>> >>>> I think there are many inconsistencies and the way we treat >>>> dependency limits is pretty inconsistent - we have no rules agreed. >>>> >>>> >>>> >>>> I would love to discuss how we can standardize it or at least set >>>> some rules we can follow. >>>> >>>> >>>> >>>> My initial thoughts are: >>>> >>>> >>>> >>>> * we can (but do not have to) have lower bounds if we need to >>>> protect and we know about some limitations, but we do not need to document >>>> those. Just set the limit. The nice thing about lower bounds that they do >>>> not "decay" over time. By default latest "eligible" version is installed by >>>> PIP (but of course if other deps have conflicting upper bounds, keeping >>>> lower limits when unnecessary is a problem. >>>> >>>> * by default we should not have upper-bounds. We know it limits >>>> our users and constraints + CI tests are nicely handling the scenario when >>>> things are breaking. >>>> >>>> * we should only introduce upper bounds if we know that there is >>>> breaking change (or that it is very likely and "foreseen" - >>>> betas/rc2/discussions about upcoming breaking changes in the dependencies >>>> >>>> * when we introduce upper-bounds we should always specify why and >>>> what is the condition to remove them >>>> >>>> >>>> >>>> WDYT? >>>> >>>> >>>> >>>> J. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>
