Any more comments here or in the PR?

On Sun, Feb 6, 2022 at 1:32 PM Jarek Potiuk <[email protected]> wrote:

> I do not think we reached consensus yet on those but I thought maybe just
> attempting it and showing how much more consistent and better it would
> look,
> I have attempted to describe the policies I had in mind and update our
> dependencies in
>
> https://github.com/apache/airflow/pull/21356
>
> Ash - I think you had doubts initially, maybe just reviewing and showing
> how it could look might be a good idea to clear up any doubts.
> Alternatively maybe we can iterate on the PR (and discussion below) if
> there is any counter-proposal (I think the current situation where the
> policy is basically "random" is not really a policy).
>
> As part of the PR I reviewed ALL our upper-bound limits (there were not
> too many as I suspected) and tried to either reverse-engineer where the
> limits are from (and document them) or left comment and TODO to attempt to
> remove the limit (those should be separate PRs and I am happy to attempt at
> least some of those as a follow-up). I also removed all upper limits where
> there are no ideas on what and when and whether at all the upper limits
> need to be applied because we are already at the latest version and there
> are no plans from the libraries to release releases with breaking changes
> (at least it's not obvious there are).
>
> Let me know what you think also - others who have not spoken yet - maybe
> looking at the PR it will be clearer what I had in mind.
>
> J.
>
>
> On Tue, Jan 25, 2022 at 10:54 AM Jarek Potiuk <[email protected]> wrote:
>
>> Actually - if we make a clear policy about what we agree clear policies
>> when we do upper-bound and when we don't -  I am super happy to write a
>> short documentation for Poetry and PIP tools users on exactly how they can
>> install airflow reliably (I can even add a small tool that will take our
>> constraints an generate the right installation configuration for
>> poetry/piptools.
>>
>> But without a clear policy on what our approach is, that might not be
>> straightforward.
>>
>> J,
>>
>>
>> On Tue, Jan 25, 2022 at 10:48 AM Jarek Potiuk <[email protected]> wrote:
>>
>>> > I am wondering if this will have a negative impact on installations
>>> using tools that do not support constraints files like poetry or
>>> pip-tools. These tools are not officially supported, but many users
>>> may use them nonetheless, so I think it is worth considering before
>>> making any changes that affect them. Unfortunately, I have too little
>>> experience with them to say at the same time how your proposal will
>>> affect it.
>>>
>>> I think this is not a change really, just clarification of our policies.
>>> I have not counted it but by rough look we already have some ~70% of our
>>> dependencies NOT upper bound already.
>>>
>>> We have no documentation, statement, nothing to explain - why. Why some
>>> dependencies are upper bound and some are not. Looks like it is pretty
>>> random now.
>>>
>>> So this is not a "change" but more
>>>
>>> * "statement on what our policy is" and clear guidelines on how to add
>>> future dependencies
>>> * possibly "review" the current set and make sure that our current
>>> approach follows the policy we define.
>>>
>>> Re: poetry, piptools - I think we cannot satisfy them if we want to keep
>>> our "both application and library" approach.
>>> Those tools made an opinionated (and in cases of many projects very
>>> reasonable BTW) decision that you are either "library" or "application" but
>>> not both - and they propose different approaches for "application" and
>>> "library". We are both. None of the solutions proposed by poetry and pip
>>> tools work for us. There is no way we can make them work well for us.
>>>
>>> I really like poetry for one, and I heartily recommend most of
>>> "standard" projects to use it as they have some really cool tooling and
>>> make it super easy to manage dependencies. We are just different because we
>>> are both application and library and poetry does not support that.
>>> I think at least - if we clarify which of our dependencies are
>>> "application" (and upper-bound them) it will give higher chances for the
>>> users that choose to use poetry/pip-tools to install "base" version of
>>> airflow - but also will give them a chance to know that they have to
>>> manually pin some of the non-application dependencies if they fail for
>>> them. Poetry and pip tools users can simply take our constraints and pin
>>> them manually when they are installing airflow - this is a solution that I
>>> mentioned many times to the users who raised their questions about it.
>>>
>>> J.
>>>
>>>
>>> On Tue, Jan 25, 2022 at 3:26 AM Kamil Breguła <[email protected]>
>>> wrote:
>>>
>>>> I am wondering if this will have a negative impact on installations
>>>> using tools that do not support constraints files like poetry or
>>>> pip-tools. These tools are not officially supported, but many users
>>>> may use them nonetheless, so I think it is worth considering before
>>>> making any changes that affect them. Unfortunately, I have too little
>>>> experience with them to say at the same time how your proposal will
>>>> affect it.
>>>>
>>>> pon., 24 sty 2022 o 23:08 Jarek Potiuk <[email protected]> napisał(a):
>>>> >
>>>> > Hey Ash (and everyone).
>>>> >
>>>> > I think we are really close to some consensus, but we are coming from
>>>> two directions. I do agree there will be some of the deps that are more
>>>> important to keep "in-check" than the rest.
>>>> > I think it makes perfect sense to distinguish the dependencies we
>>>> really "astrongly feel" are important for airflow vs. those that "are
>>>> important to our users".
>>>> > I am perfectly ok with upper-bounding deps that we really think are
>>>> important.
>>>> >
>>>> > But with >500 deps total for aifrflow (transitive) I think we should
>>>> have. shortlist of those we care only - and the rest should be upper-open
>>>> by default.
>>>> > And "short" is a good name, because those are the dependencies that
>>>> we take responsibility of manually upgrading when needed.
>>>> >
>>>> > I would be happy if we can come up wit the list that is really
>>>> important and "core" to airflow - and "really likely" to break things with
>>>> new major release. And those are the things we care more as an
>>>> "application" that our users care as "libraries". This is my subjective
>>>> list (those are all things taht we care :
>>>> >
>>>> > * sqlalchemy
>>>> > * alembic
>>>> > * flask (and all flask libs)
>>>> > * flask-app-builder
>>>> > * werkzeug  (yeah. not surprisingly - including last meetup I spoke
>>>> about Airflow, my previous presented mentioned that Werkzeug comes up as
>>>> "problematic case" and I was only able to confirm that :).
>>>> >
>>>> > + all those that we know break things (for example we know a bunch of
>>>> google deps <2.0.0 that we know are breaking things already - but we can
>>>> easily describe those)
>>>> >
>>>> > I think if we upper-bound those (and make appropriate comment in our
>>>> setup.py/cfg)  this would be pretty "good" setup. And having a
>>>> shortlist of those that we want to keep upper-bound makes sense and is
>>>> manageable.
>>>> >
>>>> > WDYT?
>>>> >
>>>> > J
>>>> >
>>>> >
>>>> > On Mon, Jan 24, 2022 at 10:30 PM Collin McNulty
>>>> <[email protected]> wrote:
>>>> >>
>>>> >> Ash,
>>>> >>
>>>> >> I think your code snippet will result in getting the latest version
>>>> of pandas even if there is an upper constraint on pandas in apache-airflow.
>>>> I simulated with these commands on the latest pip available.
>>>> >>
>>>> >> ```
>>>> >> pip install apache-airflow "pandas<1.4"
>>>> >> pip freeze | grep pandas
>>>> >> pip install -U pandas
>>>> >> pip freeze | grep pandas
>>>> >> ```
>>>> >>
>>>> >> Collin McNulty
>>>> >>
>>>> >> On Mon, Jan 24, 2022 at 3:11 PM Ash Berlin-Taylor <[email protected]>
>>>> wrote:
>>>> >>>
>>>> >>> To your inital thoughts:
>>>> >>> - lower bounds: yeah, introduce them when we need to, so a new
>>>> package could be added with no lower bound, and then a bound placed only
>>>> when we discover we need a minium version (include bug fix etc. PR author
>>>> can choose wether or not to have a lower bound initially.)
>>>> >>> - not have upper bound by default: see below
>>>> >>> - only introduce upper bound if breaking change: see below
>>>> >>> - specify reason for upper bound. yes, 100% to this
>>>> >>>
>>>> >>>
>>>> >>> I think you have created a strawman argument here to say that the
>>>> '<2.0'constraint is useless', as Panda's documentation[1] say they follow
>>>> SemVer so 2.0 is going to break _something_. It didn't help us with this
>>>> specific problem.
>>>> >>>
>>>> >>> As you mentioned (possibly only in slack) constraints protect
>>>> against initial install only, but not future in-place upgrades. Without an
>>>> upper constraint I could do
>>>> >>>
>>>> >>> ```
>>>> >>> pip install --constraint $FILE apache-airflow
>>>> >>> pip install -U pandas
>>>> >>> ```
>>>> >>>
>>>> >>> And then airflow could be broken. And being able to upgrade without
>>>> constraints files is the only way to apply a security upgrade to a module
>>>> (otherwise we'd have to update the constraint files for all supported
>>>> versions when ever any of our transitive dependencies has a security
>>>> update, not something we are in a position to do).
>>>> >>>
>>>> >>> I think it also depends how we use the dependency in Airflow --
>>>> some of them are so core that we don't want anything to break, but others
>>>> (such as Pandas in this case) where our use of them in Airflow is actually
>>>> fairly superficial.
>>>> >>>
>>>> >>> So I think I would just tone down your middle suggestions slightly
>>>> -- Upper limits dependencies can be optional. I think the only place we
>>>> really differ is what is "foreseen".
>>>> >>>
>>>> >>> -ash
>>>> >>>
>>>> >>> [1]: https://pandas.pydata.org/docs/development/policies.html
>>>> >>>
>>>> >>>
>>>> >>> On Sun, Jan 23 2022 at 17:35:04 +0100, Jarek Potiuk <
>>>> [email protected]> wrote:
>>>> >>>
>>>> >>> Just to illustrate it better. Here is an extremely good example
>>>> from today:
>>>> >>>
>>>> >>> Our main build started to fail today (
>>>> https://github.com/apache/airflow/runs/4913148486?check_suite_focus=true)
>>>> because Pandas released a new 1.4.0 version last  night:
>>>> https://pypi.org/project/pandas/1.4.0/
>>>> >>> The root cause of the failure was that Pandas 1.4.0 requires
>>>> SqlAlchemy 1.4.0 or above. But surprisingly -  there is no hard limit in
>>>> the released Pandas. What's more - Pandas does not even mention that it
>>>> **might** need sqlalchemy at all (and in version >=1.4.0 !).
>>>> >>>
>>>> >>> What happens is that in runtime, during running our tests we get
>>>> this error:
>>>> >>> `Pandas requires version '1.4.0' or newer of 'sqlalchemy' (version
>>>> '1.3.24' currently installed).`
>>>> >>>
>>>> >>> We already had a limitation (not documented why) on Pandas
>>>> >>>
>>>> >>> pandas_requirement = 'pandas>=0.17.1, <2.0'
>>>> >>>
>>>> >>> However this `<2.0` was completely useless in this case, because
>>>> it's 1.4 version that broke parts of Airflow.
>>>> >>> We assumed that the <2.0 limitation will protect us (apparently)
>>>> but it did not. However, the failure in tests protected our constraints
>>>> from being upgraded and in our "main" pandas is still 1.3.4. Our users who
>>>> follow "use constraints" and those who use Airflow images, are fully
>>>> protected against those kinds of problems.
>>>> >>>
>>>> >>> This particular problem will likely be solved in a few days when
>>>> Flask Application Builder 3.4.4rc1 (
>>>> https://pypi.org/project/Flask-AppBuilder/3.4.4rc1/|) will be released
>>>> (It moves the upper bound limit for sqlalchemy to <1.5 and that was the
>>>> only thing that blocked us from getting sqlalchemy 1.4).
>>>> >>>
>>>> >>> My point is that simply we do not know if any future version of any
>>>> dependency will break Airflow. And any reasonable assumptions about that
>>>> guessing from "Major" version is IMHO just wild guessing. And also by
>>>> adding such limits - we are limiting our users to update to higher version
>>>> of dependencies when it is harmless (very similarly as Flask Application
>>>> Builder blocked us in this case from migration to sqlalchemy 1.4).
>>>> >>>
>>>> >>> The temporary fix (until FAB 3.4.4 is released) is here:
>>>> >>>
>>>> >>> https://github.com/apache/airflow/pull/21045
>>>> >>>
>>>> >>> I believe after FAB 3.4.4 is released we remove the fix we should
>>>> just get rid of the pandas limitation. Constraints of ours protect our
>>>> users when they install Airflow "according to our instructions with
>>>> constraints" - which is the only official way of installing Airflow. We are
>>>> protecting our users from those kind of events, but at the same time we
>>>> should relax pretty much all the upper bounds to not block our users in the
>>>> future in case they need to move to higher versions of dependencies
>>>> released in the future that we have no idea if they will break, or not
>>>> Airflow..
>>>> >>>
>>>> >>> Let me know your thoughts - I think this Pandas case is great to
>>>> illustrate my point.
>>>> >>>
>>>> >>> J.
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> On Sun, Jan 23, 2022 at 1:40 AM Jarek Potiuk <[email protected]>
>>>> wrote:
>>>> >>>>
>>>> >>>> Hello everyone,
>>>> >>>>
>>>> >>>> TL;DR; I think there is one thing about dependencies that we
>>>> should have some agreement on to make some common approach. Namely about
>>>> using upper bounds on our dependencies.
>>>> >>>> Should we set some rules on when we set upper-bounds on the deps ?
>>>> What rules should they be?
>>>> >>>>
>>>> >>>> Currently we use constraints to make sure Airflow in specific
>>>> version can be installed using "golden" set of dependencies. Those are part
>>>> of our CI, automatically updated and tests which makes it really nice as
>>>> they are "fixed" for installation of particular version but they
>>>> continuously upgrade (even across major versions) when they pass tests in
>>>> CI.
>>>> >>>>
>>>> >>>> This all happens pretty much automatically by our CI, except the
>>>> cases where our dependencies are upper-limited. We occasionally do some
>>>> "setup.py", "setup.cfg" changes manually to bump some upper limits, but
>>>> this is more the result of some external request or "occasional" cleanup to
>>>> do it - rather than a regular process.
>>>> >>>>
>>>> >>>> Now, we have some dependencies that are upper-bound for good
>>>> reasons and they are documented. We also have a number of dependencies that
>>>> are not upper-limited. But I think this pretty inconsistent. Small excerpt
>>>> from setup.cfg:
>>>> >>>>
>>>> >>>>     # Logging is broken with itsdangerous > 2
>>>> >>>>     itsdangerous>=1.1.0, <2.0
>>>> >>>>     # Jinja2 3.1 will remove the 'autoescape' and 'with'
>>>> extensions, which would
>>>> >>>>     # break Flask 1.x, so we limit this for future compatibility.
>>>> Remove this
>>>> >>>>     # when bumping Flask to >=2.
>>>> >>>>     jinja2>=2.10.1,<3.1
>>>> >>>>     ...
>>>> >>>>     # python daemon crashes with 'socket operation on non-socket'
>>>> for python 3.8+ in version < 2.2.4
>>>> >>>>     # https://pagure.io/python-daemon/issue/34
>>>> >>>>     python-daemon>=2.2.4
>>>> >>>>     python-dateutil>=2.3, <3
>>>> >>>>     python-nvd3~=0.15.0
>>>> >>>>
>>>> >>>> * itsdangerous is upper limited and the reason is specified in the
>>>> comment (though we do not know when we could remove the limit)
>>>> >>>>
>>>> >>>> * Jinja is upper-limited and we know not only why but also when it
>>>> can be removed
>>>> >>>> * Python-daemon is not upper-limited but it has a comment why it
>>>> is "lower-limited" (which is pretty useless IMHO)
>>>> >>>> * python-dateutil is upper-limited but we do not know why
>>>> >>>> * python-nvd3 is also upper limited (~0.15.0 - limits it to any
>>>> 0.15.x version but 0.16 could not be installed)
>>>> >>>>
>>>> >>>> I think there are many inconsistencies and the way we treat
>>>> dependency limits is pretty inconsistent - we have no rules agreed.
>>>> >>>>
>>>> >>>> I would love to discuss how we can standardize it or at least set
>>>> some rules we can follow.
>>>> >>>>
>>>> >>>> My initial thoughts are:
>>>> >>>>
>>>> >>>> * we can (but do not have to) have lower bounds if we need to
>>>> protect and we know about some limitations, but we do not need to document
>>>> those. Just set the limit. The nice thing about lower bounds that they do
>>>> not "decay" over time. By default latest "eligible" version is installed by
>>>> PIP (but of course if other deps have conflicting upper bounds, keeping
>>>> lower limits when unnecessary is a problem.
>>>> >>>> * by default we should not have upper-bounds. We know it limits
>>>> our users and constraints + CI tests are nicely handling the scenario when
>>>> things are breaking.
>>>> >>>> * we should only introduce upper bounds if we know that there is
>>>> breaking change (or that it is very likely and "foreseen" -
>>>> betas/rc2/discussions about upcoming breaking changes in the dependencies
>>>> >>>> * when we introduce upper-bounds we should always specify why and
>>>> what is the condition to remove them
>>>> >>>>
>>>> >>>> WDYT?
>>>> >>>>
>>>> >>>> J.
>>>> >>>>
>>>> >>>>
>>>> >>>>
>>>>
>>>

Reply via email to