> I am wondering if this will have a negative impact on installations
using tools that do not support constraints files like poetry or
pip-tools. These tools are not officially supported, but many users
may use them nonetheless, so I think it is worth considering before
making any changes that affect them. Unfortunately, I have too little
experience with them to say at the same time how your proposal will
affect it.

I think this is not a change really, just clarification of our policies. I
have not counted it but by rough look we already have some ~70% of our
dependencies NOT upper bound already.

We have no documentation, statement, nothing to explain - why. Why some
dependencies are upper bound and some are not. Looks like it is pretty
random now.

So this is not a "change" but more

* "statement on what our policy is" and clear guidelines on how to add
future dependencies
* possibly "review" the current set and make sure that our current approach
follows the policy we define.

Re: poetry, piptools - I think we cannot satisfy them if we want to keep
our "both application and library" approach.
Those tools made an opinionated (and in cases of many projects very
reasonable BTW) decision that you are either "library" or "application" but
not both - and they propose different approaches for "application" and
"library". We are both. None of the solutions proposed by poetry and pip
tools work for us. There is no way we can make them work well for us.

I really like poetry for one, and I heartily recommend most of "standard"
projects to use it as they have some really cool tooling and make it super
easy to manage dependencies. We are just different because we are both
application and library and poetry does not support that.
I think at least - if we clarify which of our dependencies are
"application" (and upper-bound them) it will give higher chances for the
users that choose to use poetry/pip-tools to install "base" version of
airflow - but also will give them a chance to know that they have to
manually pin some of the non-application dependencies if they fail for
them. Poetry and pip tools users can simply take our constraints and pin
them manually when they are installing airflow - this is a solution that I
mentioned many times to the users who raised their questions about it.

J.


On Tue, Jan 25, 2022 at 3:26 AM Kamil Breguła <dzaku...@gmail.com> wrote:

> I am wondering if this will have a negative impact on installations
> using tools that do not support constraints files like poetry or
> pip-tools. These tools are not officially supported, but many users
> may use them nonetheless, so I think it is worth considering before
> making any changes that affect them. Unfortunately, I have too little
> experience with them to say at the same time how your proposal will
> affect it.
>
> pon., 24 sty 2022 o 23:08 Jarek Potiuk <ja...@potiuk.com> napisał(a):
> >
> > Hey Ash (and everyone).
> >
> > I think we are really close to some consensus, but we are coming from
> two directions. I do agree there will be some of the deps that are more
> important to keep "in-check" than the rest.
> > I think it makes perfect sense to distinguish the dependencies we really
> "astrongly feel" are important for airflow vs. those that "are important to
> our users".
> > I am perfectly ok with upper-bounding deps that we really think are
> important.
> >
> > But with >500 deps total for aifrflow (transitive) I think we should
> have. shortlist of those we care only - and the rest should be upper-open
> by default.
> > And "short" is a good name, because those are the dependencies that we
> take responsibility of manually upgrading when needed.
> >
> > I would be happy if we can come up wit the list that is really important
> and "core" to airflow - and "really likely" to break things with new major
> release. And those are the things we care more as an "application" that our
> users care as "libraries". This is my subjective list (those are all things
> taht we care :
> >
> > * sqlalchemy
> > * alembic
> > * flask (and all flask libs)
> > * flask-app-builder
> > * werkzeug  (yeah. not surprisingly - including last meetup I spoke
> about Airflow, my previous presented mentioned that Werkzeug comes up as
> "problematic case" and I was only able to confirm that :).
> >
> > + all those that we know break things (for example we know a bunch of
> google deps <2.0.0 that we know are breaking things already - but we can
> easily describe those)
> >
> > I think if we upper-bound those (and make appropriate comment in our
> setup.py/cfg)  this would be pretty "good" setup. And having a shortlist
> of those that we want to keep upper-bound makes sense and is manageable.
> >
> > WDYT?
> >
> > J
> >
> >
> > On Mon, Jan 24, 2022 at 10:30 PM Collin McNulty
> <col...@astronomer.io.invalid> wrote:
> >>
> >> Ash,
> >>
> >> I think your code snippet will result in getting the latest version of
> pandas even if there is an upper constraint on pandas in apache-airflow. I
> simulated with these commands on the latest pip available.
> >>
> >> ```
> >> pip install apache-airflow "pandas<1.4"
> >> pip freeze | grep pandas
> >> pip install -U pandas
> >> pip freeze | grep pandas
> >> ```
> >>
> >> Collin McNulty
> >>
> >> On Mon, Jan 24, 2022 at 3:11 PM Ash Berlin-Taylor <a...@apache.org>
> wrote:
> >>>
> >>> To your inital thoughts:
> >>> - lower bounds: yeah, introduce them when we need to, so a new package
> could be added with no lower bound, and then a bound placed only when we
> discover we need a minium version (include bug fix etc. PR author can
> choose wether or not to have a lower bound initially.)
> >>> - not have upper bound by default: see below
> >>> - only introduce upper bound if breaking change: see below
> >>> - specify reason for upper bound. yes, 100% to this
> >>>
> >>>
> >>> I think you have created a strawman argument here to say that the
> '<2.0'constraint is useless', as Panda's documentation[1] say they follow
> SemVer so 2.0 is going to break _something_. It didn't help us with this
> specific problem.
> >>>
> >>> As you mentioned (possibly only in slack) constraints protect against
> initial install only, but not future in-place upgrades. Without an upper
> constraint I could do
> >>>
> >>> ```
> >>> pip install --constraint $FILE apache-airflow
> >>> pip install -U pandas
> >>> ```
> >>>
> >>> And then airflow could be broken. And being able to upgrade without
> constraints files is the only way to apply a security upgrade to a module
> (otherwise we'd have to update the constraint files for all supported
> versions when ever any of our transitive dependencies has a security
> update, not something we are in a position to do).
> >>>
> >>> I think it also depends how we use the dependency in Airflow -- some
> of them are so core that we don't want anything to break, but others (such
> as Pandas in this case) where our use of them in Airflow is actually fairly
> superficial.
> >>>
> >>> So I think I would just tone down your middle suggestions slightly --
> Upper limits dependencies can be optional. I think the only place we really
> differ is what is "foreseen".
> >>>
> >>> -ash
> >>>
> >>> [1]: https://pandas.pydata.org/docs/development/policies.html
> >>>
> >>>
> >>> On Sun, Jan 23 2022 at 17:35:04 +0100, Jarek Potiuk <ja...@potiuk.com>
> wrote:
> >>>
> >>> Just to illustrate it better. Here is an extremely good example from
> today:
> >>>
> >>> Our main build started to fail today (
> https://github.com/apache/airflow/runs/4913148486?check_suite_focus=true)
> because Pandas released a new 1.4.0 version last  night:
> https://pypi.org/project/pandas/1.4.0/
> >>> The root cause of the failure was that Pandas 1.4.0 requires
> SqlAlchemy 1.4.0 or above. But surprisingly -  there is no hard limit in
> the released Pandas. What's more - Pandas does not even mention that it
> **might** need sqlalchemy at all (and in version >=1.4.0 !).
> >>>
> >>> What happens is that in runtime, during running our tests we get this
> error:
> >>> `Pandas requires version '1.4.0' or newer of 'sqlalchemy' (version
> '1.3.24' currently installed).`
> >>>
> >>> We already had a limitation (not documented why) on Pandas
> >>>
> >>> pandas_requirement = 'pandas>=0.17.1, <2.0'
> >>>
> >>> However this `<2.0` was completely useless in this case, because it's
> 1.4 version that broke parts of Airflow.
> >>> We assumed that the <2.0 limitation will protect us (apparently) but
> it did not. However, the failure in tests protected our constraints from
> being upgraded and in our "main" pandas is still 1.3.4. Our users who
> follow "use constraints" and those who use Airflow images, are fully
> protected against those kinds of problems.
> >>>
> >>> This particular problem will likely be solved in a few days when Flask
> Application Builder 3.4.4rc1 (
> https://pypi.org/project/Flask-AppBuilder/3.4.4rc1/|) will be released
> (It moves the upper bound limit for sqlalchemy to <1.5 and that was the
> only thing that blocked us from getting sqlalchemy 1.4).
> >>>
> >>> My point is that simply we do not know if any future version of any
> dependency will break Airflow. And any reasonable assumptions about that
> guessing from "Major" version is IMHO just wild guessing. And also by
> adding such limits - we are limiting our users to update to higher version
> of dependencies when it is harmless (very similarly as Flask Application
> Builder blocked us in this case from migration to sqlalchemy 1.4).
> >>>
> >>> The temporary fix (until FAB 3.4.4 is released) is here:
> >>>
> >>> https://github.com/apache/airflow/pull/21045
> >>>
> >>> I believe after FAB 3.4.4 is released we remove the fix we should just
> get rid of the pandas limitation. Constraints of ours protect our users
> when they install Airflow "according to our instructions with constraints"
> - which is the only official way of installing Airflow. We are protecting
> our users from those kind of events, but at the same time we should relax
> pretty much all the upper bounds to not block our users in the future in
> case they need to move to higher versions of dependencies released in the
> future that we have no idea if they will break, or not Airflow..
> >>>
> >>> Let me know your thoughts - I think this Pandas case is great to
> illustrate my point.
> >>>
> >>> J.
> >>>
> >>>
> >>>
> >>>
> >>> On Sun, Jan 23, 2022 at 1:40 AM Jarek Potiuk <ja...@potiuk.com> wrote:
> >>>>
> >>>> Hello everyone,
> >>>>
> >>>> TL;DR; I think there is one thing about dependencies that we should
> have some agreement on to make some common approach. Namely about using
> upper bounds on our dependencies.
> >>>> Should we set some rules on when we set upper-bounds on the deps ?
> What rules should they be?
> >>>>
> >>>> Currently we use constraints to make sure Airflow in specific version
> can be installed using "golden" set of dependencies. Those are part of our
> CI, automatically updated and tests which makes it really nice as they are
> "fixed" for installation of particular version but they continuously
> upgrade (even across major versions) when they pass tests in CI.
> >>>>
> >>>> This all happens pretty much automatically by our CI, except the
> cases where our dependencies are upper-limited. We occasionally do some
> "setup.py", "setup.cfg" changes manually to bump some upper limits, but
> this is more the result of some external request or "occasional" cleanup to
> do it - rather than a regular process.
> >>>>
> >>>> Now, we have some dependencies that are upper-bound for good reasons
> and they are documented. We also have a number of dependencies that are not
> upper-limited. But I think this pretty inconsistent. Small excerpt from
> setup.cfg:
> >>>>
> >>>>     # Logging is broken with itsdangerous > 2
> >>>>     itsdangerous>=1.1.0, <2.0
> >>>>     # Jinja2 3.1 will remove the 'autoescape' and 'with' extensions,
> which would
> >>>>     # break Flask 1.x, so we limit this for future compatibility.
> Remove this
> >>>>     # when bumping Flask to >=2.
> >>>>     jinja2>=2.10.1,<3.1
> >>>>     ...
> >>>>     # python daemon crashes with 'socket operation on non-socket' for
> python 3.8+ in version < 2.2.4
> >>>>     # https://pagure.io/python-daemon/issue/34
> >>>>     python-daemon>=2.2.4
> >>>>     python-dateutil>=2.3, <3
> >>>>     python-nvd3~=0.15.0
> >>>>
> >>>> * itsdangerous is upper limited and the reason is specified in the
> comment (though we do not know when we could remove the limit)
> >>>>
> >>>> * Jinja is upper-limited and we know not only why but also when it
> can be removed
> >>>> * Python-daemon is not upper-limited but it has a comment why it is
> "lower-limited" (which is pretty useless IMHO)
> >>>> * python-dateutil is upper-limited but we do not know why
> >>>> * python-nvd3 is also upper limited (~0.15.0 - limits it to any
> 0.15.x version but 0.16 could not be installed)
> >>>>
> >>>> I think there are many inconsistencies and the way we treat
> dependency limits is pretty inconsistent - we have no rules agreed.
> >>>>
> >>>> I would love to discuss how we can standardize it or at least set
> some rules we can follow.
> >>>>
> >>>> My initial thoughts are:
> >>>>
> >>>> * we can (but do not have to) have lower bounds if we need to protect
> and we know about some limitations, but we do not need to document those.
> Just set the limit. The nice thing about lower bounds that they do not
> "decay" over time. By default latest "eligible" version is installed by PIP
> (but of course if other deps have conflicting upper bounds, keeping lower
> limits when unnecessary is a problem.
> >>>> * by default we should not have upper-bounds. We know it limits our
> users and constraints + CI tests are nicely handling the scenario when
> things are breaking.
> >>>> * we should only introduce upper bounds if we know that there is
> breaking change (or that it is very likely and "foreseen" -
> betas/rc2/discussions about upcoming breaking changes in the dependencies
> >>>> * when we introduce upper-bounds we should always specify why and
> what is the condition to remove them
> >>>>
> >>>> WDYT?
> >>>>
> >>>> J.
> >>>>
> >>>>
> >>>>
>

Reply via email to