> That sounds like a really nice improvement :)

Thanks ! It's "quite" useful and nice indeed.

The actual impact on our users of "bad" dependency versions is quite low -
especially since we advocate constraints for years and a lot of people are
following, and if they have problems we always direct them to constraints
so it's not a "big" deal). That's why it always was "nice to have" - but
from a "perfection" point of view it is a nice improvement indeed - because
we now will prevent a number of users struggling (and even having to
fall-back to constraints). So it was never "first" priority.

But this one will have a much bigger effect on the "ecosystem" - especially
longer term. Airflow is a "dependency resolution hell". I discussed it with
Damian Shaw a number of times (including over a beer in NY :). He is using
Airflow as a test-bed for some improvements he implements and proposes to
pip - also `uv` recently merged performance test case/benchmark based on
airflow dependency resolution https://github.com/astral-sh/uv/pull/3643 .
Having no lower bindings in our dependencies caused multiple problems of
resolvers - pip and uv both struggle and backtrack a lot sometimes when
airflow is being resolved. And that's precisely because of lack of
lower-binding or far too low binding. As a result resolving installation
where airflow and providers are involved will become much more stable,
faster and predictable - especially when we will also attempt the next
step, which is somewhat limiting the old provider versions on newer
versions of Airflow (as suggested by Damian in
https://github.com/apache/airflow/issues/39100 ).

We **might** use the lower-binding from editable dependencies to be used as
a "base" for those limits proposed by Damian (but that's something that we
will take a look after we have a few releases of airflow after this PR is
merged and we will see whether we will need it all (as I have a feeling
that just having lower binding in Airflow core will help in a number of
cases).

J.

On Sat, Jun 1, 2024 at 1:55 PM Pierre Jeambrun <pierrejb...@gmail.com>
wrote:

> Great work! That sounds like a really nice improvement :)
>
> Le sam. 1 juin 2024 à 10:48, Jarek Potiuk <ja...@potiuk.com> a écrit :
>
> > Hello everyone,
> >
> > TL;DR; I have finally got to something we planned when we switched to
> UV, I
> > have a green PR where we introduced automated management of
> "lower-bounds"
> > dependencies in Airflow and all providers (thanks to uv`s --lowest-direct
> > resolution mechanism).
> >
> > The PR is here: https://github.com/apache/airflow/pull/39946 . It's
> ready
> > to review and merge (green).
> >
> > Thanks to Maciek nagging me on slack and helping with initial checks - I
> > managed to complete it before going to Community Over Code this week.
> >
> > At the end of the email I summarized changes in dependencies that were
> > needed to do it (so basically all the missing lower-bounds that the tests
> > helped to detect and fix).
> >
> > Once it is merged, our CI will run a special
> > "LowestDirectDependencyResolution" test suite that will fail if the tests
> > run in PR in Airflow and any Provider uses a feature that requires adding
> > `>=` limit for any library version (lower binding). This means that we
> will
> > finally have "proper" lower bindings for both Airflow and Providers and
> > there will be no more cases where Airflow or any Provider fails because
> > someone has an old version of a library installed.
> >
> > For example if you are using bedrock (amazon provider) - our Amazon
> > provider had botocore > 1.3.3 but the tests found that Bedrock is only
> > available in 1.34 and we have to bump it. Once we merge the PR, those
> cases
> > will be detected automatically and you will have to fix them before you
> > merge your PRs.
> >
> > It's done in the way that in the special test suite, dependencies are
> > downgraded to lowest direct ones before our unit tests are run. This is
> > done for Airflow tests and for each provider separately, so we are able
> to
> > detect missing lower bounds very accurately - separately for core Airflow
> > and separately for each provider.
> >
> > When the test fails in CI, it will be very easy to reproduce it locally
> > with Breeze. For example if you work on google provider and it fails you
> > run this command:
> >
> > breeze shell --force-lowest-dependencies --test-type "Providers[google]"
> >
> > This will drop you in Breeze shell, and downgrade google provider
> > dependencies to lowest "direct" ones and allow you to run pytest tests
> > there and fix the problem by manually installing newer dependency
> versions
> > and re-running the tests.
> >
> > Then you can iterate over tests, manually downgrade and upgrade
> > dependencies as you see fit and eventually when you figure out the
> minimum
> > binding, you just add it to provider.yaml, run pre-commit and then
> > restarting the command above can be repeated.
> >
> > I've added detailed instructions on how to approach fixing "lowest
> > dependencies" problems, and when the tests fail in CI, you will be
> directed
> > to those instructions.  I even described how to effectively use bisecting
> > to easily find the actual version of dependency that needs to be set in
> > such cases.
> >
> > -------------------------------
> >
> > The list of dependency fixes:
> >
> > Airflow:
> >
> > -    "asgiref",
> > +    "asgiref>=2.3.0",
> > -    "connexion[flask]>=2.10.0,<3.0",
> > +    "connexion[flask]>=2.14.2,<3.0",
> > -    "cryptography>=39.0.0",
> > +    "cryptography>=41.0.0",
> > -    "flask-caching>=1.5.0",
> > +    "flask-caching>=2.0.0",
> > -    "flask-wtf>=0.15",
> > +    "flask-wtf>=1.1.0",
> > -    "flask>=2.2,<2.3",
> > +    "flask>=2.2.1,<2.3",
> > -    "httpx",
> > +    "httpx>=0.18.0",
> > -    "lazy-object-proxy",
> > +    "lazy-object-proxy>=1.2.0",
> > -    "opentelemetry-exporter-otlp",
> > -    "packaging>=14.0",
> > +    "opentelemetry-exporter-otlp>=1.15.0",
> > +    "packaging>=22.0",
> > -    "pluggy>=1.0",
> > -    "psutil>=4.2.0",
> > +    "pluggy>=1.5.0",
> > +    "psutil>=5.8.0",
> > -    "python-dateutil>=2.3",
> > +    "python-dateutil>=2.7.0",
> > +    "requests-toolbelt>=0.4.0",
> > -    "setproctitle>=1.1.8",
> > +    "setproctitle>=1.3.3",
> > -    "tenacity>=6.2.0,!=8.2.0",
> > +    "tenacity>=8.0.0,!=8.2.0",
> >
> > Providers:
> >
> > Amazon:
> >
> > -  - boto3>=1.33.0
> > -  - botocore>=1.33.0
> > +  - boto3>=1.34.0
> > +  - botocore>=1.34.0
> > -  - watchtower>=2.0.1,<4
> > +  - watchtower>=3.0.0,<4
> > -  - asgiref
> > +  - asgiref>=2.3.0
> > -  - jmespath
> > +  - jmespath>=0.7.0
> >
> > Amazon[aiobotocore]
> > -      - aiobotocore[boto3]>=2.5.3
> > +      - aiobotocore[boto3]>=2.10.0
> >
> > Apache Flink:
> > -  - cryptography>=2.0.0
> > +  - cryptography>=41.0.0
> >
> > Apache Hive:
> > -  - thrift>=0.9.2
> > +  - thrift>=0.11.0
> > +  - jmespath>=0.7.0
> >
> > Apache Kylin:
> > -  - kylinpy>=2.6
> > +  - kylinpy>=2.7.0
> >
> > Apache Spark:
> > -  - pyspark
> > +  - pyspark>=3.0.0
> >
> > CNCF Kubernetes:
> > -  - cryptography>=2.0.0
> > +  - cryptography>=41.0.0
> >
> > FAB:
> > -  - jmespath
> > +  - jmespath>=0.7.0
> >
> > Github:
> > -  - PyGithub!=1.58
> > +  - PyGithub>=2.1.1
> >
> > Google:
> > +  - dill>=0.2.3
> > -  - google-analytics-admin
> > +  - google-analytics-admin>=0.9.0
> > -  - google-cloud-bigquery<3.21.0,>=3.0.1
> > +  - google-cloud-bigquery<3.21.0,>=3.4.0
> > -  - google-cloud-run>=0.9.0
> > +  - google-cloud-run>=0.10.0
> > -  - httpx
> > +  - httpx>=0.18.0
> > -  - looker-sdk>=22.2.0
> > -  - pandas-gbq
> > +  - looker-sdk>=22.4.0
> > +  - pandas-gbq>=0.7.0
> > -  - PyOpenSSL
> > -  - python-slugify>=5.0
> > +  - python-slugify>=7.0.0
> > +  - PyOpenSSL>=23.0.0
> > +  - tenacity>=8.1.0
> >
> > Grpc:
> > -  - grpcio>=1.15.0
> > +  - grpcio>=1.38.0
> >
> > Microsoft Azure:
> > -  - azure-mgmt-cosmosdb
> > +  - azure-mgmt-cosmosdb>=3.0.0
> > -  - azure-storage-file-share
> > +  - azure-storage-file-share>=12.7.0
> > -  - azure-synapse-spark
> > +  - azure-synapse-spark>=0.2.0
> >
> > Mongo:
> >
> >  devel-dependencies:
> > -  - mongomock
> > +  - mongomock>=3.12.0
> >
> > MySQL:
> > -  - mysqlclient>=1.3.6
> > +  - mysqlclient>=1.4.0
> >
> > Odbc:
> > -  - pyodbc
> > +  - pyodbc>=4.0.24
> >
> > Pinecone:
> > -  - pinecone-client>=3.0.0
> > +  - pinecone-client>=3.1.0
> >
> > SFTP:
> > -  - paramiko>=2.8.0
> > +  - paramiko>=2.9.0
> >
> > SSH:
> > -  - paramiko>=2.6.0
> > +  - paramiko>=2.9.0
> >
> > Tableau:
> > -  - tableauserverclient
> > +  - tableauserverclient>=0.25
> >
> > Vertica:
> > -  - vertica-python>=0.5.1
> > +  - vertica-python>=0.6.0
> >
> > J.
> >
>

Reply via email to