Thanks for that work - I'm sure it will find many bugs before they are
found by users :)

sob., 1 cze 2024 o 18:52 Jarek Potiuk <ja...@potiuk.com> napisał(a):

> > That sounds like a really nice improvement :)
>
> Thanks ! It's "quite" useful and nice indeed.
>
> The actual impact on our users of "bad" dependency versions is quite low -
> especially since we advocate constraints for years and a lot of people are
> following, and if they have problems we always direct them to constraints
> so it's not a "big" deal). That's why it always was "nice to have" - but
> from a "perfection" point of view it is a nice improvement indeed - because
> we now will prevent a number of users struggling (and even having to
> fall-back to constraints). So it was never "first" priority.
>
> But this one will have a much bigger effect on the "ecosystem" - especially
> longer term. Airflow is a "dependency resolution hell". I discussed it with
> Damian Shaw a number of times (including over a beer in NY :). He is using
> Airflow as a test-bed for some improvements he implements and proposes to
> pip - also `uv` recently merged performance test case/benchmark based on
> airflow dependency resolution https://github.com/astral-sh/uv/pull/3643 .
> Having no lower bindings in our dependencies caused multiple problems of
> resolvers - pip and uv both struggle and backtrack a lot sometimes when
> airflow is being resolved. And that's precisely because of lack of
> lower-binding or far too low binding. As a result resolving installation
> where airflow and providers are involved will become much more stable,
> faster and predictable - especially when we will also attempt the next
> step, which is somewhat limiting the old provider versions on newer
> versions of Airflow (as suggested by Damian in
> https://github.com/apache/airflow/issues/39100 ).
>
> We **might** use the lower-binding from editable dependencies to be used as
> a "base" for those limits proposed by Damian (but that's something that we
> will take a look after we have a few releases of airflow after this PR is
> merged and we will see whether we will need it all (as I have a feeling
> that just having lower binding in Airflow core will help in a number of
> cases).
>
> J.
>
> On Sat, Jun 1, 2024 at 1:55 PM Pierre Jeambrun <pierrejb...@gmail.com>
> wrote:
>
> > Great work! That sounds like a really nice improvement :)
> >
> > Le sam. 1 juin 2024 à 10:48, Jarek Potiuk <ja...@potiuk.com> a écrit :
> >
> > > Hello everyone,
> > >
> > > TL;DR; I have finally got to something we planned when we switched to
> > UV, I
> > > have a green PR where we introduced automated management of
> > "lower-bounds"
> > > dependencies in Airflow and all providers (thanks to uv`s
> --lowest-direct
> > > resolution mechanism).
> > >
> > > The PR is here: https://github.com/apache/airflow/pull/39946 . It's
> > ready
> > > to review and merge (green).
> > >
> > > Thanks to Maciek nagging me on slack and helping with initial checks -
> I
> > > managed to complete it before going to Community Over Code this week.
> > >
> > > At the end of the email I summarized changes in dependencies that were
> > > needed to do it (so basically all the missing lower-bounds that the
> tests
> > > helped to detect and fix).
> > >
> > > Once it is merged, our CI will run a special
> > > "LowestDirectDependencyResolution" test suite that will fail if the
> tests
> > > run in PR in Airflow and any Provider uses a feature that requires
> adding
> > > `>=` limit for any library version (lower binding). This means that we
> > will
> > > finally have "proper" lower bindings for both Airflow and Providers and
> > > there will be no more cases where Airflow or any Provider fails because
> > > someone has an old version of a library installed.
> > >
> > > For example if you are using bedrock (amazon provider) - our Amazon
> > > provider had botocore > 1.3.3 but the tests found that Bedrock is only
> > > available in 1.34 and we have to bump it. Once we merge the PR, those
> > cases
> > > will be detected automatically and you will have to fix them before you
> > > merge your PRs.
> > >
> > > It's done in the way that in the special test suite, dependencies are
> > > downgraded to lowest direct ones before our unit tests are run. This is
> > > done for Airflow tests and for each provider separately, so we are able
> > to
> > > detect missing lower bounds very accurately - separately for core
> Airflow
> > > and separately for each provider.
> > >
> > > When the test fails in CI, it will be very easy to reproduce it locally
> > > with Breeze. For example if you work on google provider and it fails
> you
> > > run this command:
> > >
> > > breeze shell --force-lowest-dependencies --test-type
> "Providers[google]"
> > >
> > > This will drop you in Breeze shell, and downgrade google provider
> > > dependencies to lowest "direct" ones and allow you to run pytest tests
> > > there and fix the problem by manually installing newer dependency
> > versions
> > > and re-running the tests.
> > >
> > > Then you can iterate over tests, manually downgrade and upgrade
> > > dependencies as you see fit and eventually when you figure out the
> > minimum
> > > binding, you just add it to provider.yaml, run pre-commit and then
> > > restarting the command above can be repeated.
> > >
> > > I've added detailed instructions on how to approach fixing "lowest
> > > dependencies" problems, and when the tests fail in CI, you will be
> > directed
> > > to those instructions.  I even described how to effectively use
> bisecting
> > > to easily find the actual version of dependency that needs to be set in
> > > such cases.
> > >
> > > -------------------------------
> > >
> > > The list of dependency fixes:
> > >
> > > Airflow:
> > >
> > > -    "asgiref",
> > > +    "asgiref>=2.3.0",
> > > -    "connexion[flask]>=2.10.0,<3.0",
> > > +    "connexion[flask]>=2.14.2,<3.0",
> > > -    "cryptography>=39.0.0",
> > > +    "cryptography>=41.0.0",
> > > -    "flask-caching>=1.5.0",
> > > +    "flask-caching>=2.0.0",
> > > -    "flask-wtf>=0.15",
> > > +    "flask-wtf>=1.1.0",
> > > -    "flask>=2.2,<2.3",
> > > +    "flask>=2.2.1,<2.3",
> > > -    "httpx",
> > > +    "httpx>=0.18.0",
> > > -    "lazy-object-proxy",
> > > +    "lazy-object-proxy>=1.2.0",
> > > -    "opentelemetry-exporter-otlp",
> > > -    "packaging>=14.0",
> > > +    "opentelemetry-exporter-otlp>=1.15.0",
> > > +    "packaging>=22.0",
> > > -    "pluggy>=1.0",
> > > -    "psutil>=4.2.0",
> > > +    "pluggy>=1.5.0",
> > > +    "psutil>=5.8.0",
> > > -    "python-dateutil>=2.3",
> > > +    "python-dateutil>=2.7.0",
> > > +    "requests-toolbelt>=0.4.0",
> > > -    "setproctitle>=1.1.8",
> > > +    "setproctitle>=1.3.3",
> > > -    "tenacity>=6.2.0,!=8.2.0",
> > > +    "tenacity>=8.0.0,!=8.2.0",
> > >
> > > Providers:
> > >
> > > Amazon:
> > >
> > > -  - boto3>=1.33.0
> > > -  - botocore>=1.33.0
> > > +  - boto3>=1.34.0
> > > +  - botocore>=1.34.0
> > > -  - watchtower>=2.0.1,<4
> > > +  - watchtower>=3.0.0,<4
> > > -  - asgiref
> > > +  - asgiref>=2.3.0
> > > -  - jmespath
> > > +  - jmespath>=0.7.0
> > >
> > > Amazon[aiobotocore]
> > > -      - aiobotocore[boto3]>=2.5.3
> > > +      - aiobotocore[boto3]>=2.10.0
> > >
> > > Apache Flink:
> > > -  - cryptography>=2.0.0
> > > +  - cryptography>=41.0.0
> > >
> > > Apache Hive:
> > > -  - thrift>=0.9.2
> > > +  - thrift>=0.11.0
> > > +  - jmespath>=0.7.0
> > >
> > > Apache Kylin:
> > > -  - kylinpy>=2.6
> > > +  - kylinpy>=2.7.0
> > >
> > > Apache Spark:
> > > -  - pyspark
> > > +  - pyspark>=3.0.0
> > >
> > > CNCF Kubernetes:
> > > -  - cryptography>=2.0.0
> > > +  - cryptography>=41.0.0
> > >
> > > FAB:
> > > -  - jmespath
> > > +  - jmespath>=0.7.0
> > >
> > > Github:
> > > -  - PyGithub!=1.58
> > > +  - PyGithub>=2.1.1
> > >
> > > Google:
> > > +  - dill>=0.2.3
> > > -  - google-analytics-admin
> > > +  - google-analytics-admin>=0.9.0
> > > -  - google-cloud-bigquery<3.21.0,>=3.0.1
> > > +  - google-cloud-bigquery<3.21.0,>=3.4.0
> > > -  - google-cloud-run>=0.9.0
> > > +  - google-cloud-run>=0.10.0
> > > -  - httpx
> > > +  - httpx>=0.18.0
> > > -  - looker-sdk>=22.2.0
> > > -  - pandas-gbq
> > > +  - looker-sdk>=22.4.0
> > > +  - pandas-gbq>=0.7.0
> > > -  - PyOpenSSL
> > > -  - python-slugify>=5.0
> > > +  - python-slugify>=7.0.0
> > > +  - PyOpenSSL>=23.0.0
> > > +  - tenacity>=8.1.0
> > >
> > > Grpc:
> > > -  - grpcio>=1.15.0
> > > +  - grpcio>=1.38.0
> > >
> > > Microsoft Azure:
> > > -  - azure-mgmt-cosmosdb
> > > +  - azure-mgmt-cosmosdb>=3.0.0
> > > -  - azure-storage-file-share
> > > +  - azure-storage-file-share>=12.7.0
> > > -  - azure-synapse-spark
> > > +  - azure-synapse-spark>=0.2.0
> > >
> > > Mongo:
> > >
> > >  devel-dependencies:
> > > -  - mongomock
> > > +  - mongomock>=3.12.0
> > >
> > > MySQL:
> > > -  - mysqlclient>=1.3.6
> > > +  - mysqlclient>=1.4.0
> > >
> > > Odbc:
> > > -  - pyodbc
> > > +  - pyodbc>=4.0.24
> > >
> > > Pinecone:
> > > -  - pinecone-client>=3.0.0
> > > +  - pinecone-client>=3.1.0
> > >
> > > SFTP:
> > > -  - paramiko>=2.8.0
> > > +  - paramiko>=2.9.0
> > >
> > > SSH:
> > > -  - paramiko>=2.6.0
> > > +  - paramiko>=2.9.0
> > >
> > > Tableau:
> > > -  - tableauserverclient
> > > +  - tableauserverclient>=0.25
> > >
> > > Vertica:
> > > -  - vertica-python>=0.5.1
> > > +  - vertica-python>=0.6.0
> > >
> > > J.
> > >
> >
>

Reply via email to