Follow up from today:

1) I optimised image building for CI a little bit stil (but that's
single-percent digit)
2) I also found a way to make uv works for PROD images. It was not working
so far because of our `--user` way of installing packages, but for a long
time I wanted to get rid of it as it caused many problems, but I believe I
finally found a good way to do it  - with 100% backwards compatibility with
some of the cases of PythonVirtualenv - previously I could not use
virtualenv to install Airflow in our PROD image but it seems that a small
trick with the right location of the venv in our image does the job with
100% compatibility

This one is a bit tricky - because we do not want (for a long time) to
switch `pip` to `uv` for our users, so while in CI most of the PROD images
(to save time) will be build with `--use-uv`, there is a separate build and
set of tests that will run for `--no-use-uv`. Regular users will have to
use `--build-arg AIRFLOW_USE_UV` to switch to using uv to build the image.
Bonus point: even in `pip` built images users will be able to use `uv` for
their installations (this is something our users are already asking for
https://github.com/apache/airflow/issues/37785 - seems like uv is -
similarly like ruff - spreading like fire).

The PR here: https://github.com/apache/airflow/pull/37796. Overall it's
~55% faster to build a PROD image from scratch with uv than with pip on my
machine (2m vs 4m45s)  - pretty consistent percentage gain as in the CI
image. End result are pretty much identical images (for size and looks like
content - and they pass our PROD image tests - and airflow works as usual
in them)

J.

On Tue, Feb 27, 2024 at 7:58 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> One more update - I am still looking at it and fine-tuning stuff and will
> have a few  more things coming
>
> I found out that we were still using `pip` for `pip constraints
> generation` (those are the constraints that our users use).
> I switched that one to `uv` and it's now 30 seconds instead of more than 5
> minutes - which is more than 10x improvement.
>
> Plus - we get all-canonical `pypi` names back, because I also switched to
> `uv pip freeze` one and uv nicely canonicalizes all the constraints
> generated. I am also switching now with
> https://github.com/apache/airflow/pull/37754 to a new 0.1.11 version that
> has some bug-fixes and new features, this PR also add upgrade-check that
> will tell us when the new version of `pip` and `uv` are available (by
> failing canary build job).
>
> J.
>
> On Tue, Feb 27, 2024 at 7:49 PM Oliveira, Niko <oniko...@amazon.com.invalid>
> wrote:
>
>> Fantastic results!
>>
>> > It also means that if you've been using breeze and were sometimes
>> afraid to
>>
>> > hit "y" to rebuild the image, being afraid that it will take 20 minutes
>> or
>> > so - not any more. It should be WAY faster now.
>>
>> I'm very excited about this speed up as well as our CI :)
>>
>> ________________________________
>> From: Jarek Potiuk <ja...@potiuk.com>
>> Sent: Tuesday, February 27, 2024 2:44:14 AM
>> To: dev@airflow.apache.org
>> Subject: RE: [EXTERNAL] [COURRIEL EXTERNE] [DISCUSS] Considering trying
>> out uv for our CI workflows
>>
>> CAUTION: This email originated from outside of the organization. Do not
>> click links or open attachments unless you can confirm the sender and know
>> the content is safe.
>>
>>
>>
>> AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe.
>> Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez
>> pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que
>> le contenu ne présente aucun risque.
>>
>>
>>
>> Summarising where we are:
>>
>> After ~24 hrs of operations, it looks really cool and fulfills (and
>> actually exceeds) all my expectations.
>>
>> * Multiple PRs succeeded, we got quite a few constraints updated
>> automatically after successful canary runs:
>> https://github.com/apache/airflow/commits/constraints-main/ (and they
>> look
>> perfectly fine - pretty much what I'd expect)
>> * I looked through a number of image builds in "canary" runs and the
>> regular 10-12 minutes build-image jobs are down to 3-4 minutes
>> * I just did an experiment and on my machine I run a complete from the
>> scratch CI image with new dependencies build for breeze (with `breeze ci
>> image build --python 3.9 --docker-cache disabled
>> --upgrade-to-newer-dependencies` ) and compared it with v2-8-test branch
>> where we do not have the change applied yet
>>
>> Results (on my desktop machine (16 cores, network 1Gb download and very
>> fast disk):
>>
>> * v2-8-test: 730 s -> *12 minutes *
>> * main: 227 s -> less than *4 minutes (!)*
>>
>> That's 70% (!) faster. This is a complete full rebuild of the image,
>> including installing all dependencies from the scratch and attempting to
>> upgrade them to the latest compatible versions. That is the WORST case.
>> Of course it will vary - depending on the network speed you have and
>> number
>> of CPU (unlike `pip` for now `uv` heavily uses parallelism - both for
>> downloads and installation and that is one of the reasons why the
>> difference is so huge). I'd love to hear the results of such comparisons
>> from others with different machines/networking/disks - to get a bit more
>> scientific data points.
>>
>> It also means that if you've been using breeze and were sometimes afraid
>> to
>> hit "y" to rebuild the image, being afraid that it will take 20 minutes or
>> so - not any more. It should be WAY faster now.
>>
>> I will also proceed to attempt to use the `--resolution lowest` soon and
>> try to see if we can have a nice automation in place to bump our
>> min-versions to the "actually working" versions - for all our extras. That
>> would be a major win for our users - as there will never be a case in the
>> future that they upgrade airflow to a newer version and some old
>> dependency
>> remains and is not compatible. It does not happen often,
>>
>> Seeing the speed difference - I am actually going now to regularly use `uv
>> pip` for any local installation as well - it should save a LOT of time -
>> especially that if you have multiple environments, it keeps a single cache
>> for all your installed packages (and their metadata) - this means that if
>> you have several virtualenvs installed and switch between them, the
>> installation and reinstallation of packages between those packages should
>> be lightning fast (like single seconds rather than 10s of seconds for
>> smallest installation). I'd heartily recommend it to anyone.
>>
>> Let's see about the stability. I know there are few edge-cases that are
>> not
>> handled well - Damian helpfully pointed out to the "apache-airflow[all]"
>> case that currently is problematic, so I will keep an eye on new versions
>> and fixes (In CI of ours we are currently pinned to 0.1.10 - so we are
>> shielded from any potential stability problems and we will need to
>> manually
>> upgrade to newer versions when they appear).
>>
>> J.
>>
>

Reply via email to