Follow up from today: 1) I optimised image building for CI a little bit stil (but that's single-percent digit) 2) I also found a way to make uv works for PROD images. It was not working so far because of our `--user` way of installing packages, but for a long time I wanted to get rid of it as it caused many problems, but I believe I finally found a good way to do it - with 100% backwards compatibility with some of the cases of PythonVirtualenv - previously I could not use virtualenv to install Airflow in our PROD image but it seems that a small trick with the right location of the venv in our image does the job with 100% compatibility
This one is a bit tricky - because we do not want (for a long time) to switch `pip` to `uv` for our users, so while in CI most of the PROD images (to save time) will be build with `--use-uv`, there is a separate build and set of tests that will run for `--no-use-uv`. Regular users will have to use `--build-arg AIRFLOW_USE_UV` to switch to using uv to build the image. Bonus point: even in `pip` built images users will be able to use `uv` for their installations (this is something our users are already asking for https://github.com/apache/airflow/issues/37785 - seems like uv is - similarly like ruff - spreading like fire). The PR here: https://github.com/apache/airflow/pull/37796. Overall it's ~55% faster to build a PROD image from scratch with uv than with pip on my machine (2m vs 4m45s) - pretty consistent percentage gain as in the CI image. End result are pretty much identical images (for size and looks like content - and they pass our PROD image tests - and airflow works as usual in them) J. On Tue, Feb 27, 2024 at 7:58 PM Jarek Potiuk <ja...@potiuk.com> wrote: > One more update - I am still looking at it and fine-tuning stuff and will > have a few more things coming > > I found out that we were still using `pip` for `pip constraints > generation` (those are the constraints that our users use). > I switched that one to `uv` and it's now 30 seconds instead of more than 5 > minutes - which is more than 10x improvement. > > Plus - we get all-canonical `pypi` names back, because I also switched to > `uv pip freeze` one and uv nicely canonicalizes all the constraints > generated. I am also switching now with > https://github.com/apache/airflow/pull/37754 to a new 0.1.11 version that > has some bug-fixes and new features, this PR also add upgrade-check that > will tell us when the new version of `pip` and `uv` are available (by > failing canary build job). > > J. > > On Tue, Feb 27, 2024 at 7:49 PM Oliveira, Niko <oniko...@amazon.com.invalid> > wrote: > >> Fantastic results! >> >> > It also means that if you've been using breeze and were sometimes >> afraid to >> >> > hit "y" to rebuild the image, being afraid that it will take 20 minutes >> or >> > so - not any more. It should be WAY faster now. >> >> I'm very excited about this speed up as well as our CI :) >> >> ________________________________ >> From: Jarek Potiuk <ja...@potiuk.com> >> Sent: Tuesday, February 27, 2024 2:44:14 AM >> To: dev@airflow.apache.org >> Subject: RE: [EXTERNAL] [COURRIEL EXTERNE] [DISCUSS] Considering trying >> out uv for our CI workflows >> >> CAUTION: This email originated from outside of the organization. Do not >> click links or open attachments unless you can confirm the sender and know >> the content is safe. >> >> >> >> AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. >> Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez >> pas confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que >> le contenu ne présente aucun risque. >> >> >> >> Summarising where we are: >> >> After ~24 hrs of operations, it looks really cool and fulfills (and >> actually exceeds) all my expectations. >> >> * Multiple PRs succeeded, we got quite a few constraints updated >> automatically after successful canary runs: >> https://github.com/apache/airflow/commits/constraints-main/ (and they >> look >> perfectly fine - pretty much what I'd expect) >> * I looked through a number of image builds in "canary" runs and the >> regular 10-12 minutes build-image jobs are down to 3-4 minutes >> * I just did an experiment and on my machine I run a complete from the >> scratch CI image with new dependencies build for breeze (with `breeze ci >> image build --python 3.9 --docker-cache disabled >> --upgrade-to-newer-dependencies` ) and compared it with v2-8-test branch >> where we do not have the change applied yet >> >> Results (on my desktop machine (16 cores, network 1Gb download and very >> fast disk): >> >> * v2-8-test: 730 s -> *12 minutes * >> * main: 227 s -> less than *4 minutes (!)* >> >> That's 70% (!) faster. This is a complete full rebuild of the image, >> including installing all dependencies from the scratch and attempting to >> upgrade them to the latest compatible versions. That is the WORST case. >> Of course it will vary - depending on the network speed you have and >> number >> of CPU (unlike `pip` for now `uv` heavily uses parallelism - both for >> downloads and installation and that is one of the reasons why the >> difference is so huge). I'd love to hear the results of such comparisons >> from others with different machines/networking/disks - to get a bit more >> scientific data points. >> >> It also means that if you've been using breeze and were sometimes afraid >> to >> hit "y" to rebuild the image, being afraid that it will take 20 minutes or >> so - not any more. It should be WAY faster now. >> >> I will also proceed to attempt to use the `--resolution lowest` soon and >> try to see if we can have a nice automation in place to bump our >> min-versions to the "actually working" versions - for all our extras. That >> would be a major win for our users - as there will never be a case in the >> future that they upgrade airflow to a newer version and some old >> dependency >> remains and is not compatible. It does not happen often, >> >> Seeing the speed difference - I am actually going now to regularly use `uv >> pip` for any local installation as well - it should save a LOT of time - >> especially that if you have multiple environments, it keeps a single cache >> for all your installed packages (and their metadata) - this means that if >> you have several virtualenvs installed and switch between them, the >> installation and reinstallation of packages between those packages should >> be lightning fast (like single seconds rather than 10s of seconds for >> smallest installation). I'd heartily recommend it to anyone. >> >> Let's see about the stability. I know there are few edge-cases that are >> not >> handled well - Damian helpfully pointed out to the "apache-airflow[all]" >> case that currently is problematic, so I will keep an eye on new versions >> and fixes (In CI of ours we are currently pinned to 0.1.10 - so we are >> shielded from any potential stability problems and we will need to >> manually >> upgrade to newer versions when they appear). >> >> J. >> >