Some more findings. Overall, I can confirm that with `uv` we will get significant - 60 - 70% on build image times. This will impact both CI but also `breeze` local rebuilds.
I am getting closer to a mergeable state. I switched to https://github.com/apache/airflow/pull/37692 to test "upgrade to latest dependencies" workflow and canary build impact. The PR is getting greener and greener. I have a few last things to address. An interesting story is that a flaky test in CLI (tests/cli/commands/test_webserver_command.py::TestCliWebServer::test_cli_webserver_background) we had is suddenly significantly more flaky, so I will have to take a look at how to finally remove the flakiness from it. This is a good thing because this test had been flaky for quite a while but it was very difficult to reproduce and seems that for some reason it is now much easier to reproduce (which also means we will know when we fix it0. Looking at stats it seems that a lot (but not all) of the speed improvement might come with Parallel downloading of dependencies - which are in the works also for pip (https://github.com/pypa/pip/pull/12388) - though it's not clear how much it will help as the Batch Dowloader in pip is involved only after resolution. We will see after it is implemented if it changes things. I am also now switching PROD builds to use uv to see how much we can save, but I leave `pip` as default for releases and users, the only difference is CI - I've added separate step for `pip` PROD build to compare and to make sure it's running fine in CI. The numbers: * for "upgrade to newer dependencies" scenario - uv is WAY faster - as I thought. In the "current" stage of the main it is: ~7m pip, 5 s (!) uv. Here caching of uv makes a huge difference, and while there is some work in `pip` and resolvelib (looking at PRs/issues) it's going to be quite some time to get similar results from pip and "upgrade" builds will go down eventually from 12m to 5 m - which is a major improvement - especially for elapsed time of CI builds. * from what I see package installation is super-fast in uv. Installing 614 packages takes (wait for it) 1s (!) where I saw it taking way over a minute with `pip`. This will be hard to beat I think with Python vs. rust. Some notes about differences I saw: PIP and UV lead to slightly different resolutions when upgrading. This is not a surprise because different heuristics are involved (the resolution algorithm is np-complete (https://research.swtch.com/version-sat) and it's very inefficient to run the full resolution, so both pip and uv take a little different approach for shortcuts and limiting the possible space of solutions. I've done a few PRs limiting (lower-bound) some dependencies to bring them closer) - but at the end what we get is "correct" in both cases - I continue running `pip check` to make sure that whatever UV finds is also correct according to `pip`. Nothing really major there. There were literally few cases that required some manual adjustments. Nothing unmanageable also in the future, I was doing similar tweaks with `pip` as well to help with the resolution. Example of differences (left. first is pip, right, second is uv) < importlib-resources==5.13.0 --- > importlib-resources==6.1.1 vs. < pycountry==23.12.11 --- > pycountry==22.3.5 It means that with `uv` we have a newer version of importlib_resources but an older version of pycountry. This one I will handle by bumping pycountry in case of facebook provider and bump it to > 23.12 as the old version is 1.5 years old. J. On Sun, Feb 25, 2024 at 12:52 AM Hussein Awala <[email protected]> wrote: > That's impressive! I love this tool, not only for reducing CI time but also > for saving the environment. > Some of the previous improvements were to further parallelize CI jobs to > complete the CI faster, but this tool will help reduce the overall time. > > Big +1 > > On Sun, Feb 25, 2024 at 12:34 AM Jarek Potiuk <[email protected]> wrote: > > > Hello here. > > > > I have a PR https://github.com/apache/airflow/pull/37683 that > implements: > > > > * ability to choose either uv or PIP when building our images > > * CI images are built with uv by default (but you can use `--no-use-uv` > as > > a flag and switch back to `pip` > > * PROD images are built with pip by default (but you can us `--use-uv` > as a > > flag an switch to uv > > > > The preliminary tests show indeed that uv not only has a much faster > > baseline, but also their use of caching fits extremely well into our > > strategy of building images and we will get huge improvements of our CI > > build timing when using uv. > > > > Just for the context - our CI images when built are using a caching > > strategy to optimise for f > > > > 1) fast building when there are no changes (around 1 minute to build with > > pip), > > 2) slower building when someone adds or modifies non-conflicting > dependency > > (around. 8 minutes to build, out of which ~ 6 m is pip resolution and > > installation) > > 3) much longer build time when there are conflicting dependencies or when > > we change Dockerfile or scripts or when Python base image changes (around > > 27 minutes build out of which pip resolving is ~ 20m). > > > > Those are all `pip` numbers. Currently `pip` does not use resolution > > caching between the steps. Comparison of some basic installation steps > from > > initial tests show that UV is way faster: > > > > * Resolving and Installing airflow with [devel-ci] (610 dependencies): > pip > > ~ 6m, uv ~ 1m 30 s > > * Re-resolving and reinstalling [devel-ci] using local pyproject.toml; > pip > > ~ 4m (cache is not used), uv ~ 4s (!!!!) - because cache is used in this > > case. > > > > I have not yet tested well (but I will once they happen) --eager upgrade > of > > dependencies (pip - very much depends but it's often in the range of 10 > > minutes) - I expect it not to take more than 2-3 minutes with uv > > > > So overall it looks like we are looking at those improvements: > > > > 1) Regular builds with no dependency changes: pip.~ 1m , uv ~ 1m > > (because we are using docker layer caching and pip resolution and > > installation is not used at all) > > 2) Updating dependencies: 8m with pip will probably go down with uv to ~ > > 3.30s => 60% improvement and in many cases ~ 2.5 m when there are no > remote > > changes and cache is used (70% improvement) > > 3) Re-resolving and reinstalling everything 27 m will probably go down > with > > uv to ~ 9m => 67% improvements. > > > > If those numbers hold and the resolution quality will be comparable to > > `pip` - then well, it's definitely worth it - and the numbers are very > > close to what the `uv` authors claimed. > > > > I am impressed :) > > > > J. > > > > > > > > On Thu, Feb 22, 2024 at 5:25 AM Amogh Desai <[email protected]> > > wrote: > > > > > I agree with Niko here. > > > > > > If someone is willing to give it a try, we should enable it > > experimentally > > > and give it a stint for a couple of weeks. If we see significant > results, > > > we can adopt it. > > > > > > Thanks & Regards, > > > Amogh Desai > > > > > > On Thu, Feb 22, 2024 at 3:32 AM Oliveira, Niko > > <[email protected] > > > > > > > wrote: > > > > > > > The Astral folks also seem very focused on it being a > drop-in/compliant > > > > replacement for pip. So I think it's definitely worth dropping it in > > and > > > > seeing if we get the expected performance improvements. If tests > still > > > pass > > > > and user facing constraints and install instructions remain > unchanged I > > > > don't see why not, if someone is willing to spend the time on it. > Never > > > > mind the extra features it would give us (I, like others, am also > very > > > > excited about --resolution=lowest, ability). > > > > > > > > ________________________________ > > > > From: Andrey Anshin <[email protected]> > > > > Sent: Tuesday, February 20, 2024 12:26:56 AM > > > > To: [email protected] > > > > Subject: RE: [EXTERNAL] [COURRIEL EXTERNE] [DISCUSS] Considering > trying > > > > out uv for our CI workflows > > > > > > > > CAUTION: This email originated from outside of the organization. Do > not > > > > click links or open attachments unless you can confirm the sender and > > > know > > > > the content is safe. > > > > > > > > > > > > > > > > AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur > > externe. > > > > Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne > > > pouvez > > > > pas confirmer l’identité de l’expéditeur et si vous n’êtes pas > certain > > > que > > > > le contenu ne présente aucun risque. > > > > > > > > > > > > > > > > > I share Andrey's skepticism. It's just yet another tool which has > an > > > > unclear > > > > development strategy. > > > > > > > > My point was more about a matter of presentation. If someone told you > > > "this > > > > is a new tool, like a killer of previous tools" then you might think > > > > "Yeah...yeah...yeah.. yet another replacement to tool X... not > really > > > > interesting". On the other hand if someone told you what in cases you > > > might > > > > solve, then this might be a mind changer. > > > > > > > > Especially the promising `--resolution=lowest` option. We always want > > to > > > > test something with minimal dependencies because we are not sure that > > it > > > > might work with pretty old dependencies, and recently I've started to > > > work > > > > on POC to collect minimal versions of the Airflow and Providers. And > at > > > the > > > > moment when I almost finished it the uv was released. Well sometimes > it > > > is > > > > better to wait a bit and maybe someone would invent the same > > > > solution 😁 and you don't have to spend a personal time. > > > > > > > > So as POC I'm on it, we still need a `pip` and validate some stuff > by a > > > pip > > > > because it is only one officially supported way to install Airflow > but > > if > > > > something could be improved in the CI then I'm on it, in most cases > it > > > > would be behind of Breeze and many of the contributors might be even > > not > > > > noticed that something changed. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, 20 Feb 2024 at 09:56, Jarek Potiuk <[email protected]> wrote: > > > > > > > > > Actually - of you read that blog post, the strategy is clear - they > > aim > > > > to > > > > > create a comprehensive packaging tooling and improvnts are measured > > > > (80-100 > > > > > times they claim - I using caching - they (unlike pip) use a lot of > > > local > > > > > caching including resolving dependencies). > > > > > > > > > > So I think both arguments are not valid if you ask me. > > > > > > > > > > wt., 20 lut 2024, 02:37 użytkownik Alexander Shorin < > > [email protected] > > > > > > > > > napisał: > > > > > > > > > > > I share Andrey's skepticism. It's just yet another tool which has > > an > > > > > > unclear development strategy. Should you make it a free testing > > > suite? > > > > > What > > > > > > project would receive in exchange? A lot of words about being > > faster, > > > > but > > > > > > how much? Are these milliseconds worth to change the stable tool > > > with a > > > > > new > > > > > > one? And will it notably improve something? > > > > > > > > > > > > I think it's worth to try it just for fun and provide feedback, > but > > > > it'll > > > > > > have to pass a long road to become such stable as pip. > > > > > > > > > > > > -- > > > > > > ,,,^..^,,, > > > > > > > > > > > > > > > > > > On Tue, Feb 20, 2024 at 3:06 AM Jarek Potiuk <[email protected]> > > > wrote: > > > > > > > > > > > > > My opinion: > > > > > > > > > > > > > > I think there is a place for a number of such tools. For a long > > > time > > > > > the > > > > > > > packaging team and `pip` team have been working not only on > `pip` > > > > > > > implementation but also (and most importantly) to make sure > that > > > what > > > > > > `pip` > > > > > > > does is to be the beacon of standardisation of packaging APIs > and > > > > PEPs. > > > > > > It > > > > > > > will never IMHO have a lot of the fancy features that other > tools > > > > might > > > > > > > provide (like the ones I mentioned). It will always be there to > > > > provide > > > > > > the > > > > > > > robust and solid CLI to run all packaging things, but there are > > > > plenty > > > > > of > > > > > > > opportunities to provide improved or modified, or more (or > less) > > > > > > > opinionated ways of doing things that are addressing some cases > > > that > > > > > > `pip` > > > > > > > team simply will not be able or willing to handle, preferring > > > "pure" > > > > > > > standard approach vs. implement all the optional things. For > > > example > > > > > the > > > > > > > way how pre-releases are handled can be improved to be more > > > > selective. > > > > > > The > > > > > > > PEP describing it gives the tools an option to add more fancy > > > > > behaviours > > > > > > > (some of which we could find useful in our CI tooling). Should > > > `pip` > > > > > > > implement those - I don't think so. It would distract > maintainers > > > > from > > > > > > > other more important things. It is quite ok to use other > tooling > > in > > > > > > places > > > > > > > like our CI, where they do some parts of the installation > better. > > > > > > > > > > > > > > For me `pip` is going more into the direction of `usable > > reference > > > > > > > implementation of package installed` - any standard/ PEP will > not > > > > > matter > > > > > > if > > > > > > > `pip` does not implement it. But others might go in different > > > > > directions > > > > > > > and implement some less popular features and do it better, > > faster, > > > > with > > > > > > > greater flexibility. IMHO it's a win-win. > > > > > > > > > > > > > > J. > > > > > > > > > > > > > > > > > > > > > On Mon, Feb 19, 2024 at 11:40 PM Andrey Anshin < > > > > > [email protected] > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Yesterday my friend shared with me that tool and I've been > told > > > > that > > > > > > more > > > > > > > > presumably it would be a niche tool. I've been told "who > needs > > > yet > > > > > > > another > > > > > > > > installer which stands to resolve all your problems' '. > > > > > > > > I guess I was wrong? > > > > > > > > > > > > > > > > On Tue, 20 Feb 2024 at 00:53, Jarek Potiuk <[email protected] > > > > > > wrote: > > > > > > > > > > > > > > > > > Hey everyone, > > > > > > > > > > > > > > > > > > Few days ago the ruff creators have released a new tool uv > - > > > > which > > > > > is > > > > > > > an > > > > > > > > > extremely fast (written in rust) and fully featured tool > > > > generally > > > > > > > fully > > > > > > > > > compatible with `pip`. > > > > > > > > > > > > > > > > > > Blog post here: https://astral.sh/blog/uv > > > > > > > > > > > > > > > > > > It looks like It has a number of things that would make our > > CI > > > > > cases > > > > > > > and > > > > > > > > > tooling quite a bit faster and better including a few > things > > > > that I > > > > > > > have > > > > > > > > > implemented some workarounds for and some that I have not > > > > > > > > > implemented because `pip` had no good solution. > > > > > > > > > > > > > > > > > > I looked at the docs and it solves some problems that are > > > > currently > > > > > > > > > difficult or impossible to handle with `pip`: > > > > > > > > > > > > > > > > > > * ability to use overrides (which are constraints on > > steroids - > > > > > > > allowing > > > > > > > > to > > > > > > > > > override limits specified by the packages - this will be > very > > > > > useful > > > > > > to > > > > > > > > > better handle our cases with "chicken-egg" providers (for > > > example > > > > > > like > > > > > > > we > > > > > > > > > had in FAB) where we have pre-release packages depending on > > > each > > > > > > other > > > > > > > > > > > > > > > > > > * different resolution strategies including > > --resolution=lowest > > > > > which > > > > > > > > will > > > > > > > > > finally allow us to see whether airflow's lower bounds are > > > still > > > > > > > holding > > > > > > > > > (i.e. - will our test still pass if we use the lowest > > supported > > > > > > version > > > > > > > > of > > > > > > > > > our dependencies? this is something i wanted to do for > quite > > > > some > > > > > > time > > > > > > > > and > > > > > > > > > recorded an issue for that - > > > > > > > > > https://github.com/apache/airflow/issues/35549 > > > > > > > > > but lack of tooling support made it a wish, with > > > > > > `--resolution=lowest` > > > > > > > it > > > > > > > > > seems like super-easy thing to do. > > > > > > > > > > > > > > > > > > * It is said to be many, many times faster - with better > > > caching > > > > > and > > > > > > > > > resolution speeds (similarly like with ruff they claim > orders > > > of > > > > > > > > magnitude > > > > > > > > > speedups in a number of cases). We can likely make very > good > > > use > > > > of > > > > > > it > > > > > > > > and > > > > > > > > > speed up some parts of our CI workflow significantly. > > > > > > > > > > > > > > > > > > I might likely do some experimenting with uv in our > > toolchain, > > > > but > > > > > > > wanted > > > > > > > > > to make sure we are all aware of it - and ask if someone > has > > > > > > something > > > > > > > > > against it (and maybe someone would like to do some work > > there > > > > > trying > > > > > > > it > > > > > > > > > out - I will be happy to guide others with the dev/tooling > > > > mindset > > > > > > and > > > > > > > > > incline to do some changes there/review PRs and cooperate > on > > > > > testing > > > > > > > > those > > > > > > > > > things. > > > > > > > > > > > > > > > > > > It's not a user-facing change, and I do not think we want > to > > > get > > > > > rid > > > > > > of > > > > > > > > > `pip` as an installation tool in general (in our images and > > > user > > > > > > facing > > > > > > > > > side) - it's mostly an internal CI tooling improvement I am > > > > > thinking > > > > > > > of. > > > > > > > > > Maybe at some point in time we can recommend it also for > > > > > development > > > > > > > > > workflows, and maybe someday it will gain enough popularity > > to > > > > > think > > > > > > > > about > > > > > > > > > recommending it to our users, but definitely not now nor in > > > even > > > > > > > mid-term > > > > > > > > > future. > > > > > > > > > > > > > > > > > > Let me know what you think. > > > > > > > > > > > > > > > > > > Repo here: https://github.com/astral-sh/uv > > > > > > > > > > > > > > > > > > J. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
