Great work! Investments in CI pay dividends to the whole community. On Sat, Aug 22, 2020 at 8:12 AM Jarek Potiuk <jarek.pot...@polidea.com> wrote:
> Hello everyone, > > Just wanted to let you know that we merged last week quite an overhaul of > the CI architecture we have in Github Actions. > > TL;DR; It should be faster, more stable and it should be super-easy to > reproduce any CI failure locally. > > We should have quite a bit faster, much more stable - and as a side effect > - easy to diagnose CI builds. There are few PRs left to merge - solving > some teething problems and adding some optimizations and we might need to > implement one workaround for missing GitHub API, but it looks pretty good > after few days of watching. > > The gist of the change is that we could start using a new "workflow_run" > feature of GitHub Actions that allows us to only build each image once and > reuse it for all the jobs - previously those images were built (using > latest sources) for every single job. Now they are built only once. > > Some stats for average runs (we have way bigger gains in situations where > python released new patch-level version): > > - Prepare image job: 5 minutes 30 seconds -> 1 minute 7 seconds (~80% > improvement) > - Longest job time: 34 minutes => 29 minutes 30 seconds (~15% > improvement in longest job) > - Build time saved per build (!) = 27 jobs * 4.5 minutes ~ 2h machine > build time saved for each build (!) > > This change also should improve overall stability. There were a number of > problems where building image failed - this should be now ~ 10 x less > likely to happen as we build images only 3 times instead of ~30. > > As a result - we are better citizens, but also it means we should have far > less queuing time in case several PRs start in quick succession. > > Also - as a side effect but an important one - we have now a super-easy way > to reproduce any failure in CI. This is the final setup which I thought > about when I implemented Breeze. Now anyone can just log in to GitHub > registry and run this: > > `breeze --github-image-id <RUN_ID> --backend <BACKEND> --python <X.Y>` > > Then you should be dropped into the EXACT same environment that was used > for a particular failed "run" in Github Actions - including airflow sources > used for that. You do not have to check-out the code etc. > > This means that you (or anyone else trying to help) should be able to > re-run most of the failed tests locally and reproduce the failures (and try > to fix them). > > Documentation with all the details and command you can use is coming in > https://github.com/apache/airflow/pull/10380 - happy to get some reviews. > > J. > > -- > > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] <https://www.polidea.com/> >