Subscribe to builds@a.o and users@infra.a.o, There is an #asfinfra slack channel. to follow. You can also take part in the monthly infra roundtables: https://infra.apache.org/roundtable.html
J. On Tue, Apr 18, 2023 at 5:22 AM Sai Boorlagadda <sai.boorlaga...@gmail.com> wrote: > Thanks, Gavin, > > How can I be informed or follow the status of this initiative? > > Sai > > On Fri, 14 Apr 2023 at 08:02, Gavin McDonald <gmcdon...@apache.org> wrote: > > > Hi All, > > > > Infra is working on self-hosted Github Runners provided by Infra to > > projects, hosted in Azure and > > hope to provide a few varieties of arch/cpu/mem. > > > > Will keep this list updated as it progresses > > > > Gav... > > > > > > On Fri, Apr 14, 2023 at 9:23 AM Sai Boorlagadda < > sai.boorlaga...@gmail.com > > > > > wrote: > > > > > thanks for all the feedback. At this time there are no sponsors for > Geode > > > so cannot have self-hosted runners. I have already split the job > running > > > tests into multiple jobs by gradle module, and even then one particular > > > gradle module takes more than 6 hours. Will try to parallelize and see > if > > > that works. > > > > > > Sai > > > > > > On Thu, 13 Apr 2023 at 23:54, Jarek Potiuk <ja...@potiuk.com> wrote: > > > > > > > In many cases it can be done with choosing a bigger machine with more > > > CPUS > > > > and parallelising as others mentioned. This is cool if your tests are > > > pure > > > > unit tests and you can add just `--xdist` flag or similar (this is a > > > pytest > > > > extension to run your tests in parallel with as many CPUs as you > can). > > > > However there are cases where the limitation is I/O or your tests > > simply > > > > cannot run in parallel because a lot of them rely on shared resources > > > (say > > > > database). But even then you can attempt to do something about it. > > > > > > > > In Airflow we solved those problems by custom-parallelising our jobs, > > > > choosing huge self-hosted runners and running everything in-memory. > > > > > > > > Even though our tests could not be parallelized "per tests" (mostly > for > > > > historical reasons a lot of our tests are not pure unit tests and > > depend > > > on > > > > database), we split the tests into "test types" (8 of them but soon > > more) > > > > and run them in parallel - with as many parallel types running as we > > > have. > > > > Each test uses its own database instance - this is all orchestrated > > with > > > > docker-compose. > > > > In order to avoid inevitable I/O contention with this setup, this is > > all > > > > running on a huge tmpfs storage (50 GB or so) - including a docker > > > > instance that runs the databases that has tmpfs backing storage, so > > > those > > > > databases are backed by in-memory filesystem and thus are > super-stable > > > and > > > > super-fast. Thanks to that, our thousands of tests can run really > fast > > > > even if some of them are not pure unit tests. We run it all on a > large > > > > self-hosted runner with 8 CPUS and 64 GB RAM and thanks to that our > > > > complete test suite runs in 15 minutes instead of 1.5 hour. > > > > > > > > Such setup achieves two optimisation goals: cheap and fast. Yes we > need > > > > much more costly, bigger machines but we need them for a shorter time > > and > > > > we use them with 80%-90% utilisation which is pretty high for such > > cases > > > > (we keep optimising it regularly and I try to continue to push it > > closer > > > to > > > > 100% continuously). As the result - if your hosted runners in the > cloud > > > are > > > > on-demand/ephemeral (usually 80%-90% cost reduction) and you have a > > fast > > > > setup, you can bring them up for 10 minutes and shutdown when > finished, > > > > thus they cause a fraction of small machines that run all the time, > > > > especially if in the project you have times where no PRs are run. > Also > > > > optimising speed of tests is even more important than optimising the > > cost > > > > of them, because getting feedback faster is good for your > contributors > > - > > > > but with this setup we can eat cake and have it too - the cost is low > > and > > > > the tests are fast. > > > > > > > > J. > > > > > > > > > > > > > > > > On Fri, Apr 14, 2023 at 1:37 AM Hyukjin Kwon <gurwls...@gmail.com> > > > wrote: > > > > > > > > > Just dropping a comment. Apache Spark solved it by splitting the > job. > > > > > > > > > > As of the number of parallel jobs, Apache Spark made, in PR > builder, > > a > > > > > custom logic to link the GitHub workflow run in forked > repositories - > > > so > > > > we > > > > > reuse the GitHub resources in PR authors forked repository instead > of > > > the > > > > > one allocated to ASF itself. > > > > > > > > > > On Fri, Apr 14, 2023 at 8:00 AM sebb <seb...@gmail.com> wrote: > > > > > > > > > > > On Thu, 13 Apr 2023 at 20:58, Martin Grigorov < > > mgrigo...@apache.org> > > > > > > wrote: > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > On Thu, Apr 13, 2023 at 7:17 PM Sai Boorlagadda < > > > > > > sai_boorlaga...@apache.org> > > > > > > > wrote: > > > > > > > > > > > > > > > Hey All! I am part of Apache Geode project and we have been > > > > migrating > > > > > > our > > > > > > > > pipelines to Github actions and hit a roadblock that the max. > > job > > > > > > execution > > > > > > > > time on non-self-hosted GitHub workers is set a hard limit > > > > > > > > < > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.github.com/en/actions/learn-github-actions/usage-limits-billing-and-administration > > > > > > > > > > > > > > > > > of > > > > > > > > 6 hours and one of our job > > > > > > > > <https://github.com/apache/geode/actions/runs/4639012912> is > > > > taking > > > > > > more > > > > > > > > than 6 hours. Are there any pointers on how someone solved > > this? > > > or > > > > > > does > > > > > > > > > > > > > > Github provides any increases for Apache Foundation projects? > > > > > > > > > > > > > > > > > > > > > > The only way to "increase the resources" is to use a > self-hosted > > > > > runner. > > > > > > > But instead of looking how to use more of the free pool you > > should > > > > try > > > > > to > > > > > > > optimize your build to need less! > > > > > > > These free resources are shared with all other Apache projects, > > so > > > > when > > > > > > > your project uses more another project will have to wait. > > > > > > > > > > > > > > You can start by using parallel build - > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/geode/blob/102e24691eacd2d1d6652a070f14af9f5b42dc0d/.github/workflows/gradle.yml#L254 > > > > > > > Also tune the maxWorkers - > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/geode/blob/102e24691eacd2d1d6652a070f14af9f5b42dc0d/.github/workflows/gradle.yml#L256 > > > > > > . > > > > > > > The Linux VMs have 2 vCPUs. You can try with the macos-latest > > VM,it > > > > > has 3 > > > > > > > vCPUs. > > > > > > > Another option is to split this job into few smaller ones. Each > > job > > > > has > > > > > > its > > > > > > > own 6 hours. > > > > > > > > > > > > Also maybe run some of the jobs manually, rather than on every > > > commit. > > > > > > At present there are two instances running at the same time from > > > > > > subsequent commits. > > > > > > At least one of these is a waste of resources. > > > > > > > > > > > > > Good luck! > > > > > > > > > > > > > > Martin > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > *Gavin McDonald* > > Systems Administrator > > ASF Infrastructure Team > > >