Re: All PRs stuck in "Queued -- Waiting to run this check..."

Brennan Ashton Tue, 30 Mar 2021 17:44:30 -0700

Part of the issue here is also limits across the organization. This has
been discussed for a couple months now on the Apache build mailing lists
and the GitHub has been part of them trying to figure out a smart path
forward.


We were sharing cache but ran into some strange issues with collisions and
disabled it unfortunately.

Personally I think we should limit the macOS runs to be configurations that
specifically test the build system which would be a couple of each arch
supported and include things like cpp builds.

I see Matias already started working on the ticket I opened for killing
previously running builds.

There has been some talk about supporting non hosted runners, but there are
some security issues that still need to be worked out. Once again see the
mailing list for more context on this .

--Brennan

On Tue, Mar 30, 2021, 1:38 PM Matias N. <[email protected]> wrote:

> Most likely a single very powerful machine could be actually quite faster
> than GH
> since we could parallelize much harder and have all resources to ourselves.
> The problem is that rolling our own can be quite a pain to maintain IMHO
> (unless someone has access to some powerful high-availabilty spare
> machine).
> We would also have to redo all CI handling since it wouldn't be GitHub's.
>
> I also looked at alternative CI systems but I think we will consume free
> credits easily.
>
> Indeed the macOS build is really slow. But if we start to cut on tests we
> loose
> the assurance given by the automated testing.
>
> Maybe we could try to share the ccache across runs. I personally have
> never found
> a ccache collision nor any issue (I have ccache enabled for years). I
> remember we
> tried this but I'm not sure if the result was inconclusive.
>
> Anyway, cancelling previous flows should get us back to a better place
> where we "only"
> wait for ~1.5hr for the build to complete.
>
> Best,
> Matias
>
> On Tue, Mar 30, 2021, at 17:17, Alan Carvalho de Assis wrote:
> > We definitely need better server to support the CI, it doesn't have
> > processing power enough to run the CI when there are more than 5 PRs.
> > It doesn't scale well.
> >
> > Also I think we could keep only one test for MacOS because it is too
> > slow! Normally MacOS delays more than 2h to complete.
> >
> > Maybe we could create some distributed CI farm and we could include
> > low power hardware (i.e. Raspberry Pi boards) running from our homes
> > to help it, hehehe.
> >
> > Suggestions are welcome!
> >
> > BR,
> >
> > Alan
> >
> > On 3/30/21, Nathan Hartman <[email protected] <mailto:
> hartman.nathan%40gmail.com>> wrote:
> > > On Tue, Mar 30, 2021 at 3:30 PM Matias N. <[email protected] <mailto:
> matias%40imap.cc>> wrote:
> > >>
> > >> It appears we overwhelmed CI. There are a couple of running jobs
> (notably
> > >> one is a macOS run which is taking about 2hrs as of now) but they are
> for
> > >> PRs from 12hs ago at least. There are a multitude of queued runs for
> many
> > >> recent PRs. The problem is that new runs (from force pushes) do not
> cancel
> > >> previous runs so they remain queued apparently.
> > >
> > > Ouch!
> > >
> > >> I will see what can be done to have new pushes cancel new pending
> runs. In
> > >> the meantime we may have to manually cancel all queued workflows. Not
> sure
> > >> if there's a mass cancel to be done.
> > >
> > > Thanks for looking into it. Yes, it would be a good thing if new force
> > > pushes could cancel in-progress runs.
> > >
> > > Thanks,
> > > Nathan
> > >
> >
>

Re: All PRs stuck in "Queued -- Waiting to run this check..."

Reply via email to