Re: Superset Consumption of ASF Shared GitHub-hosted Runners

Robert Thomson Fri, 05 Jun 2026 01:10:11 -0700

Thanks Evan, all sounds like great work, hopefully will make a dent in the
jobs in use.


Kind regards,
-Bob Thomson,
ASF Infrastructure


On Thu, Jun 4, 2026 at 9:25 PM Evan Rusackas <[email protected]> wrote:

> Thanks for the tips.
>
> For your first suggestion, we took a different route to the same goal,
> using a change-detector action and job-level gating, which has advantages
> for our setup, but we do use “paths:” in several workflows.
>
> For the second one, we are using concurrency & cancel-in-progress, so all
> set there. However, we’re using “github.run_id” rather than “github.ref”
> there, since on push events, run_id lets every commit to master get fully
> validated, whereas ref would cancel in-progress master validations when
> commits land back-to-back (happening an awful lot right now).
>
> All the important PRs mentioned in my last email have landed, and we’re
> just doing touch-ups now. Hopefully the situation has drastically improved,
> though ironically, a ton of PRs need rebasing now, so pardon the CI churn
> while we do so with the current backlog.
>
> Thanks again,
>
> -e-
>
> *Evan Rusackas*
> Preset | preset.io
> On Jun 4, 2026 at 1:09 AM -0700, Bob Thomson <[email protected]>, wrote:
>
> I have been experimenting with pointing Gemini at public repos and
> prompting:
>
> "Analyse the GitHub Actions workflows in this repo
> https://github.com/apache/PROJECT/tree/master/.github and report on
> possible causes of long run time/high number of runs of GitHub Actions"
>
> One output here was:
>
> The Problem: Changes to frontend UI files (.ts, .tsx, .less) frequently
> trigger backend Python unit test runs, and vice versa. Unless paths are
> explicitly managed on every configuration entry, the entire testing suite
> runs for micro-commits affecting only one side of the stack.
> The Fix: Workflows must feature distinct path-routing restrictions:
>
> And the suggestion change was:
>
> # For frontend workflows
> on:
> pull_request:
> paths:
> - 'superset-frontend/**'
>
> I am no expert on Actions or this project, but thought I'd pass it on in
> case it is helpful.
>
> A second one was:
>
> concurrency:
> group: ${{ github.workflow }}-${{ github.event.pull_request.number ||
> github.ref }}
> cancel-in-progress: true
>
> Which is said to ensure that, when a PR is opened and workflows are
> running for it, and a further new commit is made to the same PR, the old
> runs from the first commit are then cancelled - otherwise an open PR that
> gets 3 more commits pushed, resulst in 3 lots of workflows running for the
> one PR, 2 of which are redundant.
>
> Hope these are useful, or at least food for thought on other possible
> steamlining improvements.
>
> Kind regards,
> -Bob Thomson
>
>
> On 2026/06/03 18:22:55 Evan Rusackas wrote:
>
> Hi Bob (and all)
>
> Thanks for the heads up on this. I just opened a swath of PRs that should
> cut this down significantly. I’m working with PMC members to
> assess/touch-up/review/merge:
>
>
> 1. This PR takes us from 6 Cypress runners down to 5, and takes
> the /app/prefix smoke test (only running on master now) down from 2 runners
> to 1. https://github.com/apache/superset/pull/40717
> 2. Cypress runners were all spinning up BEFORE they checked to see if they
> were needed. This should fix that:
> https://github.com/apache/superset/pull/40718
> 3. Gating E2E behind pre-commit. That's such a common failure that we
> probably needn't test E2E until it passes. See the caveats here, there are
> some visibility and fork-based PR caveats:
> https://github.com/apache/superset/pull/40719
> 4. run unit/integration tests on CURRENT python version on PRs, and full
> version matrix (3.10-3.12) on master:
> https://github.com/apache/superset/pull/40722
> 5. Don't run CodeQL checks on docs-only changes:
> https://github.com/apache/superset/pull/40724
> 6. Cancel-in-progress on a few things that churn needlessly on every
> commit: https://github.com/apache/superset/pull/40725
> 7. Only build docker on docker-relevant changes:
> https://github.com/apache/superset/pull/40723
>
> There’s an alternate (radical) solution of just NOT running E2E tests on
> PRs, but only running them on master. Sure would “nip it in the bud” cost
> wise, but has potential repercussions if we don’t keep a close eye on CI on
> `master`
>
> TL;DR: We’re whittling, and will ask for fresh reports (in private ASF
> Slack channels, probably) for impact results.
>
>
> -e-
>
> Evan Rusackas
> Preset | preset.io
> On Jun 3, 2026 at 10:29 AM -0700, Bob Thomson <[email protected]>,
> wrote:
>
> Fewer parallel runs is essential yes - we are at 900/900 GitHub hosted
> runner jobs/slots just now and looking at Superset Actions we can see
> nearly 500 completed Supeset repo action runs in the last hour, some of
> those are up to 25 minutes in execution time, so anything that can be done
> to reduce the share of runner jobs used by Superset is an urgent issue when
> we are at max jobs on runners on a daily basis now.
>
> Thanks.
>
> Kind regards,
> -Bob Thomson,
> ASF Infrastructure
>
> On 2026/05/22 19:54:16 Evan Rusackas wrote:
>
> Hi Bob (and everyone here),
>
> Thanks for the alert. The unfortunate thing is that this will only get
> worse as we create/fix more things (security, dependabot, etc). Things only
> seem to be ramping up.
>
> So, agreed, we must whittle. Cypress is the obvious killer (about half the
> consumption). We’ll try to find ways to whittle away at this (we’re
> migrating to Playwright, but it takes time). We might also be able to spend
> less compute and more time by optimizing (or removing) some parallelization
> here.
>
> We’re also looking at moving from dependabot for all dependency bumps (a
> LOT of PRs) to `renovate` - which might optimize things a bit (bumping
> dependencies in groups) but we will need to also leave dependabot in place
> for security-driven fixes as well.
>
> As for Cypress tests, we have some “martixification” happening, that I
> think we can optimize. For the Superset folks reading this, I think we can
> split out the “app_root” tests to JUST run on merges to `master` rather
> than every PR. That’ll save ~50% right there, we just have to keep a better
> eye on CI on `master` (which we haven’t been great about historically, but
> we’re getting better).
>
> Here’s the app_root PR https://github.com/apache/superset/pull/40385
>
> We can also reduce the E2E parallelization shards from 6 to… I dunno… 3 or
> 4. That’ll save a fair bit of setup time spinning up Superset instances.
> Tests will run a bit longer, but consume less overall. Seems like a fair
> tradeoff.
>
> Open to other ideas… maybe running fewer GHA workflows in parallel, and
> having things more sequentially to fail faster (like nothing runs until
> pre-commit passes, for example).
>
> Also, least importantly, we don’t have the access to see how we stack up
> against other projects, but I sure am curious.
>
> Anyone's thoughts/PRs welcomed.
>
> Evan Rusackas
> Preset | preset.io
> On May 22, 2026 at 4:46 AM -0700, Robert Thomson <[email protected]>,
> wrote:
>
> Hello, Superset PMC.
>
> In 2024, the ASF introduced the policy for GitHub Actions usage
> across the foundation[1]. The ASF Github shared pool of
> Github-hosted runners has been at, or very close to the limit of
> 900 jobs most of the time in the past few weeks and this is the
> case again today.
>
> Your project has been identified as being among the top 5 consumers of
> build time over the past 7 days and we request that you bring your
> usage down by stream-lining long-running builds. Contact Infra for
> a consultation if you are unable to streamline your builds further.
>
> You can use the infra reporting tool[2] to monitor your GHA usage as you
> work on stream-lining, as well as locate any bottlenecks in the workflows.
>
> Infra will allow you two weeks time (till the 8th of June, 2026) to
> progress this, but should you still be above the limits by then,
> without a viable path forward, we will be limiting your GHA usage.
>
> Kind regards,
> Bob Thomson, on behalf of ASF Infrastructure.
>
>
> [1] https://infra.apache.org/github-actions-policy.html
> [2]
>
> https://infra-reports.apache.org/#ghactions&project=superset&hours=24&limit=15&group=name
>
>
>
>

Re: Superset Consumption of ASF Shared GitHub-hosted Runners

Reply via email to