I think we can disable 4.x post-merge per-commit jobs.
branch-4.x is the integration branch for the 4.x line, not a release
branch. RCs are cut from the numbered
branches (branch-4.0/4.1/4.2/4.3/...), and those keep their own per-merge
CI.

On Sat, May 23, 2026 at 1:58 PM Yicong Huang <[email protected]> wrote:

> Is it possible to run 4.x branch post merge as a scheduled job, possibly
> daily, instead of after every commit? I think this can quickly cut the CI
> usage.
>
> Best,
> Yicong Huang
>
> On Fri, May 22, 2026 at 11:48 AM Tian Gao via dev <[email protected]>
> wrote:
>
>> Like I mentioned a few weeks ago, we can't afford this. We received the
>> warning from ASF today and took a quick look at our CI usage.
>>
>> We are using about 350k min/week now, and the limit is 250k min/week. The
>> post merge itself took 180k+ min/week because now we have 2 active dev
>> branches.
>>
>> I think we should put some effort into this. There are a few ways to make
>> the situation better:
>>
>> 1. Run fewer tests - We disabled pandas on spark tests for post merge a
>> while ago to comply with the ASF limit.
>> 2. Make tests run faster - I occasionally optimize python tests, not sure
>> if Java tests are being taken care of. Java tests took significantly
>> more time in our CI now.
>> 3. Run tests less frequently - helpful for scheduled CI which we already
>> did, but won't help post merge.
>> 4. Smart testing - this is a bit tricky for post-merge because ideally we
>> want a full coverage for each commit. We can probably do some safe
>> heuristics, but it takes time and we could potentially lose coverage.
>> 5. Move scheduled tests to another repo - arrow seems to be doing this.
>> This allows us to use all the ASF budget to run post-merge tests. However,
>> we need some sponsor to achieve this.
>>
>> I think we have 2 weeks to at least temporarily reduce our CI usage under
>> the limit, so we need something fast, then something good.
>>
>> Tian
>>
>> On Mon, May 11, 2026 at 3:14 AM Akira Ajisaka <[email protected]>
>> wrote:
>>
>>> > I'm working on fixing branch-3.5 CI:
>>> https://github.com/apache/spark/pull/55764
>>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fpull%2F55764&data=05%7C02%7Cyiconghuang%40umass.edu%7C7af49e02e80043fd575008deb832aa20%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639150725261437820%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=32tQ4QP4bp4Rp%2Fby48RT9H%2FJc%2FxDzmHnKAcOgliiGX0%3D&reserved=0>.
>>> Hopefully I'll complete it this week.
>>>
>>> Closed the above PR as a duplicate of
>>> https://github.com/apache/spark/pull/55432
>>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fpull%2F55432&data=05%7C02%7Cyiconghuang%40umass.edu%7C7af49e02e80043fd575008deb832aa20%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639150725261491001%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=RpGKoF%2Faw%2F3WPvqypNsgAcaLtl6do8A21UHMjnIoGR0%3D&reserved=0>.
>>> Sorry for the confusion.
>>>
>>> On Mon, May 11, 2026 at 3:22 PM Akira Ajisaka <[email protected]>
>>> wrote:
>>> >
>>> > > Also on the 3.5 side the CI is super broken so I’m trying to fix it
>>> up now, the timing is complicated by the Ubuntu PPA DDoS outages.
>>> >
>>> > I'm working on fixing branch-3.5 CI:
>>> > https://github.com/apache/spark/pull/55764
>>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fpull%2F55764&data=05%7C02%7Cyiconghuang%40umass.edu%7C7af49e02e80043fd575008deb832aa20%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639150725261508821%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=1zB4MkjyMitFytJ4EBTv59q0SDwT%2BGiRrRK6rgqPyIM%3D&reserved=0>.
>>> Hopefully I'll complete it
>>> > this week. The Ubuntu outage seems unrelated.
>>> >
>>> > Anyway, I'm +1 to reduce the frequency on non-active branches.
>>> >
>>> > Thanks,
>>> > Akira
>>> >
>>> > On Fri, May 8, 2026 at 5:30 AM Tian Gao via dev <[email protected]>
>>> wrote:
>>> > >
>>> > > Yeah I'm not surprised that 3.5 is not in its best shape at this
>>> point because we almost did not run tests on it. When we reduce the
>>> coverage for a branch, we will have issues when we try to release. That's
>>> why we should not only make efforts on that side. We should explore all
>>> different ways to make CI better.
>>> > >
>>> > > On Thu, May 7, 2026 at 12:02 PM Holden Karau <[email protected]>
>>> wrote:
>>> > >>
>>> > >> Smarter test selection is probably the magic but it’s going to be
>>> effort. Also on the 3.5 side the CI is super broken so I’m trying to fix it
>>> up now, the timing is complicated by the Ubuntu PPA DDoS outages.
>>> > >>
>>> > >>
>>> > >> Twitter: https://twitter.com/holdenkarau
>>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fholdenkarau&data=05%7C02%7Cyiconghuang%40umass.edu%7C7af49e02e80043fd575008deb832aa20%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639150725261525901%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=aFFfISTgnaDYMcPmA06d4Vvd2c44ywoBQziwGtXzKsw%3D&reserved=0>
>>> > >> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.fighthealthinsurance.com%2F&data=05%7C02%7Cyiconghuang%40umass.edu%7C7af49e02e80043fd575008deb832aa20%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639150725261542909%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=3aW7yRQFZELYPxGwPAvTa%2B1VOeB1DP%2BNlgzKODlj%2B9U%3D&reserved=0>
>>> > >> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9
>>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Famzn.to%2F2MaRAG9&data=05%7C02%7Cyiconghuang%40umass.edu%7C7af49e02e80043fd575008deb832aa20%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639150725261559904%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=Y8Zo1UiKnIqFYIUtcg%2FFu5suNiYo0wYgn1gVby4CXMI%3D&reserved=0>
>>> > >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fuser%2Fholdenkarau&data=05%7C02%7Cyiconghuang%40umass.edu%7C7af49e02e80043fd575008deb832aa20%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639150725261578297%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=RDMAZ75eTV%2F%2B7xp6gbyXmaxxlKw87dhLBLEuq%2FIIEic%3D&reserved=0>
>>> > >> Pronouns: she/her
>>> > >>
>>> > >> On Thu, May 7, 2026 at 11:28 AM Tian Gao via dev <
>>> [email protected]> wrote:
>>> > >>>
>>> > >>> I definitely agree that we can save a lot of time by optimizing
>>> the CI. But currently, java tests take more time than python tests. They
>>> are comparable but java tests are still observably more expensive. We
>>> should not only focus on python ones.
>>> > >>>
>>> > >>> In the meantime, I'll take a look on low hanging fruits on CI to
>>> make it smarter.
>>> > >>>
>>> > >>> Tian
>>> > >>>
>>> > >>> On Thu, May 7, 2026 at 6:40 AM Ruifeng Zheng <[email protected]>
>>> wrote:
>>> > >>>>
>>> > >>>> I also did some data analysis, and think we should also revisit
>>> the the CI:
>>> > >>>> 1, Deduplicate the compile. For example, the pyspark matrix
>>> executes 8 byte-identical SBT compiles in parallel today, costing ~108m of
>>> redundant work per run.
>>> > >>>>    (I am working on a POC:
>>> https://github.com/apache/spark/pull/55726
>>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fpull%2F55726&data=05%7C02%7Cyiconghuang%40umass.edu%7C7af49e02e80043fd575008deb832aa20%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639150725261599859%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=TxDWj%2BnFzWOTEgy31O2uZeoE0oOJhPRUquq4tOWgqBQ%3D&reserved=0>
>>> )
>>> > >>>> 2, Smarter test selection. 11% of recent 10000 commits are
>>> test-only changes. Today these trigger the full pyspark matrix because the
>>> dependency
>>> > >>>>    graph in dev/sparktestsupport/modules.py cascades through
>>> dependent_modules regardless of whether the change is in source or tests.
>>> The cascade is correct
>>> > >>>>    for source changes (downstream modules import the source) but
>>> unnecessary for tests (no production code imports test code).
>>> > >>>>
>>> > >>>> On Thu, May 7, 2026 at 5:23 PM Hyukjin Kwon <[email protected]>
>>> wrote:
>>> > >>>>>
>>> > >>>>> For now, I created a PR to reduce the frequency by half:
>>> https://github.com/apache/spark/pull/55729
>>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fpull%2F55729&data=05%7C02%7Cyiconghuang%40umass.edu%7C7af49e02e80043fd575008deb832aa20%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639150725261623806%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=JCzWtIDdcS7Nv6gtv7KAIBioHdhKHUa%2F4VtuBFmlTCg%3D&reserved=0>
>>> > >>>>>
>>> > >>>>> On Thu, 7 May 2026 at 07:56, Yicong Huang <
>>> [email protected]> wrote:
>>> > >>>>>>
>>> > >>>>>> I think we need to 1) cut CIs pressure and 2) look for more
>>> resources to run CIs at the same time.
>>> > >>>>>>
>>> > >>>>>> Cut CIs:
>>> > >>>>>>
>>> > >>>>>> I think the biggest cut would be on the scheduled jobs first.
>>> For instance change 3.5 and 4.0 scheduled jobs from daily to once in three
>>> days, or even once per week.
>>> > >>>>>> Then for branch 4.x or more active release branches we can do
>>> daily post merge CI, instead of after each commit?
>>> > >>>>>> Meanwhile we can explore ways to run selected tests on the
>>> actual affected code path to avoid full runs.
>>> > >>>>>> And optimize tests themselves so they run faster.
>>> > >>>>>>
>>> > >>>>>> Expand resources:
>>> > >>>>>>
>>> > >>>>>> We can probably move some of the scheduled jobs out to another
>>> repo like what Apache Arrow did.
>>> > >>>>>> I wonder if self hosted runners are acceptable to the
>>> community? This sounds like a longer term solution if we were to introduce
>>> more checks in the future.
>>> > >>>>>>
>>> > >>>>>>
>>> > >>>>>> Best regards,
>>> > >>>>>> Yicong Huang
>>> > >>>>>>
>>> > >>>>>> On Wed, May 6, 2026 at 3:04 PM Hyukjin Kwon <
>>> [email protected]> wrote:
>>> > >>>>>>>
>>> > >>>>>>> We should probably reduce the scheduled build for the time
>>> being.
>>> > >>>>>>>
>>> > >>>>>>> As a reference, I worked in Apache Arrow, and they use an
>>> extra CI by thirdparty, e.g., see
>>> > >>>>>>> - PR: https://github.com/apache/arrow/pull/48915
>>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fpull%2F48915&data=05%7C02%7Cyiconghuang%40umass.edu%7C7af49e02e80043fd575008deb832aa20%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639150725261649564%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=pO7KvG4N7nYkiE9OM8BxWSgxhkqKQJGyOZEcv4sZKy4%3D&reserved=0>
>>> > >>>>>>> - You comment like
>>> https://github.com/apache/arrow/pull/48915#issuecomment-3852062184
>>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fpull%2F48915%23issuecomment-3852062184&data=05%7C02%7Cyiconghuang%40umass.edu%7C7af49e02e80043fd575008deb832aa20%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639150725261686934%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=8RQy6xfBAuwucM1wkqb0qEIvrjZVwMr8bWrByPOOZ78%3D&reserved=0>
>>> > >>>>>>> - It posts the CI link like
>>> https://github.com/apache/arrow/pull/48915#issuecomment-3852079993
>>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fpull%2F48915%23issuecomment-3852079993&data=05%7C02%7Cyiconghuang%40umass.edu%7C7af49e02e80043fd575008deb832aa20%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639150725261703594%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=m0qLqH1BBER1xUuF0Stp3asVlA0PNP8kr%2F%2Bcw%2BX3Cew%3D&reserved=0>
>>> > >>>>>>> - The CI is defined at
>>> https://github.com/ursacomputing/crossbow
>>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fursacomputing%2Fcrossbow&data=05%7C02%7Cyiconghuang%40umass.edu%7C7af49e02e80043fd575008deb832aa20%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639150725261719788%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=tJRx0dNJD4obwoBKIPrSmBikJdirhy7UmkzJOWksVF4%3D&reserved=0>
>>> > >>>>>>>
>>> > >>>>>>> I feel like this can be an alternative if any vendor is
>>> willing to support it.
>>> > >>>>>>>
>>> > >>>>>>> On Thu, 7 May 2026 at 04:09, Tian Gao via dev <
>>> [email protected]> wrote:
>>> > >>>>>>>>
>>> > >>>>>>>> I did some quick calculations, and we can't afford the CI
>>> with our existing infra.
>>> > >>>>>>>>
>>> > >>>>>>>> Per ASF policy (
>>> https://infra.apache.org/github-actions-policy.html
>>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Finfra.apache.org%2Fgithub-actions-policy.html&data=05%7C02%7Cyiconghuang%40umass.edu%7C7af49e02e80043fd575008deb832aa20%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639150725261737519%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=cxw2%2Fa8o%2FEKi75VskoCTcYJ24AOhBlOshNtrnjO%2BttM%3D&reserved=0>),
>>> the maximum weekly runner minutes we have is 250k. That's 1m per month, and
>>> last month, we hit almost the exact number - 1,082,721 minutes.
>>> > >>>>>>>>
>>> > >>>>>>>> Our current CI consists of a few components (all numbers are
>>> per month):
>>> > >>>>>>>> * each commits on master branch - ~280k
>>> > >>>>>>>> * 4.1 scheduled run - ~200k
>>> > >>>>>>>> * 4.0 scheduled run - ~200k
>>> > >>>>>>>> * 3.5 scheduled run - negligible because we don't run many
>>> tests
>>> > >>>>>>>> * master scheduled run ~ 300k
>>> > >>>>>>>>
>>> > >>>>>>>> With the new release cadence, even if we only do scheduled
>>> run on 4.x (which we shouldn't because it's an active dev branch but that's
>>> another story), we need an extra 200k. With a 6-month maintenance window,
>>> we will always have at least 3 active maintained versions (including LTS)
>>> that require CI.
>>> > >>>>>>>>
>>> > >>>>>>>> If it's just 200k extra, maybe it's manageable. But I really
>>> believe we need tests for the 4.x branch - we should treat that branch more
>>> like master, than say 4.2. Even if we don't do pre-merge check on it, we
>>> should do post-merge check for every commit. Daily check on an active dev
>>> branch sounds a bit too risky to me. That would be another 300k.
>>> > >>>>>>>>
>>> > >>>>>>>> This does not include the discussion about any pre-merge
>>> check for 4.x, which we should actually think about in the future.
>>> > >>>>>>>>
>>> > >>>>>>>> So the question is - how do we deal with that? The solutions
>>> I can think of are
>>> > >>>>>>>> * Get some self-host runners and increase our CI capability
>>> limited by ASF policy
>>> > >>>>>>>> * Optimize our CIs and tests so it takes less time to run
>>> > >>>>>>>> * Reduce the coverage of our tests so we can at least test
>>> all branches
>>> > >>>>>>>>
>>> > >>>>>>>> Any idea is welcome.
>>> > >>>>>>>>
>>> > >>>>>>>> Tian
>>>
>>

Reply via email to