Is it possible to run 4.x branch post merge as a scheduled job, possibly
daily, instead of after every commit? I think this can quickly cut the CI
usage.

Best,
Yicong Huang

On Fri, May 22, 2026 at 11:48 AM Tian Gao via dev <[email protected]>
wrote:

> Like I mentioned a few weeks ago, we can't afford this. We received the
> warning from ASF today and took a quick look at our CI usage.
>
> We are using about 350k min/week now, and the limit is 250k min/week. The
> post merge itself took 180k+ min/week because now we have 2 active dev
> branches.
>
> I think we should put some effort into this. There are a few ways to make
> the situation better:
>
> 1. Run fewer tests - We disabled pandas on spark tests for post merge a
> while ago to comply with the ASF limit.
> 2. Make tests run faster - I occasionally optimize python tests, not sure
> if Java tests are being taken care of. Java tests took significantly
> more time in our CI now.
> 3. Run tests less frequently - helpful for scheduled CI which we already
> did, but won't help post merge.
> 4. Smart testing - this is a bit tricky for post-merge because ideally we
> want a full coverage for each commit. We can probably do some safe
> heuristics, but it takes time and we could potentially lose coverage.
> 5. Move scheduled tests to another repo - arrow seems to be doing this.
> This allows us to use all the ASF budget to run post-merge tests. However,
> we need some sponsor to achieve this.
>
> I think we have 2 weeks to at least temporarily reduce our CI usage under
> the limit, so we need something fast, then something good.
>
> Tian
>
> On Mon, May 11, 2026 at 3:14 AM Akira Ajisaka <[email protected]> wrote:
>
>> > I'm working on fixing branch-3.5 CI:
>> https://github.com/apache/spark/pull/55764
>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fpull%2F55764&data=05%7C02%7Cyiconghuang%40umass.edu%7C7af49e02e80043fd575008deb832aa20%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639150725261437820%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=32tQ4QP4bp4Rp%2Fby48RT9H%2FJc%2FxDzmHnKAcOgliiGX0%3D&reserved=0>.
>> Hopefully I'll complete it this week.
>>
>> Closed the above PR as a duplicate of
>> https://github.com/apache/spark/pull/55432
>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fpull%2F55432&data=05%7C02%7Cyiconghuang%40umass.edu%7C7af49e02e80043fd575008deb832aa20%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639150725261491001%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=RpGKoF%2Faw%2F3WPvqypNsgAcaLtl6do8A21UHMjnIoGR0%3D&reserved=0>.
>> Sorry for the confusion.
>>
>> On Mon, May 11, 2026 at 3:22 PM Akira Ajisaka <[email protected]>
>> wrote:
>> >
>> > > Also on the 3.5 side the CI is super broken so I’m trying to fix it
>> up now, the timing is complicated by the Ubuntu PPA DDoS outages.
>> >
>> > I'm working on fixing branch-3.5 CI:
>> > https://github.com/apache/spark/pull/55764
>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fpull%2F55764&data=05%7C02%7Cyiconghuang%40umass.edu%7C7af49e02e80043fd575008deb832aa20%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639150725261508821%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=1zB4MkjyMitFytJ4EBTv59q0SDwT%2BGiRrRK6rgqPyIM%3D&reserved=0>.
>> Hopefully I'll complete it
>> > this week. The Ubuntu outage seems unrelated.
>> >
>> > Anyway, I'm +1 to reduce the frequency on non-active branches.
>> >
>> > Thanks,
>> > Akira
>> >
>> > On Fri, May 8, 2026 at 5:30 AM Tian Gao via dev <[email protected]>
>> wrote:
>> > >
>> > > Yeah I'm not surprised that 3.5 is not in its best shape at this
>> point because we almost did not run tests on it. When we reduce the
>> coverage for a branch, we will have issues when we try to release. That's
>> why we should not only make efforts on that side. We should explore all
>> different ways to make CI better.
>> > >
>> > > On Thu, May 7, 2026 at 12:02 PM Holden Karau <[email protected]>
>> wrote:
>> > >>
>> > >> Smarter test selection is probably the magic but it’s going to be
>> effort. Also on the 3.5 side the CI is super broken so I’m trying to fix it
>> up now, the timing is complicated by the Ubuntu PPA DDoS outages.
>> > >>
>> > >>
>> > >> Twitter: https://twitter.com/holdenkarau
>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fholdenkarau&data=05%7C02%7Cyiconghuang%40umass.edu%7C7af49e02e80043fd575008deb832aa20%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639150725261525901%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=aFFfISTgnaDYMcPmA06d4Vvd2c44ywoBQziwGtXzKsw%3D&reserved=0>
>> > >> Fight Health Insurance: https://www.fighthealthinsurance.com/
>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.fighthealthinsurance.com%2F&data=05%7C02%7Cyiconghuang%40umass.edu%7C7af49e02e80043fd575008deb832aa20%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639150725261542909%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=3aW7yRQFZELYPxGwPAvTa%2B1VOeB1DP%2BNlgzKODlj%2B9U%3D&reserved=0>
>> > >> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9
>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Famzn.to%2F2MaRAG9&data=05%7C02%7Cyiconghuang%40umass.edu%7C7af49e02e80043fd575008deb832aa20%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639150725261559904%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=Y8Zo1UiKnIqFYIUtcg%2FFu5suNiYo0wYgn1gVby4CXMI%3D&reserved=0>
>> > >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.youtube.com%2Fuser%2Fholdenkarau&data=05%7C02%7Cyiconghuang%40umass.edu%7C7af49e02e80043fd575008deb832aa20%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639150725261578297%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=RDMAZ75eTV%2F%2B7xp6gbyXmaxxlKw87dhLBLEuq%2FIIEic%3D&reserved=0>
>> > >> Pronouns: she/her
>> > >>
>> > >> On Thu, May 7, 2026 at 11:28 AM Tian Gao via dev <
>> [email protected]> wrote:
>> > >>>
>> > >>> I definitely agree that we can save a lot of time by optimizing the
>> CI. But currently, java tests take more time than python tests. They are
>> comparable but java tests are still observably more expensive. We should
>> not only focus on python ones.
>> > >>>
>> > >>> In the meantime, I'll take a look on low hanging fruits on CI to
>> make it smarter.
>> > >>>
>> > >>> Tian
>> > >>>
>> > >>> On Thu, May 7, 2026 at 6:40 AM Ruifeng Zheng <[email protected]>
>> wrote:
>> > >>>>
>> > >>>> I also did some data analysis, and think we should also revisit
>> the the CI:
>> > >>>> 1, Deduplicate the compile. For example, the pyspark matrix
>> executes 8 byte-identical SBT compiles in parallel today, costing ~108m of
>> redundant work per run.
>> > >>>>    (I am working on a POC:
>> https://github.com/apache/spark/pull/55726
>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fpull%2F55726&data=05%7C02%7Cyiconghuang%40umass.edu%7C7af49e02e80043fd575008deb832aa20%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639150725261599859%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=TxDWj%2BnFzWOTEgy31O2uZeoE0oOJhPRUquq4tOWgqBQ%3D&reserved=0>
>> )
>> > >>>> 2, Smarter test selection. 11% of recent 10000 commits are
>> test-only changes. Today these trigger the full pyspark matrix because the
>> dependency
>> > >>>>    graph in dev/sparktestsupport/modules.py cascades through
>> dependent_modules regardless of whether the change is in source or tests.
>> The cascade is correct
>> > >>>>    for source changes (downstream modules import the source) but
>> unnecessary for tests (no production code imports test code).
>> > >>>>
>> > >>>> On Thu, May 7, 2026 at 5:23 PM Hyukjin Kwon <[email protected]>
>> wrote:
>> > >>>>>
>> > >>>>> For now, I created a PR to reduce the frequency by half:
>> https://github.com/apache/spark/pull/55729
>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fpull%2F55729&data=05%7C02%7Cyiconghuang%40umass.edu%7C7af49e02e80043fd575008deb832aa20%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639150725261623806%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=JCzWtIDdcS7Nv6gtv7KAIBioHdhKHUa%2F4VtuBFmlTCg%3D&reserved=0>
>> > >>>>>
>> > >>>>> On Thu, 7 May 2026 at 07:56, Yicong Huang <[email protected]>
>> wrote:
>> > >>>>>>
>> > >>>>>> I think we need to 1) cut CIs pressure and 2) look for more
>> resources to run CIs at the same time.
>> > >>>>>>
>> > >>>>>> Cut CIs:
>> > >>>>>>
>> > >>>>>> I think the biggest cut would be on the scheduled jobs first.
>> For instance change 3.5 and 4.0 scheduled jobs from daily to once in three
>> days, or even once per week.
>> > >>>>>> Then for branch 4.x or more active release branches we can do
>> daily post merge CI, instead of after each commit?
>> > >>>>>> Meanwhile we can explore ways to run selected tests on the
>> actual affected code path to avoid full runs.
>> > >>>>>> And optimize tests themselves so they run faster.
>> > >>>>>>
>> > >>>>>> Expand resources:
>> > >>>>>>
>> > >>>>>> We can probably move some of the scheduled jobs out to another
>> repo like what Apache Arrow did.
>> > >>>>>> I wonder if self hosted runners are acceptable to the community?
>> This sounds like a longer term solution if we were to introduce more checks
>> in the future.
>> > >>>>>>
>> > >>>>>>
>> > >>>>>> Best regards,
>> > >>>>>> Yicong Huang
>> > >>>>>>
>> > >>>>>> On Wed, May 6, 2026 at 3:04 PM Hyukjin Kwon <
>> [email protected]> wrote:
>> > >>>>>>>
>> > >>>>>>> We should probably reduce the scheduled build for the time
>> being.
>> > >>>>>>>
>> > >>>>>>> As a reference, I worked in Apache Arrow, and they use an extra
>> CI by thirdparty, e.g., see
>> > >>>>>>> - PR: https://github.com/apache/arrow/pull/48915
>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fpull%2F48915&data=05%7C02%7Cyiconghuang%40umass.edu%7C7af49e02e80043fd575008deb832aa20%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639150725261649564%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=pO7KvG4N7nYkiE9OM8BxWSgxhkqKQJGyOZEcv4sZKy4%3D&reserved=0>
>> > >>>>>>> - You comment like
>> https://github.com/apache/arrow/pull/48915#issuecomment-3852062184
>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fpull%2F48915%23issuecomment-3852062184&data=05%7C02%7Cyiconghuang%40umass.edu%7C7af49e02e80043fd575008deb832aa20%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639150725261686934%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=8RQy6xfBAuwucM1wkqb0qEIvrjZVwMr8bWrByPOOZ78%3D&reserved=0>
>> > >>>>>>> - It posts the CI link like
>> https://github.com/apache/arrow/pull/48915#issuecomment-3852079993
>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fpull%2F48915%23issuecomment-3852079993&data=05%7C02%7Cyiconghuang%40umass.edu%7C7af49e02e80043fd575008deb832aa20%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639150725261703594%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=m0qLqH1BBER1xUuF0Stp3asVlA0PNP8kr%2F%2Bcw%2BX3Cew%3D&reserved=0>
>> > >>>>>>> - The CI is defined at
>> https://github.com/ursacomputing/crossbow
>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fursacomputing%2Fcrossbow&data=05%7C02%7Cyiconghuang%40umass.edu%7C7af49e02e80043fd575008deb832aa20%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639150725261719788%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=tJRx0dNJD4obwoBKIPrSmBikJdirhy7UmkzJOWksVF4%3D&reserved=0>
>> > >>>>>>>
>> > >>>>>>> I feel like this can be an alternative if any vendor is willing
>> to support it.
>> > >>>>>>>
>> > >>>>>>> On Thu, 7 May 2026 at 04:09, Tian Gao via dev <
>> [email protected]> wrote:
>> > >>>>>>>>
>> > >>>>>>>> I did some quick calculations, and we can't afford the CI with
>> our existing infra.
>> > >>>>>>>>
>> > >>>>>>>> Per ASF policy (
>> https://infra.apache.org/github-actions-policy.html
>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Finfra.apache.org%2Fgithub-actions-policy.html&data=05%7C02%7Cyiconghuang%40umass.edu%7C7af49e02e80043fd575008deb832aa20%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639150725261737519%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=cxw2%2Fa8o%2FEKi75VskoCTcYJ24AOhBlOshNtrnjO%2BttM%3D&reserved=0>),
>> the maximum weekly runner minutes we have is 250k. That's 1m per month, and
>> last month, we hit almost the exact number - 1,082,721 minutes.
>> > >>>>>>>>
>> > >>>>>>>> Our current CI consists of a few components (all numbers are
>> per month):
>> > >>>>>>>> * each commits on master branch - ~280k
>> > >>>>>>>> * 4.1 scheduled run - ~200k
>> > >>>>>>>> * 4.0 scheduled run - ~200k
>> > >>>>>>>> * 3.5 scheduled run - negligible because we don't run many
>> tests
>> > >>>>>>>> * master scheduled run ~ 300k
>> > >>>>>>>>
>> > >>>>>>>> With the new release cadence, even if we only do scheduled run
>> on 4.x (which we shouldn't because it's an active dev branch but that's
>> another story), we need an extra 200k. With a 6-month maintenance window,
>> we will always have at least 3 active maintained versions (including LTS)
>> that require CI.
>> > >>>>>>>>
>> > >>>>>>>> If it's just 200k extra, maybe it's manageable. But I really
>> believe we need tests for the 4.x branch - we should treat that branch more
>> like master, than say 4.2. Even if we don't do pre-merge check on it, we
>> should do post-merge check for every commit. Daily check on an active dev
>> branch sounds a bit too risky to me. That would be another 300k.
>> > >>>>>>>>
>> > >>>>>>>> This does not include the discussion about any pre-merge check
>> for 4.x, which we should actually think about in the future.
>> > >>>>>>>>
>> > >>>>>>>> So the question is - how do we deal with that? The solutions I
>> can think of are
>> > >>>>>>>> * Get some self-host runners and increase our CI capability
>> limited by ASF policy
>> > >>>>>>>> * Optimize our CIs and tests so it takes less time to run
>> > >>>>>>>> * Reduce the coverage of our tests so we can at least test all
>> branches
>> > >>>>>>>>
>> > >>>>>>>> Any idea is welcome.
>> > >>>>>>>>
>> > >>>>>>>> Tian
>>
>

Reply via email to