(texera) 01/01: fix(ci): trim benchmark full grid to fit daily run under 6h timeout (#5905)

github-bot Tue, 23 Jun 2026 14:15:42 -0700

This is an automated email from the ASF dual-hosted git repository.

github-merge-queue[bot] pushed a commit to branch 
gh-readonly-queue/main/pr-5905-6433e713a08606eb952581828e8f9c360a763013
in repository https://gitbox.apache.org/repos/asf/texera.git


commit e4f1077a238d491e528b1bb401885f6e82b6274a
Author: Matthew B. <[email protected]>
AuthorDate: Tue Jun 23 14:15:17 2026 -0700

    fix(ci): trim benchmark full grid to fit daily run under 6h timeout (#5905)
    
    ### What changes were proposed in this PR?
    - Drop `batchSize=10000` from the `full`-mode benchmark grid in
    `ArrowFlightActorBench.scala`, taking the daily sweep from 36 configs to
    27 and removing the 9 heaviest configs (30-70 min each) that pushed the
    run past GitHub's 6h job ceiling.
    - Update the now-stale "36-config / ~50-60 min" comments to "27-config /
    ~40 min" in the bench source and `benchmarks.yml`.
    ### Any related issues, documentation, discussions?
    Closes: #5904
    ### How was this PR tested?
    - Non-functional change (benchmark harness grid + CI comments); no
    shipped behavior and no unit test covers the bench grid contents.
    - CI timing verification: trigger the `Benchmarks` workflow via
    `workflow_dispatch` on this branch (the only non-schedule trigger that
    runs `full` mode) and confirm the `Bench` job finishes well under 6h
    (expected ~40-50 min including compile/setup), reaching the publish
    steps.
    ### Was this PR authored or co-authored using generative AI tooling?
    Co-authored with Claude Opus 4.8 in compliance with ASF
---
 .github/workflows/benchmarks.yml                              |  8 ++++----
 .../org/apache/texera/amber/bench/ArrowFlightActorBench.scala | 11 ++++++++---
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/.github/workflows/benchmarks.yml b/.github/workflows/benchmarks.yml
index c74d9cfe48..9d6d62672e 100644
--- a/.github/workflows/benchmarks.yml
+++ b/.github/workflows/benchmarks.yml
@@ -45,7 +45,7 @@
 #     job summary plus uploaded artifact. Publishing on every merge spammed
 #     the repo's Pulse / all-branches commit count with bot commits, so
 #     only the scheduled (daily) run persists the baseline now.
-#   - schedule (daily): runs the full 36-config sweep and is the sole
+#   - schedule (daily): runs the full 27-config sweep and is the sole
 #     writer that publishes to gh-pages (the authoritative long-term
 #     baseline).
 #   - workflow_dispatch: manual full-grid run (no publish; bring-your-own
@@ -53,7 +53,7 @@
 #
 # Two modes via BENCH_MODE env (read by the bench Scala main):
 #   pr   — 3 configs × 20 batches, ~5 min   (PR + push-to-main)
-#   full — 36 configs × 200 batches, ~50-60 min   (schedule + dispatch)
+#   full — 27 configs × 200 batches, ~40 min   (schedule + dispatch)
 #
 # Non-blocking: this workflow is NOT included in required-checks.yml's
 # `required-checks` aggregator, so its result doesn't gate merges even
@@ -76,7 +76,7 @@ on:
   schedule:
     # Daily full-grid baseline refresh, 12:00 UTC (05:00 PDT). PR and
     # post-merge runs use a trimmed 3-config grid to stay around 5 min; the
-    # scheduled run covers the full 36-config sweep that the gh-pages
+    # scheduled run covers the full 27-config sweep that the gh-pages
     # dashboard tracks long-term. Daily (rather than weekly) keeps the
     # baseline fresh and accumulates enough data points to average out CI
     # noise; the extra bot commits on gh-pages are intentionally tolerated.
@@ -178,7 +178,7 @@ jobs:
       JAVA_OPTS: -Xms2048M -Xmx2048M -Xss6M -XX:ReservedCodeCacheSize=256M 
-Dfile.encoding=UTF-8
       JVM_OPTS: -Xms2048M -Xmx2048M -Xss6M -XX:ReservedCodeCacheSize=256M 
-Dfile.encoding=UTF-8
       # `pr` mode = 3-config trimmed sweep (~5 min) for PR + post-merge.
-      # `full` mode = 36-config sweep (~50-60 min) for schedule + manual.
+      # `full` mode = 27-config sweep (~40 min) for schedule + manual.
       # Read by the bench Scala main (see GridSpec switch); workflow only
       # decides which mode to pass.
       BENCH_MODE: ${{ (github.event_name == 'schedule' || github.event_name == 
'workflow_dispatch') && 'full' || 'pr' }}
diff --git 
a/amber/src/bench/scala/org/apache/texera/amber/bench/ArrowFlightActorBench.scala
 
b/amber/src/bench/scala/org/apache/texera/amber/bench/ArrowFlightActorBench.scala
index 79d0c8cd7d..0109733589 100644
--- 
a/amber/src/bench/scala/org/apache/texera/amber/bench/ArrowFlightActorBench.scala
+++ 
b/amber/src/bench/scala/org/apache/texera/amber/bench/ArrowFlightActorBench.scala
@@ -92,9 +92,14 @@ object ArrowFlightActorBench {
 
   // Sweep grid + iteration counts switch on BENCH_MODE so PR / post-merge
   // checks stay around 5 min while scheduled / manual runs do the full
-  // 36-config grid that the gh-pages dashboard tracks long-term.
+  // 27-config grid that the gh-pages dashboard tracks long-term.
   //   pr   — 3 configs × 20 batches, warmup 5  (~4-5 min in CI)
-  //   full — 36 configs × 200 batches, warmup 20  (~50-60 min in CI)
+  //   full — 27 configs × 200 batches, warmup 20  (~40 min in CI)
+  // The batchSize=10000 row was dropped from the full grid: its 9 configs
+  // (3 schemaWidths x 3 stringLens) ran 30-70 min EACH, pushing the daily
+  // run past GitHub's 6 h job ceiling so it timed out before publishing to
+  // gh-pages. The remaining 10/100/1000 rows are ~10-1000x cheaper per
+  // batch, keeping the full sweep well under an hour.
   // BENCH_NUM_BATCHES, if set, overrides numBatches for the current mode
   // (useful for local smoke).
   private val BenchMode: String = sys.env.getOrElse("BENCH_MODE", 
"full").toLowerCase
@@ -118,7 +123,7 @@ object ArrowFlightActorBench {
       )
     case _ =>
       GridSpec(
-        batchSizes = Seq(10, 100, 1000, 10000),
+        batchSizes = Seq(10, 100, 1000),
         schemaWidths = Seq(1, 10, 50),
         stringLens = Seq(8, 64, 512),
         numBatches = 
sys.env.get("BENCH_NUM_BATCHES").map(_.toInt).getOrElse(200),

(texera) 01/01: fix(ci): trim benchmark full grid to fit daily run under 6h timeout (#5905)

Reply via email to