Re: [PR] ci(benchmarks): weekly-only gh-pages publish + same-runner baseline [texera]

via GitHub Sat, 13 Jun 2026 16:37:30 -0700


Ma77Ball commented on code in PR #5671:
URL: https://github.com/apache/texera/pull/5671#discussion_r3408758125



##########
.github/workflows/benchmarks.yml:
##########
@@ -235,6 +246,61 @@ jobs:
         # inside bin/run-benchmarks.sh and adding a publish step below.
         run: bash bin/run-benchmarks.sh
 
+      - name: Benchmark main baseline in the same runner
+        # PR only: re-run the IDENTICAL trimmed grid against the base-branch
+        # (main) commit this PR targets, in THIS runner, right after the PR
+        # run above. Comparing two runs from the same machine cancels the
+        # cross-runner hardware variance that otherwise dominates CI bench
+        # deltas, so benchmarks-pr-comment.yml can show a trustworthy
+        # main-vs-branch comparison instead of PR-here vs a stored baseline
+        # captured on some other runner.
+        #
+        # The output convention is preserved: the PR's own outputs stay in
+        # bench-results/ untouched; we only ADD main's CSV as
+        # arrow-flight-e2e-main.csv (plus the base SHA in a sidecar file).
+        # The PR-mode grid is deterministic (see GridSpec in
+        # ArrowFlightActorBench.scala), so main's rows key 1:1 against the
+        # PR's rows for the comparison.
+        #
+        # Fail-soft by construction: no `set -e`, and a trap restores the
+        # PR's results plus the original checkout no matter where the main
+        # re-run dies (broken main, compile error, etc). On failure we emit
+        # no main CSV, and the comment workflow falls back to the stored
+        # gh-pages baseline. We also skip entirely if the PR run produced no
+        # CSV (e.g. the bench itself failed upstream).
+        if: ${{ github.event_name == 'pull_request' && !cancelled() }}
+        env:
+          BASE_SHA: ${{ github.event.pull_request.base.sha }}
+        run: |
+          set -uo pipefail
+          if [ ! -f bench-results/arrow-flight-e2e.csv ]; then
+            echo "::warning::no PR bench CSV; skipping same-runner main 
baseline."
+            exit 0
+          fi
+          ORIG_REF=$(git rev-parse HEAD)
+          # Park the PR's outputs; main's re-run writes a fresh bench-results/.
+          mv bench-results bench-results-pr
+          restore() {
+            rm -rf bench-results
+            mv bench-results-pr bench-results 2>/dev/null || true
+            git checkout --force "$ORIG_REF" 2>/dev/null || true
+          }
+          trap restore EXIT
+          if ! git checkout --force "$BASE_SHA"; then
+            echo "::warning::could not check out base SHA $BASE_SHA; skipping 
main baseline."
+            exit 0
+          fi
+          # Regenerate proto bindings against main's protos, then re-bench.
+          bash bin/python-proto-gen.sh || { echo "::warning::main proto-gen 
failed; skipping main baseline."; exit 0; }
+          if bash bin/run-benchmarks.sh && [ -f 
bench-results/arrow-flight-e2e.csv ]; then
+            cp bench-results/arrow-flight-e2e.csv 
bench-results-pr/arrow-flight-e2e-main.csv
+            printf '%s' "$BASE_SHA" > 
bench-results-pr/arrow-flight-e2e-main.commit.txt
+            echo "captured same-runner main baseline at $BASE_SHA"
+          else
+            echo "::warning::main baseline re-run failed; PR comment falls 
back to the gh-pages baseline."

Review Comment:
   Problem: the same-runner baseline checked out main and recompiled Scala, but 
kept the PR's installed Python deps, so it wasn't a clean main measurement. 
Solution: re-sync main's 
`requirements.txt`/`operator-requirements.txt`/`dev-requirements.txt` via `uv 
pip install` after the base checkout (fail-soft: skips the baseline if install 
fails).



##########
.github/workflows/benchmarks.yml:
##########
@@ -273,17 +339,24 @@ jobs:
           retention-days: 14
 
       # Publish to the gh-pages dashboard. auto-push + save-data-file are
-      # both gated on push-to-main / schedule: PR runs only emit the job
-      # summary and the uploaded artifact, never touching the tracked
-      # baseline. Adding a new benchmark = adding one publish block below
-      # matching the JSON filename convention in bin/run-benchmarks.sh.
+      # gated on `schedule` ONLY: the weekly full-grid run is the single
+      # authoritative baseline writer. PR *and* push-to-main runs only emit
+      # the job summary and the uploaded artifact, never touching the
+      # tracked baseline. This is deliberate: each gh-pages write is a bot
+      # commit (one per chart, so two per run), and persisting on every
+      # merge to main flooded the repo's Pulse / all-branches commit count
+      # with `github-action-benchmark` commits. The post-merge run still
+      # gives quick signal via the rendered summary + artifact; only the
+      # weekly sweep persists. Adding a new benchmark = adding one publish
+      # block below matching the JSON filename convention in
+      # bin/run-benchmarks.sh.
       #
-      # `skip-fetch-gh-pages: true` because gh-pages does not exist on
-      # apache/texera yet — without this the action fatals with
-      # `couldn't find remote ref gh-pages` even before the auto-push
-      # decision. auto-push: true on push/schedule still creates the
-      # branch on first write. Once the dashboard is seeded, flip this
-      # to false to re-enable baseline comparison + alert-threshold.
+      # `skip-fetch-gh-pages: true` was originally needed because gh-pages
+      # did not exist; without it the action fatals with `couldn't find
+      # remote ref gh-pages` even before the auto-push decision. The branch
+      # is now seeded, so this can be flipped to false to re-enable baseline
+      # comparison + alert-threshold (a deliberate follow-up). auto-push on
+      # the weekly schedule still appends to the existing branch.

Review Comment:
   Problem: the comment implied flipping `skip-fetch-gh-pages` was just a 
cleanup, not capturing what it actually enables. Solution: rewrote it to state 
it's intentionally off for now and that turning it on compares vs main, alerts 
past threshold, and can block merge, as a later follow-up.



##########
.github/workflows/benchmarks.yml:
##########
@@ -298,8 +371,8 @@ jobs:
           tool: customBiggerIsBetter
           output-file-path: bench-results/arrow-flight-e2e-throughput.json
           github-token: ${{ secrets.GITHUB_TOKEN }}
-          auto-push: ${{ (github.event_name == 'push' && github.ref == 
'refs/heads/main') || github.event_name == 'schedule' }}
-          save-data-file: ${{ (github.event_name == 'push' && github.ref == 
'refs/heads/main') || github.event_name == 'schedule' }}
+          auto-push: ${{ github.event_name == 'schedule' }}
+          save-data-file: ${{ github.event_name == 'schedule' }}

Review Comment:
   Problem: weekly publishing is too infrequent to build a useful baseline. 
Solution: switched to daily (08:00 UTC); can add more cron entries for 
several-times-a-day later if wanted.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] ci(benchmarks): weekly-only gh-pages publish + same-runner baseline [texera]

Reply via email to