acelyc111 commented on PR #2403:
URL:
https://github.com/apache/incubator-pegasus/pull/2403#issuecomment-4614325338
## Update on the Test ASAN/Release failures
After looking deeper, the `defaults.run.shell: bash` fix did **not** resolve
the ZK startup failure. Logs confirm the step now runs under `bash --noprofile
--norc -e -o pipefail {0}` (matching what works elsewhere), but ZK still exits
~1 second after launch with `Starting zookeeper ... FAILED TO START`. So
`shell` is not the root cause.
### Key finding: this is a pre-existing v2.5 problem, not a regression from
this PR
I'd been mentally comparing this PR against the most recent successful `Cpp
CI` run on what I thought was a v2.5 PR. Re-checking: **PR #2387
(`limowang/fix/disk_abnormal`, run
[`24069709672`](https://github.com/apache/incubator-pegasus/actions/runs/24069709672))
is actually master-base, not v2.5-base.** It pulled
`apache/pegasus:thirdparties-bin-test-asan-ubuntu2204-master`, not `…-v2.5`.
Querying the `thirdparty-regular-push` workflow's run history directly:
| Branch | Successful image rebuilds | Latest success |
|---|---|---|
| `master` | 33 | 2025-11-25 |
| `build-env-ubuntu-cmake-3` | 1 | 2025-05-06 |
| `build-env-ubuntu-sasl2-modules` | 1 | 2025-09-16 |
| **`v2.5`** | **0** | **(never)** |
v2.5's only run of that workflow was a `push`-triggered run on 2026-04-09
that **`startup_failure`'d before any job** (almost certainly the same ASF
allow-list block this PR exists to fix).
So `apache/pegasus:thirdparties-bin-test-ubuntu2204-v2.5` is a one-shot
image, presumably hand-built when v2.5 was cut, never refreshed. Whatever's
wrong with its JVM / ZK runtime has been broken for the whole life of v2.5; the
issue only became visible now because **this is the first time `Test
ASAN`/`Test Release` jobs have ever made it past the ASF allow-list / startup
phase on a v2.5 PR.**
### Concrete evidence the image is the problem and not anything in this PR
- `Build ASAN` and `Build Release` use the **same** image and pass cleanly.
They never start the JVM, only link against it.
- `Test ASAN/Release` are the only jobs that invoke `./run.sh test`, which
calls `zkServer.sh start`. JVM exits ~1 second after `Using config: ...zoo.cfg`.
- The PR diff only touches `.github/workflows/*.yml` — there is no code path
through which it could affect a Docker Hub image.
### What I just pushed (`945b2bd32`)
A temporary `if: failure()` diagnostics step on the two `Unit Testing` steps
that dumps `java -version`, ulimit, the ZK install dir, and every candidate
location of `zookeeper.out`. Once the next run finishes, we'll have the JVM's
actual stderr, which should pin down the failure to one of: JDK broken in
image, glibc/libc++ mismatch, dataDir permissions, or stack/mmap denied by
container security.
**This commit should be reverted before merging this PR.** It exists only to
diagnose the unrelated v2.5 image issue.
### Suggested path forward for this PR
The PR's stated goal — making v2.5 PRs comply with the ASF allow-list and
surface non-startup_failure CI results — is achieved (Lint, Build ASAN, Build
Release, Build with jemalloc, IWYU, Standardization Lint, Module Labeler,
Golang Lint/Test all pass). The remaining `Test ASAN`/`Test Release` failures
are a separate, longstanding v2.5 release-branch infrastructure issue: the
third-party image needs to be rebuilt before C++ unit tests on v2.5 can ever
pass.
A few options for how to land this PR without holding it on the image issue:
1. Land it as-is, treating Test ASAN/Test Release red as a known v2.5 issue
tracked separately.
2. Land it with the diagnostics commit reverted, and open a follow-up issue
(or PR) to rebuild `apache/pegasus:thirdparties-bin-test-ubuntu2204-v2.5` once
the diagnostics output identifies the specific root cause.
3. If preferred, I can add an `if: false` (or a matrix-skip) on `test_ASAN`
/ `test_Release` *for v2.5 only*, with a clear comment, so v2.5 PRs don't show
red until the image is fixed.
Happy to take whichever direction you'd like.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]