[PR] test(e2e): prewarm environment pool to hide per-spec deploy latency [apisix-ingress-controller]

via GitHub Fri, 12 Jun 2026 03:03:49 -0700


AlinsRan opened a new pull request, #2790:
URL: https://github.com/apache/apisix-ingress-controller/pull/2790


   ## Problem
   
   Profiling the e2e pipeline shows the dominant cost is **per-spec environment 
setup**, not teardown or in-test sleeps. Every `It` synchronously, on its 
critical path:
   
   ```
   BeforeEach: deploy APISIX(+etcd) + ingress-controller + httpbin → block 
~30-80s for readiness → run body → delete ns
   ```
   
   With ~229 specs, that setup latency — paid once per spec — is the 
bottleneck. Trimming teardown/sleeps only shaves the tail; it does not touch 
the `N_specs × deploy_time` term.
   
   ## Idea
   
   Stop deploying on the critical path. Build environments **ahead of time in 
the background** and have `BeforeEach` pick up a ready one — overlapping the 
next spec's deployment with the current spec's execution (pipelining). 
Isolation is unchanged: each spec still gets its own namespace + controller; we 
only change *when* the environment is built.
   
   ## What's here
   
   - **`scaffold/envpool.go`** — generic, provider-agnostic pool. A buffered 
channel of depth `D` is kept full by `D` background workers (D=1 ⇒ double 
buffer: one ready while the next builds). `bgTestingT` is a minimal terratest 
`TestingT` so provisioning can run **outside a Ginkgo spec** (no 
`Expect`/`GinkgoT` in background goroutines); panics/failures are captured as 
`pooledEnv.err`. A per-process `AfterSuite` cleans up leftover envs.
   - **`scaffold/apisix_prewarm.go`** — error-style provisioning of the 
**default profile** (namespace + dataplane(+etcd) + 5 tunnels + controller + 
httpbin) and loading it onto the scaffold.
   - **`scaffold/apisix_deployer.go`** — `BeforeEach` acquires a prewarmed env; 
**webhook/custom profiles and any provisioning error fall back to the unchanged 
synchronous deploy**, so correctness is preserved.
   - **`framework/k8s.go`** — readiness polling fix: poll every 2s instead of 
an exponential backoff that polled at 7.5/15.5/31.5/63.5s — i.e. very sparsely 
exactly in the 10-30s window where pods become ready, wasting up to ~15s per 
wait. Also adds `EnsureServiceReadyE` for background endpoint waits.
   
   ## Knobs
   
   - `E2E_PREWARM` (default `true`; set `false` to disable and use the original 
synchronous path)
   - `E2E_PREWARM_DEPTH` (default `1`)
   
   ## Expected effect & limits
   
   - Steady-state per-spec cost drops from `P + B` (deploy + body) toward 
`max(P/D, B)`. For body-heavy specs the deploy is fully hidden; for 
deploy-bound specs the gain is bounded by deploy throughput.
   - Throughput is ultimately capped by **cluster resources**: with 
`E2E_NODES=N` and depth `D`, up to `N × (1 in-use + D building)` environments 
exist at once. The default `D=1` adds at most one in-flight env per process 
over today's behavior. Tune `E2E_PREWARM_DEPTH` / runner size accordingly.
   
   ## Compatibility / safety
   
   - Confined to `scaffold` + `framework`; the `Deployer` interface and spec 
bodies are unchanged, so downstream provider implementations reuse the pool.
   - Fully gated and with synchronous fallback on any failure.
   
   ## Validation
   
   - `go build ./test/e2e/...`, `go vet ./test/e2e/...`, `gofmt` — all clean.
   - Behavior under a live cluster (including peak resource usage and the 
prewarmed controller reaching Ready) needs this PR's CI run to confirm; 
`E2E_PREWARM=false` is the kill switch if needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] test(e2e): prewarm environment pool to hide per-spec deploy latency [apisix-ingress-controller]

Reply via email to