mrproliu opened a new pull request, #1188: URL: https://github.com/apache/skywalking-banyandb/pull/1188
## Summary Following https://github.com/apache/skywalking-banyandb/actions/runs/27990546888/job/82841967908. The scheduled **Flaky Test** workflow on `main` intermittently fails in the `banyand/internal/dump/measure` suite: ``` --- FAIL: TestMeasureIndexedTagResolvedFromIndex (61.14s) suite_test.go:362: Condition never satisfied Messages: series index not fully persisted after stop ``` While investigating, the root cause turned out to expose a **real production durability gap** in the series index, not just a slow test. This PR fixes both: - **(A)** makes the test resilient to persister scheduling pressure under `-race`; - **(B)** persists the series index synchronously on graceful shutdown so an offline reader (dump / migration / offline CLI) always sees a complete index. ## Root cause The failing assertion polls the **on-disk** series index (`sidx`) and waits for both written series to become visible: ```go require.Eventually(t, func() bool { count, _ := inverted.ReadOnlyDocCount(sidxPath) // reads the persisted snapshot only return count >= int64(total) // total = 2 }, 60*time.Second, 100*time.Millisecond, ...) ``` `ReadOnlyDocCount` opens a fresh read-only bluge reader, so it only sees documents that have already been **persisted to disk** — not the ones still held in the writer's in-memory segments. Tracing how the series index reaches disk: 1. The standalone write loop calls `IndexDB().Insert(...)` (`write_standalone.go`), which goes through `bluge.Writer.Batch`. 2. The test runs with `--measure-flush-timeout=200ms`. In `measure/metadata.go` this is converted to whole seconds (`flushTimeout.Nanoseconds() / int64(time.Second)`), which **truncates to `0`**, so `BatchWaitSec = 0` and bluge is configured **without** `WithUnsafeBatches`. 3. In "safe batch" mode, bluge's `prepareSegment` blocks the `Batch()` call until the batch is **persisted** (`introduction.persisted` channel). The persistence itself is performed by the background **persister goroutine**; `Batch()` simply blocks on it. 4. Crucially, `bluge.Writer.Close()` does **not** flush pending in-memory segments: it signals the persister to stop (`close(closeCh)` → the persister `break`s without a final flush) and then drops the in-memory root (`replaceRoot(nil)`). Putting the timeline together for the failing run: - `mustAddDataPoints` (part) runs *before* `Insert` (series index) in the same goroutine; the part is flushed to disk by a separate flusher, so the earlier `findPartDirs > 0` check passes. - Meanwhile the `Insert` may still be **blocked** on `<-introduction.persisted`, because the persister has not persisted it yet. - The line-362 poll therefore keeps reading `0/1` on disk until it times out. So the flake is the **bluge persister/introducer goroutine being starved** under the heavy parallel `-race` CI run, which keeps the synchronous `Insert` (and the on-disk count) blocked for the full 60s. With `nap=0` this step is normally sub-millisecond, which is why it only fails occasionally. A self-aggravating factor: the `Eventually` opens a brand-new bluge reader every 100ms (600 times), competing for CPU/IO with the very persister goroutine it is waiting on. ### The production gap this uncovered Because `Close()` discards unpersisted in-memory documents, a database that is **gracefully shut down** while the async persister still has a backlog (in production it runs with unsafe batches and a 5s nap) can leave the on-disk series index **incomplete**. The dump / migration / offline-CLI tools read a quiesced, on-disk database, so they could miss the most recently written series. This is a correctness gap independent of the test flake. ## Changes ### B — durable series index on graceful shutdown - `pkg/index/index.go` — add `Flush() error` to the `Store` interface. - `pkg/index/inverted/inverted.go` — implement `store.Flush()`: submit an empty batch carrying a persisted callback and block until it fires. Because the introducer always advances the epoch, the callback fires once the root snapshot (covering all earlier in-memory segments) is durably on disk. - `banyand/internal/storage/index.go` — add `seriesIndex.Flush()`. - `banyand/internal/storage/segment.go` — in `segmentController.close()` (the graceful-shutdown path), `Flush()` each live segment's index **before** closing it. Segments flagged for deletion are skipped (their directory is removed anyway). This runs **only on shutdown** — not on the idle-reclaim or delete paths, which would otherwise force a synchronous persist on a hot path. ### A — make the test resilient - `banyand/internal/dump/measure/suite_test.go` — poll at a coarse `1s` interval (was `100ms`) and raise the timeout to `90s` (was `60s`). The coarse interval stops the read-only-reader loop from starving the persister it is waiting on. > ⚠️ Implementation note: `Flush()` was first placed in `seriesIndex.Close()`, > which is also called from idle-reclaim and pre-delete paths. Forcing a > synchronous persist on every segment close blew the `storage` suite up to a > 400s timeout. Narrowing it to the shutdown path only brought it back to ~35s. ## Testing All run locally with `-race`: - `go build ./...`, `go vet` — pass - `golangci-lint` (project-pinned v1.64.8) — clean - `banyand/internal/storage` — pass, **35.1s** (was a 400s timeout with the overly-broad flush placement) - `banyand/internal/dump/measure` — pass, 9.9s - `pkg/index/inverted` — pass - `TestMeasureIndexedTagResolvedFromIndex` ×3 — pass - [ ] If this pull request closes/resolves/fixes an existing issue, replace the issue number. Fixes apache/skywalking#<issue number>. - [ ] Update the [`CHANGES` log](https://github.com/apache/skywalking-banyandb/blob/main/CHANGES.md). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
