hanahmily opened a new pull request, #1169: URL: https://github.com/apache/skywalking-banyandb/pull/1169
## Summary - **Dual catalogs on both sides**: adds `total_batch_*` (one per batch-mode wire stream) and `total_message_*` (one per dispatched message) to both publisher and subscriber, making the two sides directly comparable. - **New `meter.BatchBuckets`**: extends histogram buckets to 300 s (`[0.1, 0.5, 1, 2.5, 5, 10, 30, 60, 120, 300]`) — the existing `DefBuckets` top at 10 s which collapses all migration/relay observations into `+Inf`. - **Dashboard fix**: flows-table panel (Sub column) swapped from `queue_sub_total_started` (per-batch) to `queue_sub_total_message_finished` (per-message), eliminating the ~950:1 Pub/Sub imbalance visible in batch mode. - **Metrics reference updated**: new rows for batch and message catalogs in `docs/operation/observability/metrics.md`, plus corrected instrument count for the lifecycle-migration mirror. ## Root cause In batch-write mode, the publisher counted one tick per message while the subscriber counted one tick per batch. A single liaison→liaison relay stream carries ~950 messages per batch, so the flows table showed `950 msg/s` on the pub side and `1 batch/s` on the sub side — visually a ~950:1 ratio with no indication of why. ## Changes | File | What changed | |---|---| | `pkg/meter/meter.go` | Add `BatchBuckets` | | `banyand/queue/pub/pub.go` | 3 new instruments on `pubMetrics` | | `banyand/queue/pub/migration_metrics.go` | Mirror: same 3 instruments on `pubMigrationMetrics` | | `banyand/queue/pub/batch.go` | Capture `batchStart`, wire instruments into `listenBatchResponse` | | `banyand/queue/sub/server.go` | 5 new instruments on sub `metrics` | | `banyand/queue/sub/helpers.go` | Tick batch + message catalogs in `handleBatch` / `handleEOF` | | `banyand/queue/sub/sub.go` | Tick message catalog in non-batch `dispatchMessage` | | `docs/operation/grafana-fodc-nodes.json` | Swap flows-table Sub column to `total_message_finished` | | `docs/operation/observability/metrics.md` | New catalog reference rows + corrected counts | ## Test plan - [ ] `go build ./...` — BUILD_OK - [ ] `go vet ./banyand/queue/... ./pkg/meter/...` — VET_OK - [ ] `golangci-lint run ./banyand/queue/... ./pkg/meter/...` — LINT_OK - [ ] `go test ./banyand/queue/... ./pkg/meter/... -count=1` — all pass (pub 73.9 s, sub 0.11 s, integration 7.8 s) - [ ] `go test ./test/integration/distributed/lifecycle/` — ok 51.3 s - [ ] Panel-61 PromQL (8 targets A–H) validate clean with Prometheus parser -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
