hanahmily opened a new pull request, #1167:
URL: https://github.com/apache/skywalking-banyandb/pull/1167
## Summary
Two related improvements to the lifecycle service that close a long-standing
"empty sender" gap in the receiver's `banyandb_queue_sub_total_finished`
metrics, plus a small at-a-glance health panel for dashboards.
## Changes
### Sender identity stamping (the main fix)
The lifecycle's tier-migration publisher was building its queue client via
`pub.NewWithoutMetadata(omr)` and never called `SetSelfNode`. As a result the
wire `SendRequest` carried empty `SenderNode` / `SenderRole` / `SenderTier`,
and the receiving data node recorded:
```
banyandb_queue_sub_total_finished{
group="...", operation="file-sync",
remote_node="", remote_role="lifecycle", remote_tier="", ...
}
```
The liaison already did this (see `pkg/cmdsetup/liaison.go:170-171`):
```go
setter.SetSelfNode(node.NodeID, "liaison", liaisonTier)
```
This PR mirrors that pattern in `parseGroup` by deriving the lifecycle's
self identity from already-known inputs (no new CLI flags):
1. `coLocatedDataNodeAddr` (`--grpc-addr`): the gRPC address of the
co-located data node. Match by `GrpcAddress` in the data-node
registry — the production sidecar path.
2. Fall back to label-superset match against `--node-labels`.
3. Fall back to `Labels["type"]`-only match.
4. Empty fallthrough preserves the pre-fix no-op behavior.
The result is `client.SetSelfNode(senderNode, "lifecycle", senderTier)`,
where `senderNode` is the matched data node's `Metadata.Name` and
`senderTier` is its `Labels["type"]`.
### Last-run gauges
Two new `banyandb_lifecycle_last_run_*` gauges give dashboards an
"is the lifecycle healthy" signal:
* `banyandb_lifecycle_last_run_timestamp_seconds` — wall-clock epoch
(in seconds) of the most recent `action()` call.
* `banyandb_lifecycle_last_run_success` — `1` on a nil error, `0`
otherwise.
Both are stamped by a `defer` at the end of `action()` so every return
path (success, error, recovered panic) updates the pair atomically.
### Test infrastructure
* `pkg/test/setup/setup.go`: `DataNode*` helpers allocate an HTTP port
and pass `--http-port` so tests can scrape the data node's
`/metrics`.
* `banyand/queue/sub/server.go`: mounts `/metrics` on the data node's
HTTP router when the metrics registry exposes a Prometheus handler
(parity with the lifecycle's `buildHTTPRouter`).
* `LifecycleSharedContext` gains `DataHTTPURL` and `WarmHTTPURL`.
* Integration test (`test/cases/lifecycle/lifecycle.go`) scrapes the
data node's `/metrics` and asserts that at least one
`banyandb_queue_sub_total_finished` series carries
`remote_node="<ip:port>"` + `remote_role="lifecycle"` +
`remote_tier="hot"` — direct end-to-end evidence the wire messages
round-tripped with sender identity.
## Verification
* `make license-check` ✅
* `make check-req` ✅
* `make build` ✅
* `make lint` ✅ (after two fixup commits)
* `make check` ✅
* `TestPropertyLifecycle` (distributed) — 5/5 specs, 51s ✅
- Real scrape (warm data node, sw_cross_segment group):
```
banyandb_queue_sub_total_finished{
group="sw_cross_segment", operation="file-sync",
remote_node="127.0.0.1:39739", remote_role="lifecycle",
remote_tier="hot"
} 2
```
* Unit tests — 10/10 in `banyand/backup/lifecycle/` (3 new
`TestRecordLastRun*` tests guard the success/failure/nil-gauge paths).
## Commits
* `6d310e84` feat(lifecycle): stamp sender identity on migration publisher
* `acc28a16` chore: fix lint issues
* `f5a48578` feat(lifecycle): add last_run_timestamp_seconds and
last_run_success gauges
* `ec4d3b94` docs: add CHANGES.md entries for lifecycle sender identity and
last-run gauges
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]