arunkumarucet opened a new pull request, #18823:
URL: https://github.com/apache/pinot/pull/18823
## Summary
- Adds a new `TABLE_TENANT_INFO` controller gauge emitted by
`SegmentStatusChecker` that encodes the server tenant name as a key segment in
the JMX metric name:
`pinot.controller.tableTenantInfo.<tableNameWithType>.<serverTenant> = 1`
- Adds a dedicated JMX exporter rule in `controller.yml` that extracts
`table`, `tableType`, `tenant`, and `database` as Prometheus labels from this
metric
- Enables tenant-scoped aggregation of any existing table-level metric in
Prometheus via a `group_left(tenant)` join — no changes to broker/server metric
pipelines required
## Motivation
Previously there was no way to aggregate table-scoped metrics (e.g.
`numDocsScanned`, segment counts) by tenant in Prometheus/Grafana without
scattered, disruptive changes to add a `tenant` tag throughout the metrics
pipeline. This approach exposes the table→tenant mapping as a standalone info
metric that Prometheus can join against:
```promql
sum by (tenant) (
sum by (table) (pinot_server_numDocsScanned_OneMinuteRate{...})
* on(table) group_left(tenant)
pinot_controller_tableTenantInfo
)
```
## Implementation
**Emission strategy:**
- The gauge is written only once per `(table, tenant)` pair — on first
registration or when the tenant changes. It is **not** re-emitted on every
5-minute `SegmentStatusChecker` cycle (early-return when tenant is unchanged).
- `_tableTenantMap` tracks the current tenant per table so stale gauges are
removed on: tenant change, null table config, and table removal
(`nonLeaderCleanup`).
- The new gauge is registered **before** removing the old tenant's gauge on
a tenant change, to avoid a scrape-window gap.
**JMX metric name:**
```
"org.apache.pinot.common.metrics":type="ControllerMetrics",
name="pinot.controller.tableTenantInfo.<tableNameWithType>.<serverTenant>"
```
**Prometheus output (via JMX exporter):**
```
pinot_controller_tableTenantInfo_Value{table="airlineStats",
tableType="OFFLINE", tenant="DefaultTenant"} 1
```
## Test plan
- [ ] `SegmentStatusCheckerTest#tableTenantInfoGaugeNamedTenantTest` — named
server tenant is registered
- [ ]
`SegmentStatusCheckerTest#tableTenantInfoGaugeDefaultTenantFallbackTest` —
falls back to `DefaultTenant` when no tenant configured
- [ ]
`SegmentStatusCheckerTest#tableTenantInfoGaugeTenantChangeCleansStaleGaugeTest`
— stale gauge removed when tenant changes
- [ ]
`SegmentStatusCheckerTest#tableTenantInfoGaugeTableRemovedCleansUpTest` — gauge
cleaned up via `nonLeaderCleanup`
- [ ] `SegmentStatusCheckerTest#tableTenantInfoGaugeRealtimeTableTest` —
REALTIME table type covered
- [ ] Verified locally via batch quickstart: 10 MBeans registered, all
value=1, JMX exporter regex validated against no-database, with-database, and
REALTIME patterns
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]