yashmayya opened a new pull request, #18526: URL: https://github.com/apache/pinot/pull/18526
## Background The Multi-Stage Engine intermediate-stage code in `pinot-query-runtime` emits engine metrics by directly calling `ServerMetrics.get()`. `ServerMetrics` is registered only in `BaseServerStarter`. Two concrete bugs follow from this coupling: 1. **`OpChainSchedulerService.Metrics` NOOP-binding** (`pinot-query-runtime/.../executor/OpChainSchedulerService.java`). The inner class field-initializes five `PinotMeter` handles via `ServerMeter.X.getGlobalMeter()`, which resolves against whichever `ServerMetrics` is registered at that moment. If construction happens before `ServerMetrics.register(...)` runs — which happens in unit tests, embedded query-runtime harnesses, and any non-server context that constructs the scheduler directly — the captured handles are permanently bound to the NOOP registry. The op-chain started/completed counters and the per-op-chain CPU/memory/row counters then silently emit zero for the lifetime of the JVM. No later \`ServerMetrics.register()\` call can fix the captured references. 2. **`MultiStageBrokerRequestHandler` NOOP-binding** (`pinot-broker/.../requesthandler/MultiStageBrokerRequestHandler.java`). Same field-init-time anti-pattern for four `BrokerMeter.MSE_STAGES_*` and `BrokerMeter.MSE_OPCHAINS_*` handles, resolved before `BrokerMetrics.register(...)` is guaranteed to have run. Beyond the bug fixes, the static-singleton coupling makes the MSE engine layer hard to test in isolation: every test that wants to assert metric emission must stand up a full `ServerMetrics` singleton and remember to deregister it. There is no way to install a recording fixture without touching global state. ## Change This PR introduces `MseMetricsEmitter`, a thin pluggable interface in `pinot-common`, that MSE engine code calls instead of `ServerMetrics.get()` directly. The interface is keyed on the existing `ServerMeter` / `ServerTimer` enum constants — no new metric names, no new enums, no new Prometheus namespaces. - `MseMetricsEmitter` exposes three methods: `addMeteredGlobalValue(ServerMeter, long)`, `getMeteredValue(ServerMeter)` (for `MetricsExecutor` handles), `addTimedValue(ServerTimer, long, TimeUnit)`. - `ServerMetricsEmitter` is the default implementation. It forwards every call to `ServerMetrics.get()` at *call time* (not at construction). This eliminates the NOOP-binding hazard. - `BaseServerStarter` registers `ServerMetricsEmitter` immediately after `ServerMetrics.register(_serverMetrics)`. A second `register(...)` call (e.g. an integration test that restarts the server in the same JVM) silently no-ops with an INFO log, matching the `ServerMetrics.register` pattern. - MSE emission sites — `QueryServer.submit()`, `QueryServer` constructor, `QueryRunner.init()`, `OpChainSchedulerService.Metrics`, `MultiStageOperator.Type.updateServerMetrics`, `MailboxSendOperator.updateMetrics` — switch from `ServerMetrics.get()` to `MseMetricsEmitter.get()`. - `OpChainSchedulerService.Metrics` is rewritten to drop the field-cached `PinotMeter` handles entirely; each emission resolves the emitter at call time. Op-chain lifecycle events are coarse (once per stage worker, not per row), so the additional indirection is unmeasurable. - `MultiStageBrokerRequestHandler` is fixed independently of the emitter — the four field-cached handles are replaced with `volatile` fields populated lazily via `BrokerMetrics.get().addMeteredGlobalValue(meter, count, reusedMeter)`. These are broker dispatch counters that stay on `BrokerMetrics`; they do not go through `MseMetricsEmitter`. ## What does NOT change - No metric names change. \`pinot.server.mse*\` Prometheus output is identical. - No JMX exporter rules change. - No new enum entries on `ServerMeter`, `BrokerMeter`, `ServerTimer`, `BrokerTimer`, `ServerGauge`, `BrokerGauge`. - No new Prometheus namespace. - No new feature flag. - `GrpcMailboxServer` is untouched — its existing `InstanceType.BROKER`/`SERVER` branch correctly routes the mailbox memory gauges through `BrokerMetrics` / `ServerMetrics` respectively. ## Compatibility - Source and binary compatible — the only public class signatures touched are on `MultiStageOperator.Type` (a public enum nested inside a `pinot-query-runtime` class; the `updateServerMetrics` method signature changes from `(StatMap, ServerMetrics)` to `(StatMap, MseMetricsEmitter)`). This is an internal execution-layer API; no external caller in the OSS tree exercises it. - Rolling-upgrade safe — registration happens before any MSE code runs; no wire-protocol or schema changes. ## Tests - `MseMetricsEmitterTest` — singleton lifecycle (NOOP fallback, first-write-wins, deregister). - `ServerMetricsEmitterTest` — default impl correctly forwards meter, timer, and `MetricsExecutor` handle lookups to `ServerMetrics`. Includes \`testLateServerMetricsRegistrationStillEmitsCorrectly\` which proves the NOOP-binding hazard is gone (construct emitter → emit → register fresh `ServerMetrics` → re-emit → assert the new registry sees the increment). - `OpChainSchedulerServiceMetricsTest` — regression test for the field-cached NOOP-binding bug. Constructs the scheduler before any metrics registration; registers; drives an op-chain to completion; asserts the live registry sees `MSE_OPCHAINS_STARTED`. - `QueryServerTest` — existing tests pass unchanged, exercising the full `QueryServer.submit()` → `MseMetricsEmitter` → `ServerMetrics` path. Test setup registers `ServerMetricsEmitter` alongside the existing `ServerMetrics` setup. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
