swaminathanmanish opened a new pull request, #18517: URL: https://github.com/apache/pinot/pull/18517
## Summary `MinionSubTaskHighWaitTime` alert does not self-resolve after the minion queue drains because it is based on `ControllerTimer.SUBTASK_WAITING_TIME`, a histogram. Histogram `_Max` retains its peak value across emission cycles and does not decay when there are no longer any waiting subtasks, causing the alert to stay firing indefinitely. - Add a new `MAX_SUBTASK_WAIT_TIME_MS` gauge in `ControllerGauge` (per-table, non-global) - In `TaskMetricsEmitter`, replace the timer emission with per-`(table, taskType)` gauge emission: the max wait time across all waiting subtasks, or `0` when none are waiting - Clean up the gauge in `removeTableTaskTypeMetrics` when a task type/table is retired - Update `TaskMetricsEmitterTest` to reflect the new metric and assert correct gauge values The gauge is written every emit cycle and self-resolves when the queue clears. Alert rule update (separate config repo): ``` expr: max(pinot_controller_MaxSubtaskWaitTimeMs) by (exported_table, taskType) > 14400000 ``` ## Test plan - [ ] `TaskMetricsEmitterTest` updated with assertions for `maxSubtaskWaitTimeMs` gauge values per table (3000ms for waiting table, 0ms for non-waiting table) - [ ] Run `./mvnw -pl pinot-controller -am -Dtest=TaskMetricsEmitterTest test` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
