rhodo opened a new pull request, #15722:
URL: https://github.com/apache/pinot/pull/15722
During large table rebalances, a massive number of state transitions may be
triggered. If a server cannot keep up, the size of its Helix message queue can
grow significantly. This PR adds visibility into the server-side Helix message
queue size.
Some rationale:
- This PR delegates responsibility to each server instance to monitor and
log its own message queue size metrics, instead of relying on the controller.
- It decouples the getHelixServerMessageCount() method from the metrics
scraping thread. This ensures that:
- The frequency of metrics scraping does not introduce additional I/O
pressure on ZooKeeper.
- ZooKeeper I/O latency do not interfere with the metrics scraping process.
## Test
In quickstart trigger segment reload, meanwhile intentionally block segment
reload handler in server, then observing the queue size bump from 0 -> 1, after
let segment reload go through, saw metric go back to 0

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]