Hi, We've sketched out an initial high-level design for a new monitoring framework for Helix. The primary goal is to decrease our time to detect soft failures, but this is really just a design to help propagate statistics around in any Helix-managed system. Any feedback is appreciated.
The document is available on the wiki: https://cwiki.apache.org/confluence/display/HELIX/Helix+Monitoring+Design Thanks, Kanak
