Hi, We're looking at improving health reporting metrics for Helix so that we can hopefully detect failures faster, while keeping reporting lightweight and scalable.
Here's a summary of an exploration I did regarding integrating Helix with Riemann (http://riemann.io), which is a popular monitoring system. It's not done yet, but I'd like to get any feedback and ideas you have. https://cwiki.apache.org/confluence/display/HELIX/Health+Metrics+Reporting Thanks, Kanak
