As one of the authors of pacemaker in Apache Storm (and the paper), I am happy to answer any questions about why we did it or how it works. The reality of it is storm was, and still is by default, abusing zookeeper by trying to store a massive amount of metrics in it, instead of the configuration/coordination it was designed for. And since storm metrics don't really need strong consistency or even that much in terms of reliability guarantees we stood up a netty server in front of a ConcurrentHashMap (quite literately) and then wrote a client that could handle fail-over. It really is meant as a scalability stepping stone until we can get to the point that all the metrics go to a TSDB that is actually designed for metrics. But like I said if you have any questions I am happy to answer them. Sadly because of the way IEEE works neither I nor my employer own the copy right to that paper any more so I can't even put a copy of it up for you to read.
- Bobby On Thursday, January 26, 2017, 6:44:56 AM CST, ibrahim El-sanosi <ibrahimsaba...@gmail.com> wrote:Hi folk, There is a paper published recently "PaceMaker: When ZooKeeper Arteries Get Clogged in Storm Clusters" [1]. It may worth to read. [1] http://ieeexplore.ieee.org/document/7820303/?tp=&arnumber=7820303&contentType=Conference%20Publications&dld=eWFob28uY29t&source=SEARCHALERT Ibrahim