Okay, thanks Ed, I created the Jira, will look into it soon :) https://issues.apache.org/jira/browse/ZOOKEEPER-3037
Regards, Norbert On Wed, May 9, 2018 at 4:44 PM Edward Ribeiro <[email protected]> wrote: > +1. Sounds really nice to have feature. Let's open a ticket and open a PR. > :) > > Ed > > Em qua, 9 de mai de 2018 11:15, Norbert Kalmar <[email protected]> > escreveu: > > > Hi, > > > > I just got a tip that we could improve on the logging in ZooKeeper. > After a > > ZK crash, or client timeout sometimes it's hard to determine from the > logs > > what happened. Knowing if ZK was responsive at the time would help a lot. > > For example, ZK might spend a lot of time waiting on GC (there is still > > some misconception that ZK is a storage). > > > > To help detect this, HADOOP already has a great tool called JVM Pause > > Monitor. (As the name suggest, it can be also used for monitoring, but it > > also helps post-mortem in a lot of cases). Basically it has a daemon that > > sleeps for one second, and if the sleep time exceeds the 1s by more than > > the threshold (1s: INFO, 10s: WARN by default - this can be configurable > in > > our case, see below), it will alert/make a log entry. It can also monitor > > the time GC took. > > > > Now, this class is in the HADOOP-common. I wouldn't want to depend on > > Hadoop-common because of this one feature/class (it is actually a single > > class). Since this is a straightforward implementation, and in the past > > five years the few commits it had is nothing really serious, I think we > > could just copy this class in ZooKeeper, and introduce it as a > configurable > > feature, by default it can be off. > > > > The class: > > > > > https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java > > > > What do You think? > > > > Regards, > > Norbert > > >
