+1. Sounds really nice to have feature. Let's open a ticket and open a PR.
:)

Ed

Em qua, 9 de mai de 2018 11:15, Norbert Kalmar <nkal...@cloudera.com>
escreveu:

> Hi,
>
> I just got a tip that we could improve on the logging in ZooKeeper. After a
> ZK crash, or client timeout sometimes it's hard to determine from the logs
> what happened. Knowing if ZK was responsive at the time would help a lot.
> For example, ZK might spend a lot of time waiting on GC (there is still
> some misconception that ZK is a storage).
>
> To help detect this, HADOOP already has a great tool called JVM Pause
> Monitor. (As the name suggest, it can be also used for monitoring, but it
> also helps post-mortem in a lot of cases). Basically it has a daemon that
> sleeps for one second, and if the sleep time exceeds the 1s by more than
> the threshold (1s: INFO, 10s: WARN by default - this can be configurable in
> our case, see below), it will alert/make a log entry. It can also monitor
> the time GC took.
>
> Now, this class is in the HADOOP-common. I wouldn't want to depend on
> Hadoop-common because of this one feature/class (it is actually a single
> class). Since this is a straightforward implementation, and in the past
> five years the few commits it had is nothing really serious, I think we
> could just copy this class in ZooKeeper, and introduce it as a configurable
> feature, by default it can be off.
>
> The class:
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java
>
> What do You think?
>
> Regards,
> Norbert
>

Reply via email to