Thanks Patrick, great question.
My understanding is that this tool not only shows if JVM spends too much
time in GC, but if, for any other reason, there is a JVM pause (The tool
only differentiates GC pause from all other pause). This could be slow
fsync (although we do have logs for that) or even server/OS related.

But again, this is just my interpretation. I will ask the source of the
idea, what extra benefits this gives them over java GC log.

I checked ZK, I don't see it enabled by default, but GC logging can be set
with JVM parameters easily, so that shouldn't be a key factor anyway.

Regards,
Norbert

On Wed, May 9, 2018 at 7:57 PM Patrick Hunt <[email protected]> wrote:

> Do you know why they did this rather than just enabling GC logging by
> default? Why re-invent the wheel?
>
> I seem to remember seeing a push do enable GC logging by default a few
> years ago. In particular around the time when the JVM added GC log rolling
> as a feature. Here's an example:
>
> https://batmat.net/2016/10/17/always-enable-gc-logs-and-how-to-enable-logs-rotation-with-hotspot/
> My understanding is that the overhead is so low that it's feasible to do
> this.
>
> Good improvement though regardless which way we go.
>
> Regards,
>
> Patrick
>
> On Wed, May 9, 2018 at 9:36 AM, Andor Molnar <[email protected]> wrote:
>
> > +1 cool!
> >
> >
> > On Wed, May 9, 2018 at 7:59 AM, Norbert Kalmar <[email protected]>
> > wrote:
> >
> > > Okay, thanks Ed, I created the Jira, will look into it soon :)
> > > https://issues.apache.org/jira/browse/ZOOKEEPER-3037
> > >
> > > Regards,
> > > Norbert
> > >
> > > On Wed, May 9, 2018 at 4:44 PM Edward Ribeiro <
> [email protected]>
> > > wrote:
> > >
> > > > +1. Sounds really nice to have feature. Let's open a ticket and open
> a
> > > PR.
> > > > :)
> > > >
> > > > Ed
> > > >
> > > > Em qua, 9 de mai de 2018 11:15, Norbert Kalmar <[email protected]
> >
> > > > escreveu:
> > > >
> > > > > Hi,
> > > > >
> > > > > I just got a tip that we could improve on the logging in ZooKeeper.
> > > > After a
> > > > > ZK crash, or client timeout sometimes it's hard to determine from
> the
> > > > logs
> > > > > what happened. Knowing if ZK was responsive at the time would help
> a
> > > lot.
> > > > > For example, ZK might spend a lot of time waiting on GC (there is
> > still
> > > > > some misconception that ZK is a storage).
> > > > >
> > > > > To help detect this, HADOOP already has a great tool called JVM
> Pause
> > > > > Monitor. (As the name suggest, it can be also used for monitoring,
> > but
> > > it
> > > > > also helps post-mortem in a lot of cases). Basically it has a
> daemon
> > > that
> > > > > sleeps for one second, and if the sleep time exceeds the 1s by more
> > > than
> > > > > the threshold (1s: INFO, 10s: WARN by default - this can be
> > > configurable
> > > > in
> > > > > our case, see below), it will alert/make a log entry. It can also
> > > monitor
> > > > > the time GC took.
> > > > >
> > > > > Now, this class is in the HADOOP-common. I wouldn't want to depend
> on
> > > > > Hadoop-common because of this one feature/class (it is actually a
> > > single
> > > > > class). Since this is a straightforward implementation, and in the
> > past
> > > > > five years the few commits it had is nothing really serious, I
> think
> > we
> > > > > could just copy this class in ZooKeeper, and introduce it as a
> > > > configurable
> > > > > feature, by default it can be off.
> > > > >
> > > > > The class:
> > > > >
> > > > >
> > > > https://github.com/apache/hadoop/blob/trunk/hadoop-
> > > common-project/hadoop-common/src/main/java/org/apache/
> > > hadoop/util/JvmPauseMonitor.java
> > > > >
> > > > > What do You think?
> > > > >
> > > > > Regards,
> > > > > Norbert
> > > > >
> > > >
> > >
> >
>

Reply via email to