I added the suggestions to the jira.

Thanks all!

On Thu, May 10, 2018 at 3:34 AM Prasanth Mathialagan <
[email protected]> wrote:

> Hi,
> This looks cool :) I have a suggestion. It would be nice if we could add
> the current size of the heap or (% of heap used) in the log entry whenever
> sleep threshold had exceeded a lot. It could be helpful.
>
> On Wed, May 9, 2018 at 11:26 AM, Patrick Hunt <[email protected]> wrote:
>
> > On Wed, May 9, 2018 at 11:11 AM, Norbert Kalmar <[email protected]>
> > wrote:
> >
> > > Thanks Patrick, great question.
> > > My understanding is that this tool not only shows if JVM spends too
> much
> > > time in GC, but if, for any other reason, there is a JVM pause (The
> tool
> > > only differentiates GC pause from all other pause). This could be slow
> > > fsync (although we do have logs for that) or even server/OS related.
> > >
> > > But again, this is just my interpretation. I will ask the source of the
> > > idea, what extra benefits this gives them over java GC log.
> > >
> > > I checked ZK, I don't see it enabled by default, but GC logging can be
> > set
> > > with JVM parameters easily, so that shouldn't be a key factor anyway.
> > >
> > >
> > I think that would be a useful change regardless - to make it on by
> default
> > I mean. Also some docs wrt our recommendations, how to troubleshoot,
> etc...
> > Adding a feature is useful, but ensuring people know about it and can
> use
> > it effectively is even more so.
> >
> > Regards,
> >
> > Patrick
> >
> >
> > > Regards,
> > > Norbert
> > >
> > > On Wed, May 9, 2018 at 7:57 PM Patrick Hunt <[email protected]> wrote:
> > >
> > > > Do you know why they did this rather than just enabling GC logging by
> > > > default? Why re-invent the wheel?
> > > >
> > > > I seem to remember seeing a push do enable GC logging by default a
> few
> > > > years ago. In particular around the time when the JVM added GC log
> > > rolling
> > > > as a feature. Here's an example:
> > > >
> > > > https://batmat.net/2016/10/17/always-enable-gc-logs-and-how-
> > > to-enable-logs-rotation-with-hotspot/
> > > > My understanding is that the overhead is so low that it's feasible to
> > do
> > > > this.
> > > >
> > > > Good improvement though regardless which way we go.
> > > >
> > > > Regards,
> > > >
> > > > Patrick
> > > >
> > > > On Wed, May 9, 2018 at 9:36 AM, Andor Molnar <[email protected]>
> > wrote:
> > > >
> > > > > +1 cool!
> > > > >
> > > > >
> > > > > On Wed, May 9, 2018 at 7:59 AM, Norbert Kalmar <
> [email protected]
> > >
> > > > > wrote:
> > > > >
> > > > > > Okay, thanks Ed, I created the Jira, will look into it soon :)
> > > > > > https://issues.apache.org/jira/browse/ZOOKEEPER-3037
> > > > > >
> > > > > > Regards,
> > > > > > Norbert
> > > > > >
> > > > > > On Wed, May 9, 2018 at 4:44 PM Edward Ribeiro <
> > > > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > > +1. Sounds really nice to have feature. Let's open a ticket and
> > > open
> > > > a
> > > > > > PR.
> > > > > > > :)
> > > > > > >
> > > > > > > Ed
> > > > > > >
> > > > > > > Em qua, 9 de mai de 2018 11:15, Norbert Kalmar <
> > > [email protected]
> > > > >
> > > > > > > escreveu:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I just got a tip that we could improve on the logging in
> > > ZooKeeper.
> > > > > > > After a
> > > > > > > > ZK crash, or client timeout sometimes it's hard to determine
> > from
> > > > the
> > > > > > > logs
> > > > > > > > what happened. Knowing if ZK was responsive at the time would
> > > help
> > > > a
> > > > > > lot.
> > > > > > > > For example, ZK might spend a lot of time waiting on GC
> (there
> > is
> > > > > still
> > > > > > > > some misconception that ZK is a storage).
> > > > > > > >
> > > > > > > > To help detect this, HADOOP already has a great tool called
> JVM
> > > > Pause
> > > > > > > > Monitor. (As the name suggest, it can be also used for
> > > monitoring,
> > > > > but
> > > > > > it
> > > > > > > > also helps post-mortem in a lot of cases). Basically it has a
> > > > daemon
> > > > > > that
> > > > > > > > sleeps for one second, and if the sleep time exceeds the 1s
> by
> > > more
> > > > > > than
> > > > > > > > the threshold (1s: INFO, 10s: WARN by default - this can be
> > > > > > configurable
> > > > > > > in
> > > > > > > > our case, see below), it will alert/make a log entry. It can
> > also
> > > > > > monitor
> > > > > > > > the time GC took.
> > > > > > > >
> > > > > > > > Now, this class is in the HADOOP-common. I wouldn't want to
> > > depend
> > > > on
> > > > > > > > Hadoop-common because of this one feature/class (it is
> > actually a
> > > > > > single
> > > > > > > > class). Since this is a straightforward implementation, and
> in
> > > the
> > > > > past
> > > > > > > > five years the few commits it had is nothing really serious,
> I
> > > > think
> > > > > we
> > > > > > > > could just copy this class in ZooKeeper, and introduce it as
> a
> > > > > > > configurable
> > > > > > > > feature, by default it can be off.
> > > > > > > >
> > > > > > > > The class:
> > > > > > > >
> > > > > > > >
> > > > > > > https://github.com/apache/hadoop/blob/trunk/hadoop-
> > > > > > common-project/hadoop-common/src/main/java/org/apache/
> > > > > > hadoop/util/JvmPauseMonitor.java
> > > > > > > >
> > > > > > > > What do You think?
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Norbert
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to