[ 
https://issues.apache.org/jira/browse/HADOOP-14960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16261861#comment-16261861
 ] 

Misha Dmitriev commented on HADOOP-14960:
-----------------------------------------

[~xkrogen] indeed, the GC time percentage is likely to go up and down all the 
time. Do you mean this code at line 190 in JvmMetrics.java, correct?

{code}
    if (gcTimeMonitor != null) {
      rb.addCounter(GcTimePercentage,
          gcTimeMonitor.getLatestGcData().getGcTimePercentage());
    }
{code}

Replacing {{addCounter}} with {{addGauge}} is trivial. [~xiaochen] how (in 
which ticket) would you recommend to submit a patch for this change?

> Add GC time percentage monitor/alerter
> --------------------------------------
>
>                 Key: HADOOP-14960
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14960
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Misha Dmitriev
>            Assignee: Misha Dmitriev
>             Fix For: 3.0.0, 2.10.0
>
>         Attachments: HADOOP-14960.01.patch, HADOOP-14960.02.patch, 
> HADOOP-14960.03.patch, HADOOP-14960.04.patch
>
>
> Currently class {{org.apache.hadoop.metrics2.source.JvmMetrics}} provides 
> several metrics related to GC. Unfortunately, all these metrics are not as 
> useful as they could be, because they don't answer the first and most 
> important question related to GC and JVM health: what percentage of time my 
> JVM is paused in GC? This percentage, calculated as the sum of the GC pauses 
> over some period, like 1 minute, divided by that period - is the most 
> convenient measure of the GC health because:
> - it is just one number, and it's clear that, say, 1..5% is good, but 80..90% 
> is really bad
> - it allows for easy apple-to-apple comparison between runs, even between 
> different apps
> - when this metric reaches some critical value like 70%, it almost always 
> indicates a "GC death spiral", from which the app can recover only if it 
> drops some task(s) etc.
> The existing "total GC time", "total number of GCs" etc. metrics only give 
> numbers that can be used to rougly estimate this percentage. Thus it is 
> suggested to add a new metric to this class, and possibly allow users to 
> register handlers that will be automatically invoked if this metric reaches 
> the specified threshold.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to