[ 
https://issues.apache.org/jira/browse/HADOOP-7630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108802#comment-13108802
 ] 

Eric Yang commented on HADOOP-7630:
-----------------------------------

bq. Are you saying simon aggregator could not process less than 1k udp packets 
per second?

No, that is not what I was saying.  On all production cluster, on the status 
page, it shows 93% packets lost for disk metrics.  Disk metrics are emitted per 
disk.  On a typical 2000 nodes cluster, there used to be 4 disk, which turns 
out to be 8k metrics per 5 seconds.  Single simon aggregator has problem to 
handle aggregation load at this scale.  Hadoop metrics is supposedly smaller 
than system metrics, but multiply the type of metrics (jvm, roc, mapped, hdfs), 
the number of output udp packets would reach the same scale of disk metrics, if 
something is not done to reduce the repeated noise.

bq. I'm sure you meant simon aggregator.

No I mean the simon plugin, we want the gauge like metrics to be in sync at the 
source (MetricsContext) as well as the plugins.  Internally in simon 
aggregator, it will use the last know value, or calculate the missing gap, if 
there is packet lost.  I wrote the code to handle missing udp packets for Simon 
aggregator per management's request.

bq. My point is that you should not change the current default that has 
potential impact on production monitoring without actually testing it at scale.

This configuration has been verified to be working at 40 nodes scale.  I am 
sure that it would not cause any harm but reduce the potential breaking point.

> hadoop-metrics2.properties should have a property *.period set to a default 
> value foe metrics
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-7630
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7630
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: conf
>            Reporter: Arpit Gupta
>            Assignee: Eric Yang
>             Fix For: 0.20.205.0, 0.23.0
>
>         Attachments: HADOOP-7630-trunk.patch, HADOOP-7630.patch
>
>
> currently the hadoop-metrics2.properties file does not have a value set for 
> *.period
> This property is useful for metrics to determine when the property will 
> refresh. We should set it to default of 60

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to