[ https://issues.apache.org/jira/browse/HADOOP-7630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108802#comment-13108802 ]
Eric Yang commented on HADOOP-7630: ----------------------------------- bq. Are you saying simon aggregator could not process less than 1k udp packets per second? No, that is not what I was saying. On all production cluster, on the status page, it shows 93% packets lost for disk metrics. Disk metrics are emitted per disk. On a typical 2000 nodes cluster, there used to be 4 disk, which turns out to be 8k metrics per 5 seconds. Single simon aggregator has problem to handle aggregation load at this scale. Hadoop metrics is supposedly smaller than system metrics, but multiply the type of metrics (jvm, roc, mapped, hdfs), the number of output udp packets would reach the same scale of disk metrics, if something is not done to reduce the repeated noise. bq. I'm sure you meant simon aggregator. No I mean the simon plugin, we want the gauge like metrics to be in sync at the source (MetricsContext) as well as the plugins. Internally in simon aggregator, it will use the last know value, or calculate the missing gap, if there is packet lost. I wrote the code to handle missing udp packets for Simon aggregator per management's request. bq. My point is that you should not change the current default that has potential impact on production monitoring without actually testing it at scale. This configuration has been verified to be working at 40 nodes scale. I am sure that it would not cause any harm but reduce the potential breaking point. > hadoop-metrics2.properties should have a property *.period set to a default > value foe metrics > --------------------------------------------------------------------------------------------- > > Key: HADOOP-7630 > URL: https://issues.apache.org/jira/browse/HADOOP-7630 > Project: Hadoop Common > Issue Type: Bug > Components: conf > Reporter: Arpit Gupta > Assignee: Eric Yang > Fix For: 0.20.205.0, 0.23.0 > > Attachments: HADOOP-7630-trunk.patch, HADOOP-7630.patch > > > currently the hadoop-metrics2.properties file does not have a value set for > *.period > This property is useful for metrics to determine when the property will > refresh. We should set it to default of 60 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira