[ https://issues.apache.org/jira/browse/EAGLE-97?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15074608#comment-15074608 ]
Libin, Sun commented on EAGLE-97: --------------------------------- In GC log monitoring, the key information is stop-the-world event The following is some sample log from ParaNew GC, CMS GC & G1 GC {code:title=ParaNew GC sample log} 2014-06-04T22:21:19.158-0700: 9.952: [GC 9.952: [ParNew: 2365777K->5223K(2831168K), 0.0155080 secs] 2365777K->5223K(100348736K), 0.0156030 secs] [Times: user=0.08 sys=0.05, real=0.02 secs] {code} {code:title=CMS GC sample log} 2014-06-04T22:47:31.218-0700: 1582.012: [GC [1 CMS-initial-mark: 78942227K(97517568K)] 79264643K(100348736K), 0.2334170 secs] [Times: user=0.23 sys=0.00, real=0.24 secs] 2014-06-04T22:49:50.603-0700: 1721.397: [GC[YG occupancy: 2777944 K (2831168 K)]1721.398: [Rescan (parallel) , 0.1706730 secs]1721.568: [weak refs processing, 0.0156130 secs] [1 CMS-remark: 83730081K(97517568K)] 86508026K(100348736K), 0.1868130 secs] [Times: user=3.04 sys=0.01, real=0.18 secs] {code} {code:title=G1 GC sample log} 0.522: [GC pause (young), 0.15877971 secs] 1.730: [GC pause (mixed), 0.32714353 secs] [Eden: 12M(12M)->0B(13M) Survivors: 0B->2048K Heap: 14M(64M)->9739K(64M)] {code} >From the log, we can extract the key information as the following stream >definition for alert engine/metric generator to process {code:title=gc log stream definition} timestamp long eventType string pausedGCTimeSec double youngAreaGCed boolean youngUsedHeapK long youngTotalHeapK long tenuredAreaGCed boolean tenuredUsedHeapK long tenuredTotalHeapK long permAreaGCed boolean permUsedHeapK long permTotalHeapK long totalHeapUsageAvailable boolean usedTotalHeapK long totalHeapK long logLine string {code} > Enable GC Log monitoring for important service like hadoop namenode > ------------------------------------------------------------------- > > Key: EAGLE-97 > URL: https://issues.apache.org/jira/browse/EAGLE-97 > Project: Eagle > Issue Type: New Feature > Affects Versions: 0.3.0 > Reporter: Libin, Sun > Assignee: Libin, Sun > > Garbage Collection Monitoring refers to the process of figuring out how JVM > is running GC. > When GC happened, JVM will stop the application from running to execute a GC, > every thread except for the threads needed for the GC will stop their tasks. > The interrupted tasks will resume only after the GC task has completed, the > stop interval is known as "stop-the-world" > For service like namenode, GC will affect the performance, especially full > GC, we should avoid full GC and if full GC happened, we should detected it > ASAP and sent out alert -- This message was sent by Atlassian JIRA (v6.3.4#6332)