[ https://issues.apache.org/jira/browse/HADOOP-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15361880#comment-15361880 ]
Yongjun Zhang commented on HADOOP-13339: ---------------------------------------- Hi [~ozawa], Thanks for pointing me to HADOOP-11361. I was about to update here: Per the trace stack I'm seeing: {code} Caused by: java.lang.NullPointerException at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateAttrCache(MetricsSourceAdapter.java:260) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:184) at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:155) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBeanInfo(DefaultMBeanServerInterceptor.java:1378) {code} The relevant code is {code} private void updateJmxCache() { boolean getAllMetrics = false; synchronized(this) { if (Time.now() - jmxCacheTS >= jmxCacheTTL) { // temporarilly advance the expiry while updating the cache jmxCacheTS = Time.now() + jmxCacheTTL; // lastRecs might have been set to an object already by another thread. // Track the fact that lastRecs has been reset once to make sure refresh // is correctly triggered. if (lastRecsCleared) { getAllMetrics = true; lastRecsCleared = false; } } else { return; } } if (getAllMetrics) { MetricsCollectorImpl builder = new MetricsCollectorImpl(); getMetrics(builder, true); <== This is where lastRecs is created/assigned non-NULL } synchronized(this) { updateAttrCache(); <=== This is where the NPE happened if (getAllMetrics) { updateInfoCache(); } jmxCacheTS = Time.now(); lastRecs = null; // in case regular interval update is not running <== This is where lastRecs assigned NULL lastRecsCleared = true; } } {code} If one thread enters the above method with {{Time.now() - jmxCacheTS >= jmxCacheTTL}} being false, then {{getAllMetrics}} will continue to be false, then when {{updateAttrCache()}} is called, it will hit NULL {{lastRecs}} Even a single thread like this would hit the issue. So it doesn't seem a pure synchronization issue. And the code prior to HADOOP-12482 appear to have the same issue. Brief look at HADOOP-12482, the logic seems fine, except the hole pointed out here. While we need to examine the synchronization a bit further, have a checking in MetricsSourceAdapter#updateAttrCache (so not to access it when lastRecs is NULL) seems reasonable. What do you think? Thanks. > MetricsSourceAdapter#updateAttrCache may throw NPE due to NULL lastRecs > ----------------------------------------------------------------------- > > Key: HADOOP-13339 > URL: https://issues.apache.org/jira/browse/HADOOP-13339 > Project: Hadoop Common > Issue Type: Bug > Reporter: Yongjun Zhang > Assignee: Yongjun Zhang > > The for loop below may find lastRecs NULL > {code} > private int updateAttrCache() { > LOG.debug("Updating attr cache..."); > int recNo = 0; > int numMetrics = 0; > for (MetricsRecordImpl record : lastRecs) { > for (MetricsTag t : record.tags()) { > setAttrCacheTag(t, recNo); > ++numMetrics; > } > for (AbstractMetric m : record.metrics()) { > setAttrCacheMetric(m, recNo); > ++numMetrics; > } > ++recNo; > } > LOG.debug("Done. # tags & metrics="+ numMetrics); > return numMetrics; > } > {code} > and throws NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org