[ 
https://issues.apache.org/jira/browse/HADOOP-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15361880#comment-15361880
 ] 

Yongjun Zhang commented on HADOOP-13339:
----------------------------------------

Hi [~ozawa],

Thanks for pointing me to HADOOP-11361.

I was about to update here:

Per the trace stack I'm seeing:
{code}
Caused by: java.lang.NullPointerException
        at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateAttrCache(MetricsSourceAdapter.java:260)
        at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:184)
        at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:155)
        at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBeanInfo(DefaultMBeanServerInterceptor.java:1378)
{code}

The relevant code is

{code}
  private void updateJmxCache() {
    boolean getAllMetrics = false;
    synchronized(this) {
      if (Time.now() - jmxCacheTS >= jmxCacheTTL) {
        // temporarilly advance the expiry while updating the cache
        jmxCacheTS = Time.now() + jmxCacheTTL;
        // lastRecs might have been set to an object already by another thread.
        // Track the fact that lastRecs has been reset once to make sure refresh
        // is correctly triggered.
        if (lastRecsCleared) {
          getAllMetrics = true;
          lastRecsCleared = false;
        }
      }
      else {
        return;
      }
    }

    if (getAllMetrics) {
      MetricsCollectorImpl builder = new MetricsCollectorImpl();
      getMetrics(builder, true); <== This is where lastRecs is created/assigned 
non-NULL
    }

    synchronized(this) {
      updateAttrCache(); <=== This is where the NPE happened
      if (getAllMetrics) {
        updateInfoCache();
      }
      jmxCacheTS = Time.now();
      lastRecs = null;  // in case regular interval update is not running <==  
This is where lastRecs assigned NULL
      lastRecsCleared = true;
    }
  }
{code}

If one thread enters the above method with {{Time.now() - jmxCacheTS >= 
jmxCacheTTL}} being false, then {{getAllMetrics}} will continue to be false, 
then when {{updateAttrCache()}} is called, it will hit NULL {{lastRecs}}

Even a single thread like this would hit the issue. So it doesn't seem a pure 
synchronization issue.  And the code prior to HADOOP-12482 appear to have the 
same issue.

Brief look at HADOOP-12482, the logic seems fine, except the hole pointed out 
here. While we need to examine the synchronization a bit further, have a 
checking in MetricsSourceAdapter#updateAttrCache (so not to access it when 
lastRecs is NULL) seems reasonable.

What do you think?

Thanks.



> MetricsSourceAdapter#updateAttrCache may throw NPE due to NULL lastRecs
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-13339
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13339
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>
> The for loop below may find lastRecs NULL
> {code}
>   private int updateAttrCache() {
>     LOG.debug("Updating attr cache...");
>     int recNo = 0;
>     int numMetrics = 0;
>     for (MetricsRecordImpl record : lastRecs) {
>       for (MetricsTag t : record.tags()) {
>         setAttrCacheTag(t, recNo);
>         ++numMetrics;
>       }
>       for (AbstractMetric m : record.metrics()) {
>         setAttrCacheMetric(m, recNo);
>         ++numMetrics;
>       }
>       ++recNo;
>     }
>     LOG.debug("Done. # tags & metrics="+ numMetrics);
>     return numMetrics;
>   }
> {code}
> and throws NPE. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to