[jira] [Resolved] (HDFS-1878) TestHDFSServerPorts unit test failure - race condition in FSNamesystem.close() causes NullPointerException without serious consequence

Matt Foley (JIRA) Wed, 08 Jun 2011 10:30:49 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Matt Foley resolved HDFS-1878.
------------------------------

    Resolution: Fixed

Committed to 0.20-security and 0.20-security-205.

> TestHDFSServerPorts unit test failure - race condition in 
> FSNamesystem.close() causes NullPointerException without serious consequence
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-1878
>                 URL: https://issues.apache.org/jira/browse/HDFS-1878
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20.204.0
>            Reporter: Matt Foley
>            Assignee: Matt Foley
>            Priority: Minor
>             Fix For: 0.20.205.0
>
>         Attachments: 1878-1.patch
>
>
> In 20.204, TestHDFSServerPorts was observed to intermittently throw a 
> NullPointerException.  This only happens when FSNamesystem.close() is called, 
> which means system termination for the Namenode, so this is not a serious bug 
> for .204.  TestHDFSServerPorts is more likely than normal execution to 
> stimulate the race, because it runs two Namenodes in the same JVM, causing 
> more interleaving and more potential to see a race condition.
> The race is in FSNamesystem.close(), line 566, we have:
>       if (replthread != null) replthread.interrupt();
>       if (replmon != null) replmon = null;
> Since the interrupted replthread is not waited on, there is a potential race 
> condition with replmon being nulled before replthread is dead, but replthread 
> references replmon in computeDatanodeWork() where the NullPointerException 
> occurs.
> The solution is either to wait on replthread or just don't null replmon.  The 
> latter is preferred, since none of the sibling Namenode processing threads 
> are waited on in close().
> I'll attach a patch for .205.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-1878) TestHDFSServerPorts unit test failure - race condition in FSNamesystem.close() causes NullPointerException without serious consequence

Reply via email to