[ https://issues.apache.org/jira/browse/HDFS-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matt Foley resolved HDFS-1878. ------------------------------ Resolution: Fixed Committed to 0.20-security and 0.20-security-205. > TestHDFSServerPorts unit test failure - race condition in > FSNamesystem.close() causes NullPointerException without serious consequence > -------------------------------------------------------------------------------------------------------------------------------------- > > Key: HDFS-1878 > URL: https://issues.apache.org/jira/browse/HDFS-1878 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.20.204.0 > Reporter: Matt Foley > Assignee: Matt Foley > Priority: Minor > Fix For: 0.20.205.0 > > Attachments: 1878-1.patch > > > In 20.204, TestHDFSServerPorts was observed to intermittently throw a > NullPointerException. This only happens when FSNamesystem.close() is called, > which means system termination for the Namenode, so this is not a serious bug > for .204. TestHDFSServerPorts is more likely than normal execution to > stimulate the race, because it runs two Namenodes in the same JVM, causing > more interleaving and more potential to see a race condition. > The race is in FSNamesystem.close(), line 566, we have: > if (replthread != null) replthread.interrupt(); > if (replmon != null) replmon = null; > Since the interrupted replthread is not waited on, there is a potential race > condition with replmon being nulled before replthread is dead, but replthread > references replmon in computeDatanodeWork() where the NullPointerException > occurs. > The solution is either to wait on replthread or just don't null replmon. The > latter is preferred, since none of the sibling Namenode processing threads > are waited on in close(). > I'll attach a patch for .205. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira