[ https://issues.apache.org/jira/browse/HDFS-7253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yi Liu resolved HDFS-7253. -------------------------- Resolution: Invalid Thanks Carrey for reporting the issue, mark it as invalid since it's already fixed in HDFS-7045. > getBlockLocationsUpdateTimes missing handle exception may cause fsLock dead > lock > -------------------------------------------------------------------------------- > > Key: HDFS-7253 > URL: https://issues.apache.org/jira/browse/HDFS-7253 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.2.0 > Reporter: Carrey Zhan > Attachments: Tester.java, Trigger.tgz, nn1013.jstack > > > One day my active namenode hanged and I dumped the program stacks by > jstack.In the stacks file, I saw most threads were waiting > FSNamesystem.fsLock, both readLock and writeLock were unacquirable, but no > thread was holding writeLock. > I tried to access the web interface of this namenode but was blocked. and I > tried to failover the active node to another namenode manually (zkfs did not > discover this node was hanging) but it was also failed. So I killed this > namenode trying to recover the production environment, then the failover was > triggered, standby nn transited to active, and then, the new active namenode > hanged. > My following steps are useless and can be ignored. At last, I thought it was > caused by an incorrect lock handling in > FSNamesystem.getBlockLocationsUpdateTimes, which I will describe in the first > comment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)