[ https://issues.apache.org/jira/browse/HDFS-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502521#comment-13502521 ]
Xiaobo Peng commented on HDFS-4222: ----------------------------------- Sorry the former comment did not format well. I'm trying to format it now. The following code snippets show a simple way to change FSNamesystem::renameTo in branch-0.23.4. Changes to other methods are similar. //////////// existent code {code:borderStyle=solid} /** Rename src to dst */ void renameTo(String src, String dst, Options.Rename... options) throws IOException, UnresolvedLinkException { ... writeLock(); try { renameToInternal(src, dst, options); if (auditLog.isInfoEnabled() && isExternalInvocation()) { resultingStat = dir.getFileInfo(dst, false); } } finally { writeUnlock(); } ... } private void renameToInternal(String src, String dst, Options.Rename... options) throws IOException { ... if (isPermissionEnabled) { checkParentAccess(src, FsAction.WRITE); checkAncestorAccess(dst, FsAction.WRITE); } ... } private FSPermissionChecker checkParentAccess(String path, FsAction access ) throws AccessControlException, UnresolvedLinkException { return checkPermission(path, false, null, access, null, null); } private FSPermissionChecker checkPermission(String path, boolean doCheckOwner, FsAction ancestorAccess, FsAction parentAccess, FsAction access, FsAction subAccess) throws AccessControlException, UnresolvedLinkException { FSPermissionChecker pc = new FSPermissionChecker( fsOwner.getShortUserName(), supergroup); if (!pc.isSuper) { dir.waitForReady(); readLock(); try { pc.checkPermission(path, dir.rootDir, doCheckOwner, ancestorAccess, parentAccess, access, subAccess); } finally { readUnlock(); } } return pc; } {code} //////////// proposed changes {code:borderStyle=solid} /** Rename src to dst */ void renameTo(String src, String dst, Options.Rename... options) throws IOException, UnresolvedLinkException { ... FSPermissionChecker pc = new FSPermissionChecker( fsOwner.getShortUserName(), supergroup); writeLock(); try { renameToInternal(pc, src, dst, options); if (auditLog.isInfoEnabled() && isExternalInvocation()) { resultingStat = dir.getFileInfo(dst, false); } } finally { writeUnlock(); } ... } private void renameToInternal(FSPermissionChecker pc, String src, String dst, Options.Rename... options) throws IOException { ... if (isPermissionEnabled) { checkParentAccess(pc, src, FsAction.WRITE); checkAncestorAccess(pc, dst, FsAction.WRITE); } ... } private FSPermissionChecker checkParentAccess(FSPermissionChecker pc, String path, FsAction access ) throws AccessControlException, UnresolvedLinkException { return checkPermission(pc, path, false, null, access, null, null); } private FSPermissionChecker checkPermission(FSPermissionChecker pc, String path, boolean doCheckOwner, FsAction ancestorAccess, FsAction parentAccess, FsAction access, FsAction subAccess) throws AccessControlException, UnresolvedLinkException { if (!pc.isSuper) { dir.waitForReady(); readLock(); try { pc.checkPermission(path, dir.rootDir, doCheckOwner, ancestorAccess, parentAccess, access, subAccess); } finally { readUnlock(); } } return pc; } {code} > NN is unresponsive and lose hearbeats of DNs when Hadoop is configured to use > LADP and LDAP has issues > ------------------------------------------------------------------------------------------------------ > > Key: HDFS-4222 > URL: https://issues.apache.org/jira/browse/HDFS-4222 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.23.3 > Reporter: Xiaobo Peng > Assignee: Xiaobo Peng > Priority: Minor > > For Hadoop clusters configured to access directory information by LDAP, the > FSNamesystem calls on behave of DFS clients might hang due to LDAP issues > (including LDAP access issues caused by networking issues) while holding the > single lock of FSNamesystem. That will result in the NN unresponsive and loss > of the heartbeats from DNs. > The places LDAP got accessed by FSNamesystem calls are the instantiation of > FSPermissionChecker, which could be moved out of the lock scope since the > instantiation does not need the FSNamesystem lock. After the move, a DFS > client hang will not affect other threads by hogging the single lock. This is > especially helpful when we use separate RPC servers for ClientProtocol and > DatanodeProtocol since the calls for DatanodeProtocol do not need to access > LDAP. So even if DFS clients hang due to LDAP issues, the NN will still be > able to process the requests (including heartbeats) from DNs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira