[ 
https://issues.apache.org/jira/browse/HDFS-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13582479#comment-13582479
 ] 

Suresh Srinivas commented on HDFS-4222:
---------------------------------------

bq. Nice change. I've quickly scanned the patch and checkSuperuserPrivilege 
internally instantiates a permission checker, which appears to be getting 
called within the namespace lock in a number of places too.
Good catch.

bq. To reduce the size of the patch, would it maybe make sense to use a 
thread-local permission checker singleton? Ie. FsPermissionChecker.init() sets 
the thread-local for the current user, and then 
FsPermissionChecker.getInstance() instead of new FsPermissionChecker returns 
the singleton? Just a thought.
Not sure how to make this work. When does thread local variable get initialized 
and when is it cleared, given a thread gets used for different current users?

bq. Another thought might be an option to tell a UGI to "lock-in" it's group 
list. Something earlier on at a high level, maybe the NN's RPC server, could 
call UserGroupInformation.getCurrentUser().lockGroups().
Not sure I understood this.
                
> NN is unresponsive and lose heartbeats of DNs when Hadoop is configured to 
> use LDAP and LDAP has issues
> -------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-4222
>                 URL: https://issues.apache.org/jira/browse/HDFS-4222
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 1.0.0, 0.23.3, 2.0.0-alpha
>            Reporter: Xiaobo Peng
>            Assignee: Xiaobo Peng
>            Priority: Minor
>         Attachments: hdfs-4222-branch-0.23.3.patch, HDFS-4222.patch, 
> hdfs-4222-release-1.0.3.patch
>
>
> For Hadoop clusters configured to access directory information by LDAP, the 
> FSNamesystem calls on behave of DFS clients might hang due to LDAP issues 
> (including LDAP access issues caused by networking issues) while holding the 
> single lock of FSNamesystem. That will result in the NN unresponsive and loss 
> of the heartbeats from DNs.
> The places LDAP got accessed by FSNamesystem calls are the instantiation of 
> FSPermissionChecker, which could be moved out of the lock scope since the 
> instantiation does not need the FSNamesystem lock. After the move, a DFS 
> client hang will not affect other threads by hogging the single lock. This is 
> especially helpful when we use separate RPC servers for ClientProtocol and 
> DatanodeProtocol since the calls for DatanodeProtocol do not need to access 
> LDAP. So even if DFS clients hang due to LDAP issues, the NN will still be 
> able to process the requests (including heartbeats) from DNs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to