Xiaoyu Yao created HDFS-13136:
---------------------------------

             Summary: Avoid taking FSN lock while doing group member lookup for 
FSD permission check
                 Key: HDFS-13136
                 URL: https://issues.apache.org/jira/browse/HDFS-13136
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: namenode
            Reporter: Xiaoyu Yao
            Assignee: Xiaoyu Yao


Namenode has FSN lock and FSD lock. Most of the namenode operations need to 
take FSN lock first and then FSD lock.  The permission check is done via 
FSPermissionChecker at FSD layer assuming FSN lock is taken. 

The FSPermissionChecker constructor invokes callerUgi.getGroups() that can take 
seconds sometimes. There are external cache scheme such SSSD and internal cache 
scheme for group lookup. However, the delay could still occur during cache 
refresh, which causes severe FSN lock contentions and unresponsive namenode 
issues.

Checking the current code, we found that getBlockLocations(..) did it right but 
some methods such as getFileInfo(..), getContentSummary(..) did it wrong. This 
ticket is open to ensure the group lookup for permission checker is outside the 
FSN lock.  
 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to