[ https://issues.apache.org/jira/browse/HDFS-13136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16365193#comment-16365193 ]
Tsz Wo Nicholas Sze commented on HDFS-13136: -------------------------------------------- Thanks for the update! +1 on the 002 patch. > Avoid taking FSN lock while doing group member lookup for FSD permission check > ------------------------------------------------------------------------------ > > Key: HDFS-13136 > URL: https://issues.apache.org/jira/browse/HDFS-13136 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Reporter: Xiaoyu Yao > Assignee: Xiaoyu Yao > Priority: Major > Attachments: HDFS-13136.001.patch, HDFS-13136.002.patch > > > Namenode has FSN lock and FSD lock. Most of the namenode operations need to > take FSN lock first and then FSD lock. The permission check is done via > FSPermissionChecker at FSD layer assuming FSN lock is taken. > The FSPermissionChecker constructor invokes callerUgi.getGroups() that can > take seconds sometimes. There are external cache scheme such SSSD and > internal cache scheme for group lookup. However, the delay could still occur > during cache refresh, which causes severe FSN lock contentions and > unresponsive namenode issues. > Checking the current code, we found that getBlockLocations(..) did it right > but some methods such as getFileInfo(..), getContentSummary(..) did it wrong. > This ticket is open to ensure the group lookup for permission checker is > outside the FSN lock. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org