[ https://issues.apache.org/jira/browse/HDFS-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950863#comment-17950863 ]
William Montaz commented on HDFS-13904: --------------------------------------- Hello, we also run into this issue and it seems it happens on long directories because the lock is not yield when listing files in a directory containing many (indeed yield only happens when looking in a subfolder). Put in other words, summary.yield() is only called in INodeDirectory and especially not from INodeFile, thus if a directory contains a lot of files, it will never yield the lock leading to long lock acquisition. > ContentSummary does not always respect processing limit, resulting in long > lock acquisitions > -------------------------------------------------------------------------------------------- > > Key: HDFS-13904 > URL: https://issues.apache.org/jira/browse/HDFS-13904 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode > Reporter: Erik Krogen > Assignee: Erik Krogen > Priority: Major > > HDFS-4995 added a config {{dfs.content-summary.limit}} which allows for an > administrator to set a limit on the number of entries processed during a > single acquisition of the {{FSNamesystemLock}} during the creation of a > content summary. This is useful to prevent very long (multiple seconds) > pauses on the NameNode when {{getContentSummary}} is called on large > directories. > However, even on versions with HDFS-4995, we have seen warnings like: > {code} > INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem read > lock held for 9398 ms via > java.lang.Thread.getStackTrace(Thread.java:1552) > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:950) > org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.readUnlock(FSNamesystemLock.java:188) > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.readUnlock(FSNamesystem.java:1486) > org.apache.hadoop.hdfs.server.namenode.ContentSummaryComputationContext.yield(ContentSummaryComputationContext.java:109) > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeDirectoryContentSummary(INodeDirectory.java:679) > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeContentSummary(INodeDirectory.java:642) > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeDirectoryContentSummary(INodeDirectory.java:656) > {code} > happen quite consistently when {{getContentSummary}} was called on a large > directory on a heavily-loaded NameNode. Such long pauses completely destroy > the performance of the NameNode. We have the limit set to its default of > 5000; if it was respected, clearly there would not be a 10-second pause. > The current {{yield()}} code within {{ContentSummaryComputationContext}} > looks like: > {code} > public boolean yield() { > // Are we set up to do this? > if (limitPerRun <= 0 || dir == null || fsn == null) { > return false; > } > // Have we reached the limit? > long currentCount = counts.getFileCount() + > counts.getSymlinkCount() + > counts.getDirectoryCount() + > counts.getSnapshotableDirectoryCount(); > if (currentCount <= nextCountLimit) { > return false; > } > // Update the next limit > nextCountLimit = currentCount + limitPerRun; > boolean hadDirReadLock = dir.hasReadLock(); > boolean hadDirWriteLock = dir.hasWriteLock(); > boolean hadFsnReadLock = fsn.hasReadLock(); > boolean hadFsnWriteLock = fsn.hasWriteLock(); > // sanity check. > if (!hadDirReadLock || !hadFsnReadLock || hadDirWriteLock || > hadFsnWriteLock || dir.getReadHoldCount() != 1 || > fsn.getReadHoldCount() != 1) { > // cannot relinquish > return false; > } > // unlock > dir.readUnlock(); > fsn.readUnlock("contentSummary"); > try { > Thread.sleep(sleepMilliSec, sleepNanoSec); > } catch (InterruptedException ie) { > } finally { > // reacquire > fsn.readLock(); > dir.readLock(); > } > yieldCount++; > return true; > } > {code} > We believe that this check in particular is the culprit: > {code} > if (!hadDirReadLock || !hadFsnReadLock || hadDirWriteLock || > hadFsnWriteLock || dir.getReadHoldCount() != 1 || > fsn.getReadHoldCount() != 1) { > // cannot relinquish > return false; > } > {code} > The content summary computation will only relinquish the lock if it is > currently the _only_ holder of the lock. Given the high volume of read > requests on a heavily loaded NameNode, especially when unfair locking is > enabled, it is likely there may be another holder of the read lock performing > some short-lived operation. By refusing to give up the lock in this case, the > content summary computation ends up never relinquishing the lock. > We propose to simply remove the readHoldCount checks from this {{yield()}}. > This should alleviate the case described above by giving up the read lock and > allowing other short-lived operations to complete (while the content summary > thread sleeps) so that the lock can finally be given up completely. This has > the drawback that sometimes, the content summary may give up the lock > unnecessarily, if the read lock is never actually released by the time the > thread continues again. The only negative impact from this is to make some > large content summary operations slightly slower, with the tradeoff of > reducing NameNode-wide performance impact. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org