[jira] [Commented] (HDFS-13904) ContentSummary does not always respect processing limit, resulting in long lock acquisitions

William Montaz (Jira) Mon, 12 May 2025 01:36:04 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950863#comment-17950863
 ]


William Montaz commented on HDFS-13904:
---------------------------------------

Hello,

we also run into this issue and it seems it happens on long directories because 
the lock is not yield when listing files in a directory containing many (indeed 
yield only happens when looking in a subfolder).

Put in other words, summary.yield() is only called in INodeDirectory and 
especially not from INodeFile, thus if a directory contains a lot of files, it 
will never yield the lock leading to long lock acquisition.

> ContentSummary does not always respect processing limit, resulting in long 
> lock acquisitions
> --------------------------------------------------------------------------------------------
>
>                 Key: HDFS-13904
>                 URL: https://issues.apache.org/jira/browse/HDFS-13904
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs, namenode
>            Reporter: Erik Krogen
>            Assignee: Erik Krogen
>            Priority: Major
>
> HDFS-4995 added a config {{dfs.content-summary.limit}} which allows for an 
> administrator to set a limit on the number of entries processed during a 
> single acquisition of the {{FSNamesystemLock}} during the creation of a 
> content summary. This is useful to prevent very long (multiple seconds) 
> pauses on the NameNode when {{getContentSummary}} is called on large 
> directories.
> However, even on versions with HDFS-4995, we have seen warnings like:
> {code}
> INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem read 
> lock held for 9398 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:950)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.readUnlock(FSNamesystemLock.java:188)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.readUnlock(FSNamesystem.java:1486)
> org.apache.hadoop.hdfs.server.namenode.ContentSummaryComputationContext.yield(ContentSummaryComputationContext.java:109)
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeDirectoryContentSummary(INodeDirectory.java:679)
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeContentSummary(INodeDirectory.java:642)
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeDirectoryContentSummary(INodeDirectory.java:656)
> {code}
> happen quite consistently when {{getContentSummary}} was called on a large 
> directory on a heavily-loaded NameNode. Such long pauses completely destroy 
> the performance of the NameNode. We have the limit set to its default of 
> 5000; if it was respected, clearly there would not be a 10-second pause.
> The current {{yield()}} code within {{ContentSummaryComputationContext}} 
> looks like:
> {code}
>   public boolean yield() {
>     // Are we set up to do this?
>     if (limitPerRun <= 0 || dir == null || fsn == null) {
>       return false;
>     }
>     // Have we reached the limit?
>     long currentCount = counts.getFileCount() +
>         counts.getSymlinkCount() +
>         counts.getDirectoryCount() +
>         counts.getSnapshotableDirectoryCount();
>     if (currentCount <= nextCountLimit) {
>       return false;
>     }
>     // Update the next limit
>     nextCountLimit = currentCount + limitPerRun;
>     boolean hadDirReadLock = dir.hasReadLock();
>     boolean hadDirWriteLock = dir.hasWriteLock();
>     boolean hadFsnReadLock = fsn.hasReadLock();
>     boolean hadFsnWriteLock = fsn.hasWriteLock();
>     // sanity check.
>     if (!hadDirReadLock || !hadFsnReadLock || hadDirWriteLock ||
>         hadFsnWriteLock || dir.getReadHoldCount() != 1 ||
>         fsn.getReadHoldCount() != 1) {
>       // cannot relinquish
>       return false;
>     }
>     // unlock
>     dir.readUnlock();
>     fsn.readUnlock("contentSummary");
>     try {
>       Thread.sleep(sleepMilliSec, sleepNanoSec);
>     } catch (InterruptedException ie) {
>     } finally {
>       // reacquire
>       fsn.readLock();
>       dir.readLock();
>     }
>     yieldCount++;
>     return true;
>   }
> {code}
> We believe that this check in particular is the culprit:
> {code}
>     if (!hadDirReadLock || !hadFsnReadLock || hadDirWriteLock ||
>         hadFsnWriteLock || dir.getReadHoldCount() != 1 ||
>         fsn.getReadHoldCount() != 1) {
>       // cannot relinquish
>       return false;
>     }
> {code}
> The content summary computation will only relinquish the lock if it is 
> currently the _only_ holder of the lock. Given the high volume of read 
> requests on a heavily loaded NameNode, especially when unfair locking is 
> enabled, it is likely there may be another holder of the read lock performing 
> some short-lived operation. By refusing to give up the lock in this case, the 
> content summary computation ends up never relinquishing the lock.
> We propose to simply remove the readHoldCount checks from this {{yield()}}. 
> This should alleviate the case described above by giving up the read lock and 
> allowing other short-lived operations to complete (while the content summary 
> thread sleeps) so that the lock can finally be given up completely. This has 
> the drawback that sometimes, the content summary may give up the lock 
> unnecessarily, if the read lock is never actually released by the time the 
> thread continues again. The only negative impact from this is to make some 
> large content summary operations slightly slower, with the tradeoff of 
> reducing NameNode-wide performance impact.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13904) ContentSummary does not always respect processing limit, resulting in long lock acquisitions

Reply via email to