[ 
https://issues.apache.org/jira/browse/HDFS-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16607627#comment-16607627
 ] 

Arpit Agarwal commented on HDFS-13904:
--------------------------------------

Nice find [~xkrogen]. Yes I think we can remove those non-deterministic checks.

> ContentSummary does not always respect processing limit, resulting in long 
> lock acquisitions
> --------------------------------------------------------------------------------------------
>
>                 Key: HDFS-13904
>                 URL: https://issues.apache.org/jira/browse/HDFS-13904
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs, namenode
>            Reporter: Erik Krogen
>            Assignee: Erik Krogen
>            Priority: Major
>
> HDFS-4995 added a config {{dfs.content-summary.limit}} which allows for an 
> administrator to set a limit on the number of entries processed during a 
> single acquisition of the {{FSNamesystemLock}} during the creation of a 
> content summary. This is useful to prevent very long (multiple seconds) 
> pauses on the NameNode when {{getContentSummary}} is called on large 
> directories.
> However, even on versions with HDFS-4995, we have seen warnings like:
> {code}
> INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem read 
> lock held for 9398 ms via
> java.lang.Thread.getStackTrace(Thread.java:1552)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:950)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.readUnlock(FSNamesystemLock.java:188)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.readUnlock(FSNamesystem.java:1486)
> org.apache.hadoop.hdfs.server.namenode.ContentSummaryComputationContext.yield(ContentSummaryComputationContext.java:109)
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeDirectoryContentSummary(INodeDirectory.java:679)
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeContentSummary(INodeDirectory.java:642)
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.computeDirectoryContentSummary(INodeDirectory.java:656)
> {code}
> happen quite consistently when {{getContentSummary}} was called on a large 
> directory on a heavily-loaded NameNode. Such long pauses completely destroy 
> the performance of the NameNode. We have the limit set to its default of 
> 5000; if it was respected, clearly there would not be a 10-second pause.
> The current {{yield()}} code within {{ContentSummaryComputationContext}} 
> looks like:
> {code}
>   public boolean yield() {
>     // Are we set up to do this?
>     if (limitPerRun <= 0 || dir == null || fsn == null) {
>       return false;
>     }
>     // Have we reached the limit?
>     long currentCount = counts.getFileCount() +
>         counts.getSymlinkCount() +
>         counts.getDirectoryCount() +
>         counts.getSnapshotableDirectoryCount();
>     if (currentCount <= nextCountLimit) {
>       return false;
>     }
>     // Update the next limit
>     nextCountLimit = currentCount + limitPerRun;
>     boolean hadDirReadLock = dir.hasReadLock();
>     boolean hadDirWriteLock = dir.hasWriteLock();
>     boolean hadFsnReadLock = fsn.hasReadLock();
>     boolean hadFsnWriteLock = fsn.hasWriteLock();
>     // sanity check.
>     if (!hadDirReadLock || !hadFsnReadLock || hadDirWriteLock ||
>         hadFsnWriteLock || dir.getReadHoldCount() != 1 ||
>         fsn.getReadHoldCount() != 1) {
>       // cannot relinquish
>       return false;
>     }
>     // unlock
>     dir.readUnlock();
>     fsn.readUnlock("contentSummary");
>     try {
>       Thread.sleep(sleepMilliSec, sleepNanoSec);
>     } catch (InterruptedException ie) {
>     } finally {
>       // reacquire
>       fsn.readLock();
>       dir.readLock();
>     }
>     yieldCount++;
>     return true;
>   }
> {code}
> We believe that this check in particular is the culprit:
> {code}
>     if (!hadDirReadLock || !hadFsnReadLock || hadDirWriteLock ||
>         hadFsnWriteLock || dir.getReadHoldCount() != 1 ||
>         fsn.getReadHoldCount() != 1) {
>       // cannot relinquish
>       return false;
>     }
> {code}
> The content summary computation will only relinquish the lock if it is 
> currently the _only_ holder of the lock. Given the high volume of read 
> requests on a heavily loaded NameNode, especially when unfair locking is 
> enabled, it is likely there may be another holder of the read lock performing 
> some short-lived operation. By refusing to give up the lock in this case, the 
> content summary computation ends up never relinquishing the lock.
> We propose to simply remove the readHoldCount checks from this {{yield()}}. 
> This should alleviate the case described above by giving up the read lock and 
> allowing other short-lived operations to complete (while the content summary 
> thread sleeps) so that the lock can finally be given up completely. This has 
> the drawback that sometimes, the content summary may give up the lock 
> unnecessarily, if the read lock is never actually released by the time the 
> thread continues again. The only negative impact from this is to make some 
> large content summary operations slightly slower, with the tradeoff of 
> reducing NameNode-wide performance impact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to