[jira] [Commented] (HDFS-6784) Avoid rescan twice in HDFS CacheReplicationMonitor for one FS Op if it calls setNeedsRescan multiple times.

Colin Patrick McCabe (JIRA) Thu, 31 Jul 2014 11:50:31 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-6784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081264#comment-14081264
 ]


Colin Patrick McCabe commented on HDFS-6784:
--------------------------------------------

bq. If modifyDirective is caling setNeedsRescan multiple times, each time could 
trigger a rescan that completes before the next time.

Sorry, my statement above wasn't quite correct.  Since the rescan needs the FSN 
lock, and the modifyDirective is holding it, the rescan can't complete until 
the modifyDirective is done.

Earlier versions of the {{CRM#rescan}} code did release the FSN lock 
intermittently to allow for greater concurrency.  We changed it to hold the FSN 
lock for the full duration for simplicity.  But we want to start releasing the 
lock inside that function in the future, as I've commented earlier.  So that's 
why I don't like this patch... it relies on the FSN lock being held for the 
whole duration of the rescan.

> Avoid rescan twice in HDFS CacheReplicationMonitor for one FS Op if it calls 
> setNeedsRescan multiple times.
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-6784
>                 URL: https://issues.apache.org/jira/browse/HDFS-6784
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: caching
>    Affects Versions: 3.0.0
>            Reporter: Yi Liu
>            Assignee: Yi Liu
>         Attachments: HDFS-6784.001.patch
>
>
> In HDFS CacheReplicationMonitor,  rescan is expensive. Sometimes, 
> {{setNeedsRescan}} is called multiple times, for example, in 
> FSNamesystem#modifyCacheDirective, there are 3 times. In monitor thread of 
> CacheReplicationMonitor, if it checks {{needsRescan}} is true, rescan will 
> happen, but {{needsRescan}} is set to false before real scan. Meanwhile, the 
> 2nd or 3rd time {{setNeedsResacn}} may set {{needsRescan}} to true. So after 
> the scan finish, in next loop, a new rescan will be triggered, that's not 
> necessary at all and inefficient for rescan twice. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6784) Avoid rescan twice in HDFS CacheReplicationMonitor for one FS Op if it calls setNeedsRescan multiple times.

Reply via email to