[ 
https://issues.apache.org/jira/browse/HDFS-8873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903415#comment-14903415
 ] 

Daniel Templeton commented on HDFS-8873:
----------------------------------------

bq. Can we have a constant here for MS_PER_SEC? I think I commented on this 
earlier

I didn't do that in this patch because I didn't think the 1000 was as prominent 
as before, but it appears that was before I was done adding stuff.  I'll put it 
back.  I'd love to put that constant somewhere like util.Time.  Would that be 
kosher?  Or better keep it low profile and leave it local to DirectoryScanner?  
I notice there's already HdfsClientConfigKeys.SECOND, but that would introduce 
an pointless dependency.  May the best answer is to keep it local and file a 
JIRA to consolidate them under util.Time?

bq. Maybe say "throttle" instead of "run limit"?

I was shooting for something that would be meaningful to someone who doesn't 
know the code.  What about "throttle limit," since that echoes the config param?

bq. Does this need to be an object, or can it be a primitive?

Ha. Evolutionary mistake. I'll fix it.

bq. This logic seems flawed.

I don't follow.  (nowMs % 1000) has to be between 0 and 999.  If it's less than 
the throttle limit, we won't enter the loop.  The throttle limit must be 
between 1 and 1000.  (Anything else gets set to 1000 when the scanner is 
created.)  The sleep must therefore be for between 1 and 999 ms, pretty much 
guaranteeing a different result the next time around.

> throttle directoryScanner
> -------------------------
>
>                 Key: HDFS-8873
>                 URL: https://issues.apache.org/jira/browse/HDFS-8873
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.7.1
>            Reporter: Nathan Roberts
>            Assignee: Daniel Templeton
>         Attachments: HDFS-8873.001.patch, HDFS-8873.002.patch, 
> HDFS-8873.003.patch, HDFS-8873.004.patch, HDFS-8873.005.patch
>
>
> The new 2-level directory layout can make directory scans expensive in terms 
> of disk seeks (see HDFS-8791) for details. 
> It would be good if the directoryScanner() had a configurable duty cycle that 
> would reduce its impact on disk performance (much like the approach in 
> HDFS-8617). 
> Without such a throttle, disks can go 100% busy for many minutes at a time 
> (assuming the common case of all inodes in cache but no directory blocks 
> cached, 64K seeks are required for full directory listing which translates to 
> 655 seconds) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to