[ 
https://issues.apache.org/jira/browse/HDFS-8873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903507#comment-14903507
 ] 

Colin Patrick McCabe commented on HDFS-8873:
--------------------------------------------

bq. I didn't do that in this patch because I didn't think the 1000 was as 
prominent as before, but it appears that was before I was done adding stuff. 
I'll put it back. I'd love to put that constant somewhere like util.Time. Would 
that be kosher?

I think it's fine to put it in the DirectoryScanner itself if you want.  I 
don't object to putting it in time either.  Up to you.

bq. I was shooting for something that would be meaningful to someone who 
doesn't know the code. What about "throttle limit," since that echoes the 
config param?

Sure.

bq. I don't follow. (nowMs % 1000) has to be between 0 and 999. If it's less 
than the throttle limit, we won't enter the loop. The throttle limit must be 
between 1 and 1000. (Anything else gets set to 1000 when the scanner is 
created.) The sleep must therefore be for between 1 and 999 ms, pretty much 
guaranteeing a different result the next time around.

Let's say we start the loop at time 5200.  Then {{while (nowMs % 1000L > 
throttleLimitMsPerSec)}} returns true (let's say {{throttleLimitMsPerSec = 
100}}).

We call sleep with an argument of 800, but sleep actually sleeps for 1000 ms 
instead.  (Remember, Thread#sleep may always sleep for longer than requested.)  
nowMs becomes 6200.  Now {{while (nowMs % 1000L > throttleLimitMsPerSec) }} 
returns true again, since 6200 % 1000 = 200 > 100.  Now we sleep again for 800 
ms yet again.  We completely missed our timeslice, and there's no guarantee 
that we'll pick up the next one either.  That's the bug.

> throttle directoryScanner
> -------------------------
>
>                 Key: HDFS-8873
>                 URL: https://issues.apache.org/jira/browse/HDFS-8873
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.7.1
>            Reporter: Nathan Roberts
>            Assignee: Daniel Templeton
>         Attachments: HDFS-8873.001.patch, HDFS-8873.002.patch, 
> HDFS-8873.003.patch, HDFS-8873.004.patch, HDFS-8873.005.patch
>
>
> The new 2-level directory layout can make directory scans expensive in terms 
> of disk seeks (see HDFS-8791) for details. 
> It would be good if the directoryScanner() had a configurable duty cycle that 
> would reduce its impact on disk performance (much like the approach in 
> HDFS-8617). 
> Without such a throttle, disks can go 100% busy for many minutes at a time 
> (assuming the common case of all inodes in cache but no directory blocks 
> cached, 64K seeks are required for full directory listing which translates to 
> 655 seconds) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to