[ 
https://issues.apache.org/jira/browse/HDFS-8873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14905222#comment-14905222
 ] 

Daniel Templeton commented on HDFS-8873:
----------------------------------------

bq. Unlike the modulo solution, it does not have a pathological case where we 
never get to run at all despite a non-zero throttle rate.

I don't see the pathological case.  Assume the throttle is 1ms and we always 
oversleep:

.0000 - thread calls throttle(), no block
-- thread runs --
.0422 - thread calls throttle(), sleep for 588ms
.1999 - thread wakes up from oversleep, run limit this second set to 1000ms
-- thread runs (999 < 1000) --
.2190 - thread calls throttle(), sleep for 810ms
.3106 - thread wakes up from oversleep, run limit this second set to 107ms
-- thread runs (106 < 107) --
etc.

The throttle() method is guaranteed to exit when at least (1000 - limit) ms 
have passed and the calling thread is scheduled again.  What am I missing?

As fas as I can tell, the difference between the StopWatch approach and the 
modulo approach is the case when a thread wakes up within (1000 - limit) of the 
end of the second.  In that case, the modulo approach will allow the thread to 
run longer than the StopWatch approach would.  In other words, the modulo 
approach is focused on trying to hold to the per-second duty cycle, and the 
StopWatch approach is trying to hold to the duty cycle regardless of second 
boundaries.  Given that neither the sleep time nor run time is reliable, I 
don't see where it matters much one way or the other.

That said, I did just notice that this patch is broken in the case of spurious 
wake-ups.  I'll have a new patch for that (including tests) shortly.

bq. I'm confused about this code...

Good catch.  Copy-paste error, grabbed the wrong key.

> throttle directoryScanner
> -------------------------
>
>                 Key: HDFS-8873
>                 URL: https://issues.apache.org/jira/browse/HDFS-8873
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.7.1
>            Reporter: Nathan Roberts
>            Assignee: Daniel Templeton
>         Attachments: HDFS-8873.001.patch, HDFS-8873.002.patch, 
> HDFS-8873.003.patch, HDFS-8873.004.patch, HDFS-8873.005.patch, 
> HDFS-8873.006.patch
>
>
> The new 2-level directory layout can make directory scans expensive in terms 
> of disk seeks (see HDFS-8791) for details. 
> It would be good if the directoryScanner() had a configurable duty cycle that 
> would reduce its impact on disk performance (much like the approach in 
> HDFS-8617). 
> Without such a throttle, disks can go 100% busy for many minutes at a time 
> (assuming the common case of all inodes in cache but no directory blocks 
> cached, 64K seeks are required for full directory listing which translates to 
> 655 seconds) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to