[ 
https://issues.apache.org/jira/browse/HDFS-8873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14906603#comment-14906603
 ] 

Nathan Roberts commented on HDFS-8873:
--------------------------------------

Thanks [~templedf]. I like that the stopwatch class makes this much cleaner. 
Just a couple of comments:
- Shouldn't the isInterrupted() check throw an InterruptedException? Otherwise 
won't we just break out of one level? It would probably be good to test 
shutdown on an actual cluster if possible because you're exactly right that we 
could be in here a long time and it would be good to make sure we don't affect 
shutdown of the datanode. This has been a problem in the past and can have a 
serious impact on rolling upgrades.
- nit but I find markRunning() and markWaiting() confusing (seem backwards to 
me because we call markRunning() just before going to sleep).
- I'm kind of wondering if we should disallow extremely low duty cycles. Seems 
like it could take close to 24 hours with a minimum setting. A minimum of 20% 
should keep us within an hour.

> throttle directoryScanner
> -------------------------
>
>                 Key: HDFS-8873
>                 URL: https://issues.apache.org/jira/browse/HDFS-8873
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.7.1
>            Reporter: Nathan Roberts
>            Assignee: Daniel Templeton
>         Attachments: HDFS-8873.001.patch, HDFS-8873.002.patch, 
> HDFS-8873.003.patch, HDFS-8873.004.patch, HDFS-8873.005.patch, 
> HDFS-8873.006.patch, HDFS-8873.007.patch, HDFS-8873.008.patch
>
>
> The new 2-level directory layout can make directory scans expensive in terms 
> of disk seeks (see HDFS-8791) for details. 
> It would be good if the directoryScanner() had a configurable duty cycle that 
> would reduce its impact on disk performance (much like the approach in 
> HDFS-8617). 
> Without such a throttle, disks can go 100% busy for many minutes at a time 
> (assuming the common case of all inodes in cache but no directory blocks 
> cached, 64K seeks are required for full directory listing which translates to 
> 655 seconds) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to