[ https://issues.apache.org/jira/browse/HDFS-8873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901136#comment-14901136 ]
Daniel Templeton commented on HDFS-8873: ---------------------------------------- The scanjob queue is indeed ignoring volume when selecting the next job. I was considering the case where there are volumes of greatly differing sizes, in which case not binding a thread to a volume will result in a better distribution of the load. That's also true when the number of threads exceeds the number of volumes. That said, the point of the JIRA was not to change the load profile of the directory scanner; it was just to insert a throttle. I'll post a changeset with a reduced scope shortly. > throttle directoryScanner > ------------------------- > > Key: HDFS-8873 > URL: https://issues.apache.org/jira/browse/HDFS-8873 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode > Affects Versions: 2.7.1 > Reporter: Nathan Roberts > Assignee: Daniel Templeton > Attachments: HDFS-8873.001.patch, HDFS-8873.002.patch, > HDFS-8873.003.patch, HDFS-8873.004.patch > > > The new 2-level directory layout can make directory scans expensive in terms > of disk seeks (see HDFS-8791) for details. > It would be good if the directoryScanner() had a configurable duty cycle that > would reduce its impact on disk performance (much like the approach in > HDFS-8617). > Without such a throttle, disks can go 100% busy for many minutes at a time > (assuming the common case of all inodes in cache but no directory blocks > cached, 64K seeks are required for full directory listing which translates to > 655 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)