[ https://issues.apache.org/jira/browse/HDFS-8873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14804846#comment-14804846 ]
Colin Patrick McCabe commented on HDFS-8873: -------------------------------------------- Thanks, [~templedf]. Looking better. The configuration looks good. We had a quick offline discussion where [~templedf] pointed out that we could add a selectable throttle type in the future in a compatible way. So we don't need to have the throttle type selector in this patch. {code} 46 public DirectoryScannerThrottle(int limit) { 47 if ((limit <= 0) || (limit >= 1000)) { {code} We use "1000" a lot in this code. Can we have a constant like {{MS_PER_SEC}} to make its role clearer? {code} 109 public synchronized void cycle() throws InterruptedException { 110 while (!open) { 111 try { 112 wait(); 113 } catch (InterruptedException ex) { 114 // Ignore 115 } 116 } {code} Why are we declaring that the function throws {{InterruptedException}} if we just catch it and do nothing? In {{ReportCompiler#run}}, we should probably at least call {{Thread.currentThread.interrupt}} rather than swallowing the {{InterruptedException}} completely. As described in http://www.ibm.com/developerworks/library/j-jtp05236/ , "even noncancelable tasks should attempt to preserve the interrupted status in case code higher up on the call stack wants to act on the interruption after the noncancelable task completes." {code} 84 @VisibleForTesting 85 final Map<String, Stats> stats = new HashMap<>(); {code} It's not clear from looking at this what the String key is all about. Either document that it is a block pool id in the comment, or rename the variable to {{statsPerBpId}} or similar. {code} 508 int d = 0; // index for blockpoolReport 509 int m = 0; // index for memReprot {code} Rename d to blockReportIdx, m to memReportIdx? I don't normally complain about short variable names, but this loop is somewhat complex. {code} 683 // No need to set an initial capacity because the number of entries will 684 // never be that large {code} All very true, but you could have just initialized it with reports.size() and saved some typing :) ThrottleTask / UnthrottleTask: is there any way we could reuse these objects and avoid generating all that garbage? It seems like they are stateless (although they alter state inside the DirectoryScannerThrottle). We could probably just keep one of each inside the parent class and avoid allocating. {code} long mark = Time.monotonicNow(); {code} Please make this markMs. Same for newMark. {code} 913 // Skip all the files that cycle with block name until 914 // getting to the metafile for the block 915 while ((i + 1 < files.length) && files[i + 1].isFile() 916 && files[i + 1].getName().startsWith(blockFile.getName())) { 917 i++; 918 919 if (isBlockMetaFile(blockFile.getName(), files[i].getName())) { 920 metaFile = files[i]; 921 break; 922 } 923 } {code} I realize you didn't change this in your patch, but this is a potentially very expensive operation depending on the directory size. Perhaps we should sort the array of children at the lowest level, so that we know that the meta file is the next entry, if it exists. We might want to do that in a follow-on jira. > throttle directoryScanner > ------------------------- > > Key: HDFS-8873 > URL: https://issues.apache.org/jira/browse/HDFS-8873 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode > Affects Versions: 2.7.1 > Reporter: Nathan Roberts > Assignee: Daniel Templeton > Attachments: HDFS-8873.001.patch, HDFS-8873.002.patch, > HDFS-8873.003.patch > > > The new 2-level directory layout can make directory scans expensive in terms > of disk seeks (see HDFS-8791) for details. > It would be good if the directoryScanner() had a configurable duty cycle that > would reduce its impact on disk performance (much like the approach in > HDFS-8617). > Without such a throttle, disks can go 100% busy for many minutes at a time > (assuming the common case of all inodes in cache but no directory blocks > cached, 64K seeks are required for full directory listing which translates to > 655 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)