[ https://issues.apache.org/jira/browse/HDFS-8873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14744628#comment-14744628 ]
Colin Patrick McCabe commented on HDFS-8873: -------------------------------------------- bq. Encapsulating the throttle stuff requires a reasonable abstraction of what a throttle is. The various kinds of throttles (time, file, iops, ...) are all pretty different and aren't easy to overlay with a single abstraction. I decided to give up on the idea of making the throttle type selectable. The limit therefore always means the same thing, and so I think it's fair to leave it's name as is. I agree that it is probably premature to have a single Throttle base class that can do all the various possible things you might want a Throttle to do. But it doesn't follow from that that we need to give up on the idea of making the throttle type selectable in the future. Also, configuration keys which are specified in terms of milliseconds should always end in ms. It causes a huge amount of confusion when times come without units. It is obvious to you-- the author of the code-- that it is in milliseconds, but it is not obvious to users or other developers. Why not just have a single class called {{TimeBasedThrottle}} which does whatever you want your time-based throttle to do? You can make it a standalone class that doesn't extend or implement anything, and even create an instance of it that does nothing when throttling is turned off. But keep the flexible configuration mechanism that we discussed earlier. That way, if someone wants to do something more elaborate later, they can. {{hadoop-hdfs-project/hadoop-hdfs/now}}: did you intend to put this in the patch? {code} 397 public static final String DFS_DATANODE_DIRECTORYSCAN_THROTTLE_LIMIT_KEY = 398 "dfs.datanode.directoryscan.throttle.limit"; {code} Should end with "ms" to indicate the limit is in milliseconds. Or ideally "ms.per.sec" {code} - if (!retainDiffs) clear(); + + if (!retainDiffs) { + clear(); + } {code} Can we move small whitespace cleanups like this to a follow-on change? It just makes backports a pain because it creates unnecessary conflicts, and tends to obscure what this JIRA is about when people are reading the change log. {code} + } //end for + } //end synchronized + } // end if {code} If we're changing this, then let's get rid of the COBOLisms. A close brace is enough to know that the block is closed. > throttle directoryScanner > ------------------------- > > Key: HDFS-8873 > URL: https://issues.apache.org/jira/browse/HDFS-8873 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode > Affects Versions: 2.7.1 > Reporter: Nathan Roberts > Assignee: Daniel Templeton > Attachments: HDFS-8873.001.patch, HDFS-8873.002.patch > > > The new 2-level directory layout can make directory scans expensive in terms > of disk seeks (see HDFS-8791) for details. > It would be good if the directoryScanner() had a configurable duty cycle that > would reduce its impact on disk performance (much like the approach in > HDFS-8617). > Without such a throttle, disks can go 100% busy for many minutes at a time > (assuming the common case of all inodes in cache but no directory blocks > cached, 64K seeks are required for full directory listing which translates to > 655 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)