[jira] [Commented] (HDFS-8873) throttle directoryScanner

Colin Patrick McCabe (JIRA) Mon, 14 Sep 2015 18:11:43 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-8873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14744628#comment-14744628
 ]


Colin Patrick McCabe commented on HDFS-8873:
--------------------------------------------

bq. Encapsulating the throttle stuff requires a reasonable abstraction of what 
a throttle is. The various kinds of throttles (time, file, iops, ...) are all 
pretty different and aren't easy to overlay with a single abstraction. I 
decided to give up on the idea of making the throttle type selectable. The 
limit therefore always means the same thing, and so I think it's fair to leave 
it's name as is.

I agree that it is probably premature to have a single Throttle base class that 
can do all the various possible things you might want a Throttle to do.  But it 
doesn't follow from that that we need to give up on the idea of making the 
throttle type selectable in the future.

Also, configuration keys which are specified in terms of milliseconds should 
always end in ms.  It causes a huge amount of confusion when times come without 
units.  It is obvious to you-- the author of the code-- that it is in 
milliseconds, but it is not obvious to users or other developers.

Why not just have a single class called {{TimeBasedThrottle}} which does 
whatever you want your time-based throttle to do?  You can make it a standalone 
class that doesn't extend or implement anything, and even create an instance of 
it that does nothing when throttling is turned off.  But keep the flexible 
configuration mechanism that we discussed earlier.  That way, if someone wants 
to do something more elaborate later, they can.

{{hadoop-hdfs-project/hadoop-hdfs/now}}: did you intend to put this in the 
patch?

{code}
397  public static final String  DFS_DATANODE_DIRECTORYSCAN_THROTTLE_LIMIT_KEY =
398           "dfs.datanode.directoryscan.throttle.limit";
{code}
Should end with "ms" to indicate the limit is in milliseconds.  Or ideally 
"ms.per.sec"

{code}
-    if (!retainDiffs) clear();
+
+    if (!retainDiffs) {
+      clear();
+    }
{code}
Can we move small whitespace cleanups like this to a follow-on change?  It just 
makes backports a pain because it creates unnecessary conflicts, and tends to 
obscure what this JIRA is about when people are reading the change log.

{code}
+        } //end for
+      } //end synchronized
+    } // end if
{code}
If we're changing this, then let's get rid of the COBOLisms.  A close brace is 
enough to know that the block is closed.

> throttle directoryScanner
> -------------------------
>
>                 Key: HDFS-8873
>                 URL: https://issues.apache.org/jira/browse/HDFS-8873
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.7.1
>            Reporter: Nathan Roberts
>            Assignee: Daniel Templeton
>         Attachments: HDFS-8873.001.patch, HDFS-8873.002.patch
>
>
> The new 2-level directory layout can make directory scans expensive in terms 
> of disk seeks (see HDFS-8791) for details. 
> It would be good if the directoryScanner() had a configurable duty cycle that 
> would reduce its impact on disk performance (much like the approach in 
> HDFS-8617). 
> Without such a throttle, disks can go 100% busy for many minutes at a time 
> (assuming the common case of all inodes in cache but no directory blocks 
> cached, 64K seeks are required for full directory listing which translates to 
> 655 seconds) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-8873) throttle directoryScanner

Reply via email to