[ 
https://issues.apache.org/jira/browse/HDFS-8617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596712#comment-14596712
 ] 

Haohui Mai commented on HDFS-8617:
----------------------------------

bq. You can read my related SoCC paper here: 
http://umbrant.com/papers/socc12-cake.pdf . I experimented with ioprio about 3 
years ago as part of this work, and didn't get positive results. We needed 
application-level throttling.

As you mentioned in the evaluation, there are adverse effects on throughputs.

I agree that application-level throttling can be useful. The proposed solution, 
however, relies on magic numbers to work. My concern is that how to choose the 
magic numbers? Is it repeatable to achieve good performance? Is it 
generalizable to other configuration? It looks to me that currently the answers 
of both questions are no. The proposed solution looks like lowering the 
utilization of the cluster (at the cost of making {{checkDir()}} really slow) 
to meet the SLOs.

bq. The key issue though, as both Colin and I have mentioned, is that there is 
queuing both in the OS and on disk. ioprio only affects OS-level queuing, and 
disk-level queuing can be quite substantial. Not sure how much more needs to be 
said.

Point taken. Unfortunately without performance benchmarks and numbers the 
statements are purely speculative. For example, what do you mean by 
substantial? The size of the NCQ is 32 compared the size of OS level I/O queue 
can be hundreds or thousands. I really appreciate doing some performance 
benchmarks and sharing the numbers.

My concern of the proposal is that the parameter cannot be automatically 
tunable w.r.t. cluster configurations and loads. It has to be dynamic. In the 
longer term it makes a lot sense to tune these parameters based on the length 
of the I/O queue, avg. processing time, etc. At the first step I think it can 
be very helpful to simply correlate these parameters with simple metrics like 
the number of tranceiver threads.


> Throttle DiskChecker#checkDirs() speed.
> ---------------------------------------
>
>                 Key: HDFS-8617
>                 URL: https://issues.apache.org/jira/browse/HDFS-8617
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: HDFS
>    Affects Versions: 2.7.0
>            Reporter: Lei (Eddy) Xu
>            Assignee: Lei (Eddy) Xu
>         Attachments: HDFS-8617.000.patch
>
>
> As described in HDFS-8564,  {{DiskChecker.checkDirs(finalizedDir)}} is 
> causing excessive I/Os because {{finalizedDirs}} might have up to 64K 
> sub-directories (HDFS-6482).
> This patch proposes to limit the rate of IO operations in 
> {{DiskChecker.checkDirs()}}. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to