[ 
https://issues.apache.org/jira/browse/HDFS-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897066#action_12897066
 ] 

Hong Tang commented on HDFS-1338:
---------------------------------

More thoughts:
- I'd correct myself that we should launch sufficient writers so that # of 
concurrent IO operations are more than # of physical disks. I'd suggest the 
ratio to be between 1.5 to 2. For instance, if we have 4 disks per node, and 
rep-degree = 3. Then we should launch two DFS writers per node.
- Similar principle should apply to reader side too. We should have sufficient 
map slots to allow the # of readers to be 1.5x to 2x of the # of physical 
drivers. Again, with 4 disks per node, we may need 6 map slots. (And each map 
should read one block of data instead of a whole file.)

> Improve TestDFSIO
> -----------------
>
>                 Key: HDFS-1338
>                 URL: https://issues.apache.org/jira/browse/HDFS-1338
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Arun C Murthy
>
> Currently the read test in TestDFSIO benchmark just opens a large side file 
> and measures the read performance. The MR scheduler has no opportunity to do 
> *any* optimization for the TestDFSIO MR application. The side-effect of this 
> is that it is *very* hard to do any meaningful analysis of the results of the 
> benchmark i.e. to check if node-local or rack-local or off-switch read 
> performance improved/degraded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to