[ 
https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806461#action_12806461
 ] 

Todd Lipcon commented on HDFS-918:
----------------------------------

I think a main benefit of this work will be scalability with a high number of 
concurrent connection.

A good test would be to write a quick MR job that opens a few hundred 
DFSInputStreams from each mapper, all to the same block of the same file, and 
reads very slowly on them (one byte every few seconds, for example). This 
should be fine on actual disk IO since it will all hit the buffer cache, but 
will demonstrate whether this has a scalability improvement over the 
thread-per-reader.

> Use single Selector and small thread pool to replace many instances of 
> BlockSender for reads
> --------------------------------------------------------------------------------------------
>
>                 Key: HDFS-918
>                 URL: https://issues.apache.org/jira/browse/HDFS-918
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>            Reporter: Jay Booth
>             Fix For: 0.22.0
>
>         Attachments: hdfs-multiplex.patch
>
>
> Currently, on read requests, the DataXCeiver server allocates a new thread 
> per request, which must allocate its own buffers and leads to 
> higher-than-optimal CPU and memory usage by the sending threads.  If we had a 
> single selector and a small threadpool to multiplex request packets, we could 
> theoretically achieve higher performance while taking up fewer resources and 
> leaving more CPU on datanodes available for mapred, hbase or whatever.  This 
> can be done without changing any wire protocols.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to