[ https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806461#action_12806461 ]
Todd Lipcon commented on HDFS-918: ---------------------------------- I think a main benefit of this work will be scalability with a high number of concurrent connection. A good test would be to write a quick MR job that opens a few hundred DFSInputStreams from each mapper, all to the same block of the same file, and reads very slowly on them (one byte every few seconds, for example). This should be fine on actual disk IO since it will all hit the buffer cache, but will demonstrate whether this has a scalability improvement over the thread-per-reader. > Use single Selector and small thread pool to replace many instances of > BlockSender for reads > -------------------------------------------------------------------------------------------- > > Key: HDFS-918 > URL: https://issues.apache.org/jira/browse/HDFS-918 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node > Reporter: Jay Booth > Fix For: 0.22.0 > > Attachments: hdfs-multiplex.patch > > > Currently, on read requests, the DataXCeiver server allocates a new thread > per request, which must allocate its own buffers and leads to > higher-than-optimal CPU and memory usage by the sending threads. If we had a > single selector and a small threadpool to multiplex request packets, we could > theoretically achieve higher performance while taking up fewer resources and > leaving more CPU on datanodes available for mapred, hbase or whatever. This > can be done without changing any wire protocols. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.