[ 
https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Booth updated HDFS-918:
---------------------------

    Attachment: hdfs-918-20100201.patch

New patch.. 

* new configuration params:  dfs.datanode.multiplexBlockSender=true, 
dfs.datanode.multiplex.packetSize=32k, dfs.datanode.multiplex.numWorkers=3

* Packet size is tuneable, possibly allowing better performance with larger TCP 
buffers enabled

* Workers only wake up when a connection is writable

* 3 new class files, minor changes to DataXceiverServer and DataXceiver, 2 
utility classes added to DataTransferProtocol (one stolen from HDFS-881)

* Passes tests from earlier comment  plus a new one for files with lengths that 
don't match up to checksum chunk size, as well as holding up to some load on 
TestDFSIO

* Still fails all tests relying on SimulatedFSDataset

* Has a large amount of TRACE level debugging going on in 
MultiplexedBlockSender in case anybody wants to watch the output

* Adds dependencies for commons-pool and commons-math (for benchmarking code)

* Doesn't yet have benchmarks, but those should be easy now that the 
configuration is all in place

> Use single Selector and small thread pool to replace many instances of 
> BlockSender for reads
> --------------------------------------------------------------------------------------------
>
>                 Key: HDFS-918
>                 URL: https://issues.apache.org/jira/browse/HDFS-918
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>            Reporter: Jay Booth
>             Fix For: 0.22.0
>
>         Attachments: hdfs-918-20100201.patch, hdfs-multiplex.patch
>
>
> Currently, on read requests, the DataXCeiver server allocates a new thread 
> per request, which must allocate its own buffers and leads to 
> higher-than-optimal CPU and memory usage by the sending threads.  If we had a 
> single selector and a small threadpool to multiplex request packets, we could 
> theoretically achieve higher performance while taking up fewer resources and 
> leaving more CPU on datanodes available for mapred, hbase or whatever.  This 
> can be done without changing any wire protocols.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to