[ http://issues.apache.org/jira/browse/HADOOP-445?page=all ]
Sameer Paranjpye updated HADOOP-445:
------------------------------------
Component/s: dfs
> Parallel data/socket writing for DFSOutputStream
> ------------------------------------------------
>
> Key: HADOOP-445
> URL: http://issues.apache.org/jira/browse/HADOOP-445
> Project: Hadoop
> Issue Type: Improvement
> Components: dfs
> Affects Versions: 0.5.0
> Reporter: Benjamin Reed
> Attachments: fastClientWrite.patch
>
>
> Currently, as DFS clients output blocks they write the entire block to disk
> before starting to transmit to the datanode. By writing to disk the client is
> able to retry a block write if the datanode files in the middle of a block
> transfer. Writing to disk and then to the datanode adds latency. Hopefully,
> the common case is that block transfers to datanodes are successful. This
> patch writes to the datanode and the disk in parallel. If the write to the
> datanode fails, it falls back to current behavior.
> In my tests of transmits of 237M and 946M datasets using -copyFromLocal I'm
> seeing a 20-25% improvement in throughput.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira