[ https://issues.apache.org/jira/browse/HDFS-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398767#comment-13398767 ]
Lars Hofhansl commented on HDFS-1783: ------------------------------------- Yep. That is exactly the point. HDFS does pipelining to improve throughput at the expense of latency. This patch allows a client to favor latency. If the client operates at the NIC's throughput limit enabling parallel writes will make things worse. This patch could be extended in the future to mix direct connections with pipelining. For example a client could setup a 1-hop (direct) pipeline and a 2-hop-pipeline for a replication factor of 3, or 2 2-hop-pipelines for a replication factor of 4, etc. We'll be testing this with HBase workloads. Using traffic shaping is interesting. > Ability for HDFS client to write replicas in parallel > ----------------------------------------------------- > > Key: HDFS-1783 > URL: https://issues.apache.org/jira/browse/HDFS-1783 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs client > Reporter: dhruba borthakur > Assignee: Lars Hofhansl > Attachments: HDFS-1783-trunk-v2.patch, HDFS-1783-trunk-v3.patch, > HDFS-1783-trunk-v4.patch, HDFS-1783-trunk-v5.patch, HDFS-1783-trunk.patch > > > The current implementation of HDFS pipelines the writes to the three > replicas. This introduces some latency for realtime latency sensitive > applications. An alternate implementation that allows the client to write all > replicas in parallel gives much better response times to these applications. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira