[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050231#comment-13050231 ]
Todd Lipcon commented on HDFS-941: ---------------------------------- Ran the following benchmark to compare 0.22 before vs after the application of HDFS-941: - inserted a 128M file into HDFS - read it 50 times using "hadoop fs -cat /file > /dev/null" and the unix "time" utility - recompiled with the patch reverted, restarted NN/DN - ran same test - recompiled with the patch included, restarted NN/DN - ran same test - recompiled with patch reverted - ran same test This resulted in 100 samples for each setup, 50 from each run. The following is the output of a t-test for the important variables: > t.test(d.22$wall, d.22.with.941$wall) Welch Two Sample t-test data: d.22$wall and d.22.with.941$wall t = -0.4932, df = 174.594, p-value = 0.6225 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.011002972 0.006602972 sample estimates: mean of x mean of y 1.1937 1.1959 > t.test(d.22$user, d.22.with.941$user) Welch Two Sample t-test data: d.22$user and d.22.with.941$user t = -1.5212, df = 197.463, p-value = 0.1298 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.032378364 0.004178364 sample estimates: mean of x mean of y 1.3335 1.3476 that is to say, it failed to reject the null hypothesis... in less stat-heavy terms, there's no statistical evidence that this patch makes the test any slower. > Datanode xceiver protocol should allow reuse of a connection > ------------------------------------------------------------ > > Key: HDFS-941 > URL: https://issues.apache.org/jira/browse/HDFS-941 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client > Affects Versions: 0.22.0 > Reporter: Todd Lipcon > Assignee: bc Wong > Fix For: 0.22.0 > > Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, > HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, > HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, > HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, > hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png > > > Right now each connection into the datanode xceiver only processes one > operation. > In the case that an operation leaves the stream in a well-defined state (eg a > client reads to the end of a block successfully) the same connection could be > reused for a second operation. This should improve random read performance > significantly. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira