[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050231#comment-13050231
 ] 

Todd Lipcon commented on HDFS-941:
----------------------------------

Ran the following benchmark to compare 0.22 before vs after the application of 
HDFS-941:
- inserted a 128M file into HDFS
- read it 50 times using "hadoop fs -cat /file > /dev/null" and the unix "time" 
utility
- recompiled with the patch reverted, restarted NN/DN
- ran same test
- recompiled with the patch included, restarted NN/DN
- ran same test
- recompiled with patch reverted
- ran same test

This resulted in 100 samples for each setup, 50 from each run. The following is 
the output of a t-test for the important variables:


> t.test(d.22$wall, d.22.with.941$wall)

        Welch Two Sample t-test

data:  d.22$wall and d.22.with.941$wall 
t = -0.4932, df = 174.594, p-value = 0.6225
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -0.011002972  0.006602972 
sample estimates:
mean of x mean of y 
   1.1937    1.1959 

> t.test(d.22$user, d.22.with.941$user)

        Welch Two Sample t-test

data:  d.22$user and d.22.with.941$user 
t = -1.5212, df = 197.463, p-value = 0.1298
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 -0.032378364  0.004178364 
sample estimates:
mean of x mean of y 
   1.3335    1.3476 

that is to say, it failed to reject the null hypothesis... in less stat-heavy 
terms, there's no statistical evidence that this patch makes the test any 
slower.

> Datanode xceiver protocol should allow reuse of a connection
> ------------------------------------------------------------
>
>                 Key: HDFS-941
>                 URL: https://issues.apache.org/jira/browse/HDFS-941
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node, hdfs client
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: bc Wong
>             Fix For: 0.22.0
>
>         Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
> HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
> HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
> HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to