[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866496#action_12866496
 ] 

Todd Lipcon commented on HDFS-941:
----------------------------------

I ran some benchmarks again tonight using YCSB.

I loaded 1M rows into an HBase table (untimed) on my test cluster. The cluster 
is running a 5-node HDFS, but I only ran one HBase region server, so that I 
could reliably have the same region deployment between test runs. The data fits 
entirely within the buffer cache, so we're just benchmarking DFS overhead and 
not actual seek time.

I ran benchmarks with:
{code}
java -cp build/ycsb.jar:src/com/yahoo/ycsb/db/hbaselib/*:$HBASE_CONF_DIR 
com.yahoo.ycsb.Client  -db com.yahoo.ycsb.db.HBaseClient -threads 40 -t -p 
columnfamily=test -P workloads/workloadc -p recordcount=$[1000*1000] -p 
operationcount=$[1000*1000]
{code}
from one of the nodes in the cluster (not the same one as ran the region server)

I ran the benchmark twice without the patch and twice with, alternating builds 
and restarting DFS and HBase each time, to make sure I wasn't getting any 
variability due to caching, etc.

Results follow:

==> 941-bench-1.txt <==
YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -threads 40 -t -p 
columnfamily=test -P workloads/workloadc -p recordcount=1000000 -p 
operationcount=1000000
[OVERALL],RunTime(ms), 118197
[OVERALL],Throughput(ops/sec), 8460.451618907417
[READ], Operations, 1000000
[READ], AverageLatency(ms), 4.701651
[READ], MinLatency(ms), 0
[READ], MaxLatency(ms), 1352
[READ], 95thPercentileLatency(ms), 11
[READ], 99thPercentileLatency(ms), 15

==> 941-bench-2.txt <==
YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -threads 40 -t -p 
columnfamily=test -P workloads/workloadc -p recordcount=1000000 -p 
operationcount=1000000
[OVERALL],RunTime(ms), 124005
[OVERALL],Throughput(ops/sec), 8064.190960041934
[READ], Operations, 1000000
[READ], AverageLatency(ms), 4.940652
[READ], MinLatency(ms), 0
[READ], MaxLatency(ms), 1337
[READ], 95thPercentileLatency(ms), 12
[READ], 99thPercentileLatency(ms), 16

==> normal-bench-1.txt <==
YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -threads 40 -t -p 
columnfamily=test -P workloads/workloadc -p recordcount=1000000 -p 
operationcount=1000000
[OVERALL],RunTime(ms), 182316
[OVERALL],Throughput(ops/sec), 5484.982118958293
[READ], Operations, 1000000
[READ], AverageLatency(ms), 7.267306
[READ], MinLatency(ms), 0
[READ], MaxLatency(ms), 1327
[READ], 95thPercentileLatency(ms), 17
[READ], 99thPercentileLatency(ms), 26

==> normal-bench-2.txt <==
YCSB Client 0.1
Command line: -db com.yahoo.ycsb.db.HBaseClient -threads 40 -t -p 
columnfamily=test -P workloads/workloadc -p recordcount=1000000 -p 
operationcount=1000000
[OVERALL],RunTime(ms), 190053
[OVERALL],Throughput(ops/sec), 5261.690160113231
[READ], Operations, 1000000
[READ], AverageLatency(ms), 7.577673
[READ], MinLatency(ms), 0
[READ], MaxLatency(ms), 1525
[READ], 95thPercentileLatency(ms), 15
[READ], 99thPercentileLatency(ms), 21

In other words, this patch speeds up average latency by nearly 40%, with 
similar gains on the high percentile latencies. The reads/sec number improved 
by about 35%.

This is without any tuning of the keepalive or the socket cache size - I 
imagine even more improvement could be made with a bit more tuning, etc.

> Datanode xceiver protocol should allow reuse of a connection
> ------------------------------------------------------------
>
>                 Key: HDFS-941
>                 URL: https://issues.apache.org/jira/browse/HDFS-941
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node, hdfs client
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: bc Wong
>         Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, 
> HDFS-941-3.patch, HDFS-941-4.patch
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to