[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866496#action_12866496 ]
Todd Lipcon commented on HDFS-941: ---------------------------------- I ran some benchmarks again tonight using YCSB. I loaded 1M rows into an HBase table (untimed) on my test cluster. The cluster is running a 5-node HDFS, but I only ran one HBase region server, so that I could reliably have the same region deployment between test runs. The data fits entirely within the buffer cache, so we're just benchmarking DFS overhead and not actual seek time. I ran benchmarks with: {code} java -cp build/ycsb.jar:src/com/yahoo/ycsb/db/hbaselib/*:$HBASE_CONF_DIR com.yahoo.ycsb.Client -db com.yahoo.ycsb.db.HBaseClient -threads 40 -t -p columnfamily=test -P workloads/workloadc -p recordcount=$[1000*1000] -p operationcount=$[1000*1000] {code} from one of the nodes in the cluster (not the same one as ran the region server) I ran the benchmark twice without the patch and twice with, alternating builds and restarting DFS and HBase each time, to make sure I wasn't getting any variability due to caching, etc. Results follow: ==> 941-bench-1.txt <== YCSB Client 0.1 Command line: -db com.yahoo.ycsb.db.HBaseClient -threads 40 -t -p columnfamily=test -P workloads/workloadc -p recordcount=1000000 -p operationcount=1000000 [OVERALL],RunTime(ms), 118197 [OVERALL],Throughput(ops/sec), 8460.451618907417 [READ], Operations, 1000000 [READ], AverageLatency(ms), 4.701651 [READ], MinLatency(ms), 0 [READ], MaxLatency(ms), 1352 [READ], 95thPercentileLatency(ms), 11 [READ], 99thPercentileLatency(ms), 15 ==> 941-bench-2.txt <== YCSB Client 0.1 Command line: -db com.yahoo.ycsb.db.HBaseClient -threads 40 -t -p columnfamily=test -P workloads/workloadc -p recordcount=1000000 -p operationcount=1000000 [OVERALL],RunTime(ms), 124005 [OVERALL],Throughput(ops/sec), 8064.190960041934 [READ], Operations, 1000000 [READ], AverageLatency(ms), 4.940652 [READ], MinLatency(ms), 0 [READ], MaxLatency(ms), 1337 [READ], 95thPercentileLatency(ms), 12 [READ], 99thPercentileLatency(ms), 16 ==> normal-bench-1.txt <== YCSB Client 0.1 Command line: -db com.yahoo.ycsb.db.HBaseClient -threads 40 -t -p columnfamily=test -P workloads/workloadc -p recordcount=1000000 -p operationcount=1000000 [OVERALL],RunTime(ms), 182316 [OVERALL],Throughput(ops/sec), 5484.982118958293 [READ], Operations, 1000000 [READ], AverageLatency(ms), 7.267306 [READ], MinLatency(ms), 0 [READ], MaxLatency(ms), 1327 [READ], 95thPercentileLatency(ms), 17 [READ], 99thPercentileLatency(ms), 26 ==> normal-bench-2.txt <== YCSB Client 0.1 Command line: -db com.yahoo.ycsb.db.HBaseClient -threads 40 -t -p columnfamily=test -P workloads/workloadc -p recordcount=1000000 -p operationcount=1000000 [OVERALL],RunTime(ms), 190053 [OVERALL],Throughput(ops/sec), 5261.690160113231 [READ], Operations, 1000000 [READ], AverageLatency(ms), 7.577673 [READ], MinLatency(ms), 0 [READ], MaxLatency(ms), 1525 [READ], 95thPercentileLatency(ms), 15 [READ], 99thPercentileLatency(ms), 21 In other words, this patch speeds up average latency by nearly 40%, with similar gains on the high percentile latencies. The reads/sec number improved by about 35%. This is without any tuning of the keepalive or the socket cache size - I imagine even more improvement could be made with a bit more tuning, etc. > Datanode xceiver protocol should allow reuse of a connection > ------------------------------------------------------------ > > Key: HDFS-941 > URL: https://issues.apache.org/jira/browse/HDFS-941 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client > Affects Versions: 0.22.0 > Reporter: Todd Lipcon > Assignee: bc Wong > Attachments: HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, > HDFS-941-3.patch, HDFS-941-4.patch > > > Right now each connection into the datanode xceiver only processes one > operation. > In the case that an operation leaves the stream in a well-defined state (eg a > client reads to the end of a block successfully) the same connection could be > reused for a second operation. This should improve random read performance > significantly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.