[
https://issues.apache.org/jira/browse/KUDU-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15249270#comment-15249270
]
Binglin Chang commented on KUDU-1235:
-------------------------------------
New test result for the latest patch:
{noformat}
get_perf-itest result on i7-4770 CPU @ 3.40GHz (4 core 8 thread):
Input <scan|get> <seconds to run, 0(exit)> <num thread>: get 16 20
I0420 06:05:29.992522 28700 get_perf-itest.cc:154] Get total: 145032 QPS: 72514
I0420 06:05:31.992616 28700 get_perf-itest.cc:154] Get total: 285910 QPS:
70435.7
I0420 06:05:33.992712 28700 get_perf-itest.cc:154] Get total: 431481 QPS: 72782
I0420 06:05:35.992811 28700 get_perf-itest.cc:154] Get total: 571640 QPS:
70076.1
I0420 06:05:37.992911 28700 get_perf-itest.cc:154] Get total: 715463 QPS:
71907.9
I0420 06:05:39.993012 28700 get_perf-itest.cc:154] Get total: 859175 QPS:
71852.4
I0420 06:05:41.993110 28700 get_perf-itest.cc:154] Get total: 1002069 QPS:
71443.5
I0420 06:05:43.993211 28700 get_perf-itest.cc:154] Get total: 1137329 QPS:
67626.6
Input <scan|get> <seconds to run, 0(exit)> <num thread>: scan 16 20
I0420 06:05:51.713536 28700 get_perf-itest.cc:154] Scan total: 70844 QPS: 35421
I0420 06:05:53.713636 28700 get_perf-itest.cc:154] Scan total: 140419 QPS:
34785.8
I0420 06:05:55.713733 28700 get_perf-itest.cc:154] Scan total: 212848 QPS:
36212.8
I0420 06:05:57.713832 28700 get_perf-itest.cc:154] Scan total: 275949 QPS:
31548.9
I0420 06:05:59.713927 28700 get_perf-itest.cc:154] Scan total: 348015 QPS:
36031.3
I0420 06:06:01.714028 28700 get_perf-itest.cc:154] Scan total: 418685 QPS:
35333.2
I0420 06:06:03.714128 28700 get_perf-itest.cc:154] Scan total: 488378 QPS:
34844.8
I0420 06:06:05.714231 28700 get_perf-itest.cc:154] Scan total: 559647 QPS:
35632.7
result on Xeon(R) CPU E5-2620 0 @ 2.00GHz (12 core 24 thread)
Input <scan|get> <seconds to run, 0(exit)> <num thread>: get 16 20
I0420 11:52:07.616729 16915 get_perf-itest.cc:148] Get total: 73436 QPS: 36716.8
I0420 11:52:09.616859 16915 get_perf-itest.cc:148] Get total: 145494 QPS:
36026.7
I0420 11:52:11.616997 16915 get_perf-itest.cc:148] Get total: 222510 QPS:
38505.3
I0420 11:52:13.617130 16915 get_perf-itest.cc:148] Get total: 292826 QPS:
35155.7
I0420 11:52:15.617260 16915 get_perf-itest.cc:148] Get total: 370233 QPS: 38701
I0420 11:52:17.617399 16915 get_perf-itest.cc:148] Get total: 450170 QPS:
39965.9
I0420 11:52:19.617547 16915 get_perf-itest.cc:148] Get total: 524279 QPS:
37051.8
I0420 11:52:21.617691 16915 get_perf-itest.cc:148] Get total: 600118 QPS:
37916.8
CPU: TS: ~510% client: ~200%
Input <scan|get> <seconds to run, 0(exit)> <num thread>: scan 16 20
I0420 11:52:35.767921 16915 get_perf-itest.cc:148] Scan total: 62086 QPS: 31042
I0420 11:52:37.768052 16915 get_perf-itest.cc:148] Scan total: 117487 QPS:
27698.7
I0420 11:52:39.768182 16915 get_perf-itest.cc:148] Scan total: 180447 QPS:
31477.9
I0420 11:52:41.768314 16915 get_perf-itest.cc:148] Scan total: 241990 QPS:
30769.5
I0420 11:52:43.768456 16915 get_perf-itest.cc:148] Scan total: 304551 QPS:
31278.4
I0420 11:52:45.768621 16915 get_perf-itest.cc:148] Scan total: 355526 QPS:
25485.4
I0420 11:52:47.768771 16915 get_perf-itest.cc:148] Scan total: 410974 QPS: 27722
I0420 11:52:49.768911 16915 get_perf-itest.cc:148] Scan total: 474510 QPS:
31765.6
CPU: TS: ~720% client: ~360%
{noformat}
It's interesting that my local machine(4 core 3.4GHz) actually out perform
remote server(12 core 2.0 GHz),
Originally I suspect there is lock contention, so I change tablets from 1 to
10, but this doesn't help at all.
Another thought is maybe we should pin tablet to cpu cores, e.g. each handler
thread only server fixed collection of tablets(hash by tabletid).
> Add Get API
> -----------
>
> Key: KUDU-1235
> URL: https://issues.apache.org/jira/browse/KUDU-1235
> Project: Kudu
> Issue Type: New Feature
> Reporter: Binglin Chang
> Assignee: Binglin Chang
> Attachments: perf-get.svg, perf-scan-opt.svg, perf-scan.svg
>
>
> Get API is more user friendly and efficient if use just want primary key
> lookup.
> I setup a cluster and test get/scan single row using ycsb, initial test shows
> better performance for get.
> {noformat}
> kudu_workload:
> recordcount=1000000
> operationcount=1000000
> workload=com.yahoo.ycsb.workloads.CoreWorkload
> readallfields=false
> readproportion=1
> updateproportion=0
> scanproportion=0
> insertproportion=0
> requestdistribution=uniform
> use_get_api=false
> load:
> ./bin/ycsb load kudu -P workloads/kudu_workload -p sync_ops=false -p
> pre_split_num_tablets=1 -p table_name=ycsb_wiki_example -p
> masterQuorum='c3-kudu-tst-st01.bj:32600' -threads 100
> read test:
> ./bin/ycsb run kudu -P workloads/kudu_workload -p
> masterQuorum='c3-kudu-tst-st01.bj:32600' -threads 100
> {noformat}
> Get API:
> [OVERALL], RunTime(ms), 21304.0
> [OVERALL], Throughput(ops/sec), 46939.54187007135
> [CLEANUP], Operations, 100.0
> [CLEANUP], AverageLatency(us), 423.57
> [CLEANUP], MinLatency(us), 24.0
> [CLEANUP], MaxLatency(us), 19327.0
> [CLEANUP], 95thPercentileLatency(us), 52.0
> [CLEANUP], 99thPercentileLatency(us), 18815.0
> [READ], Operations, 1000000.0
> [READ], AverageLatency(us), 2065.185152
> [READ], MinLatency(us), 134.0
> [READ], MaxLatency(us), 92159.0
> [READ], 95thPercentileLatency(us), 2391.0
> [READ], 99thPercentileLatency(us), 6359.0
> [READ], Return=0, 1000000
> Scan API:
> [OVERALL], RunTime(ms), 38259.0
> [OVERALL], Throughput(ops/sec), 26137.6408165399
> [CLEANUP], Operations, 100.0
> [CLEANUP], AverageLatency(us), 47.32
> [CLEANUP], MinLatency(us), 16.0
> [CLEANUP], MaxLatency(us), 1837.0
> [CLEANUP], 95thPercentileLatency(us), 41.0
> [CLEANUP], 99thPercentileLatency(us), 158.0
> [READ], Operations, 1000000.0
> [READ], AverageLatency(us), 3595.825249
> [READ], MinLatency(us), 139.0
> [READ], MaxLatency(us), 3139583.0
> [READ], 95thPercentileLatency(us), 3775.0
> [READ], 99thPercentileLatency(us), 7659.0
> [READ], Return=0, 1000000
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)