[
https://issues.apache.org/jira/browse/KUDU-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182624#comment-15182624
]
Todd Lipcon commented on KUDU-1366:
-----------------------------------
results seem to be a little mixed.
rpc-bench: tcmalloc is faster -- I ran the 10-second benchmark 10 times each
and did a t-test of the Reqs/second number on ve0518:
{code}
data: subset(d, V1 == "jemalloc")$V2 and subset(d, V1 == "tcmalloc")$V2
t = -4.4198, df = 15.415, p-value = 0.000467
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-15759.041 -5520.759
sample estimates:
mean of x mean of y
273947.4 284587.3
{code}
(with 95% confidence it's between 2% and 5.7% faster)
----
full_stack-insert-scan-test (with the scanning portion removed -- added an
exit(0) after inserts complete)
jemalloc:
{code}
Performance counter stats for 'bin/full_stack-insert-scan-test
--gtest_filter=*WithDiskStress* --inserts_per_client=2000000
-rows_per_batch=1000':
37872.553861 task-clock # 3.816 CPUs utilized
66,711 context-switches # 0.002 M/sec
1,555 cpu-migrations # 0.041 K/sec
428,024 page-faults # 0.011 M/sec
111,589,329,879 cycles # 2.946 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
167,530,007,382 instructions # 1.50 insns per cycle
31,311,773,016 branches # 826.767 M/sec
128,031,725 branch-misses # 0.41% of all branches
9.925722239 seconds time elapsed
{code}
tcmalloc:
{code}
46228.541106 task-clock # 4.033 CPUs utilized
66,913 context-switches # 0.001 M/sec
1,445 cpu-migrations # 0.031 K/sec
248,879 page-faults # 0.005 M/sec
136,522,804,795 cycles # 2.953 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
144,958,384,819 instructions # 1.06 insns per cycle
27,938,858,661 branches # 604.364 M/sec
115,054,846 branch-misses # 0.41% of all branches
11.463562859 seconds time elapsed
{code}
I saw similar speedups on this test on my laptop (~27s vs ~33s).
I'm inclined to make decisions based on the real workload instead of the RPC
microbenchmark, but probably worth further investigation before doing a switch.
A "free" 10% improvement is worth looking at. jemalloc also has some neat APIs
we could use to improve performance of things like faststring (following what
'fbstring' in folly does)
> Consider switching to jemalloc
> ------------------------------
>
> Key: KUDU-1366
> URL: https://issues.apache.org/jira/browse/KUDU-1366
> Project: Kudu
> Issue Type: Bug
> Components: build
> Reporter: Todd Lipcon
>
> We spend a fair amount of time in the allocator. While we could spend some
> time trying to use arenas more, it's also worth considering switching
> allocators. I ran a few quick tests with jemalloc 4.1 and it seems like it
> might be better than the version of tcmalloc that we use (and has much more
> active development)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)