[ 
https://issues.apache.org/jira/browse/KUDU-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182624#comment-15182624
 ] 

Todd Lipcon commented on KUDU-1366:
-----------------------------------

results seem to be a little mixed.

rpc-bench: tcmalloc is faster -- I ran the 10-second benchmark 10 times each 
and did a t-test of the Reqs/second number on ve0518:
{code}
data:  subset(d, V1 == "jemalloc")$V2 and subset(d, V1 == "tcmalloc")$V2
t = -4.4198, df = 15.415, p-value = 0.000467
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -15759.041  -5520.759
sample estimates:
mean of x mean of y 
 273947.4  284587.3 
{code}
(with 95% confidence it's between 2% and 5.7% faster)

----

full_stack-insert-scan-test (with the scanning portion removed -- added an 
exit(0) after inserts complete)


jemalloc:
{code}
 Performance counter stats for 'bin/full_stack-insert-scan-test 
--gtest_filter=*WithDiskStress* --inserts_per_client=2000000 
-rows_per_batch=1000':

      37872.553861 task-clock                #    3.816 CPUs utilized          
            66,711 context-switches          #    0.002 M/sec                  
             1,555 cpu-migrations            #    0.041 K/sec                  
           428,024 page-faults               #    0.011 M/sec                  
   111,589,329,879 cycles                    #    2.946 GHz                    
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
   167,530,007,382 instructions              #    1.50  insns per cycle        
    31,311,773,016 branches                  #  826.767 M/sec                  
       128,031,725 branch-misses             #    0.41% of all branches        

       9.925722239 seconds time elapsed
{code}

tcmalloc:
{code}
      46228.541106 task-clock                #    4.033 CPUs utilized          
            66,913 context-switches          #    0.001 M/sec                  
             1,445 cpu-migrations            #    0.031 K/sec                  
           248,879 page-faults               #    0.005 M/sec                  
   136,522,804,795 cycles                    #    2.953 GHz                    
   <not supported> stalled-cycles-frontend 
   <not supported> stalled-cycles-backend  
   144,958,384,819 instructions              #    1.06  insns per cycle        
    27,938,858,661 branches                  #  604.364 M/sec                  
       115,054,846 branch-misses             #    0.41% of all branches        

      11.463562859 seconds time elapsed
{code}

I saw similar speedups on this test on my laptop (~27s vs ~33s).

I'm inclined to make decisions based on the real workload instead of the RPC 
microbenchmark, but probably worth further investigation before doing a switch. 
A "free" 10% improvement is worth looking at. jemalloc also has some neat APIs 
we could use to improve performance of things like faststring (following what 
'fbstring' in folly does)


> Consider switching to jemalloc
> ------------------------------
>
>                 Key: KUDU-1366
>                 URL: https://issues.apache.org/jira/browse/KUDU-1366
>             Project: Kudu
>          Issue Type: Bug
>          Components: build
>            Reporter: Todd Lipcon
>
> We spend a fair amount of time in the allocator. While we could spend some 
> time trying to use arenas more, it's also worth considering switching 
> allocators. I ran a few quick tests with jemalloc 4.1 and it seems like it 
> might be better than the version of tcmalloc that we use (and has much more 
> active development)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to