[jira] [Commented] (KUDU-636) optimization: we spend a lot of time in alloc/free

ASF subversion and git services (Jira) Fri, 10 Jul 2020 14:17:32 -0700


    [ 
https://issues.apache.org/jira/browse/KUDU-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155725#comment-17155725
 ]


ASF subversion and git services commented on KUDU-636:
------------------------------------------------------

Commit a600f386aa2c341522638acb9af53fd45c469431 in kudu's branch 
refs/heads/master from Todd Lipcon
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=a600f38 ]

KUDU-636. Use Arena for EncodedKeys

This updates EncodedKeyBuilder, RowSetKeyProbe, and EncodedKey to always
allocate from an Arena instead of from the heap. This reduces allocator
contention on the write path significantly and improves memory locality.

I measured by running a tserver under 'perf stat' while using perf loadgen to
insert 80M rows total using 8 client threads. The CPU time on the tserver was
reduced by about 20%.

Before:

 Performance counter stats for './build/latest/bin/kudu tserver run -fs-wal-dir 
/tmp/ts':

         269853.10 msec task-clock                #    6.862 CPUs utilized
            293066      context-switches          #    0.001 M/sec
             44541      cpu-migrations            #    0.165 K/sec
           2846435      page-faults               #    0.011 M/sec
     1110190206891      cycles                    #    4.114 GHz                
      (83.33%)
      201895623339      stalled-cycles-frontend   #   18.19% frontend cycles 
idle     (83.33%)
      137095475307      stalled-cycles-backend    #   12.35% backend cycles 
idle      (83.32%)
      894201276095      instructions              #    0.81  insn per cycle
                                                  #    0.23  stalled cycles per 
insn  (83.33%)
      159095264762      branches                  #  589.562 M/sec              
      (83.35%)
         639216492      branch-misses             #    0.40% of all branches    
      (83.35%)

     255.178068000 seconds user
      14.913394000 seconds sys

After:

 Performance counter stats for './build/latest/bin/kudu tserver run -fs-wal-dir 
/tmp/ts':

         227730.62 msec task-clock                #    6.212 CPUs utilized
            263824      context-switches          #    0.001 M/sec
             45470      cpu-migrations            #    0.200 K/sec
           3165436      page-faults               #    0.014 M/sec
      931840588715      cycles                    #    4.092 GHz                
      (83.25%)
      183214671009      stalled-cycles-frontend   #   19.66% frontend cycles 
idle     (83.40%)
      111864991317      stalled-cycles-backend    #   12.00% backend cycles 
idle      (83.35%)
      832636863971      instructions              #    0.89  insn per cycle
                                                  #    0.22  stalled cycles per 
insn  (83.40%)
      148228107120      branches                  #  650.892 M/sec              
      (83.24%)
         563344647      branch-misses             #    0.38% of all branches    
      (83.35%)

     211.361472000 seconds user
      16.635265000 seconds sys

Change-Id: Ib46d0e2c31e03a7f319ceb0bf742e08ff74d7683
Reviewed-on: http://gerrit.cloudera.org:8080/16162
Reviewed-by: Alexey Serbin <aser...@cloudera.com>
Tested-by: Todd Lipcon <t...@apache.org>


> optimization: we spend a lot of time in alloc/free
> --------------------------------------------------
>
>                 Key: KUDU-636
>                 URL: https://issues.apache.org/jira/browse/KUDU-636
>             Project: Kudu
>          Issue Type: Improvement
>          Components: perf
>    Affects Versions: Public beta
>            Reporter: Todd Lipcon
>            Priority: Major
>
> Looking at a workload in the cluster, several of the top 10 lines of perf 
> report are tcmalloc-related. It seems like we don't do a good job of making 
> use of the per-thread free-lists, and we end up in a lot of contention on the 
> central free list. There are a few low-hanging fruit things we could do to 
> improve this for a likely perf boost.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KUDU-636) optimization: we spend a lot of time in alloc/free

Reply via email to