[ https://issues.apache.org/jira/browse/KUDU-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155725#comment-17155725 ]
ASF subversion and git services commented on KUDU-636: ------------------------------------------------------ Commit a600f386aa2c341522638acb9af53fd45c469431 in kudu's branch refs/heads/master from Todd Lipcon [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=a600f38 ] KUDU-636. Use Arena for EncodedKeys This updates EncodedKeyBuilder, RowSetKeyProbe, and EncodedKey to always allocate from an Arena instead of from the heap. This reduces allocator contention on the write path significantly and improves memory locality. I measured by running a tserver under 'perf stat' while using perf loadgen to insert 80M rows total using 8 client threads. The CPU time on the tserver was reduced by about 20%. Before: Performance counter stats for './build/latest/bin/kudu tserver run -fs-wal-dir /tmp/ts': 269853.10 msec task-clock # 6.862 CPUs utilized 293066 context-switches # 0.001 M/sec 44541 cpu-migrations # 0.165 K/sec 2846435 page-faults # 0.011 M/sec 1110190206891 cycles # 4.114 GHz (83.33%) 201895623339 stalled-cycles-frontend # 18.19% frontend cycles idle (83.33%) 137095475307 stalled-cycles-backend # 12.35% backend cycles idle (83.32%) 894201276095 instructions # 0.81 insn per cycle # 0.23 stalled cycles per insn (83.33%) 159095264762 branches # 589.562 M/sec (83.35%) 639216492 branch-misses # 0.40% of all branches (83.35%) 255.178068000 seconds user 14.913394000 seconds sys After: Performance counter stats for './build/latest/bin/kudu tserver run -fs-wal-dir /tmp/ts': 227730.62 msec task-clock # 6.212 CPUs utilized 263824 context-switches # 0.001 M/sec 45470 cpu-migrations # 0.200 K/sec 3165436 page-faults # 0.014 M/sec 931840588715 cycles # 4.092 GHz (83.25%) 183214671009 stalled-cycles-frontend # 19.66% frontend cycles idle (83.40%) 111864991317 stalled-cycles-backend # 12.00% backend cycles idle (83.35%) 832636863971 instructions # 0.89 insn per cycle # 0.22 stalled cycles per insn (83.40%) 148228107120 branches # 650.892 M/sec (83.24%) 563344647 branch-misses # 0.38% of all branches (83.35%) 211.361472000 seconds user 16.635265000 seconds sys Change-Id: Ib46d0e2c31e03a7f319ceb0bf742e08ff74d7683 Reviewed-on: http://gerrit.cloudera.org:8080/16162 Reviewed-by: Alexey Serbin <aser...@cloudera.com> Tested-by: Todd Lipcon <t...@apache.org> > optimization: we spend a lot of time in alloc/free > -------------------------------------------------- > > Key: KUDU-636 > URL: https://issues.apache.org/jira/browse/KUDU-636 > Project: Kudu > Issue Type: Improvement > Components: perf > Affects Versions: Public beta > Reporter: Todd Lipcon > Priority: Major > > Looking at a workload in the cluster, several of the top 10 lines of perf > report are tcmalloc-related. It seems like we don't do a good job of making > use of the per-thread free-lists, and we end up in a lot of contention on the > central free list. There are a few low-hanging fruit things we could do to > improve this for a likely perf boost. -- This message was sent by Atlassian Jira (v8.3.4#803005)