[Qemu-devel] [PATCH v2 0/6] Reduce lock contention on TCG hot-path

Alex Bennée Tue, 05 Jul 2016 11:50:48 -0700

Hi,

Well this is the first re-spin of the series posted last week. I've
added a bunch of additional patches to be more aggressive with
avoiding bouncing locks but to be honest the numbers don't seem to
make it worth it.


I think the first 3 patches are ready to take if the TCG maintainers
want to:

    tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock'
    tcg: set up tb->page_addr before insertion
    tcg: cpu-exec: remove tb_lock from the hot-path

The remaining patches are included for discussion.

I've re-spun the benchmarks with a larger tarball to show the
difference more clearly:

Baseline Run
============

retry.py called with ['./arm-linux-user/qemu-arm', './pigz.armhf', '-c', '-9', 
'linux-4.6.3.tar']
Source code is @ pull-target-arm-20160627-162-ged7e184 or 
heads/misc/docker-linux-user-v4
run 1: ret=0 (PASS), time=32.786249 (1/1)
run 2: ret=0 (PASS), time=32.535492 (2/2)
run 3: ret=0 (PASS), time=33.036394 (3/3)
run 4: ret=0 (PASS), time=33.036447 (4/4)
run 5: ret=0 (PASS), time=33.036706 (5/5)
run 6: ret=0 (PASS), time=33.536869 (6/6)
run 7: ret=0 (PASS), time=33.286681 (7/7)
run 8: ret=0 (PASS), time=35.292143 (8/8)
run 9: ret=0 (PASS), time=33.286727 (9/9)
run 10: ret=0 (PASS), time=32.786092 (10/10)
Results summary:
0: 10 times (100.00%), avg time 33.262 (0.59 varience/0.77 deviation)

Up to and including tcg: cpu-exec: remove tb_lock from the hot-path
===================================================================

Ran command 10 times, 10 passes
retry.py called with ['./arm-linux-user/qemu-arm', './pigz.armhf', '-c', '-9', 
'linux-4.6.3.tar']
Source code is @ pull-target-arm-20160627-165-ga6c4538 or 
heads/misc/docker-linux-user-v4-3-ga6c4538
run 1: ret=0 (PASS), time=29.783023 (1/1)
run 2: ret=0 (PASS), time=29.532725 (2/2)
run 3: ret=0 (PASS), time=29.783066 (3/3)
run 4: ret=0 (PASS), time=29.783209 (4/4)
run 5: ret=0 (PASS), time=29.783338 (5/5)
run 6: ret=0 (PASS), time=30.033726 (6/6)
run 7: ret=0 (PASS), time=32.039076 (7/7)
run 8: ret=0 (PASS), time=29.783116 (8/8)
run 9: ret=0 (PASS), time=30.033237 (9/9)
run 10: ret=0 (PASS), time=30.283845 (10/10)
Results summary:
0: 10 times (100.00%), avg time 30.084 (0.51 varience/0.72 deviation)

The whole series
================

Ran command 10 times, 10 passes
retry.py called with ['./arm-linux-user/qemu-arm', './pigz.armhf', '-c', '-9', 
'linux-4.6.3.tar']
Source code is @ pull-target-arm-20160627-168-ge9609f6 or 
heads/tcg/hot-path-and-misc-cleanups-v2
run 1: ret=0 (PASS), time=29.532766 (1/1)
run 2: ret=0 (PASS), time=29.534664 (2/2)
run 3: ret=0 (PASS), time=29.533659 (3/3)
run 4: ret=0 (PASS), time=29.282399 (4/4)
run 5: ret=0 (PASS), time=30.283774 (5/5)
run 6: ret=0 (PASS), time=30.033609 (6/6)
run 7: ret=0 (PASS), time=30.283790 (7/7)
run 8: ret=0 (PASS), time=29.783237 (8/8)
run 9: ret=0 (PASS), time=30.033356 (9/9)
run 10: ret=0 (PASS), time=32.536344 (10/10)
Results summary:
0: 10 times (100.00%), avg time 30.084 (0.86 varience/0.93 deviation)
Ran command 10 times, 10 passes

I think the variance and deviation calculations are correct now.The
benchmark is run with my retry script:

    https://github.com/stsquad/retry

The command line was:

    retry.py -l pigz.bench -g -n 10 -c -- ./arm-linux-user/qemu-arm \
        ./pigz.armhf -c -9 linux-4.6.3.tar > /dev/null

Alex Bennée (5):
  tcg: set up tb->page_addr before insertion
  tcg: cpu-exec: remove tb_lock from the hot-path
  tcg: cpu-exec: factor out TB patching code
  tcg: introduce tb_lock_recursive()
  tcg: cpu-exec: roll-up tb_find_fast/slow

Sergey Fedorov (1):
  tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock'

 cpu-exec.c      | 153 +++++++++++++++++++++++++++++++++-----------------------
 tcg/tcg.h       |   1 +
 translate-all.c |  28 ++++++++---
 3 files changed, 114 insertions(+), 68 deletions(-)

-- 
2.7.4

[Qemu-devel] [PATCH v2 0/6] Reduce lock contention on TCG hot-path

Reply via email to