Hi, Well this is the first re-spin of the series posted last week. I've added a bunch of additional patches to be more aggressive with avoiding bouncing locks but to be honest the numbers don't seem to make it worth it.
I think the first 3 patches are ready to take if the TCG maintainers want to: tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock' tcg: set up tb->page_addr before insertion tcg: cpu-exec: remove tb_lock from the hot-path The remaining patches are included for discussion. I've re-spun the benchmarks with a larger tarball to show the difference more clearly: Baseline Run ============ retry.py called with ['./arm-linux-user/qemu-arm', './pigz.armhf', '-c', '-9', 'linux-4.6.3.tar'] Source code is @ pull-target-arm-20160627-162-ged7e184 or heads/misc/docker-linux-user-v4 run 1: ret=0 (PASS), time=32.786249 (1/1) run 2: ret=0 (PASS), time=32.535492 (2/2) run 3: ret=0 (PASS), time=33.036394 (3/3) run 4: ret=0 (PASS), time=33.036447 (4/4) run 5: ret=0 (PASS), time=33.036706 (5/5) run 6: ret=0 (PASS), time=33.536869 (6/6) run 7: ret=0 (PASS), time=33.286681 (7/7) run 8: ret=0 (PASS), time=35.292143 (8/8) run 9: ret=0 (PASS), time=33.286727 (9/9) run 10: ret=0 (PASS), time=32.786092 (10/10) Results summary: 0: 10 times (100.00%), avg time 33.262 (0.59 varience/0.77 deviation) Up to and including tcg: cpu-exec: remove tb_lock from the hot-path =================================================================== Ran command 10 times, 10 passes retry.py called with ['./arm-linux-user/qemu-arm', './pigz.armhf', '-c', '-9', 'linux-4.6.3.tar'] Source code is @ pull-target-arm-20160627-165-ga6c4538 or heads/misc/docker-linux-user-v4-3-ga6c4538 run 1: ret=0 (PASS), time=29.783023 (1/1) run 2: ret=0 (PASS), time=29.532725 (2/2) run 3: ret=0 (PASS), time=29.783066 (3/3) run 4: ret=0 (PASS), time=29.783209 (4/4) run 5: ret=0 (PASS), time=29.783338 (5/5) run 6: ret=0 (PASS), time=30.033726 (6/6) run 7: ret=0 (PASS), time=32.039076 (7/7) run 8: ret=0 (PASS), time=29.783116 (8/8) run 9: ret=0 (PASS), time=30.033237 (9/9) run 10: ret=0 (PASS), time=30.283845 (10/10) Results summary: 0: 10 times (100.00%), avg time 30.084 (0.51 varience/0.72 deviation) The whole series ================ Ran command 10 times, 10 passes retry.py called with ['./arm-linux-user/qemu-arm', './pigz.armhf', '-c', '-9', 'linux-4.6.3.tar'] Source code is @ pull-target-arm-20160627-168-ge9609f6 or heads/tcg/hot-path-and-misc-cleanups-v2 run 1: ret=0 (PASS), time=29.532766 (1/1) run 2: ret=0 (PASS), time=29.534664 (2/2) run 3: ret=0 (PASS), time=29.533659 (3/3) run 4: ret=0 (PASS), time=29.282399 (4/4) run 5: ret=0 (PASS), time=30.283774 (5/5) run 6: ret=0 (PASS), time=30.033609 (6/6) run 7: ret=0 (PASS), time=30.283790 (7/7) run 8: ret=0 (PASS), time=29.783237 (8/8) run 9: ret=0 (PASS), time=30.033356 (9/9) run 10: ret=0 (PASS), time=32.536344 (10/10) Results summary: 0: 10 times (100.00%), avg time 30.084 (0.86 varience/0.93 deviation) Ran command 10 times, 10 passes I think the variance and deviation calculations are correct now.The benchmark is run with my retry script: https://github.com/stsquad/retry The command line was: retry.py -l pigz.bench -g -n 10 -c -- ./arm-linux-user/qemu-arm \ ./pigz.armhf -c -9 linux-4.6.3.tar > /dev/null Alex Bennée (5): tcg: set up tb->page_addr before insertion tcg: cpu-exec: remove tb_lock from the hot-path tcg: cpu-exec: factor out TB patching code tcg: introduce tb_lock_recursive() tcg: cpu-exec: roll-up tb_find_fast/slow Sergey Fedorov (1): tcg: Ensure safe tb_jmp_cache lookup out of 'tb_lock' cpu-exec.c | 153 +++++++++++++++++++++++++++++++++----------------------- tcg/tcg.h | 1 + translate-all.c | 28 ++++++++--- 3 files changed, 114 insertions(+), 68 deletions(-) -- 2.7.4