Hi, all. I think the generated codes from qemu_ld/st IRs are relatively heavy, which are up to 12 instructions for TLB hit case on i386 host. This patch series enhances the code quality of TCG qemu_ld/st IRs by reducing jump and enhancing locality. Main idea is simple and has been already described in the comments in tcg-target.c, which separates slow path (TLB miss case), and generates it at the end of TB.
For example, the generated code from qemu_ld changes as follow. Before: (1) TLB check (2) If hit fall through, else jump to TLB miss case (5) (3) TLB hit case: Load value from host memory (4) Jump to next code (6) (5) TLB miss case: call MMU helper (6) ... (next code) After: (1) TLB check (2) If hit fall through, else jump to TLB miss case (7) (3) TLB hit case: Load value from host memory (4) ... (next code) ... (7) TLB miss case: call MMU helper (8) Return to next code (4) Following is some performance results which was measured based on qemu 1.0. Although there was measurement error, the results was not negligible. * EEMBC CoreMark (before -> after) - Guest: i386, Linux (Tizen platform) - Host: Intel Core2 Quad 2.4GHz, 2GB RAM, Linux - Results: 1135.6 -> 1179.9 (+3.9%) * nbench (before -> after) - Guest: i386, Linux (linux-0.2.img included in QEMU source) - Host: Intel Core2 Quad 2.4GHz, 2GB RAM, Linux - Results . MEMORY INDEX: 1.6782 -> 1.6818 (+0.2%) . INTEGER INDEX: 1.8258 -> 1.877 (+2.8%) . FLOATING-POINT INDEX: 0.5944 -> 0.5954 (+0.2%) Summarized feature is as following. - All the changes are wrapped by macro "CONFIG_QEMU_LDST_OPTIMIZATION" and disabled by default. - They are enabled by "configure --enable-ldst-optimization" and need CONFIG_SOFTMMU. - They do not work with CONFIG_TCG_PASS_AREG0 because it looks better apply them after areg0 codes come steady. - Currently, they support only x86 and x86-64 and have been tested with x86 and ARM linux targets on x86/x86-64 host platforms. - Build test has been done for all targets. In addition, I have tried to remove the generated codes of calling MMU helpers for TLB miss case from end of TB, however, have not found good solution yet. In my opinion, TLB hit case performance could be degraded if removing the calling codes, because it needs to set runtime parameters, such as, data, mmu index and return address, in register or stack though they are not used in TLB hit case. This remains as a further issue. Yeongkyoon Lee (4): tcg: add declarations and templates of extended MMU helpers tcg: add extended MMU helpers to targets tcg: add optimized TCG qemu_ld/st generation configure: add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization configure | 15 ++ softmmu_defs.h | 13 ++ softmmu_template.h | 51 +++++-- target-alpha/mem_helper.c | 22 +++ target-arm/op_helper.c | 23 +++ target-cris/op_helper.c | 22 +++ target-i386/mem_helper.c | 22 +++ target-lm32/op_helper.c | 23 +++- target-m68k/op_helper.c | 22 +++ target-microblaze/op_helper.c | 22 +++ target-mips/op_helper.c | 22 +++ target-ppc/mem_helper.c | 22 +++ target-s390x/op_helper.c | 22 +++ target-sh4/op_helper.c | 22 +++ target-sparc/ldst_helper.c | 23 +++ target-xtensa/op_helper.c | 22 +++ tcg/i386/tcg-target.c | 328 +++++++++++++++++++++++++++++++++++++++++ tcg/tcg.c | 12 ++ tcg/tcg.h | 35 +++++ 19 files changed, 732 insertions(+), 11 deletions(-) __________________________________ Principal Engineer VM Team Yeongkyoon Lee S-Core Co., Ltd. D.L.: +82-31-696-7249 M.P.: +82-10-9965-1265 __________________________________