Hi, all. Here is the 3rd version of the series optimizing TCG qemu_ld/st code generation.
v3: - Support CONFIG_TCG_PASS_AREG0 (expected to get more performance enhancement than others) - Remove the configure option "--enable-ldst-optimization"" - Make the optimization as default on i386 and x86_64 hosts - Fix some mistyping and apply checkpatch.pl before committing - Test i386, arm and sparc softmmu targets on i386 and x86_64 hosts - Test linux-user-test-0.3 v2: - Follow the submit rule of qemu v1: - Initial commit request I think the generated codes from qemu_ld/st IRs are relatively heavy, which are up to 12 instructions for TLB hit case on i386 host. This patch series enhance the code quality of TCG qemu_ld/st IRs by reducing jump and enhancing locality. Main idea is simple and has been already described in the comments in tcg-target.c, which separates slow path (TLB miss case), and generates it at the end of TB. For example, the generated code from qemu_ld changes as follow. Before: (1) TLB check (2) If hit fall through, else jump to TLB miss case (5) (3) TLB hit case: Load value from host memory (4) Jump to next code (6) (5) TLB miss case: call MMU helper (6) ... (next code) After: (1) TLB check (2) If hit fall through, else jump to TLB miss case (7) (3) TLB hit case: Load value from host memory (4) ... (next code) ... (7) TLB miss case: call MMU helper (8) Return to next code (4) Following is some performance results measured based on qemu 1.0. Although there was measurement error, the results was not negligible. * EEMBC CoreMark (before -> after) - Guest: i386, Linux (Tizen platform) - Host: Intel Core2 Quad 2.4GHz, 2GB RAM, Linux - Results: 1135.6 -> 1179.9 (+3.9%) * nbench (before -> after) - Guest: i386, Linux (linux-0.2.img included in QEMU source) - Host: Intel Core2 Quad 2.4GHz, 2GB RAM, Linux - Results . MEMORY INDEX: 1.6782 -> 1.6818 (+0.2%) . INTEGER INDEX: 1.8258 -> 1.877 (+2.8%) . FLOATING-POINT INDEX: 0.5944 -> 0.5954 (+0.2%) Summarized feature is as following. - The changes are wrapped by macro "CONFIG_QEMU_LDST_OPTIMIZATION" and they are enabled by default on i386/x86_64 hosts - Forced removal of the macro will cause compilation error on i386/x86_64 hosts - Support working with CONFIG_TCG_PASS_AREG0 In addition, I have tried to remove the generated codes of calling MMU helpers for TLB miss case from end of TB, however, have not found good solution yet. In my opinion, TLB hit case performance could be degraded if removing the calling codes, because it needs to set runtime parameters, such as, data, mmu index and return address, in register or stack though they are not used in TLB hit case. This remains as a further issue. Yeongkyoon Lee (3): configure: Add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization tcg: Add declarations and templates of extended MMU helpers tcg: Optimize qemu_ld/st by generating slow paths at the end of a block configure | 8 + softmmu_defs.h | 64 +++++++ softmmu_header.h | 31 ++++ softmmu_template.h | 52 +++++- tcg/i386/tcg-target.c | 475 +++++++++++++++++++++++++++++++------------------ tcg/tcg.c | 12 ++ tcg/tcg.h | 35 ++++ 7 files changed, 500 insertions(+), 177 deletions(-) -- 1.7.4.1