This patch series rework the liveness analysis and register allocator in order to generate more optimized code, by avoiding a lot of move instructions. I have measured a 9% performance improvement in user mode and 4% in system mode.
The idea behind this patch series is to free registers as soon as the temps are not used anymore instead of waiting for a basic block end or an op with side effects. In addition temps are copied to memory as soon as they are not going to be written anymore, this way even globals can be marked as "dead", avoiding moves to a new register when inputs and outputs are aliased. Finally qemu_ld/st operations do not save back globals to memory, but only copy them there. In case of an exception the globals have the correct values, and otherwise they do not have to be reloaded. Overall this greatly reduces the number of moves emitted, and spread them all over the TBs, increasing the performances on in-order CPUs. This also reduces register spilling, especially on CPUs with few registers. In practice it means the liveness analysis is providing more information to the register allocator, and especially when to the memory version of a temp with the content of the associated register. This means that the two are now quite linked, and that for some functions the code exist in two versions, one used when the liveness analysis is enabled which only does some checks with assert(), the other when it is disabled. It might be possible to keep only one version, but it implies de-optimizing the liveness analysis disabled case. In any case the checks with assert() should be kept, as they are quite useful to make sure nothing subtly breaks. Changes v2 -> v3: - rebased against master - patch 7: fixed indentation - patch 14 and later: renamed TCG_CALL_NO_RG into TCG_CALL_NO_RWG renamed TCG_CALL_NO_WGSE into TCG_CALL_NO_WG_SE renamed TCG_CALL_NO_RGSE TCG_CALL_NO_RWG_SE Aurelien Jarno (26): tcg: add temp_dead() tcg: add tcg_reg_sync() tcg: add temp_sync() tcg: sync output arguments on liveness request tcg: rework liveness analysis tcg: improve tcg_reg_alloc_movi() tcg: rewrite tcg_reg_alloc_mov() tcg: always mark dead input arguments as dead tcg: start with local temps in TEMP_VAL_MEM state tcg: don't explicitly save globals and temps tcg: fix some op flags tcg: forbid ld/st function to modify globals tcg: synchronize globals for ops with side effects tcg: rework TCG helper flags target-alpha: rename helper flags target-arm: rename helper flags target-cris: rename helper flags target-i386: rename helper flags target-microblaze: rename helper flags target-mips: rename helper flags target-ppc: rename helper flags target-s390x: rename helper flags target-sh4: rename helper flags target-sparc: rename helper flags target-xtensa: rename helper flags tcg: remove compatiblity call flags target-alpha/helper.h | 176 ++++++++--------- target-arm/helper.h | 18 +- target-cris/helper.h | 18 +- target-i386/helper.h | 4 +- target-microblaze/helper.h | 6 +- target-mips/helper.h | 106 +++++------ target-ppc/helper.h | 38 ++-- target-s390x/helper.h | 76 ++++---- target-sh4/helper.h | 6 +- target-sparc/helper.h | 50 ++--- target-xtensa/helper.h | 16 +- tcg/README | 22 ++- tcg/optimize.c | 3 +- tcg/tcg-op.h | 18 +- tcg/tcg-opc.h | 29 ++- tcg/tcg.c | 449 +++++++++++++++++++++++++++----------------- tcg/tcg.h | 29 ++- 17 files changed, 595 insertions(+), 469 deletions(-) -- 1.7.10.4