I've been looking at this problem off and on for the last week or so, prompted by the sparc performance work. Although I havn't been able to get a proper sparc64 guest install working, I see the exact same problem with a mips guest.
On alpha or x86, which seem to perform well, perf numbers for the executable have about 30% of the execution time spent in cpu_exec. For mips, on the other hand, we spend about 30% of the time in routines related to tcg (re-)translation. Aurelien has a patch in his own branches that attempts to mitigate this on mips by shadow caching more tlb entries. While this does improve performace a bit, it employs a linear search through a large buffer, with the effect of 30-ish % perf numbers for r4k_map_address. (One could probably improve things by hashing the data in that array, rather than a linear search, but...) In the past we've talked about getting rid of retranslation entirely. It's clever, but it certainly has its share of problems. I gave it a go this weekend. The following isn't quite right. It fails to boot on sparc even with our tiny test kernel. It also triggers an abort on mips, eventually. But it's able to get all the way through to a prompt, and in the process I can see that perf results are quite different -- much more like results I see for alpha. Thoughts on the approach? r~ Richard Henderson (20): tcg: Rename debug_insn_start to insn_start target-*: Unconditionally emit tcg_gen_insn_start tcg: Allow extra data to be attached to insn_start target-arm: Add condexec state to insn_start target-i386: Add cc_op state to insn_start target-mips: Add delayed branch state to insn_start target-s390x: Add cc_op state to insn_start target-sh4: Add flags state to insn_start target-cris: Mirror gen_opc_pc into insn_start target-sparc: Tidy gen_branch_a interface target-sparc: Split out gen_branch_n target-sparc: Remove gen_opc_jump_pc target-sparc: Add npc state to insn_start tcg: Merge cpu_gen_code into tb_gen_code target-*: Drop cpu_gen_code define tcg: Add TCG_MAX_INSNS tcg: Pass data argument to restore_state_to_opc tcg: Save insn data and use it in cpu_restore_state_from_tb tcg: Remove gen_intermediate_code_pc tcg: Remove tcg_gen_code_search_pc include/exec/exec-all.h | 6 +- target-alpha/cpu.h | 1 - target-alpha/translate.c | 55 +++------- target-arm/cpu.h | 2 +- target-arm/translate-a64.c | 39 ++----- target-arm/translate.c | 75 ++++--------- target-arm/translate.h | 8 +- target-cris/cpu.h | 1 - target-cris/translate.c | 64 +++--------- target-cris/translate_v10.c | 3 - target-i386/cpu.h | 2 +- target-i386/translate.c | 86 ++++----------- target-lm32/cpu.h | 1 - target-lm32/translate.c | 55 ++-------- target-m68k/cpu.h | 1 - target-m68k/translate.c | 64 +++--------- target-microblaze/cpu.h | 1 - target-microblaze/translate.c | 56 +++------- target-mips/cpu.h | 2 +- target-mips/translate.c | 73 ++++--------- target-moxie/cpu.h | 1 - target-moxie/translate.c | 65 ++++-------- target-openrisc/cpu.h | 1 - target-openrisc/translate.c | 54 ++-------- target-ppc/cpu.h | 1 - target-ppc/translate.c | 56 +++------- target-s390x/cpu.h | 2 +- target-s390x/translate.c | 61 +++-------- target-sh4/cpu.h | 2 +- target-sh4/translate.c | 71 ++++--------- target-sparc/cpu.h | 2 +- target-sparc/translate.c | 189 ++++++++++++++------------------- target-tricore/translate.c | 53 ++++------ target-unicore32/translate.c | 57 +++------- target-xtensa/cpu.h | 1 - target-xtensa/translate.c | 52 ++------- tcg/tcg-op.h | 52 +++++++-- tcg/tcg-opc.h | 4 +- tcg/tcg.c | 96 ++++++++--------- tcg/tcg.h | 14 ++- tci.c | 9 -- translate-all.c | 237 ++++++++++++++++++++++++------------------ 42 files changed, 578 insertions(+), 1097 deletions(-) -- 2.4.3