Hello, On Wed, Sep 14, 2016 at 11:56 AM, Paolo Bonzini <pbonz...@redhat.com> wrote: > Computing TranslationBlock flags is pretty expensive on ARM, especially > 32-bit. Because tbflags are computed on every tb lookup, it is not > unlikely to see cpu_get_tb_cpu_state close to the top of the profile > now that QHT makes the hash table much more efficient. > > However, most tbflags only change when the EL is switched or after > MSR instructions. Based on this observation, this series caches these > tbflags in CPUARMState, resulting in a 10-15% speedup on 32-bit code.
I like that patch! I quickly tested with some softmmu images on both AArch32 and AArch64 and I can confirm the speedup. As far as your patch goes: Tested-by: Laurent Desnogues <laurent.desnog...@gmail.com> Reviewed-by: Laurent Desnogues <laurent.desnog...@gmail.com> Thanks, Laurent PS - BTW, I couldn't run any user mode program since they segfault on mainline for some reason I have no time to look into. The v2.7.0 tag works. > Paolo > > Paolo Bonzini (3): > target-arm: introduce cpu_dynamic_tb_cpu_flags > target-arm: add env->tbflags > target-arm: cache most tbflags > > target-arm/cpu.c | 2 ++ > target-arm/cpu.h | 58 > ++++++++++++++++++++++++++++++++-------------- > target-arm/helper.c | 2 ++ > target-arm/helper.h | 1 + > target-arm/op_helper.c | 7 ++++++ > target-arm/translate-a64.c | 4 ++++ > target-arm/translate.c | 12 ++++++++-- > target-arm/translate.h | 1 + > 8 files changed, 68 insertions(+), 19 deletions(-) > > -- > 2.7.4 > >