On Mon, Sep 26, 2016 at 12:04 PM, Laurent Desnogues <laurent.desnog...@gmail.com> wrote: > Hello, > > On Wed, Sep 14, 2016 at 11:56 AM, Paolo Bonzini <pbonz...@redhat.com> wrote: >> Computing TranslationBlock flags is pretty expensive on ARM, especially >> 32-bit. Because tbflags are computed on every tb lookup, it is not >> unlikely to see cpu_get_tb_cpu_state close to the top of the profile >> now that QHT makes the hash table much more efficient. >> >> However, most tbflags only change when the EL is switched or after >> MSR instructions. Based on this observation, this series caches these >> tbflags in CPUARMState, resulting in a 10-15% speedup on 32-bit code. > > I like that patch! > > I quickly tested with some softmmu images on both AArch32 and AArch64 > and I can confirm the speedup. > > As far as your patch goes: > > Tested-by: Laurent Desnogues <laurent.desnog...@gmail.com> > Reviewed-by: Laurent Desnogues <laurent.desnog...@gmail.com> > > Thanks, > > Laurent > > PS - BTW, I couldn't run any user mode program since they segfault on > mainline for some reason I have no time to look into. The v2.7.0 tag > works.
It turned out this was a mistake on my side. I ran one SPEC2k test with the patch in user mode, and got a few percent improvements for both AArch32 and AArch64. Thanks, Laurent > >> Paolo >> >> Paolo Bonzini (3): >> target-arm: introduce cpu_dynamic_tb_cpu_flags >> target-arm: add env->tbflags >> target-arm: cache most tbflags >> >> target-arm/cpu.c | 2 ++ >> target-arm/cpu.h | 58 >> ++++++++++++++++++++++++++++++++-------------- >> target-arm/helper.c | 2 ++ >> target-arm/helper.h | 1 + >> target-arm/op_helper.c | 7 ++++++ >> target-arm/translate-a64.c | 4 ++++ >> target-arm/translate.c | 12 ++++++++-- >> target-arm/translate.h | 1 + >> 8 files changed, 68 insertions(+), 19 deletions(-) >> >> -- >> 2.7.4 >> >>