The motivation here is reducing the total overhead.

Before a few patches went into target-arm.next, I measured total
tlb flush overhead for aarch64 at 25%.  This appears to reduce the
total overhead to about 5% (I do need to re-run the control tests,
not just watch perf top as I'm doing now).

The final patch is somewhat of an RFC.  I'd like to know what
benchmark was used when putting in pending_tlb_flushes, and I
have not done any archaeology to find out.  I suspect that it
does make any measurable difference beyond tlb_c.dirty, and I
think the code is a bit cleaner without it.


r~


Richard Henderson (10):
  cputlb: Move tlb_lock to CPUTLBCommon
  cputlb: Remove tcg_enabled hack from tlb_flush_nocheck
  cputlb: Move cpu->pending_tlb_flush to env->tlb_c.pending_flush
  cputlb: Split large page tracking per mmu_idx
  cputlb: Move env->vtlb_index to env->tlb_d.vindex
  cputlb: Merge tlb_flush_nocheck into tlb_flush_by_mmuidx_async_work
  cputlb: Merge tlb_flush_page into tlb_flush_page_by_mmuidx
  cputlb: Count "partial" and "elided" tlb flushes
  cputlb: Filter flushes on already clean tlbs
  cputlb: Remove tlb_c.pending_flushes

 include/exec/cpu-defs.h   |  51 +++++-
 include/exec/cputlb.h     |   2 +-
 include/qom/cpu.h         |   6 -
 accel/tcg/cputlb.c        | 354 +++++++++++++++-----------------------
 accel/tcg/translate-all.c |   8 +-
 5 files changed, 184 insertions(+), 237 deletions(-)

-- 
2.17.2


Reply via email to