On 11/28/2013 04:12 AM, Richard Henderson wrote:
2. why not use a TLB or bigger size?  currently the TLB has 1<<8 entries. the
TLB lookup is 10 x86 instructions , but every miss needs ~450 instructions, i
measured this using Intel PIN. so even the miss rate is low (say 3%) the
overall time spent in the cpu_x86_handle_mmu_fault is still signifcant.
I'd be interested to experiment with different TLB sizes, to see what effect
that has on performance.  But I suspect that lack of TLB contexts mean that we
wind up flushing the TLB more often than real hardware does, and therefore a
larger TLB merely takes longer to flush.



You could use a generation counter to flush the TLB in O(1) by incrementing the counter. That slows down the fast path though. Maybe you can do that for the larger second level TLB only.

Reply via email to