After reading this paper [1], I wondered whether how far one could push the idea of dynamic TLB resizing. We discussed it briefly in this thread:
https://lists.gnu.org/archive/html/qemu-devel/2018-09/msg02340.html Since then, (1) rth helped me (thanks!) with TCG backend code, and (2) I've abandoned the idea of substituting malloc for memset, and instead focused on dynamically resizing the TLBs. The rationale is that if a process touches a lot of memory, having a large TLB will pay off, since the perf gains will dwarf the increased cost of flushing via memset. This series shows that the indirection necessary to do this does not cause a perf decrease, at least for x86_64 hosts. This series is incomplete, since it only implements changes to the i386 backend, and it probably only works on x86_64. But the whole point is to (1) see whether the performance gains are worth it, and (2) discuss how crazy this approach is. I was looking for things to break badly, but so far I've found no obvious issues. But there might be some assumptions about the TLB size baked in the code that I might have missed, so please point those out if they exist. Performance numbers are in the last patch. You can fetch this series from: https://github.com/cota/qemu/tree/tlb-dyn Note that it applies on top of my tlb-lock-v3 series: https://lists.gnu.org/archive/html/qemu-devel/2018-10/msg01087.html Thanks, Emilio [1] "Optimizing Memory Translation Emulation in Full System Emulators", Tong et al, TACO'15 https://dl.acm.org/citation.cfm?id=2686034