This is not a clean patch, but does fix a problem I hit with TB invalidation due to the target software writing to memory with TBs.
Lockup messages are triggering in Linux due to page clearing taking a long time when a code page has been freed, because it takes a lot of notdirty notifiers, which massively slows things down. Linux might possibly have a bug here too because it seems to hang indefinitely in some cases, but even if it didn't, the latency of clearing these pages is very high. This showed when running KVM on the emulated machine, starting and stopping guests. That causes lots of instruction pages to be freed. Usually if you're just running Linux, executable pages remain in pagecache so you get fewer of these bombs in the kernel memory allocator. But page reclaim, JITs, deleting executable files, etc., could trigger it too. Invalidating all TBs from the page on any hit seems to avoid the problem and generally speeds things up. How important is the precise invalidation? These days I assume the tricky kind of SMC that frequently writes code close to where it's executing is pretty rare and might not be something we really care about for performance. Could we remove sub-page TB invalidation entirely? Thanks, Nick --- accel/tcg/tb-maint.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c index cc0f5afd47..d9a76b1665 100644 --- a/accel/tcg/tb-maint.c +++ b/accel/tcg/tb-maint.c @@ -1107,6 +1107,9 @@ tb_invalidate_phys_page_range__locked(struct page_collection *pages, TranslationBlock *current_tb = retaddr ? tcg_tb_lookup(retaddr) : NULL; #endif /* TARGET_HAS_PRECISE_SMC */ + start &= TARGET_PAGE_MASK; + last |= ~TARGET_PAGE_MASK; + /* Range may not cross a page. */ tcg_debug_assert(((start ^ last) & TARGET_PAGE_MASK) == 0); -- 2.45.2