On 3/12/20 3:10 PM, Alistair Francis wrote: >> I still think this must be a guest (or nested guest) bug related to clearing >> PTE bits and failing to flush the TLB properly. > > It think so as well now. I have changed the Linux guest and Hypervisor > to be very aggressive with flushing but still can't get guest user > space working. I'll keep digging and see if I can figure out what's > going on. > >> >> I don't see how it could be a qemu tlb flushing bug. The only primitive, >> sfence.vma, is quite heavy-handed and explicitly local to the thread. > > Yes, both sfence and hfence flush all TLBs, so that doesn't seem to be > the problem.
Here's an idea: change the tlb_flush() calls to tlb_flush_all_cpus_synced(). If that works, it suggests a guest interprocessor interrupt bug in the tlb shoot-down. r~