On Thu, May 30, 2019 at 11:37 PM Nadav Amit <na...@vmware.com> wrote: > > When we flush userspace mappings, we can defer the TLB flushes, as long > the following conditions are met: > > 1. No tables are freed, since otherwise speculative page walks might > cause machine-checks. > > 2. No one would access userspace before flush takes place. Specifically, > NMI handlers and kprobes would avoid accessing userspace. >
I think I need to ask the big picture question. When someone calls flush_tlb_mm_range() (or the other entry points), if no page tables were freed, they want the guarantee that future accesses (initiated observably after the flush returns) will not use paging entries that were replaced by stores ordered before flush_tlb_mm_range(). We also need the guarantee that any effects from any memory access using the old paging entries will become globally visible before flush_tlb_mm_range(). I'm wondering if receipt of an IPI is enough to guarantee any of this. If CPU 1 sets a dirty bit and CPU 2 writes to the APIC to send an IPI to CPU 1, at what point is CPU 2 guaranteed to be able to observe the dirty bit? An interrupt entry today is fully serializing by the time it finishes, but interrupt entries are epicly slow, and I don't know if the APIC waits long enough. Heck, what if IRQs are off on the remote CPU? There are a handful of places where we touch user memory with IRQs off, and it's (sadly) possible for user code to turn off IRQs with iopl(). I *think* that Intel has stated recently that SMT siblings are guaranteed to stop speculating when you write to the APIC ICR to poke them, but SMT is very special. My general conclusion is that I think the code needs to document what is guaranteed and why. --Andy