On Wed, Aug 02, 2017 at 10:45:51AM +0200, Peter Zijlstra wrote: > On Wed, Aug 02, 2017 at 09:15:23AM +0100, Will Deacon wrote: > > On Wed, Aug 02, 2017 at 10:11:06AM +0200, Peter Zijlstra wrote: > > > > arm64 looks good too, although it plays silly games with the first > > > barrier, but I trust that to be sufficient. > > > > The first barrier only orders prior stores for us, because page table > > updates are made using stores. A prior load could be reordered past the > > invalidation, but can't make it past the second barrier. > > So then you rely on the program not having any loads pending to the > address you're about to invalidate, right? Otherwise we can do the TLBI > and then the load to insta-repopulate the TLB entry you just wanted > dead. > > That later DSB ISH is too late for that. > > Isn't that somewhat fragile?
We only initiate the TLB invalidation after the page table update is observable to the page table walker, so any repopulation will cause a fill using the new page table entry. > > I really think we should avoid defining TLB invalidation in terms of > > smp_mb() because it's a lot more subtle than that. > > I'm tempted to say stronger, smp_mb() only provides order, we want full > serialization. Everything before stays before and _completes_ before. > Everything after happens after (if the primitives actually do something > at all of course, sparc64 for instance has no-op flush_tlb*). > > While such semantics might be slightly too strong for what we currently > need, it is what powerpc, x86 and arm currently implement and are fairly > easy to reason about. If we weaken it, stuff gets confusing again. My problem with this is that we're strengthening the semantics for no actual use-case, but at the same time this will have a real performance impact. Will