On 01/02/2021 05:58, Nadav Amit wrote: >> On Jan 31, 2021, at 4:10 AM, Andrew Cooper <andrew.coop...@citrix.com> wrote: >> >> On 31/01/2021 01:07, Andy Lutomirski wrote: >>> Adding Andrew Cooper, who has a distressingly extensive understanding >>> of the x86 PTE magic. >> Pretty sure it is all learning things the hard way... >> >>> On Sat, Jan 30, 2021 at 4:16 PM Nadav Amit <nadav.a...@gmail.com> wrote: >>>> diff --git a/mm/mprotect.c b/mm/mprotect.c >>>> index 632d5a677d3f..b7473d2c9a1f 100644 >>>> --- a/mm/mprotect.c >>>> +++ b/mm/mprotect.c >>>> @@ -139,7 +139,8 @@ static unsigned long change_pte_range(struct >>>> mmu_gather *tlb, >>>> ptent = pte_mkwrite(ptent); >>>> } >>>> ptep_modify_prot_commit(vma, addr, pte, oldpte, >>>> ptent); >>>> - tlb_flush_pte_range(tlb, addr, PAGE_SIZE); >>>> + if (pte_may_need_flush(oldpte, ptent)) >>>> + tlb_flush_pte_range(tlb, addr, PAGE_SIZE); >> You're choosing to avoid the flush, based on A/D bits read ahead of the >> actual modification of the PTE. >> >> In this example, another thread can write into the range (sets A and D), >> and get a suitable TLB entry which goes unflushed while the rest of the >> kernel thinks the memory is write-protected and clean. >> >> The only safe way to do this is to use XCHG/etc to modify the PTE, and >> base flush calculations on the results. Atomic operations are ordered >> with A/D updates from pagewalks on other CPUs, even on AMD where A >> updates are explicitly not ordered with regular memory reads, for >> performance reasons. > Thanks Andrew for the feedback, but I think the patch does it exactly in > this safe manner that you describe (at least on native x86, but I see a > similar path elsewhere as well): > > oldpte = ptep_modify_prot_start() > -> __ptep_modify_prot_start() > -> ptep_get_and_clear > -> native_ptep_get_and_clear() > -> xchg() > > Note that the xchg() will clear the PTE (i.e., making it non-present), and > no further updates of A/D are possible until ptep_modify_prot_commit() is > called. > > On non-SMP setups this is not atomic (no xchg), but since we hold the lock, > we should be safe. > > I guess you are right and a pte_may_need_flush() deserves a comment to > clarify that oldpte must be obtained by an atomic operation to ensure no A/D > bits are lost (as you say). > > Yet, I do not see a correctness problem. Am I missing something?
No(ish) - I failed to spot that path. But native_ptep_get_and_clear() is broken on !SMP builds. It needs to be an XCHG even in that case, to spot A/D updates from prefetch or shared-virtual-memory DMA. ~Andrew