Re: [RFC PATCH v2 0/2] mm: fix races due to deferred TLB flushes

2021-03-02 Thread Peter Xu
On Fri, Dec 25, 2020 at 01:25:27AM -0800, Nadav Amit wrote:
> From: Nadav Amit 
> 
> This patch-set went from v1 to RFCv2, as there is still an ongoing
> discussion regarding the way of solving the recently found races due to
> deferred TLB flushes. These patches are only sent for reference for now,
> and can be applied later if no better solution is taken.
> 
> In a nutshell, write-protecting PTEs with deferred TLB flushes was mostly
> performed while holding mmap_lock for write. This prevented concurrent
> page-fault handler invocations from mistakenly assuming that a page is
> write-protected when in fact, due to the deferred TLB flush, other CPU
> could still write to the page. Such a write can cause a memory
> corruption if it takes place after the page was copied (in
> cow_user_page()), and before the PTE was flushed (by wp_page_copy()).
> 
> However, the userfaultfd and soft-dirty mechanisms did not take
> mmap_lock for write, but only for read, which made such races possible.
> Since commit 09854ba94c6a ("mm: do_wp_page() simplification") these
> races became more likely to take place as non-COW'd pages are more
> likely to be COW'd instead of being reused. Both of the races that
> these patches are intended to resolve were produced on v5.10.
> 
> To avoid the performance overhead some alternative solutions that do not
> require to acquire mmap_lock for write were proposed, specifically for
> userfaultfd. So far no better solution that can be backported was
> proposed for the soft-dirty case.
> 
> v1->RFCv2:
> - Better (i.e., correct) description of the userfaultfd buggy case [Yu]
> - Patch for the soft-dirty case

Nadav,

Do you plan to post a new version to fix the tlb corrupt issue that this series
wanted to solve?

Thanks,

-- 
Peter Xu



Re: [RFC PATCH v2 0/2] mm: fix races due to deferred TLB flushes

2021-03-02 Thread Nadav Amit


> On Mar 2, 2021, at 2:13 PM, Peter Xu  wrote:
> 
> On Fri, Dec 25, 2020 at 01:25:27AM -0800, Nadav Amit wrote:
>> From: Nadav Amit 
>> 
>> This patch-set went from v1 to RFCv2, as there is still an ongoing
>> discussion regarding the way of solving the recently found races due to
>> deferred TLB flushes. These patches are only sent for reference for now,
>> and can be applied later if no better solution is taken.
>> 
>> In a nutshell, write-protecting PTEs with deferred TLB flushes was mostly
>> performed while holding mmap_lock for write. This prevented concurrent
>> page-fault handler invocations from mistakenly assuming that a page is
>> write-protected when in fact, due to the deferred TLB flush, other CPU
>> could still write to the page. Such a write can cause a memory
>> corruption if it takes place after the page was copied (in
>> cow_user_page()), and before the PTE was flushed (by wp_page_copy()).
>> 
>> However, the userfaultfd and soft-dirty mechanisms did not take
>> mmap_lock for write, but only for read, which made such races possible.
>> Since commit 09854ba94c6a ("mm: do_wp_page() simplification") these
>> races became more likely to take place as non-COW'd pages are more
>> likely to be COW'd instead of being reused. Both of the races that
>> these patches are intended to resolve were produced on v5.10.
>> 
>> To avoid the performance overhead some alternative solutions that do not
>> require to acquire mmap_lock for write were proposed, specifically for
>> userfaultfd. So far no better solution that can be backported was
>> proposed for the soft-dirty case.
>> 
>> v1->RFCv2:
>> - Better (i.e., correct) description of the userfaultfd buggy case [Yu]
>> - Patch for the soft-dirty case
> 
> Nadav,
> 
> Do you plan to post a new version to fix the tlb corrupt issue that this 
> series
> wanted to solve?

Yes, yes. Sorry for that. Will do so later today.

Regards,
Nadav


signature.asc
Description: Message signed with OpenPGP