On Tue, Nov 10, 2020 at 07:44:09PM -0400, Jason Gunthorpe wrote: > Since commit 70e806e4e645 ("mm: Do early cow for pinned pages during > fork() for ptes") pages under a FOLL_PIN will not be write protected > during COW for fork. This means that pages returned from > pin_user_pages(FOLL_WRITE) should not become write protected while the pin > is active. > > However, there is a small race where get_user_pages_fast(FOLL_PIN) can > establish a FOLL_PIN at the same time copy_present_page() is write > protecting it: > > CPU 0 CPU 1 > get_user_pages_fast() > internal_get_user_pages_fast() > copy_page_range() > pte_alloc_map_lock() > copy_present_page() > atomic_read(has_pinned) == 0 > page_maybe_dma_pinned() == false > atomic_set(has_pinned, 1); > gup_pgd_range() > gup_pte_range() > pte_t pte = gup_get_pte(ptep) > pte_access_permitted(pte) > try_grab_compound_head() > pte = pte_wrprotect(pte) > set_pte_at(); > pte_unmap_unlock() > // GUP now returns with a write protected page > > The first attempt to resolve this by using the write protect caused > problems (and was missing a barrrier), see commit f3c64eda3e50 ("mm: avoid > early COW write protect games during fork()") > > Instead wrap copy_p4d_range() with the write side of a seqcount and check > the read side around gup_pgd_range(). If there is a collision then > get_user_pages_fast() fails and falls back to slow GUP. > > Slow GUP is safe against this race because copy_page_range() is only > called while holding the exclusive side of the mmap_lock on the src > mm_struct. > > Fixes: f3c64eda3e50 ("mm: avoid early COW write protect games during fork()") > Suggested-by: Linus Torvalds <torva...@linux-foundation.org> > Link: > https://lore.kernel.org/r/CAHk-=wi=icnycarbpgjkvju9eyyez13n64tzyldob8cp5q_...@mail.gmail.com > Reviewed-by: John Hubbard <jhubb...@nvidia.com> > Reviewed-by: Jan Kara <j...@suse.cz> > Signed-off-by: Jason Gunthorpe <j...@nvidia.com>
Reviewed-by: Peter Xu <pet...@redhat.com> -- Peter Xu