On Mon, Sep 21, 2020 at 11:41:16PM -0700, John Hubbard wrote: > On 9/21/20 2:20 PM, Peter Xu wrote: > ... > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index 7ff29cc3d55c..c40aac0ad87e 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -1074,6 +1074,23 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct > > mm_struct *src_mm, > > src_page = pmd_page(pmd); > > VM_BUG_ON_PAGE(!PageHead(src_page), src_page); > > + > > + /* > > + * If this page is a potentially pinned page, split and retry the fault > > + * with smaller page size. Normally this should not happen because the > > + * userspace should use MADV_DONTFORK upon pinned regions. This is a > > + * best effort that the pinned pages won't be replaced by another > > + * random page during the coming copy-on-write. > > + */ > > + if (unlikely(READ_ONCE(src_mm->has_pinned) && > > + page_maybe_dma_pinned(src_page))) {
[...] > > + pte_free(dst_mm, pgtable); > > + spin_unlock(src_ptl); > > + spin_unlock(dst_ptl); > > + __split_huge_pmd(vma, src_pmd, addr, false, NULL); > > + return -EAGAIN; > > + } > > > Why wait until we are so deep into this routine to detect this and unwind? > It seems like if you could do a check near the beginning of this routine, and > handle it there, with less unwinding? In fact, after taking only the src_ptl, > the check could be made, right? Because that's where we've fetched the page from the pmd so I can directly reference src_page. Also I think at least I need to check against swp entries? So it seems still easier to keep it here, considering it's an unlikely path. Thanks, -- Peter Xu