On 25 Apr 21:56, David Hildenbrand wrote: > > On 25.04.24 17:19, Guillaume Morin wrote: > > On 24 Apr 23:00, David Hildenbrand wrote: > > > > One issue here is that FOLL_FORCE|FOLL_WRITE is not implemented for > > > > hugetlb mappings. However this was also on my TODO and I have a draft > > > > patch that implements it. > > > > > > Yes, I documented it back then and added sanity checks in GUP code to > > > fence > > > it off. Shouldn't be too hard to implement (famous last words) and would > > > be > > > the cleaner thing to use here once I manage to switch over to > > > FOLL_WRITE|FOLL_FORCE to break COW. > > > > Yes, my patch seems to be working. The hugetlb code is pretty simple. > > And it allows ptrace and the proc pid mem file to work on the executable > > private hugetlb mappings. > > > > There is one thing I am unclear about though. hugetlb enforces that > > huge_pte_write() is true on FOLL_WRITE in both the fault and > > follow_page_mask paths. I am not sure if we can simply assume in the > > hugetlb code that if the pte is not writable and this is a write fault > > then we're in the FOLL_FORCE|FOLL_WRITE case. Or do we want to keep the > > checks simply not enforce it for FOLL_FORCE|FOLL_WRITE? > > > > The latter is more complicated in the fault path because there is no > > FAULT_FLAG_FORCE flag. > > > > I just pushed something to > https://github.com/davidhildenbrand/linux/tree/uprobes_cow > > Only very lightly tested so far. Expect the worst :)
I'll try it out and send you the hugetlb bits > > I still detest having the zapping logic there, but to get it all right I > don't see a clean way around that. > > > For hugetlb, we'd primarily have to implement the > mm_walk_ops->hugetlb_entry() callback (well, and FOLL_FORCE). For FOLL_FORCE, heer is my draft. Let me know if this is what you had in mind. diff --git a/mm/gup.c b/mm/gup.c index 1611e73b1121..ac60e0ae64e8 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1056,9 +1056,6 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) if (!(vm_flags & VM_WRITE) || (vm_flags & VM_SHADOW_STACK)) { if (!(gup_flags & FOLL_FORCE)) return -EFAULT; - /* hugetlb does not support FOLL_FORCE|FOLL_WRITE. */ - if (is_vm_hugetlb_page(vma)) - return -EFAULT; /* * We used to let the write,force case do COW in a * VM_MAYWRITE VM_SHARED !VM_WRITE vma, so ptrace could diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 3548eae42cf9..73f86eddf888 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5941,7 +5941,8 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, struct folio *pagecache_folio, spinlock_t *ptl, struct vm_fault *vmf) { - const bool unshare = flags & FAULT_FLAG_UNSHARE; + const bool make_writable = !(flags & FAULT_FLAG_UNSHARE) && + (vma->vm_flags & VM_WRITE); pte_t pte = huge_ptep_get(ptep); struct hstate *h = hstate_vma(vma); struct folio *old_folio; @@ -5959,16 +5960,9 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, * can trigger this, because hugetlb_fault() will always resolve * uffd-wp bit first. */ - if (!unshare && huge_pte_uffd_wp(pte)) + if (make_writable && huge_pte_uffd_wp(pte)) return 0; - /* - * hugetlb does not support FOLL_FORCE-style write faults that keep the - * PTE mapped R/O such as maybe_mkwrite() would do. - */ - if (WARN_ON_ONCE(!unshare && !(vma->vm_flags & VM_WRITE))) - return VM_FAULT_SIGSEGV; - /* Let's take out MAP_SHARED mappings first. */ if (vma->vm_flags & VM_MAYSHARE) { set_huge_ptep_writable(vma, haddr, ptep); @@ -5989,7 +5983,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, folio_move_anon_rmap(old_folio, vma); SetPageAnonExclusive(&old_folio->page); } - if (likely(!unshare)) + if (likely(make_writable)) set_huge_ptep_writable(vma, haddr, ptep); delayacct_wpcopy_end(); @@ -6094,7 +6088,8 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, spin_lock(ptl); ptep = hugetlb_walk(vma, haddr, huge_page_size(h)); if (likely(ptep && pte_same(huge_ptep_get(ptep), pte))) { - pte_t newpte = make_huge_pte(vma, &new_folio->page, !unshare); + pte_t newpte = make_huge_pte(vma, &new_folio->page, + make_writable); /* Break COW or unshare */ huge_ptep_clear_flush(vma, haddr, ptep); @@ -6883,6 +6878,17 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, } #endif /* CONFIG_USERFAULTFD */ +static bool is_force_follow(struct vm_area_struct* vma, unsigned int flags, + struct page* page) { + if (vma->vm_flags & VM_WRITE) + return false; + + if (!(flags & FOLL_FORCE)) + return false; + + return page && PageAnon(page) && page_mapcount(page) == 1; +} + struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, unsigned long address, unsigned int flags, unsigned int *page_mask) @@ -6907,11 +6913,11 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, if (!huge_pte_write(entry)) { if (flags & FOLL_WRITE) { - page = NULL; - goto out; - } - - if (gup_must_unshare(vma, flags, page)) { + if (!is_force_follow(vma, flags, page)) { + page = NULL; + goto out; + } + } else if (gup_must_unshare(vma, flags, page)) { /* Tell the caller to do unsharing */ page = ERR_PTR(-EMLINK); goto out; > > Likely vaddr and PAGE_SIZE in uprobe_write_opcode() would have to be > expanded to cover the full hugetlb page. > > -- > Cheers, > > David / dhildenb > -- Guillaume Morin <guilla...@morinfr.org>