On 30 Apr 20:21, David Hildenbrand wrote: > Sorry for not replying earlier, was busy with other stuff. I'll try getiing > that stuff into shape and send it out soonish.
No worries. Let me know what you think of the FOLL_FORCE patch when you have a sec. > > I went with using one write uprobe function with some additional > > branches. I went back and forth between that and making them 2 different > > functions. > > All the folio_test_hugetlb() special casing is a bit suboptimal. Likely we > want a separate variant, because we should be sing hugetlb PTE functions > consistently (e.g., huge_pte_uffd_wp() vs pte_uffd_wp(), softdirty does not > exist etc.) Ok, I'll give this a whirl and send something probably tomorrow. > > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c > > index 2f4e88552d3f..8a33e380f7ea 100644 > > --- a/fs/hugetlbfs/inode.c > > +++ b/fs/hugetlbfs/inode.c > > @@ -83,6 +83,10 @@ static const struct fs_parameter_spec > > hugetlb_fs_parameters[] = { > > {} > > }; > > +bool hugetlbfs_mapping(struct address_space *mapping) { > > + return mapping->a_ops == &hugetlbfs_aops; > > is_vm_hugetlb_page() might be what you are looking for. I use hugetlbfs_mapping() in __uprobe_register() which does an early return using the mapping only if it's not supported. There is no vma that I can get at this point (afaict). I could refactor so we check this when we have a vma but it looked cleaner to introduce it since there is already shmem_mapping() > > } > > -static void copy_from_page(struct page *page, unsigned long vaddr, void > > *dst, int len) > > +static void copy_from_page(struct page *page, unsigned long vaddr, void > > *dst, int len, unsigned long page_mask) > > { > > void *kaddr = kmap_atomic(page); > > - memcpy(dst, kaddr + (vaddr & ~PAGE_MASK), len); > > + memcpy(dst, kaddr + (vaddr & ~page_mask), len); > > kunmap_atomic(kaddr); > > } > > > -static void copy_to_page(struct page *page, unsigned long vaddr, const > > void *src, int len) > > +static void copy_to_page(struct page *page, unsigned long vaddr, const > > void *src, int len, unsigned long page_mask) > > { > > void *kaddr = kmap_atomic(page); > > - memcpy(kaddr + (vaddr & ~PAGE_MASK), src, len); > > + memcpy(kaddr + (vaddr & ~page_mask), src, len); > > kunmap_atomic(kaddr); > > } > > These two changes really are rather ugly ... > > An why are they even required? We get a PAGE_SIZED-based subpage of a > hugetlb page. We only kmap that one and copy within that one. > > In other words, I don't think the copy_from_page() and copy_to_page() > changes are even required when we consistently work on subpages and not > suddenly on head pages. The main reason is that the previous __replace_page worked directly on the full HP page so adjusting after gup seemed to make more sense to me. But now I guess it's not that useful (esp we're going with a different version of write_uprobe). I'll fix (...) > > { > > struct uwo_data *data = walk->private;; > > const bool is_register = !!is_swbp_insn(&data->opcode); > > @@ -415,9 +417,12 @@ static int __write_opcode_pte(pte_t *ptep, unsigned > > long vaddr, > > /* Unmap + flush the TLB, such that we can write atomically .*/ > > flush_cache_page(vma, vaddr, pte_pfn(pte)); > > - pte = ptep_clear_flush(vma, vaddr, ptep); > > + if (folio_test_hugetlb(folio)) > > + pte = huge_ptep_clear_flush(vma, vaddr, ptep); > > + else > > + pte = ptep_clear_flush(vma, vaddr, ptep); > > copy_to_page(page, data->opcode_vaddr, &data->opcode, > > - UPROBE_SWBP_INSN_SIZE); > > + UPROBE_SWBP_INSN_SIZE, page_mask); > > /* When unregistering, we may only zap a PTE if uffd is disabled ... */ > > if (is_register || userfaultfd_missing(vma)) > > @@ -443,13 +448,18 @@ static int __write_opcode_pte(pte_t *ptep, unsigned > > long vaddr, > > if (!identical || folio_maybe_dma_pinned(folio)) > > goto remap; > > - /* Zap it and try to reclaim swap space. */ > > - dec_mm_counter(mm, MM_ANONPAGES); > > - folio_remove_rmap_pte(folio, page, vma); > > - if (!folio_mapped(folio) && folio_test_swapcache(folio) && > > - folio_trylock(folio)) { > > - folio_free_swap(folio); > > - folio_unlock(folio); > > + if (folio_test_hugetlb(folio)) { > > + hugetlb_remove_rmap(folio); > > + large = false; > > + } else { > > + /* Zap it and try to reclaim swap space. */ > > + dec_mm_counter(mm, MM_ANONPAGES); > > + folio_remove_rmap_pte(folio, page, vma); > > + if (!folio_mapped(folio) && folio_test_swapcache(folio) && > > + folio_trylock(folio)) { > > + folio_free_swap(folio); > > + folio_unlock(folio); > > + } > > } > > folio_put(folio); > > @@ -461,11 +471,29 @@ static int __write_opcode_pte(pte_t *ptep, unsigned > > long vaddr, > > */ > > smp_wmb(); > > /* We modified the page. Make sure to mark the PTE dirty. */ > > - set_pte_at(mm, vaddr, ptep, pte_mkdirty(pte)); > > + if (folio_test_hugetlb(folio)) > > + set_huge_pte_at(mm , vaddr, ptep, huge_pte_mkdirty(pte), > > + (~page_mask) + 1); > > + else > > + set_pte_at(mm, vaddr, ptep, pte_mkdirty(pte)); > > return UWO_DONE; > > } > > +static int __write_opcode_hugetlb(pte_t *ptep, unsigned long hmask, > > + unsigned long vaddr, > > + unsigned long next, struct mm_walk *walk) > > +{ > > + return __write_opcode(ptep, vaddr, hmask, walk); > > +} > > + > > +static int __write_opcode_pte(pte_t *ptep, unsigned long vaddr, > > + unsigned long next, struct mm_walk *walk) > > +{ > > + return __write_opcode(ptep, vaddr, PAGE_MASK, walk); > > +} > > + > > static const struct mm_walk_ops write_opcode_ops = { > > + .hugetlb_entry = __write_opcode_hugetlb, > > .pte_entry = __write_opcode_pte, > > .walk_lock = PGWALK_WRLOCK, > > }; > > @@ -492,7 +520,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, > > struct vm_area_struct *vma, > > unsigned long opcode_vaddr, uprobe_opcode_t opcode) > > { > > struct uprobe *uprobe = container_of(auprobe, struct uprobe, arch); > > - const unsigned long vaddr = opcode_vaddr & PAGE_MASK; > > + unsigned long vaddr = opcode_vaddr & PAGE_MASK; > > const bool is_register = !!is_swbp_insn(&opcode); > > struct uwo_data data = { > > .opcode = opcode, > > @@ -503,6 +531,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, > > struct vm_area_struct *vma, > > struct mmu_notifier_range range; > > int ret, ref_ctr_updated = 0; > > struct page *page; > > + unsigned long page_size = PAGE_SIZE; > > if (WARN_ON_ONCE(!is_cow_mapping(vma->vm_flags))) > > return -EINVAL; > > @@ -521,7 +550,14 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, > > struct vm_area_struct *vma, > > if (ret != 1) > > goto out; > > - ret = verify_opcode(page, opcode_vaddr, &opcode); > > + > > + if (is_vm_hugetlb_page(vma)) { > > + struct hstate *h = hstate_vma(vma); > > + page_size = huge_page_size(h); > > + vaddr &= huge_page_mask(h); > > + page = compound_head(page); > > I think we should only adjust the range we pass to the mmu notifier and for > walking the VMA range. But we should not adjust vaddr. > > Further, we should not adjust the page if possible ... ideally, we'll treat > hugetlb folios just like large folios here and operate on subpages. > > Inside __write_opcode(), we can derive the the page of interest from > data->opcode_vaddr. Here you mean __write_opcode_hugetlb(), right? Since we're going with the 2 independent variants. Just want to 100% sure I am following > find_get_page() might need some though, if it won't return a subpage of a > hugetlb folio. Should be solvable by a wrapper, though. We can zero out the subbits with the huge page mask in the vaddr_to_offset() in the hugetlb variant like I do in __copy_insn() and that should work, no? Or you prefer a wrapper? Guillaume. -- Guillaume Morin <guilla...@morinfr.org>