* Jeff Xu <jef...@chromium.org> [240821 12:33]: > On Wed, Aug 21, 2024 at 9:24 AM Pedro Falcato <pedro.falc...@gmail.com> wrote: > > > > On Wed, Aug 21, 2024 at 5:16 PM Jeff Xu <jef...@chromium.org> wrote: > > > > > > On Fri, Aug 16, 2024 at 5:18 PM Pedro Falcato <pedro.falc...@gmail.com> > > > wrote: > > > > > > > > We were doing an extra mmap tree traversal just to check if the entire > > > > range is modifiable. This can be done when we iterate through the VMAs > > > > instead. > > > > > > > > Signed-off-by: Pedro Falcato <pedro.falc...@gmail.com> > > > > --- > > > > mm/mmap.c | 11 +---------- > > > > mm/vma.c | 19 ++++++++++++------- > > > > 2 files changed, 13 insertions(+), 17 deletions(-) > > > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > > index 3af256bacef3..30ae4cb5cec9 100644 > > > > --- a/mm/mmap.c > > > > +++ b/mm/mmap.c > > > > @@ -1740,16 +1740,7 @@ int do_vma_munmap(struct vma_iterator *vmi, > > > > struct vm_area_struct *vma, > > > > unsigned long start, unsigned long end, struct > > > > list_head *uf, > > > > bool unlock) > > > > { > > > > - struct mm_struct *mm = vma->vm_mm; > > > > - > > > > - /* > > > > - * Check if memory is sealed, prevent unmapping a sealed VMA. > > > > - * can_modify_mm assumes we have acquired the lock on MM. > > > > - */ > > > > - if (unlikely(!can_modify_mm(mm, start, end))) > > > > - return -EPERM; > > > Another approach to improve perf is to clone the vmi (since it > > > already point to the first vma), and pass the cloned vmi/vma into > > > can_modify_mm check, that will remove the cost of re-finding the first > > > VMA. > > > > > > The can_modify_mm then continues from cloned VMI/vma till the end of > > > address range, there will be some perf cost there. However, most > > > address ranges in the real world are within a single VMA, in > > > practice, the perf cost is the same as checking the single VMA, 99.9% > > > case. > > > > > > This will help preserve the nice sealing feature (if one of the vma is > > > sealed, the entire address range is not modified) > > > > Please drop it. No one wants to preserve this. Everyone is in sync > > when it comes to the solution except you. > > Still, this is another option that will very likely address the perf issue.
The cost of cloning the vmi is a memory copy, while the cost of not cloning the vmi is a re-walk of the tree. Neither of these are free. Both can be avoided by checking the vma flag during the existing loop, which is what is done in this patch set. This is obviously lower cost of either of the above options since they do extra work and still have to check the vma flag(s). I think you are confusing the behaviour of the munmap() call when you state 'partial success' with a potential split operation that may happen prior to aborting a munmap() call. What could happen in the failure scenario is that you'd end up with two vmas instead of one mapping a particular area - but the mseal flag is checked prior to allowing a split to happen, so it'll only split non-mseal()'ed vmas. So what mseal() used to avoid is a call that could potentially split a vma that isn't mseal()'ed, while this patch will allow it to be split. This is how it has been for a very long time and it's okay. Considering how this affects the security model of mseal(), it means the attacker could still accomplish the same feat of splitting that first vma by altering the call to munmap() to avoid an mseal()'ed vma, so there isn't much lost or gained here security wise - if any. Thanks, Liam