Re: [PATCH 09/41] mm: rcu safe VMA freeing

2023-01-17 Thread Suren Baghdasaryan
On Tue, Jan 17, 2023 at 6:25 AM Michal Hocko  wrote:
>
> On Mon 09-01-23 12:53:04, Suren Baghdasaryan wrote:
> [...]
> >  void vm_area_free(struct vm_area_struct *vma)
> >  {
> >   free_anon_vma_name(vma);
> > +#ifdef CONFIG_PER_VMA_LOCK
> > + call_rcu(>vm_rcu, __vm_area_free);
> > +#else
> >   kmem_cache_free(vm_area_cachep, vma);
> > +#endif
>
> Is it safe to have vma with already freed vma_name? I suspect this is
> safe because of mmap_lock but is there any reason to split the freeing
> process and have this potential UAF lurking?

It should be safe because VMA is either locked or has been isolated
while locked, so no page fault handlers should have access to it. But
you are right, moving free_anon_vma_name() into __vm_area_free() does
seem safer. Will make the change in the next rev.

>
> >  }
> >
> >  static void account_kernel_stack(struct task_struct *tsk, int account)
> > --
> > 2.39.0
>
> --
> Michal Hocko
> SUSE Labs


Re: [PATCH 09/41] mm: rcu safe VMA freeing

2023-01-17 Thread Michal Hocko
On Mon 09-01-23 12:53:04, Suren Baghdasaryan wrote:
[...]
>  void vm_area_free(struct vm_area_struct *vma)
>  {
>   free_anon_vma_name(vma);
> +#ifdef CONFIG_PER_VMA_LOCK
> + call_rcu(>vm_rcu, __vm_area_free);
> +#else
>   kmem_cache_free(vm_area_cachep, vma);
> +#endif

Is it safe to have vma with already freed vma_name? I suspect this is
safe because of mmap_lock but is there any reason to split the freeing
process and have this potential UAF lurking?

>  }
>  
>  static void account_kernel_stack(struct task_struct *tsk, int account)
> -- 
> 2.39.0

-- 
Michal Hocko
SUSE Labs


[PATCH 09/41] mm: rcu safe VMA freeing

2023-01-09 Thread Suren Baghdasaryan
From: Michel Lespinasse 

This prepares for page faults handling under VMA lock, looking up VMAs
under protection of an rcu read lock, instead of the usual mmap read lock.

Signed-off-by: Michel Lespinasse 
Signed-off-by: Suren Baghdasaryan 
---
 include/linux/mm_types.h | 13 ++---
 kernel/fork.c| 13 +
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 4b6bce73fbb4..d5cdec1314fe 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -535,9 +535,16 @@ struct anon_vma_name {
 struct vm_area_struct {
/* The first cache line has the info for VMA tree walking. */
 
-   unsigned long vm_start; /* Our start address within vm_mm. */
-   unsigned long vm_end;   /* The first byte after our end address
-  within vm_mm. */
+   union {
+   struct {
+   /* VMA covers [vm_start; vm_end) addresses within mm */
+   unsigned long vm_start;
+   unsigned long vm_end;
+   };
+#ifdef CONFIG_PER_VMA_LOCK
+   struct rcu_head vm_rcu; /* Used for deferred freeing. */
+#endif
+   };
 
struct mm_struct *vm_mm;/* The address space we belong to. */
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 58aab6c889a4..5986817f393c 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -479,10 +479,23 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct 
*orig)
return new;
 }
 
+#ifdef CONFIG_PER_VMA_LOCK
+static void __vm_area_free(struct rcu_head *head)
+{
+   struct vm_area_struct *vma = container_of(head, struct vm_area_struct,
+ vm_rcu);
+   kmem_cache_free(vm_area_cachep, vma);
+}
+#endif
+
 void vm_area_free(struct vm_area_struct *vma)
 {
free_anon_vma_name(vma);
+#ifdef CONFIG_PER_VMA_LOCK
+   call_rcu(>vm_rcu, __vm_area_free);
+#else
kmem_cache_free(vm_area_cachep, vma);
+#endif
 }
 
 static void account_kernel_stack(struct task_struct *tsk, int account)
-- 
2.39.0