spapr: Cache mm in tce_container

Alexey Kardashevskiy Sun, 23 Oct 2016 21:26:42 -0700

On 20/10/16 18:31, Nicholas Piggin wrote:
> On Thu, 20 Oct 2016 14:03:49 +1100
> Alexey Kardashevskiy <a...@ozlabs.ru> wrote:
> 
>> In some situations the userspace memory context may live longer than
>> the userspace process itself so if we need to do proper memory context
>> cleanup, we better cache @mm and use it later when the process is gone
>> (@current or @current->mm is NULL).
>>
>> This references mm and stores the pointer in the container; this is done
>> when a container is just created so checking for !current->mm in other
>> places becomes pointless.
>>
>> This replaces current->mm with container->mm everywhere except debug
>> prints.
>>
>> This adds a check that current->mm is the same as the one stored in
>> the container to prevent userspace from registering memory in other
>> processes.
>>
>> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru>
>> ---
>>  drivers/vfio/vfio_iommu_spapr_tce.c | 127 
>> ++++++++++++++++++++----------------
>>  1 file changed, 71 insertions(+), 56 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c 
>> b/drivers/vfio/vfio_iommu_spapr_tce.c
>> index d0c38b2..6b0b121 100644
>> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
>> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
>> @@ -31,49 +31,46 @@
> 
> Does it make sense to move the rest of these hunks into patch 2?
> I think they're similarly just moving the mm reference into callers.



Patch #2 is moving chunks between 2 maintainership areas - ppc64 and vfio,
this one changes only vfio code, usually it is easier to split patches this
way.

> 
> 
>>  static void tce_iommu_detach_group(void *iommu_data,
>>              struct iommu_group *iommu_group);
>>  
>> -static long try_increment_locked_vm(long npages)
>> +static long try_increment_locked_vm(struct mm_struct *mm, long npages)
>>  {
>>      long ret = 0, locked, lock_limit;
>>  
>> -    if (!current || !current->mm)
>> -            return -ESRCH; /* process exited */
>> -
>>      if (!npages)
>>              return 0;
>>  
>> -    down_write(&current->mm->mmap_sem);
>> -    locked = current->mm->locked_vm + npages;
>> +    down_write(&mm->mmap_sem);
>> +    locked = mm->locked_vm + npages;
>>      lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
>>      if (locked > lock_limit && !capable(CAP_IPC_LOCK))
>>              ret = -ENOMEM;
>>      else
>> -            current->mm->locked_vm += npages;
>> +            mm->locked_vm += npages;
>>  
>>      pr_debug("[%d] RLIMIT_MEMLOCK +%ld %ld/%ld%s\n", current->pid,
>>                      npages << PAGE_SHIFT,
>> -                    current->mm->locked_vm << PAGE_SHIFT,
>> +                    mm->locked_vm << PAGE_SHIFT,
>>                      rlimit(RLIMIT_MEMLOCK),
>>                      ret ? " - exceeded" : "");
>>  
>> -    up_write(&current->mm->mmap_sem);
>> +    up_write(&mm->mmap_sem);
>>  
>>      return ret;
>>  }
>>  
>> -static void decrement_locked_vm(long npages)
>> +static void decrement_locked_vm(struct mm_struct *mm, long npages)
>>  {
>> -    if (!current || !current->mm || !npages)
>> +    if (!mm || !npages)
>>              return; /* process exited */
> 
> I know you're trying to be defensive and change as little logic as possible,
> but some cases should be an error, and I think some of the "process exited"
> comments were wrong anyway.
> 
> Maybe pull the !mm test into the caller and make it WARN_ON?


No, the next patch should just drop this check as I am going to have a
valid mm pointer in a container all its lifetime.


> 
> 
>> @@ -317,6 +311,9 @@ static void *tce_iommu_open(unsigned long arg)
>>              return ERR_PTR(-EINVAL);
>>      }
>>  
>> +    if (!current->mm)
>> +            return ERR_PTR(-ESRCH); /* process exited */
> 
> A userspace thread in the kernel can't have its mm disappear, unless you
> are actually in the exit code. !current->mm is more like a test for a kernel
> thread.

Sorry, I am not following you here. I am going to use @mm, I need to check
if it is not NULL for whatever reason, I do this here, once, but it is
pointless anyway?


> 
> 
>> +
>>      container = kzalloc(sizeof(*container), GFP_KERNEL);
>>      if (!container)
>>              return ERR_PTR(-ENOMEM);
>> @@ -326,13 +323,17 @@ static void *tce_iommu_open(unsigned long arg)
>>  
>>      container->v2 = arg == VFIO_SPAPR_TCE_v2_IOMMU;
>>  
>> +    container->mm = current->mm;
>> +    atomic_inc(&container->mm->mm_count);
>> +
>>      return container;
> 
> It's a nitpick if you respin the patch, but I guess it would better be
> described as a reference than a cache of the object. "have tce_container
> take a reference to mm_struct".

Ok, will do!


> 
> 
>> @@ -515,13 +526,16 @@ static long tce_iommu_build_v2(struct tce_container 
>> *container,
>>      unsigned long hpa;
>>      enum dma_data_direction dirtmp;
>>  
>> +    if (container->mm != current->mm)
>> +            return -ESRCH;
> 
> Good, is this condition now enforced on all entrypoints that use
> container->mm (except the final teardown)? (The mlock/rlimit stuff,
> as we talked about before, doesn't make sense if not).

After having a chat with Paul, I'll move this check (slightly improved) to
the beginning of tce_iommu_ioctl().



-- 
Alexey

Re: [PATCH kernel v3 3/4] vfio/spapr: Cache mm in tce_container

Reply via email to