On Tue, 8 Nov 2016 19:43:25 +0530
Kirti Wankhede <kwankh...@nvidia.com> wrote:

> On 11/8/2016 2:33 AM, Alex Williamson wrote:
> > On Sat, 5 Nov 2016 02:40:43 +0530
> > Kirti Wankhede <kwankh...@nvidia.com> wrote:
> >   
> 
> ...
> 
> >>  static int vfio_dma_do_map(struct vfio_iommu *iommu,
> >>                       struct vfio_iommu_type1_dma_map *map)
> >>  {
> >>    dma_addr_t iova = map->iova;
> >>    unsigned long vaddr = map->vaddr;
> >>    size_t size = map->size;
> >> -  long npage;
> >>    int ret = 0, prot = 0;
> >>    uint64_t mask;
> >>    struct vfio_dma *dma;
> >> -  unsigned long pfn;
> >> +  struct vfio_addr_space *addr_space;
> >> +  struct mm_struct *mm;
> >> +  bool free_addr_space_on_err = false;
> >>  
> >>    /* Verify that none of our __u64 fields overflow */
> >>    if (map->size != size || map->vaddr != vaddr || map->iova != iova)
> >> @@ -608,47 +685,56 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu,
> >>    mutex_lock(&iommu->lock);
> >>  
> >>    if (vfio_find_dma(iommu, iova, size)) {
> >> -          mutex_unlock(&iommu->lock);
> >> -          return -EEXIST;
> >> +          ret = -EEXIST;
> >> +          goto do_map_err;
> >> +  }
> >> +
> >> +  mm = get_task_mm(current);
> >> +  if (!mm) {
> >> +          ret = -ENODEV;  
> > 
> > -EFAULT?
> >  
> 
> -ENODEV return is in original code from vfio_pin_pages()
>         if (!current->mm)
>                 return -ENODEV;
> 
> Once I thought of changing it to -EFAULT, but then again changed to
> -ENODEV to be consistent with original error code.
> 
> Should I still change this return to -EFAULT?

Let's keep ENODEV for less code churn, I guess.
 
> >> +          goto do_map_err;
> >> +  }
> >> +
> >> +  addr_space = vfio_find_addr_space(iommu, mm);
> >> +  if (addr_space) {
> >> +          atomic_inc(&addr_space->ref_count);
> >> +          mmput(mm);
> >> +  } else {
> >> +          addr_space = kzalloc(sizeof(*addr_space), GFP_KERNEL);
> >> +          if (!addr_space) {
> >> +                  ret = -ENOMEM;
> >> +                  goto do_map_err;
> >> +          }
> >> +          addr_space->mm = mm;
> >> +          atomic_set(&addr_space->ref_count, 1);
> >> +          list_add(&addr_space->next, &iommu->addr_space_list);
> >> +          free_addr_space_on_err = true;
> >>    }
> >>  
> >>    dma = kzalloc(sizeof(*dma), GFP_KERNEL);
> >>    if (!dma) {
> >> -          mutex_unlock(&iommu->lock);
> >> -          return -ENOMEM;
> >> +          if (free_addr_space_on_err) {
> >> +                  mmput(mm);
> >> +                  list_del(&addr_space->next);
> >> +                  kfree(addr_space);
> >> +          }
> >> +          ret = -ENOMEM;
> >> +          goto do_map_err;
> >>    }
> >>  
> >>    dma->iova = iova;
> >>    dma->vaddr = vaddr;
> >>    dma->prot = prot;
> >> +  dma->addr_space = addr_space;
> >> +  get_task_struct(current);
> >> +  dma->task = current;
> >> +  dma->mlock_cap = capable(CAP_IPC_LOCK);  
> > 
> > 
> > How do you reason we can cache this?  Does the fact that the process
> > had this capability at the time that it did a DMA_MAP imply that it
> > necessarily still has this capability when an external user (vendor
> > driver) tries to pin pages?  I don't see how we can make that
> > assumption.
> > 
> >   
> 
> Will process change MEMLOCK limit at runtime? I think it shouldn't,
> correct me if I'm wrong. QEMU doesn't do that, right?

What QEMU does or doesn't do isn't relevant, the question is could a
process change CAP_IPC_LOCK runtime.  It seems plausible to me.

> The function capable() determines current task's capability. But when
> vfio_pin_pages() is called, it could come from other task but pages are
> pinned from address space of task who mapped it. So we can't use
> capable() in vfio_pin_pages()
> 
> If this capability shouldn't be cached, we have to use has_capability()
> with dma->task as argument in vfio_pin_pages()
> 
>  bool has_capability(struct task_struct *t, int cap)

Yep, that sounds better.  Thanks,

Alex

Reply via email to