On Tue, 8 Nov 2016 19:43:25 +0530 Kirti Wankhede <kwankh...@nvidia.com> wrote:
> On 11/8/2016 2:33 AM, Alex Williamson wrote: > > On Sat, 5 Nov 2016 02:40:43 +0530 > > Kirti Wankhede <kwankh...@nvidia.com> wrote: > > > > ... > > >> static int vfio_dma_do_map(struct vfio_iommu *iommu, > >> struct vfio_iommu_type1_dma_map *map) > >> { > >> dma_addr_t iova = map->iova; > >> unsigned long vaddr = map->vaddr; > >> size_t size = map->size; > >> - long npage; > >> int ret = 0, prot = 0; > >> uint64_t mask; > >> struct vfio_dma *dma; > >> - unsigned long pfn; > >> + struct vfio_addr_space *addr_space; > >> + struct mm_struct *mm; > >> + bool free_addr_space_on_err = false; > >> > >> /* Verify that none of our __u64 fields overflow */ > >> if (map->size != size || map->vaddr != vaddr || map->iova != iova) > >> @@ -608,47 +685,56 @@ static int vfio_dma_do_map(struct vfio_iommu *iommu, > >> mutex_lock(&iommu->lock); > >> > >> if (vfio_find_dma(iommu, iova, size)) { > >> - mutex_unlock(&iommu->lock); > >> - return -EEXIST; > >> + ret = -EEXIST; > >> + goto do_map_err; > >> + } > >> + > >> + mm = get_task_mm(current); > >> + if (!mm) { > >> + ret = -ENODEV; > > > > -EFAULT? > > > > -ENODEV return is in original code from vfio_pin_pages() > if (!current->mm) > return -ENODEV; > > Once I thought of changing it to -EFAULT, but then again changed to > -ENODEV to be consistent with original error code. > > Should I still change this return to -EFAULT? Let's keep ENODEV for less code churn, I guess. > >> + goto do_map_err; > >> + } > >> + > >> + addr_space = vfio_find_addr_space(iommu, mm); > >> + if (addr_space) { > >> + atomic_inc(&addr_space->ref_count); > >> + mmput(mm); > >> + } else { > >> + addr_space = kzalloc(sizeof(*addr_space), GFP_KERNEL); > >> + if (!addr_space) { > >> + ret = -ENOMEM; > >> + goto do_map_err; > >> + } > >> + addr_space->mm = mm; > >> + atomic_set(&addr_space->ref_count, 1); > >> + list_add(&addr_space->next, &iommu->addr_space_list); > >> + free_addr_space_on_err = true; > >> } > >> > >> dma = kzalloc(sizeof(*dma), GFP_KERNEL); > >> if (!dma) { > >> - mutex_unlock(&iommu->lock); > >> - return -ENOMEM; > >> + if (free_addr_space_on_err) { > >> + mmput(mm); > >> + list_del(&addr_space->next); > >> + kfree(addr_space); > >> + } > >> + ret = -ENOMEM; > >> + goto do_map_err; > >> } > >> > >> dma->iova = iova; > >> dma->vaddr = vaddr; > >> dma->prot = prot; > >> + dma->addr_space = addr_space; > >> + get_task_struct(current); > >> + dma->task = current; > >> + dma->mlock_cap = capable(CAP_IPC_LOCK); > > > > > > How do you reason we can cache this? Does the fact that the process > > had this capability at the time that it did a DMA_MAP imply that it > > necessarily still has this capability when an external user (vendor > > driver) tries to pin pages? I don't see how we can make that > > assumption. > > > > > > Will process change MEMLOCK limit at runtime? I think it shouldn't, > correct me if I'm wrong. QEMU doesn't do that, right? What QEMU does or doesn't do isn't relevant, the question is could a process change CAP_IPC_LOCK runtime. It seems plausible to me. > The function capable() determines current task's capability. But when > vfio_pin_pages() is called, it could come from other task but pages are > pinned from address space of task who mapped it. So we can't use > capable() in vfio_pin_pages() > > If this capability shouldn't be cached, we have to use has_capability() > with dma->task as argument in vfio_pin_pages() > > bool has_capability(struct task_struct *t, int cap) Yep, that sounds better. Thanks, Alex