Re: [RFC PATCH 29/30] vfio: Add support for Shared Virtual Memory
Hi Jean, On 27.02.2017 20:54, Jean-Philippe Brucker wrote: Add two new ioctl for VFIO devices. VFIO_DEVICE_BIND_TASK creates a bond between a device and a process address space, identified by a device-specific ID named PASID. This allows the device to target DMA transactions at the process virtual addresses without a need for mapping and unmapping buffers explicitly in the IOMMU. The process page tables are shared with the IOMMU, and mechanisms such as PCI ATS/PRI may be used to handle faults. VFIO_DEVICE_UNBIND_TASK removed a bond identified by a PASID. Also add a capability flag in device info to detect whether the system and the device support SVM. Users need to specify the state of a PASID when unbinding, with flags VFIO_PASID_RELEASE_FLUSHED and VFIO_PASID_RELEASE_CLEAN. Even for PCI, PASID invalidation is specific to each device and only partially covered by the specification: * Device must have an implementation-defined mechanism for stopping the use of a PASID. When this mechanism finishes, the device has stopped issuing transactions for this PASID and all transactions for this PASID have been flushed to the IOMMU. * Device may either wait for all outstanding PRI requests for this PASID to finish, or issue a Stop Marker message, a barrier that separates PRI requests affecting this instance of the PASID from PRI requests affecting the next instance. In the first case, we say that the PASID is "clean", in the second case it is "flushed" (and the IOMMU has to wait for the Stop Marker before reassigning the PASID.) We expect similar distinctions for platform devices. Ideally there should be a callback for each PCI device, allowing the IOMMU to ask the device to stop using a PASID. When the callback returns, the PASID is either flushed or clean and the return value tells which. For the moment I don't know how to implement this callback for PCI, so if the user forgets to call unbind with either "clean" or "flushed", the PASID is never reused. For platform devices, it might be simpler to implement since we could associate an invalidate_pasid callback to a DT compatible string, as is currently done for reset. Signed-off-by: Jean-Philippe Brucker --- drivers/vfio/pci/vfio_pci.c | 24 ++ drivers/vfio/vfio.c | 104 include/uapi/linux/vfio.h | 55 +++ 3 files changed, 183 insertions(+) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 324c52e3a1a4..3d7733f94891 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -623,6 +624,26 @@ int vfio_pci_register_dev_region(struct vfio_pci_device *vdev, return 0; } [...] kfree(device); @@ -1622,6 +1651,75 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep) return 0; } +static long vfio_svm_ioctl(struct vfio_device *device, unsigned int cmd, + unsigned long arg) +{ + int ret; + unsigned long minsz; + + struct vfio_device_svm svm; + struct vfio_task *vfio_task; + + minsz = offsetofend(struct vfio_device_svm, pasid); + + if (copy_from_user(&svm, (void __user *)arg, minsz)) + return -EFAULT; + + if (svm.argsz < minsz) + return -EINVAL; + + if (cmd == VFIO_DEVICE_BIND_TASK) { + struct task_struct *task = current; + + ret = iommu_bind_task(device->dev, task, &svm.pasid, 0, NULL); + if (ret) + return ret; + + vfio_task = kzalloc(sizeof(*vfio_task), GFP_KERNEL); + if (!vfio_task) { + iommu_unbind_task(device->dev, svm.pasid, + IOMMU_PASID_CLEAN); + return -ENOMEM; + } + + vfio_task->pasid = svm.pasid; + + mutex_lock(&device->tasks_lock); + list_add(&vfio_task->list, &device->tasks); + mutex_unlock(&device->tasks_lock); + + } else { + int flags = 0; + + if (svm.flags & ~(VFIO_SVM_PASID_RELEASE_FLUSHED | + VFIO_SVM_PASID_RELEASE_CLEAN)) + return -EINVAL; + + if (svm.flags & VFIO_SVM_PASID_RELEASE_FLUSHED) + flags = IOMMU_PASID_FLUSHED; + else if (svm.flags & VFIO_SVM_PASID_RELEASE_CLEAN) + flags = IOMMU_PASID_CLEAN; + + mutex_lock(&device->tasks_lock); + list_for_each_entry(vfio_task, &device->tasks, list) { + if (vfio_task->pasid != svm.pasid) + continue; + + ret = iommu_unbind_task(device->dev, svm.pasid, flags); + if (ret) +
Re: [PATCH] arm64/dma-mapping: fix DMA_ATTR_FORCE_CONTIGUOUS mmaping code
On Mon, Apr 24, 2017 at 09:58:23AM -0700, Laura Abbott wrote: > On 04/21/2017 09:12 AM, Catalin Marinas wrote: > > On Wed, Mar 29, 2017 at 01:55:32PM +0100, Robin Murphy wrote: > >> On 29/03/17 11:05, Andrzej Hajda wrote: > >>> In case of DMA_ATTR_FORCE_CONTIGUOUS allocations vm_area->pages > >>> is invalid. __iommu_mmap_attrs and __iommu_get_sgtable cannot use > >>> it. In first case temporary pages array is passed to iommu_dma_mmap, > >>> in 2nd case single entry sg table is created directly instead > >>> of calling helper. > >>> > >>> Fixes: 44176bb ("arm64: Add support for DMA_ATTR_FORCE_CONTIGUOUS to > >>> IOMMU") > >>> Signed-off-by: Andrzej Hajda > >>> --- > >>> Hi, > >>> > >>> I am not familiar with this framework so please don't be too cruel ;) > >>> Alternative solution I see is to always create vm_area->pages, > >>> I do not know which approach is preferred. > >>> > >>> Regards > >>> Andrzej > >>> --- > >>> arch/arm64/mm/dma-mapping.c | 40 ++-- > >>> 1 file changed, 38 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c > >>> index f7b5401..bba2bc8 100644 > >>> --- a/arch/arm64/mm/dma-mapping.c > >>> +++ b/arch/arm64/mm/dma-mapping.c > >>> @@ -704,7 +704,30 @@ static int __iommu_mmap_attrs(struct device *dev, > >>> struct vm_area_struct *vma, > >>> return ret; > >>> > >>> area = find_vm_area(cpu_addr); > >>> - if (WARN_ON(!area || !area->pages)) > >>> + if (WARN_ON(!area)) > >> > >> From the look of things, it doesn't seem strictly necessary to change > >> this, but whether that's a good thing is another matter. I'm not sure > >> that dma_common_contiguous_remap() should really be leaving a dangling > >> pointer in area->pages as it apparently does... :/ > > > > On this specific point, I don't think area->pages should be set either > > (cc'ing Laura). As in the vmalloc vs vmap case, area->pages when pages > > need to be freed (via vfree), which is not the case here. The > > dma_common_pages_remap() would need to set area->pages when called > > directly from the iommu DMA ops. Proposal below, not tested with the > > iommu ops. I assume the patch would cause __iommu_mmap_attrs() to return > > -ENXIO if DMA_ATTR_FORCE_CONTIGUOUS is set. [...] > From a quick glance, this looks okay. I can give a proper tag when > the fix to allow mmaping comes in since I'm guessing -ENXIO is not the > end goal. I'll post a proper patch and description. Strictly speaking, the above fix is independent and we should rather get -ENXIO on arm64's __iommu_mmap_attrs than dereferencing an already freed pointer. However, I'm not sure anyone else would trigger this. Even on arm64, once we fix the mmap case with DMA_ATTR_FORCE_CONTIGUOUS, we wouldn't dereference the dangling area->pages pointer. -- Catalin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
On Tue, Apr 25, 2017 at 8:03 AM, Jayachandran C wrote: > On Fri, Apr 21, 2017 at 12:57:05PM -0500, Bjorn Helgaas wrote: >> I did notice that all the Root Port devices claim to *not* be connected to >> slots, which doesn't seem right. For example, >> >> 12:00.0 PCI bridge: Broadcom Corporation Device 9084 >> Bus: primary=12, secondary=13, subordinate=14, sec-latency=0 >> Capabilities: [ac] Express (v2) Root Port (Slot-), MSI 00 >> >> 13:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ >> Network Connection >> >> It seems strange because the 12:00.0 Root Port looks like it probably >> *does* lead to a slot where the NIC is plugged in. Or is that NIC really >> soldered down? >> >> But I assume there are *some* PCIe slots, so at some of those Root Ports >> should advertise "Slot+" (which by itself does not imply hotplug support, >> if that's the concern). > > The Root Ports are connected to a slot, so I am not sure why the slot > implemented > bit is not set. There seems to be nothing useful in the slot capabilites, so > this > may be ok for now. I have reported this to the hardware team. Thanks. The "Slot Implemented" bit and the slot registers aren't essential if you don't want to support hotplug on those slots. But even without hotplug, they do contain things like a slot number, which may be useful in the user interface. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v5 2/2] PCI: quirks: Fix ThunderX2 dma alias handling
On Fri, Apr 21, 2017 at 12:57:05PM -0500, Bjorn Helgaas wrote: > On Fri, Apr 21, 2017 at 05:05:41PM +, Jayachandran C wrote: > > On Fri, Apr 21, 2017 at 10:48:15AM -0500, Bjorn Helgaas wrote: > > > On Mon, Apr 17, 2017 at 12:47 PM, Jayachandran C > > > wrote: > > > > On Fri, Apr 14, 2017 at 09:00:06PM -0500, Bjorn Helgaas wrote: > > > > >> Could you collect "lspci -vv" output from this system? I'd like to > > > >> archive that as background for this IOMMU issue and the ASPM tweaks I > > > >> suspect we'll have to do. I *wish* we had more information about that > > > >> VIA thing, because I suspect we could get rid of it if we had more > > > >> details. > > > > > > > > The full logs are slightly large, so I have kept them at: > > > > https://github.com/jchandra-cavm/thunderx2/blob/master/logs/ > > > > The lspci -vv output is lspci-vv.txt and lspci -tvn output is > > > > lspci-tvn.txt > > > > > > > > The output is from 2 socket system, the cards are not on the first slot > > > > like the example above, so the bus and device numbers are different. > > > > > > Can somebody with this system collect the "lspci -" output as well? > > > > > > I'm making some lspci changes to handle the PCI-to-PCIe bridge > > > correctly, and I can use the "lspci -" output to create an lspci > > > test case. > > > > [Sorry was AFK for a few days] > > > > I have updated the above directory with the log. Also tested your next > > branch > > and it works fine on ThunderX2. > > Thanks! > > With regard to my lspci changes, they add "Slot-" here: > >01:0a.0 PCI bridge: Broadcom Corporation Device 9039 >... > - Capabilities: [40] Express (v2) PCI/PCI-X to PCI-Express Bridge, MSI 00 > + Capabilities: [40] Express (v2) PCI/PCI-X to PCI-Express Bridge > (Slot-), MSI 00 > > for all your PCI-to-PCIe bridges. I assume the "Slot-" is correct, i.e., > the link is not connected to a slot, right? This comes from the "Slot > Implemented" bit in the PCIe Capabilities Register. Yes, Slot- should be correct. > I did notice that all the Root Port devices claim to *not* be connected to > slots, which doesn't seem right. For example, > > 12:00.0 PCI bridge: Broadcom Corporation Device 9084 > Bus: primary=12, secondary=13, subordinate=14, sec-latency=0 > Capabilities: [ac] Express (v2) Root Port (Slot-), MSI 00 > > 13:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ > Network Connection > > It seems strange because the 12:00.0 Root Port looks like it probably > *does* lead to a slot where the NIC is plugged in. Or is that NIC really > soldered down? > > But I assume there are *some* PCIe slots, so at some of those Root Ports > should advertise "Slot+" (which by itself does not imply hotplug support, > if that's the concern). The Root Ports are connected to a slot, so I am not sure why the slot implemented bit is not set. There seems to be nothing useful in the slot capabilites, so this may be ok for now. I have reported this to the hardware team. Thanks for the review and the comments, JC. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu: arm-smmu: correct sid to mask
On Mon, Apr 24, 2017 at 04:54:54PM +0100, Will Deacon wrote: > On Fri, Apr 21, 2017 at 05:03:36PM +0800, Peng Fan wrote: > > - sid, smmu->smr_mask_mask); > > + mask, smmu->smr_mask_mask); > > goto out_free; > > Looks like a copy-paste error to me: > > Acked-by: Will Deacon > > Joerg: do you mind picking this one up, please? Applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2] iommu/arm-smmu: Return IOVA in iova_to_phys when SMMU is bypassed
From: Sunil Goutham For software initiated address translation, when domain type is IOMMU_DOMAIN_IDENTITY i.e SMMU is bypassed, mimic HW behavior i.e return the same IOVA as translated address. This patch is an extension to Will Deacon's patchset "Implement SMMU passthrough using the default domain". Signed-off-by: Sunil Goutham --- V2 - As per Will's suggestion applied fix to SMMUv3 driver as well. drivers/iommu/arm-smmu-v3.c | 3 +++ drivers/iommu/arm-smmu.c| 3 +++ 2 files changed, 6 insertions(+) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 05b4592..d412bdd 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -1714,6 +1714,9 @@ arm_smmu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova) struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops; + if (domain->type == IOMMU_DOMAIN_IDENTITY) + return iova; + if (!ops) return 0; diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index bfab4f7..81088cd 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1459,6 +1459,9 @@ static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain, struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); struct io_pgtable_ops *ops= smmu_domain->pgtbl_ops; + if (domain->type == IOMMU_DOMAIN_IDENTITY) + return iova; + if (!ops) return 0; -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu