Re: [RFC PATCH 29/30] vfio: Add support for Shared Virtual Memory

2017-04-25 Thread Tomasz Nowicki

Hi Jean,

On 27.02.2017 20:54, Jean-Philippe Brucker wrote:

Add two new ioctl for VFIO devices. VFIO_DEVICE_BIND_TASK creates a bond
between a device and a process address space, identified by a
device-specific ID named PASID. This allows the device to target DMA
transactions at the process virtual addresses without a need for mapping
and unmapping buffers explicitly in the IOMMU. The process page tables are
shared with the IOMMU, and mechanisms such as PCI ATS/PRI may be used to
handle faults. VFIO_DEVICE_UNBIND_TASK removed a bond identified by a
PASID.

Also add a capability flag in device info to detect whether the system and
the device support SVM.

Users need to specify the state of a PASID when unbinding, with flags
VFIO_PASID_RELEASE_FLUSHED and VFIO_PASID_RELEASE_CLEAN. Even for PCI,
PASID invalidation is specific to each device and only partially covered
by the specification:

* Device must have an implementation-defined mechanism for stopping the
  use of a PASID. When this mechanism finishes, the device has stopped
  issuing transactions for this PASID and all transactions for this PASID
  have been flushed to the IOMMU.

* Device may either wait for all outstanding PRI requests for this PASID
  to finish, or issue a Stop Marker message, a barrier that separates PRI
  requests affecting this instance of the PASID from PRI requests
  affecting the next instance. In the first case, we say that the PASID is
  "clean", in the second case it is "flushed" (and the IOMMU has to wait
  for the Stop Marker before reassigning the PASID.)

We expect similar distinctions for platform devices. Ideally there should
be a callback for each PCI device, allowing the IOMMU to ask the device to
stop using a PASID. When the callback returns, the PASID is either flushed
or clean and the return value tells which.

For the moment I don't know how to implement this callback for PCI, so if
the user forgets to call unbind with either "clean" or "flushed", the
PASID is never reused. For platform devices, it might be simpler to
implement since we could associate an invalidate_pasid callback to a DT
compatible string, as is currently done for reset.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/vfio/pci/vfio_pci.c |  24 ++
 drivers/vfio/vfio.c | 104 
 include/uapi/linux/vfio.h   |  55 +++
 3 files changed, 183 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 324c52e3a1a4..3d7733f94891 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -623,6 +624,26 @@ int vfio_pci_register_dev_region(struct vfio_pci_device 
*vdev,
return 0;
 }



[...]



kfree(device);
@@ -1622,6 +1651,75 @@ static int vfio_device_fops_release(struct inode *inode, 
struct file *filep)
return 0;
 }

+static long vfio_svm_ioctl(struct vfio_device *device, unsigned int cmd,
+  unsigned long arg)
+{
+   int ret;
+   unsigned long minsz;
+
+   struct vfio_device_svm svm;
+   struct vfio_task *vfio_task;
+
+   minsz = offsetofend(struct vfio_device_svm, pasid);
+
+   if (copy_from_user(&svm, (void __user *)arg, minsz))
+   return -EFAULT;
+
+   if (svm.argsz < minsz)
+   return -EINVAL;
+
+   if (cmd == VFIO_DEVICE_BIND_TASK) {
+   struct task_struct *task = current;
+
+   ret = iommu_bind_task(device->dev, task, &svm.pasid, 0, NULL);
+   if (ret)
+   return ret;
+
+   vfio_task = kzalloc(sizeof(*vfio_task), GFP_KERNEL);
+   if (!vfio_task) {
+   iommu_unbind_task(device->dev, svm.pasid,
+ IOMMU_PASID_CLEAN);
+   return -ENOMEM;
+   }
+
+   vfio_task->pasid = svm.pasid;
+
+   mutex_lock(&device->tasks_lock);
+   list_add(&vfio_task->list, &device->tasks);
+   mutex_unlock(&device->tasks_lock);
+
+   } else {
+   int flags = 0;
+
+   if (svm.flags & ~(VFIO_SVM_PASID_RELEASE_FLUSHED |
+ VFIO_SVM_PASID_RELEASE_CLEAN))
+   return -EINVAL;
+
+   if (svm.flags & VFIO_SVM_PASID_RELEASE_FLUSHED)
+   flags = IOMMU_PASID_FLUSHED;
+   else if (svm.flags & VFIO_SVM_PASID_RELEASE_CLEAN)
+   flags = IOMMU_PASID_CLEAN;
+
+   mutex_lock(&device->tasks_lock);
+   list_for_each_entry(vfio_task, &device->tasks, list) {
+   if (vfio_task->pasid != svm.pasid)
+   continue;
+
+   ret = iommu_unbind_task(device->dev, svm.pasid, flags);
+   if (ret)
+ 

Re: [PATCH] arm64/dma-mapping: fix DMA_ATTR_FORCE_CONTIGUOUS mmaping code

2017-04-25 Thread Catalin Marinas
On Mon, Apr 24, 2017 at 09:58:23AM -0700, Laura Abbott wrote:
> On 04/21/2017 09:12 AM, Catalin Marinas wrote:
> > On Wed, Mar 29, 2017 at 01:55:32PM +0100, Robin Murphy wrote:
> >> On 29/03/17 11:05, Andrzej Hajda wrote:
> >>> In case of DMA_ATTR_FORCE_CONTIGUOUS allocations vm_area->pages
> >>> is invalid. __iommu_mmap_attrs and __iommu_get_sgtable cannot use
> >>> it. In first case temporary pages array is passed to iommu_dma_mmap,
> >>> in 2nd case single entry sg table is created directly instead
> >>> of calling helper.
> >>>
> >>> Fixes: 44176bb ("arm64: Add support for DMA_ATTR_FORCE_CONTIGUOUS to 
> >>> IOMMU")
> >>> Signed-off-by: Andrzej Hajda 
> >>> ---
> >>> Hi,
> >>>
> >>> I am not familiar with this framework so please don't be too cruel ;)
> >>> Alternative solution I see is to always create vm_area->pages,
> >>> I do not know which approach is preferred.
> >>>
> >>> Regards
> >>> Andrzej
> >>> ---
> >>>  arch/arm64/mm/dma-mapping.c | 40 ++--
> >>>  1 file changed, 38 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> >>> index f7b5401..bba2bc8 100644
> >>> --- a/arch/arm64/mm/dma-mapping.c
> >>> +++ b/arch/arm64/mm/dma-mapping.c
> >>> @@ -704,7 +704,30 @@ static int __iommu_mmap_attrs(struct device *dev, 
> >>> struct vm_area_struct *vma,
> >>>   return ret;
> >>>  
> >>>   area = find_vm_area(cpu_addr);
> >>> - if (WARN_ON(!area || !area->pages))
> >>> + if (WARN_ON(!area))
> >>
> >> From the look of things, it doesn't seem strictly necessary to change
> >> this, but whether that's a good thing is another matter. I'm not sure
> >> that dma_common_contiguous_remap() should really be leaving a dangling
> >> pointer in area->pages as it apparently does... :/
> > 
> > On this specific point, I don't think area->pages should be set either
> > (cc'ing Laura). As in the vmalloc vs vmap case, area->pages when pages
> > need to be freed (via vfree), which is not the case here. The
> > dma_common_pages_remap() would need to set area->pages when called
> > directly from the iommu DMA ops. Proposal below, not tested with the
> > iommu ops. I assume the patch would cause __iommu_mmap_attrs() to return
> > -ENXIO if DMA_ATTR_FORCE_CONTIGUOUS is set.
[...]
> From a quick glance, this looks okay. I can give a proper tag when
> the fix to allow mmaping comes in since I'm guessing -ENXIO is not the
> end goal.

I'll post a proper patch and description. Strictly speaking, the above
fix is independent and we should rather get -ENXIO on arm64's
__iommu_mmap_attrs than dereferencing an already freed pointer. However,
I'm not sure anyone else would trigger this. Even on arm64, once we fix
the mmap case with DMA_ATTR_FORCE_CONTIGUOUS, we wouldn't dereference
the dangling area->pages pointer.

-- 
Catalin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 2/2] PCI: quirks: Fix ThunderX2 dma alias handling

2017-04-25 Thread Bjorn Helgaas via iommu
On Tue, Apr 25, 2017 at 8:03 AM, Jayachandran C
 wrote:
> On Fri, Apr 21, 2017 at 12:57:05PM -0500, Bjorn Helgaas wrote:

>> I did notice that all the Root Port devices claim to *not* be connected to
>> slots, which doesn't seem right.  For example,
>>
>>   12:00.0 PCI bridge: Broadcom Corporation Device 9084
>>   Bus: primary=12, secondary=13, subordinate=14, sec-latency=0
>>   Capabilities: [ac] Express (v2) Root Port (Slot-), MSI 00
>>
>>   13:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ 
>> Network Connection
>>
>> It seems strange because the 12:00.0 Root Port looks like it probably
>> *does* lead to a slot where the NIC is plugged in.  Or is that NIC really
>> soldered down?
>>
>> But I assume there are *some* PCIe slots, so at some of those Root Ports
>> should advertise "Slot+" (which by itself does not imply hotplug support,
>> if that's the concern).
>
> The Root Ports are connected to a slot, so I am not sure why the slot 
> implemented
> bit is not set. There seems to be nothing useful in the slot capabilites, so 
> this
> may be ok for now. I have reported this to the hardware team.

Thanks.  The "Slot Implemented" bit and the slot registers aren't
essential if you don't want to support hotplug on those slots.  But
even without hotplug, they do contain things like a slot number, which
may be useful in the user interface.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 2/2] PCI: quirks: Fix ThunderX2 dma alias handling

2017-04-25 Thread Jayachandran C
On Fri, Apr 21, 2017 at 12:57:05PM -0500, Bjorn Helgaas wrote:
> On Fri, Apr 21, 2017 at 05:05:41PM +, Jayachandran C wrote:
> > On Fri, Apr 21, 2017 at 10:48:15AM -0500, Bjorn Helgaas wrote:
> > > On Mon, Apr 17, 2017 at 12:47 PM, Jayachandran C
> > >  wrote:
> > > > On Fri, Apr 14, 2017 at 09:00:06PM -0500, Bjorn Helgaas wrote:
> 
> > > >> Could you collect "lspci -vv" output from this system?  I'd like to
> > > >> archive that as background for this IOMMU issue and the ASPM tweaks I
> > > >> suspect we'll have to do.  I *wish* we had more information about that
> > > >> VIA thing, because I suspect we could get rid of it if we had more
> > > >> details.
> > > >
> > > > The full logs are slightly large, so I have kept them at:
> > > > https://github.com/jchandra-cavm/thunderx2/blob/master/logs/
> > > > The lspci -vv output is lspci-vv.txt and lspci -tvn output is 
> > > > lspci-tvn.txt
> > > >
> > > > The output is from 2 socket system, the cards are not on the first slot
> > > > like the example above, so the bus and device numbers are different.
> > > 
> > > Can somebody with this system collect the "lspci -" output as well?
> > > 
> > > I'm making some lspci changes to handle the PCI-to-PCIe bridge
> > > correctly, and I can use the "lspci -" output to create an lspci
> > > test case.
> > 
> > [Sorry was AFK for a few days]
> > 
> > I have updated the above directory with the log. Also tested your next 
> > branch
> > and it works fine on ThunderX2.
> 
> Thanks!
> 
> With regard to my lspci changes, they add "Slot-" here:
> 
>01:0a.0 PCI bridge: Broadcom Corporation Device 9039
>...
>   -   Capabilities: [40] Express (v2) PCI/PCI-X to PCI-Express Bridge, MSI 00
>   +   Capabilities: [40] Express (v2) PCI/PCI-X to PCI-Express Bridge 
> (Slot-), MSI 00
> 
> for all your PCI-to-PCIe bridges.  I assume the "Slot-" is correct, i.e.,
> the link is not connected to a slot, right?  This comes from the "Slot
> Implemented" bit in the PCIe Capabilities Register.

Yes, Slot- should be correct.
 
> I did notice that all the Root Port devices claim to *not* be connected to
> slots, which doesn't seem right.  For example,
> 
>   12:00.0 PCI bridge: Broadcom Corporation Device 9084
>   Bus: primary=12, secondary=13, subordinate=14, sec-latency=0
>   Capabilities: [ac] Express (v2) Root Port (Slot-), MSI 00
> 
>   13:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ 
> Network Connection
> 
> It seems strange because the 12:00.0 Root Port looks like it probably
> *does* lead to a slot where the NIC is plugged in.  Or is that NIC really
> soldered down?
> 
> But I assume there are *some* PCIe slots, so at some of those Root Ports
> should advertise "Slot+" (which by itself does not imply hotplug support,
> if that's the concern).

The Root Ports are connected to a slot, so I am not sure why the slot 
implemented
bit is not set. There seems to be nothing useful in the slot capabilites, so 
this
may be ok for now. I have reported this to the hardware team.

Thanks for the review and the comments,
JC.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu: arm-smmu: correct sid to mask

2017-04-25 Thread Joerg Roedel
On Mon, Apr 24, 2017 at 04:54:54PM +0100, Will Deacon wrote:
> On Fri, Apr 21, 2017 at 05:03:36PM +0800, Peng Fan wrote:
> > -   sid, smmu->smr_mask_mask);
> > +   mask, smmu->smr_mask_mask);
> > goto out_free;
> 
> Looks like a copy-paste error to me:
> 
> Acked-by: Will Deacon 
> 
> Joerg: do you mind picking this one up, please?

Applied, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2] iommu/arm-smmu: Return IOVA in iova_to_phys when SMMU is bypassed

2017-04-25 Thread sunil . kovvuri
From: Sunil Goutham 

For software initiated address translation, when domain type is
IOMMU_DOMAIN_IDENTITY i.e SMMU is bypassed, mimic HW behavior
i.e return the same IOVA as translated address.

This patch is an extension to Will Deacon's patchset
"Implement SMMU passthrough using the default domain".

Signed-off-by: Sunil Goutham 
---

V2
- As per Will's suggestion applied fix to SMMUv3 driver as well.

 drivers/iommu/arm-smmu-v3.c | 3 +++
 drivers/iommu/arm-smmu.c| 3 +++
 2 files changed, 6 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 05b4592..d412bdd 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1714,6 +1714,9 @@ arm_smmu_iova_to_phys(struct iommu_domain *domain, 
dma_addr_t iova)
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops;
 
+   if (domain->type == IOMMU_DOMAIN_IDENTITY)
+   return iova;
+
if (!ops)
return 0;
 
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index bfab4f7..81088cd 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1459,6 +1459,9 @@ static phys_addr_t arm_smmu_iova_to_phys(struct 
iommu_domain *domain,
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct io_pgtable_ops *ops= smmu_domain->pgtbl_ops;
 
+   if (domain->type == IOMMU_DOMAIN_IDENTITY)
+   return iova;
+
if (!ops)
return 0;
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu