Re: [PATCH 2/2] iommu/vt-d: Replace intel SVM APIs with generic SVA APIs

2020-02-25 Thread Jean-Philippe Brucker
On Tue, Feb 25, 2020 at 11:10:34AM -0800, Christoph Hellwig wrote:
> On Mon, Feb 24, 2020 at 03:26:37PM -0800, Jacob Pan wrote:
> > This patch is an initial step to replace Intel SVM code with the
> > following IOMMU SVA ops:
> > intel_svm_bind_mm() => iommu_sva_bind_device()
> > intel_svm_unbind_mm() => iommu_sva_unbind_device()
> > intel_svm_is_pasid_valid() => iommu_sva_get_pasid()
> 
> How did either set APIs end up being added to the kernel without users
> to start with?
> 
> Please remove both the old intel and the iommu ones given that neither
> has any users.

The hisilicon crypto accelerator will be using the new API in v5.7
https://lore.kernel.org/linux-iommu/1581407665-13504-1-git-send-email-zhangfei@linaro.org/

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2] uacce: unmap remaining mmapping from user space

2020-02-25 Thread Zhangfei Gao
When uacce parent device module is removed, user app may
still keep the mmaped area, which can be accessed unsafely.
When rmmod, Parent device driver will call uacce_remove,
which unmap all remaining mapping from user space for safety.
VM_FAULT_SIGBUS is also reported to user space accordingly.

Suggested-by: Dave Jiang 
Signed-off-by: Zhangfei Gao 
---
 v2: Unmap before put_queue, where memory is freed, commented from Zaibo.

 drivers/misc/uacce/uacce.c | 16 
 include/linux/uacce.h  |  2 ++
 2 files changed, 18 insertions(+)

diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index ffced4d..d39307f 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -224,6 +224,7 @@ static int uacce_fops_open(struct inode *inode, struct file 
*filep)
 
init_waitqueue_head(>wait);
filep->private_data = q;
+   uacce->inode = inode;
q->state = UACCE_Q_INIT;
 
return 0;
@@ -253,6 +254,14 @@ static int uacce_fops_release(struct inode *inode, struct 
file *filep)
return 0;
 }
 
+static vm_fault_t uacce_vma_fault(struct vm_fault *vmf)
+{
+   if (vmf->flags & (FAULT_FLAG_MKWRITE | FAULT_FLAG_WRITE))
+   return VM_FAULT_SIGBUS;
+
+   return 0;
+}
+
 static void uacce_vma_close(struct vm_area_struct *vma)
 {
struct uacce_queue *q = vma->vm_private_data;
@@ -265,6 +274,7 @@ static void uacce_vma_close(struct vm_area_struct *vma)
 }
 
 static const struct vm_operations_struct uacce_vm_ops = {
+   .fault = uacce_vma_fault,
.close = uacce_vma_close,
 };
 
@@ -556,6 +566,12 @@ void uacce_remove(struct uacce_device *uacce)
 
if (!uacce)
return;
+   /*
+* unmap remaining mapping from user space, preventing user still
+* access the mmaped area while parent device is already removed
+*/
+   if (uacce->inode)
+   unmap_mapping_range(uacce->inode->i_mapping, 0, 0, 1);
 
/* ensure no open queue remains */
mutex_lock(>mm_lock);
diff --git a/include/linux/uacce.h b/include/linux/uacce.h
index 904a461..0e215e6 100644
--- a/include/linux/uacce.h
+++ b/include/linux/uacce.h
@@ -98,6 +98,7 @@ struct uacce_queue {
  * @priv: private pointer of the uacce
  * @mm_list: list head of uacce_mm->list
  * @mm_lock: lock for mm_list
+ * @inode: core vfs
  */
 struct uacce_device {
const char *algs;
@@ -113,6 +114,7 @@ struct uacce_device {
void *priv;
struct list_head mm_list;
struct mutex mm_lock;
+   struct inode *inode;
 };
 
 /**
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RESEND,PATCH 03/13] iommu/mediatek: Add mtk_iommu_pgtable structure

2020-02-25 Thread chao hao
On Sat, 2020-02-15 at 20:17 +0800, Yong Wu wrote:
> On Tue, 2019-12-31 at 17:39 +0800, chao hao wrote:
> > On Mon, 2019-12-16 at 20:13 +0800, Yong Wu wrote:
> > > On Mon, 2019-11-04 at 19:52 +0800, Chao Hao wrote:
> > > > Start with this patch, we will change the SW architecture
> > > > to support multiple domains. SW architecture will has a big change,
> > > > so we need to modify a little bit by more than one patch.
> > > > The new SW overall architecture is as below:
> > > > 
> > > > iommu0   iommu1
> > > >   | |
> > > >   ---
> > > > |
> > > > mtk_iommu_pgtable
> > > > |
> > > > --
> > > > ||   |
> > > > mtk_iommu_domain1   mtk_iommu_domain2  mtk_iommu_domain3
> > > > ||   |
> > > > iommu_group1 iommu_group2   iommu_group3
> > > > ||   |
> > > > iommu_domain1   iommu_domain2   
> > > > iommu_domain3
> > > > ||   |
> > > > iova region1(normal)  iova region2(CCU)iova 
> > > > region3(VPU)
> > > > 
> > > > For current structure, no matter how many iommus there are,
> > > > they use the same page table to simplify the usage of module.
> > > > In order to make the software architecture more explicit, this
> > > > patch will create a global mtk_iommu_pgtable structure to describe
> > > > page table and all the iommus use it.
> > > 
> > > Thanks for the hard work of this file. Actually this patch and the later
> > > ones confuse me. Why do you make this flow change? 
> > > for making the code "more explicit" or for adding multi-domain support
> > > in 13/13.
> > > 
> > > IMHO, the change is unnecessary.
> > > a) For me, this change has no improvement. currently we use a global
> > > mtk_iommu_get_m4u_data to get the M4U data. I will be very glad if you
> > > could get rid of it. But in this patchset, You use a another global
> > > mtk_iommu_pgtable to instead. For me. It has no improvement.
> > 
> > Thanks for you advice!
> > 
> > For current SW arch, all the IOMMU HW use the same page table, we can
> > use a global mtk_iommu_pgtable to discribe the information of page table
> 
> What's your plan if the 4GB iova range is not enough for us in future?
> Do you plan to add a new global mtk_iommu_pgtable again?
> 
> > and all the IOMMU attach it, I think that it is more clear and
> > unambiguous. For beginners, it maybe more easily explicable? 
> 
> I still don't get the necessity of this change. it is only for making
> code clear from your point for view, right?
> 
> This code has been reviewed for many years, I don't know why you think
> it is ambiguous. it is clear for me at lease. and I will complain that
> you add a new global variable in this change.
> 
> > > 
> > > b) This patchset break the original flow. device_group give you a
> > > software chance for initializing, then you move pagetable allocating
> > > code into it. But it isn't device_group job.
> > > 
> > 
> > As is shown above diagram, mtk_iommu_pgtable includes iommu_group and
> > iommu_domain,so we need to allocate mtk_iommu_pgtable and initialize it
> > in device_group firstly,and then execute the original flow, it only
> > changes place for creating mtk_iommu_pgtable and don't break original
> > device_group flow.
> 
> I understand you have to do this change after you adjust the structure.
> I mean that it may be not proper since allocating pagetable should not
> be done in device_group logically. From here, Could we get this change
> looks not good?.
> 
gentle ping ...

Dear Matthias and Joerg,
>From mt6779 platform, mtk_iommu.c needs to support multiple domains for
different iova regions.About the change, there are some disagreements
among our internal. We hope to get your helps and advices:

Based on current SW architecture to support multiple domain, diagram is
as below:
   iommu0   iommu1
  ||
  --
   |
  --
  ||   |
 iommu_group1 iommu_group2iommu_group3
  ||   |
   mtk_iommu_domain1 mtk_iommu_domain2   mtk_iommu_domain3
  ||   |
   iova region1(normal)  iova region2(CCU)   iova region3(VPU)
 
  PS: the information of page table is included struct mtk_iommu_domain

In my 

Re: [PATCH v4 03/26] iommu: Add a page fault handler

2020-02-25 Thread Xu Zaibo

Hi,
On 2020/2/25 17:25, Jean-Philippe Brucker wrote:

Hi Zaibo,

On Tue, Feb 25, 2020 at 11:30:05AM +0800, Xu Zaibo wrote:

+struct iopf_queue *
+iopf_queue_alloc(const char *name, iopf_queue_flush_t flush, void *cookie)
+{
+   struct iopf_queue *queue;
+
+   queue = kzalloc(sizeof(*queue), GFP_KERNEL);
+   if (!queue)
+   return NULL;
+
+   /*
+* The WQ is unordered because the low-level handler enqueues faults by
+* group. PRI requests within a group have to be ordered, but once
+* that's dealt with, the high-level function can handle groups out of
+* order.
+*/
+   queue->wq = alloc_workqueue("iopf_queue/%s", WQ_UNBOUND, 0, name);

Should this workqueue use 'WQ_HIGHPRI | WQ_UNBOUND' or some flags like this
to decrease the unexpected
latency of I/O PageFault here? Or maybe, workqueue will show an uncontrolled
latency, even in a busy system.

I'll investigate the effect of these flags. So far I've only run on
completely idle systems but it would be interesting to add some
workqueue-heavy load in my tests.

I'm not sure, just my concern. Hopefully, Tejun Heo can give us some 
hints. :)


+cc  Tejun Heo 

Cheers,
Zaibo

.

.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2] MAINTAINERS: add maintainers for uacce

2020-02-25 Thread Zhangfei Gao
Add Zhangfei Gao and Zhou Wang as maintainers for uacce

Signed-off-by: Zhangfei Gao 
Signed-off-by: Zhou Wang 
---
Add list, suggested by Dave

MAINTAINERS | 12 
 1 file changed, 12 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 38fe2f3..b5bdef8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17039,6 +17039,18 @@ W: http://linuxtv.org
 S: Maintained
 F: drivers/media/pci/tw686x/
 
+UACCE ACCELERATOR FRAMEWORK
+M: Zhangfei Gao 
+M: Zhou Wang 
+L: linux-accelerat...@lists.ozlabs.org
+L: linux-ker...@vger.kernel.org
+S: Maintained
+F: Documentation/ABI/testing/sysfs-driver-uacce
+F: Documentation/misc-devices/uacce.rst
+F: drivers/misc/uacce/
+F: include/linux/uacce.h
+F: include/uapi/misc/uacce/
+
 UBI FILE SYSTEM (UBIFS)
 M: Richard Weinberger 
 L: linux-...@lists.infradead.org
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] MAINTAINERS: add maintainers for uacce

2020-02-25 Thread Dave Jiang



On 2/25/20 6:11 PM, zhangfei wrote:



On 2020/2/26 上午12:02, Dave Jiang wrote:



On 2/24/20 11:17 PM, Zhangfei Gao wrote:

Add Zhangfei Gao and Zhou Wang as maintainers for uacce

Signed-off-by: Zhangfei Gao 
Signed-off-by: Zhou Wang 
---
  MAINTAINERS | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 38fe2f3..22e647f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17039,6 +17039,16 @@ W:    http://linuxtv.org
  S:    Maintained
  F:    drivers/media/pci/tw686x/
  +UACCE ACCELERATOR FRAMEWORK
+M:    Zhangfei Gao 
+M:    Zhou Wang 
+S:    Maintained
+F:    Documentation/ABI/testing/sysfs-driver-uacce
+F:    Documentation/misc-devices/uacce.rst
+F:    drivers/misc/uacce/
+F:    include/linux/uacce.h
+F:    include/uapi/misc/uacce/


Mailing list for patch submission?
+L: linux-accelerat...@lists.ozlabs.org ?


Thanks Dave

How about adding both
linux-accelerat...@lists.ozlabs.org
linux-ker...@vger.kernel.org
Since the patches will go to misc tree.


That is entirely up to you. But having guidance on somewhere to submit 
patches will be good.




Thanks

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] MAINTAINERS: add maintainers for uacce

2020-02-25 Thread zhangfei



On 2020/2/26 上午12:02, Dave Jiang wrote:



On 2/24/20 11:17 PM, Zhangfei Gao wrote:

Add Zhangfei Gao and Zhou Wang as maintainers for uacce

Signed-off-by: Zhangfei Gao 
Signed-off-by: Zhou Wang 
---
  MAINTAINERS | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 38fe2f3..22e647f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17039,6 +17039,16 @@ W:    http://linuxtv.org
  S:    Maintained
  F:    drivers/media/pci/tw686x/
  +UACCE ACCELERATOR FRAMEWORK
+M:    Zhangfei Gao 
+M:    Zhou Wang 
+S:    Maintained
+F:    Documentation/ABI/testing/sysfs-driver-uacce
+F:    Documentation/misc-devices/uacce.rst
+F:    drivers/misc/uacce/
+F:    include/linux/uacce.h
+F:    include/uapi/misc/uacce/


Mailing list for patch submission?
+L: linux-accelerat...@lists.ozlabs.org ?


Thanks Dave

How about adding both
linux-accelerat...@lists.ozlabs.org
linux-ker...@vger.kernel.org
Since the patches will go to misc tree.

Thanks
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH 06/11] iommu: arm-smmu: Remove Calxeda secure mode quirk

2020-02-25 Thread Rob Herring
On Tue, Feb 18, 2020 at 11:20 AM Will Deacon  wrote:
>
> On Tue, Feb 18, 2020 at 11:13:16AM -0600, Rob Herring wrote:
> > Cc: Will Deacon 
> > Cc: Robin Murphy 
> > Cc: Joerg Roedel 
> > Cc: iommu@lists.linux-foundation.org
> > Signed-off-by: Rob Herring 
> > ---
> > Do not apply yet.
>
> Please? ;)
>
> >  drivers/iommu/arm-smmu-impl.c | 43 ---
> >  1 file changed, 43 deletions(-)
>
> Yes, I'm happy to get rid of this. Sadly, I don't think we can remove
> anything from 'struct arm_smmu_impl' because most implementations fall
> just short of perfect.
>
> Anyway, let me know when I can push the button and I'll queue this in
> the arm-smmu tree.

Seems we're leaving the platform support for now, but I think we never
actually enabled SMMU support. It's not in the dts either in mainline
nor the version I have which should be close to what shipped in
firmware. So as long as Andre agrees, this one is good to apply.

Rob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/vt-d: Replace intel SVM APIs with generic SVA APIs

2020-02-25 Thread Christoph Hellwig
On Mon, Feb 24, 2020 at 03:26:37PM -0800, Jacob Pan wrote:
> This patch is an initial step to replace Intel SVM code with the
> following IOMMU SVA ops:
> intel_svm_bind_mm() => iommu_sva_bind_device()
> intel_svm_unbind_mm() => iommu_sva_unbind_device()
> intel_svm_is_pasid_valid() => iommu_sva_get_pasid()

How did either set APIs end up being added to the kernel without users
to start with?

Please remove both the old intel and the iommu ones given that neither
has any users.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h

2020-02-25 Thread Cornelia Huck
On Mon, 24 Feb 2020 19:49:53 +0100
Halil Pasic  wrote:

> On Mon, 24 Feb 2020 14:33:14 +1100
> David Gibson  wrote:
> 
> > On Fri, Feb 21, 2020 at 07:07:02PM +0100, Halil Pasic wrote:  
> > > On Fri, 21 Feb 2020 10:48:15 -0500
> > > "Michael S. Tsirkin"  wrote:
> > >   
> > > > On Fri, Feb 21, 2020 at 02:06:39PM +0100, Halil Pasic wrote:  
> > > > > On Fri, 21 Feb 2020 14:27:27 +1100
> > > > > David Gibson  wrote:
> > > > >   
> > > > > > On Thu, Feb 20, 2020 at 05:31:35PM +0100, Christoph Hellwig wrote:  
> > > > > > > On Thu, Feb 20, 2020 at 05:23:20PM +0100, Christian Borntraeger 
> > > > > > > wrote:  
> > > > > > > > >From a users perspective it makes absolutely perfect sense to 
> > > > > > > > >use the  
> > > > > > > > bounce buffers when they are NEEDED. 
> > > > > > > > Forcing the user to specify iommu_platform just because you 
> > > > > > > > need bounce buffers
> > > > > > > > really feels wrong. And obviously we have a severe performance 
> > > > > > > > issue
> > > > > > > > because of the indirections.  
> > > > > > > 
> > > > > > > The point is that the user should not have to specify 
> > > > > > > iommu_platform.
> > > > > > > We need to make sure any new hypervisor (especially one that 
> > > > > > > might require
> > > > > > > bounce buffering) always sets it,  
> > > > > > 
> > > > > > So, I have draft qemu patches which enable iommu_platform by 
> > > > > > default.
> > > > > > But that's really because of other problems with !iommu_platform, 
> > > > > > not
> > > > > > anything to do with bounce buffering or secure VMs.
> > > > > > 
> > > > > > The thing is that the hypervisor *doesn't* require bounce buffering.
> > > > > > In the POWER (and maybe s390 as well) models for Secure VMs, it's 
> > > > > > the
> > > > > > *guest*'s choice to enter secure mode, so the hypervisor has no 
> > > > > > reason
> > > > > > to know whether the guest needs bounce buffering.  As far as the
> > > > > > hypervisor and qemu are concerned that's a guest internal detail, it
> > > > > > just expects to get addresses it can access whether those are GPAs
> > > > > > (iommu_platform=off) or IOVAs (iommu_platform=on).  
> > > > > 
> > > > > I very much agree!
> > > > >   
> > > > > >   
> > > > > > > as was a rather bogus legacy hack  
> > > > > > 
> > > > > > It was certainly a bad idea, but it was a bad idea that went into a
> > > > > > public spec and has been widely deployed for many years.  We can't
> > > > > > just pretend it didn't happen and move on.
> > > > > > 
> > > > > > Turning iommu_platform=on by default breaks old guests, some of 
> > > > > > which
> > > > > > we still care about.  We can't (automatically) do it only for guests
> > > > > > that need bounce buffering, because the hypervisor doesn't know that
> > > > > > ahead of time.  

We could default to iommu_platform=on on s390 when the host has active
support for protected virtualization... but that's just another kind of
horrible, so let's just pretend I didn't suggest it.

> > > > > 
> > > > > Turning iommu_platform=on for virtio-ccw makes no sense whatsover,
> > > > > because for CCW I/O there is no such thing as IOMMU and the addresses
> > > > > are always physical addresses.  
> > > > 
> > > > Fix the name then. The spec calls is ACCESS_PLATFORM now, which
> > > > makes much more sense.  
> > > 
> > > I don't quite get it. Sorry. Maybe I will revisit this later.  
> > 
> > Halil, I think I can clarify this.
> > 
> > The "iommu_platform" flag doesn't necessarily have anything to do with
> > an iommu, although it often will.  Basically it means "access guest
> > memory via the bus's normal DMA mechanism" rather than "access guest
> > memory using GPA, because you're the hypervisor and you can do that".
> >   
> 
> Unfortunately, I don't think this is what is conveyed to the end users.
> Let's see what do we have documented:
> 
> Neither Qemu user documentation
> (https://www.qemu.org/docs/master/qemu-doc.html) nor online help:
> qemu-system-s390x -device virtio-net-ccw,?|grep iommu
>   iommu_platform=  - on/off (default: false)
> has any documentation on it.

Now, that's 'helpful' -- this certainly calls out for a bit of doc...

> 
> But libvirt does have have documenttion on the knob that contros
> iommu_platform for QEMU (when  QEMU is managed by libvirt):
> """
> Virtio-related options 
> 
> QEMU's virtio devices have some attributes related to the virtio
> transport under the driver element: The iommu attribute enables the use
> of emulated IOMMU by the device. The attribute ats controls the Address
> Translation Service support for PCIe devices. This is needed to make use
> of IOTLB support (see IOMMU device). Possible values are on or off.
> Since 3.5.0 
> """
> (https://libvirt.org/formatdomain.html#elementsVirtio)
> 
> Thus it seems the only available documentation says that it "enables the use
> of emulated IOMMU by the device".
> 
> And for vhost-user we have
> """
> When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has not 

Re: [PATCH] MAINTAINERS: add maintainers for uacce

2020-02-25 Thread Dave Jiang




On 2/24/20 11:17 PM, Zhangfei Gao wrote:

Add Zhangfei Gao and Zhou Wang as maintainers for uacce

Signed-off-by: Zhangfei Gao 
Signed-off-by: Zhou Wang 
---
  MAINTAINERS | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 38fe2f3..22e647f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17039,6 +17039,16 @@ W: http://linuxtv.org
  S:Maintained
  F:drivers/media/pci/tw686x/
  
+UACCE ACCELERATOR FRAMEWORK

+M: Zhangfei Gao 
+M: Zhou Wang 
+S: Maintained
+F: Documentation/ABI/testing/sysfs-driver-uacce
+F: Documentation/misc-devices/uacce.rst
+F: drivers/misc/uacce/
+F: include/linux/uacce.h
+F: include/uapi/misc/uacce/


Mailing list for patch submission?
+L: linux-accelerat...@lists.ozlabs.org ?


+
  UBI FILE SYSTEM (UBIFS)
  M:Richard Weinberger 
  L:linux-...@lists.infradead.org


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 01/26] mm/mmu_notifiers: pass private data down to alloc_notifier()

2020-02-25 Thread Jason Gunthorpe
On Tue, Feb 25, 2020 at 10:24:39AM +0100, Jean-Philippe Brucker wrote:
> On Mon, Feb 24, 2020 at 03:00:56PM -0400, Jason Gunthorpe wrote:
> > On Mon, Feb 24, 2020 at 07:23:36PM +0100, Jean-Philippe Brucker wrote:
> > > The new allocation scheme introduced by 2c7933f53f6b ("mm/mmu_notifiers:
> > > add a get/put scheme for the registration") provides a convenient way
> > > for users to attach notifier data to an mm. However, it would be even
> > > better to create this notifier data atomically.
> > > 
> > > Since the alloc_notifier() callback only takes an mm argument at the
> > > moment, some users have to perform the allocation in two times.
> > > alloc_notifier() initially creates an incomplete structure, which is
> > > then finalized using more context once mmu_notifier_get() returns. This
> > > second step requires carrying an initialization lock in the notifier
> > > data and playing dirty tricks to order memory accesses against live
> > > invalidation.
> > 
> > This was the intended pattern. Tthere shouldn't be an real issue as
> > there shouldn't be any data on which to invalidate, ie the later patch
> > does:
> > 
> > +   list_for_each_entry_rcu(bond, _mm->devices, mm_head)
> > 
> > And that list is empty post-allocation, so no 'dirty tricks' required.
> 
> Before introducing this patch I had the following code:
> 
> + list_for_each_entry_rcu(bond, _mm->devices, mm_head) {
> + /*
> +  * To ensure that we observe the initialization of io_mm fields
> +  * by io_mm_finalize() before the registration of this bond to
> +  * the list by io_mm_attach(), introduce an address dependency
> +  * between bond and io_mm. It pairs with the smp_store_release()
> +  * from list_add_rcu().
> +  */
> + io_mm = rcu_dereference(bond->io_mm);

A rcu_dereference isn't need here, just a normal derference is fine.

> + io_mm->ops->invalidate(bond->sva.dev, io_mm->pasid, io_mm->ctx,
> +start, end - start);
> + }
> 
> (1) io_mm_get() would obtain an empty io_mm from iommu_notifier_get().
> (2) then io_mm_finalize() would initialize io_mm->ops, io_mm->ctx, etc.
> (3) finally io_mm_attach() would add the bond to io_mm->devices.
> 
> Since the above code can run before (2) it needs to observe valid
> io_mm->ctx, io_mm->ops initialized by (2) after obtaining the bond
> initialized by (3). Which I believe requires the address dependency from
> the rcu_dereference() above or some stronger barrier to pair with the
> list_add_rcu().

The list_for_each_entry_rcu() is an acquire that already pairs with
the release in list_add_rcu(), all you need is a data dependency chain
starting on bond to be correct on ordering.

But this is super tricky :\

> If io_mm->ctx and io_mm->ops are already valid before the
> mmu notifier is published, then we don't need that stuff.

So, this trickyness with RCU is not a bad reason to introduce the priv
scheme, maybe explain it in the commit message?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: take the bus_dma_limit into account on arm

2020-02-25 Thread Roger Quadros via iommu

Hi Russell,

On 18/02/2020 20:41, Christoph Hellwig wrote:

Hi Russell,

this series fixes the arm dma coherent allocator to take the bus dma
mask into account, similar to what other architectures do.  Without
this devices that support 64-bit mask, but are limited by the
interconnect won't work properly.



Can we please have this series marked for -stable? Thanks.

--
cheers,
-roger

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] openrisc: use the generic in-place uncached DMA allocator

2020-02-25 Thread Stafford Horne
On Mon, Feb 24, 2020 at 08:45:28PM +0100, Christoph Hellwig wrote:
> On Sat, Feb 22, 2020 at 07:14:47AM +0900, Stafford Horne wrote:
> > On Thu, Feb 20, 2020 at 09:01:39AM -0800, Christoph Hellwig wrote:
> > > Switch openrisc to use the dma-direct allocator and just provide the
> > > hooks for setting memory uncached or cached.
> > > 
> > > Signed-off-by: Christoph Hellwig 
> > 
> > Reviewed-by: Stafford Horne 
> > 
> > Also, I test booted openrisc with linux 5.5 + these patches.  Thanks for
> > continuing to shrink my code base.
> 
> I just resent a new version that changes how the hooks work based on
> feedback from Robin.  Everything should work as-is, but if you have
> some time to retest that would be great.

No problem.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 5/5] openrisc: use the generic in-place uncached DMA allocator

2020-02-25 Thread Stafford Horne
On Mon, Feb 24, 2020 at 11:44:45AM -0800, Christoph Hellwig wrote:
> Switch openrisc to use the dma-direct allocator and just provide the
> hooks for setting memory uncached or cached.
> 
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Stafford Horne 

I also test booted this series with linux 5.5 on my OpenRISC target.  There are
no issues.  Note, I had an issue with patch 3/5 not cleanly applying with 'git 
am'
but it worked fine using just patch, I didn't get any other details.

-Stafford
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 03/26] iommu: Add a page fault handler

2020-02-25 Thread Jean-Philippe Brucker
Hi Zaibo,

On Tue, Feb 25, 2020 at 11:30:05AM +0800, Xu Zaibo wrote:
> > +struct iopf_queue *
> > +iopf_queue_alloc(const char *name, iopf_queue_flush_t flush, void *cookie)
> > +{
> > +   struct iopf_queue *queue;
> > +
> > +   queue = kzalloc(sizeof(*queue), GFP_KERNEL);
> > +   if (!queue)
> > +   return NULL;
> > +
> > +   /*
> > +* The WQ is unordered because the low-level handler enqueues faults by
> > +* group. PRI requests within a group have to be ordered, but once
> > +* that's dealt with, the high-level function can handle groups out of
> > +* order.
> > +*/
> > +   queue->wq = alloc_workqueue("iopf_queue/%s", WQ_UNBOUND, 0, name);
> Should this workqueue use 'WQ_HIGHPRI | WQ_UNBOUND' or some flags like this
> to decrease the unexpected
> latency of I/O PageFault here? Or maybe, workqueue will show an uncontrolled
> latency, even in a busy system.

I'll investigate the effect of these flags. So far I've only run on
completely idle systems but it would be interesting to add some
workqueue-heavy load in my tests.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 01/26] mm/mmu_notifiers: pass private data down to alloc_notifier()

2020-02-25 Thread Jean-Philippe Brucker
On Mon, Feb 24, 2020 at 03:00:56PM -0400, Jason Gunthorpe wrote:
> On Mon, Feb 24, 2020 at 07:23:36PM +0100, Jean-Philippe Brucker wrote:
> > The new allocation scheme introduced by 2c7933f53f6b ("mm/mmu_notifiers:
> > add a get/put scheme for the registration") provides a convenient way
> > for users to attach notifier data to an mm. However, it would be even
> > better to create this notifier data atomically.
> > 
> > Since the alloc_notifier() callback only takes an mm argument at the
> > moment, some users have to perform the allocation in two times.
> > alloc_notifier() initially creates an incomplete structure, which is
> > then finalized using more context once mmu_notifier_get() returns. This
> > second step requires carrying an initialization lock in the notifier
> > data and playing dirty tricks to order memory accesses against live
> > invalidation.
> 
> This was the intended pattern. Tthere shouldn't be an real issue as
> there shouldn't be any data on which to invalidate, ie the later patch
> does:
> 
> +   list_for_each_entry_rcu(bond, _mm->devices, mm_head)
> 
> And that list is empty post-allocation, so no 'dirty tricks' required.

Before introducing this patch I had the following code:

+   list_for_each_entry_rcu(bond, _mm->devices, mm_head) {
+   /*
+* To ensure that we observe the initialization of io_mm fields
+* by io_mm_finalize() before the registration of this bond to
+* the list by io_mm_attach(), introduce an address dependency
+* between bond and io_mm. It pairs with the smp_store_release()
+* from list_add_rcu().
+*/
+   io_mm = rcu_dereference(bond->io_mm);
+   io_mm->ops->invalidate(bond->sva.dev, io_mm->pasid, io_mm->ctx,
+  start, end - start);
+   }

(1) io_mm_get() would obtain an empty io_mm from iommu_notifier_get().
(2) then io_mm_finalize() would initialize io_mm->ops, io_mm->ctx, etc.
(3) finally io_mm_attach() would add the bond to io_mm->devices.

Since the above code can run before (2) it needs to observe valid
io_mm->ctx, io_mm->ops initialized by (2) after obtaining the bond
initialized by (3). Which I believe requires the address dependency from
the rcu_dereference() above or some stronger barrier to pair with the
list_add_rcu(). If io_mm->ctx and io_mm->ops are already valid before the
mmu notifier is published, then we don't need that stuff.

That's the main reason I would have liked moving everything to
alloc_notifier(), the locking below isn't a big deal.

> The other op callback is release, which also cannot be called as the
> caller must hold a mmget to establish the notifier.
> 
> So just use the locking that already exists. There is one function
> that calls io_mm_get() which immediately calls io_mm_attach, which
> immediately grabs the global iommu_sva_lock.
> 
> Thus init the pasid for the first time under that lock and everything
> is fine.

I agree with this, can't remember why I used a separate lock for
initialization rather than reusing iommu_sva_lock.

Thanks,
Jean

> 
> There is nothing inherently wrong with the approach in this patch, but
> it seems unneeded in this case..
> 
> Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] uacce: unmap remaining mmapping from user space

2020-02-25 Thread Xu Zaibo

Hi,
On 2020/2/25 16:33, zhangfei wrote:

Hi, Zaibo

On 2020/2/24 下午3:17, Xu Zaibo wrote:

  @@ -585,6 +595,13 @@ void uacce_remove(struct uacce_device *uacce)
  cdev_device_del(uacce->cdev, >dev);
  xa_erase(_xa, uacce->dev_id);
  put_device(>dev);
+
+/*
+ * unmap remainning mapping from user space, preventing user still
+ * access the mmaped area while parent device is already removed
+ */
+if (uacce->inode)
+unmap_mapping_range(uacce->inode->i_mapping, 0, 0, 1);
Should we unmap them at the first of 'uacce_remove',  and before 
'uacce_put_queue'?



We can do this,
Though it does not matter, since user space can not interrupt kernel 
function uacce_remove.



I think it matters :)

Image that the process holds the uacce queue is running(read and write 
the queue), then you do 'uacce_remove'.
The process is running(read and write the queue) well in the time 
between 'uacce_put_queue' and
'unmap_mapping_range', however, the queue with its DMA memory may be 
gotten and used by
other guys in this time, since you have released them in kernel. As a 
result, the running process will be a disaster.


cheers,
Zaibo

.



Thanks
.




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v12 2/4] uacce: add uacce driver

2020-02-25 Thread zhangfei

Hi, Raj

On 2020/2/25 上午2:22, Raj, Ashok wrote:

Hi Kenneth,

sorry for waking up late on this patchset.

+
+static int uacce_fops_open(struct inode *inode, struct file *filep)
+{
+   struct uacce_mm *uacce_mm = NULL;
+   struct uacce_device *uacce;
+   struct uacce_queue *q;
+   int ret = 0;
+
+   uacce = xa_load(_xa, iminor(inode));
+   if (!uacce)
+   return -ENODEV;
+
+   q = kzalloc(sizeof(struct uacce_queue), GFP_KERNEL);
+   if (!q)
+   return -ENOMEM;
+
+   mutex_lock(>mm_lock);
+   uacce_mm = uacce_mm_get(uacce, q, current->mm);
I think having this at open time is a bit unnatural. Since when a process
does fork, we do not inherit the PASID. Although it inherits the fd
but cannot use the mmaped address in the child.

If you move this to the mmap time, its more natural. The child could
do a mmap() get a new PASID + mmio space to work with the hardware.


Thanks for the suggestion.
We will consider fork in the next step, may need some time.

Thanks
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] uacce: unmap remaining mmapping from user space

2020-02-25 Thread zhangfei

Hi, Zaibo

On 2020/2/24 下午3:17, Xu Zaibo wrote:

  @@ -585,6 +595,13 @@ void uacce_remove(struct uacce_device *uacce)
  cdev_device_del(uacce->cdev, >dev);
  xa_erase(_xa, uacce->dev_id);
  put_device(>dev);
+
+    /*
+ * unmap remainning mapping from user space, preventing user still
+ * access the mmaped area while parent device is already removed
+ */
+    if (uacce->inode)
+    unmap_mapping_range(uacce->inode->i_mapping, 0, 0, 1);
Should we unmap them at the first of 'uacce_remove',  and before 
'uacce_put_queue'?



We can do this,
Though it does not matter, since user space can not interrupt kernel 
function uacce_remove.


Thanks
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH V9 05/10] iommu/vt-d: Support flushing more translation cache types

2020-02-25 Thread Auger Eric
Hi Jacob,

On 2/15/20 12:27 AM, Jacob Pan wrote:
> Hi Eric,
> 
> On Wed, 12 Feb 2020 13:55:25 +0100
> Auger Eric  wrote:
> 
>> Hi Jacob,
>>
>> On 1/29/20 7:01 AM, Jacob Pan wrote:
>>> When Shared Virtual Memory is exposed to a guest via vIOMMU,
>>> scalable IOTLB invalidation may be passed down from outside IOMMU
>>> subsystems. This patch adds invalidation functions that can be used
>>> for additional translation cache types.
>>>
>>> Signed-off-by: Jacob Pan 
>>> ---
>>>  drivers/iommu/dmar.c| 33 +
>>>  drivers/iommu/intel-pasid.c |  3 ++-
>>>  include/linux/intel-iommu.h | 20 
>>>  3 files changed, 51 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
>>> index 071bb42bbbc5..206733ec8140 100644
>>> --- a/drivers/iommu/dmar.c
>>> +++ b/drivers/iommu/dmar.c
>>> @@ -1411,6 +1411,39 @@ void qi_flush_piotlb(struct intel_iommu
>>> *iommu, u16 did, u32 pasid, u64 addr, qi_submit_sync(, iommu);
>>>  }
>>>  
>>> +/* PASID-based device IOTLB Invalidate */
>>> +void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid,
>>> u16 pfsid,
>>> +   u32 pasid,  u16 qdep, u64 addr, unsigned
>>> size_order, u64 granu) +{
>>> +   struct qi_desc desc = {.qw2 = 0, .qw3 = 0};
>>> +
>>> +   desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) |
>>> QI_DEV_EIOTLB_SID(sid) |
>>> +   QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE |
>>> +   QI_DEV_IOTLB_PFSID(pfsid);
>>> +   desc.qw1 = QI_DEV_EIOTLB_GLOB(granu);
>>> +
>>> +   /* If S bit is 0, we only flush a single page. If S bit is
>>> set,
>>> +* The least significant zero bit indicates the
>>> invalidation address
>>> +* range. VT-d spec 6.5.2.6.
>>> +* e.g. address bit 12[0] indicates 8KB, 13[0] indicates
>>> 16KB.
>>> +*/
>>> +   if (!size_order) {
>>> +   desc.qw0 |= QI_DEV_EIOTLB_ADDR(addr) &
>>> ~QI_DEV_EIOTLB_SIZE;
>>> +   } else {
>>> +   unsigned long mask = 1UL << (VTD_PAGE_SHIFT +
>>> size_order);
>>> +   desc.qw1 |= QI_DEV_EIOTLB_ADDR(addr & ~mask) |
>>> QI_DEV_EIOTLB_SIZE;
>>> +   }
>>> +   qi_submit_sync(, iommu);  
>> I made some comments in
>> https://lkml.org/lkml/2019/8/14/1311
>> that do not seem to have been taken into account. Or do I miss
>> something?
>>
> I missed adding these changes. At the time Baolu was doing cache flush
> consolidation so I wasn't sure if I could use his code completely. This
> patch is on top of his consolidated flush code with what is still
> needed for vSVA. Then I forgot to address your comments. Sorry about
> that.
no problem
> 
>> More generally having an individual history log would be useful and
>> speed up the review.
>>
> Will add history to each patch, e.g. like this?
> ---
> v8 -> v9
yes thanks. You're not obliged to list every little thing but to me, it
helps to see at first sight if you took into account major comments and
in case you did not - on purpose -, you can also indicate it.

Thanks

Eric
> ---
>> Thanks
>>
>> Eric
>>> +}
>>> +
>>> +void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64
>>> granu, int pasid) +{
>>> +   struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0};
>>> +
>>> +   desc.qw0 = QI_PC_PASID(pasid) | QI_PC_DID(did) |
>>> QI_PC_GRAN(granu) | QI_PC_TYPE;
>>> +   qi_submit_sync(, iommu);
>>> +}
>>> +
>>>  /*
>>>   * Disable Queued Invalidation interface.
>>>   */
>>> diff --git a/drivers/iommu/intel-pasid.c
>>> b/drivers/iommu/intel-pasid.c index bd067af4d20b..b100f51407f9
>>> 100644 --- a/drivers/iommu/intel-pasid.c
>>> +++ b/drivers/iommu/intel-pasid.c
>>> @@ -435,7 +435,8 @@ pasid_cache_invalidation_with_pasid(struct
>>> intel_iommu *iommu, {
>>> struct qi_desc desc;
>>>  
>>> -   desc.qw0 = QI_PC_DID(did) | QI_PC_PASID_SEL |
>>> QI_PC_PASID(pasid);
>>> +   desc.qw0 = QI_PC_DID(did) | QI_PC_GRAN(QI_PC_PASID_SEL) |
>>> +   QI_PC_PASID(pasid) | QI_PC_TYPE;
>>> desc.qw1 = 0;
>>> desc.qw2 = 0;
>>> desc.qw3 = 0;
>>> diff --git a/include/linux/intel-iommu.h
>>> b/include/linux/intel-iommu.h index b0ffecbc0dfc..dd9fa61689bc
>>> 100644 --- a/include/linux/intel-iommu.h
>>> +++ b/include/linux/intel-iommu.h
>>> @@ -332,7 +332,7 @@ enum {
>>>  #define QI_IOTLB_GRAN(gran)(((u64)gran) >>
>>> (DMA_TLB_FLUSH_GRANU_OFFSET-4)) #define QI_IOTLB_ADDR(addr)
>>> (((u64)addr) & VTD_PAGE_MASK) #define
>>> QI_IOTLB_IH(ih) (((u64)ih) << 6) -#define
>>> QI_IOTLB_AM(am) (((u8)am)) +#define
>>> QI_IOTLB_AM(am) (((u8)am) & 0x3f) 
>>>  #define QI_CC_FM(fm)   (((u64)fm) << 48)
>>>  #define QI_CC_SID(sid) (((u64)sid) << 32)
>>> @@ -351,16 +351,21 @@ enum {
>>>  #define QI_PC_DID(did) (((u64)did) << 16)
>>>  #define QI_PC_GRAN(gran)   (((u64)gran) << 4)
>>>  
>>> -#define QI_PC_ALL_PASIDS   (QI_PC_TYPE | QI_PC_GRAN(0))
>>> -#define QI_PC_PASID_SEL(QI_PC_TYPE | QI_PC_GRAN(1))
>>> +/* PASID cache invalidation granu */
>>> +#define QI_PC_ALL_PASIDS   0