Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device

2020-03-23 Thread Alexey Kardashevskiy



On 24/03/2020 14:37, Alexey Kardashevskiy wrote:
> 
> 
> On 24/03/2020 04:20, Christoph Hellwig wrote:
>> On Mon, Mar 23, 2020 at 07:58:01PM +1100, Alexey Kardashevskiy wrote:
> 0x100.. .. 0x101..
>
> 2x4G, each is 1TB aligned. And we can map directly only the first 4GB
> (because of the maximum IOMMU table size) but not the other. And 1:1 on
> that "pseries" is done with offset=0x0800....
>
> So we want to check every bus address against dev->bus_dma_limit, not
> dev->coherent_dma_mask. In the example above I'd set bus_dma_limit to
> 0x0800.0001.. and 1:1 mapping for the second 4GB would not be
> tried. Does this sound reasonable? Thanks,

 bus_dma_limit is just another limiting factor applied on top of
 coherent_dma_mask or dma_mask respectively.
>>>
>>> This is not enough for the task: in my example, I'd set bus limit to
>>> 0x0800.0001.. but this would disable bypass for all RAM
>>> addresses - the first and the second 4GB blocks.
>>
>> So what about something like the version here:
>>
>> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-bypass.3
> 
> 
> dma_alloc_direct() and dma_map_direct() do the same thing now which is
> good, did I miss anything else?
> 
> This lets us disable bypass automatically if this weird memory appears
> in the system but does not let us have 1:1 after that even for normal
> RAM. Thanks,

Ah no, does not help much, simple setting dma_ops_bypass will though.


But eventually, in this function:

static inline bool dma_map_direct(struct device *dev,
   const struct dma_map_ops *ops)
{
   if (likely(!ops))
   return true;
   if (!dev->dma_ops_bypass)
   return false;

   return min_not_zero(*dev->dma_mask, dev->bus_dma_limit) >=
   dma_direct_get_required_mask(dev);
}


we rather want it to take a dma handle and a size, and add

if (dev->bus_dma_limit)
return dev->bus_dma_limit > dma_handle + size;


where dma_handle=phys_to_dma(dev, phys) (I am not doing it here as unmap
needs the same test and it does not receive phys as a parameter).




-- 
Alexey
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device

2020-03-23 Thread Alexey Kardashevskiy



On 24/03/2020 04:20, Christoph Hellwig wrote:
> On Mon, Mar 23, 2020 at 07:58:01PM +1100, Alexey Kardashevskiy wrote:
 0x100.. .. 0x101..

 2x4G, each is 1TB aligned. And we can map directly only the first 4GB
 (because of the maximum IOMMU table size) but not the other. And 1:1 on
 that "pseries" is done with offset=0x0800....

 So we want to check every bus address against dev->bus_dma_limit, not
 dev->coherent_dma_mask. In the example above I'd set bus_dma_limit to
 0x0800.0001.. and 1:1 mapping for the second 4GB would not be
 tried. Does this sound reasonable? Thanks,
>>>
>>> bus_dma_limit is just another limiting factor applied on top of
>>> coherent_dma_mask or dma_mask respectively.
>>
>> This is not enough for the task: in my example, I'd set bus limit to
>> 0x0800.0001.. but this would disable bypass for all RAM
>> addresses - the first and the second 4GB blocks.
> 
> So what about something like the version here:
> 
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-bypass.3


dma_alloc_direct() and dma_map_direct() do the same thing now which is
good, did I miss anything else?

This lets us disable bypass automatically if this weird memory appears
in the system but does not let us have 1:1 after that even for normal
RAM. Thanks,


-- 
Alexey
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device

2020-03-23 Thread Alexey Kardashevskiy



On 24/03/2020 04:22, Christoph Hellwig wrote:
> On Mon, Mar 23, 2020 at 09:07:38PM +0530, Aneesh Kumar K.V wrote:
>>
>> This is what I was trying, but considering I am new to DMA subsystem, I
>> am not sure I got all the details correct. The idea is to look at the
>> cpu addr and see if that can be used in direct map fashion(is
>> bus_dma_limit the right restriction here?) if not fallback to dynamic
>> IOMMU mapping.
> 
> I don't think we can throw all these complications into the dma
> mapping code.  At some point I also wonder what the point is,
> especially for scatterlist mappings, where the iommu can coalesce.

This is for persistent memory which you can DMA to/from but yet it does
not appear in the system as a normal memory and therefore requires
special handling anyway (O_DIRECT or DAX, I do not know the exact
mechanics). All other devices in the system should just run as usual,
i.e. use 1:1 mapping if possible.


-- 
Alexey
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V10 02/11] iommu/uapi: Define a mask for bind data

2020-03-23 Thread Lu Baolu

On 2020/3/24 3:37, Jacob Pan wrote:

On Sun, 22 Mar 2020 09:29:32 +0800> Lu Baolu  wrote:


On 2020/3/21 7:27, Jacob Pan wrote:

Memory type related flags can be grouped together for one simple
check.

---
v9 renamed from EMT to MTS since these are memory type support
flags. ---

Signed-off-by: Jacob Pan
---
   include/uapi/linux/iommu.h | 5 -
   1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 4ad3496e5c43..d7bcbc5f79b0 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -284,7 +284,10 @@ struct iommu_gpasid_bind_data_vtd {
__u32 pat;
__u32 emt;
   };
-
+#define IOMMU_SVA_VTD_GPASID_MTS_MASK
(IOMMU_SVA_VTD_GPASID_CD | \
+IOMMU_SVA_VTD_GPASID_EMTE
| \
+IOMMU_SVA_VTD_GPASID_PCD
|  \
+
IOMMU_SVA_VTD_GPASID_PWT)

As name implies, can this move to intel-iommu.h?


I also thought about this but the masks are in vendor specific part of
the UAPI.



I looked through this patch series. It looks good to me. I will do some
code style cleanup and take it to v5.7. I am not the right person to
decide whether include/uapi/linux/iommu.h is the right place for this,
so I will move it to Intel IOMMU driver for now.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/vt-d: Replace intel SVM APIs with generic SVA APIs

2020-03-23 Thread Lu Baolu

On 2020/3/24 7:01, Raj, Ashok wrote:

Hi Jean

On Fri, Mar 20, 2020 at 10:29:55AM +0100, Jean-Philippe Brucker wrote:

+#define to_intel_svm_dev(handle) container_of(handle, struct intel_svm_dev, 
sva)
+struct iommu_sva *
+intel_svm_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
+{
+   struct iommu_sva *sva = ERR_PTR(-EINVAL);
+   struct intel_svm_dev *sdev = NULL;
+   int flags = 0;
+   int ret;
+
+   /*
+* TODO: Consolidate with generic iommu-sva bind after it is merged.
+* It will require shared SVM data structures, i.e. combine io_mm
+* and intel_svm etc.
+*/
+   if (drvdata)
+   flags = *(int *)drvdata;


drvdata is more for storing device driver contexts that can be passed to
iommu_sva_ops, but I get that this is temporary.

As usual I'm dreading supervisor mode making it into the common API. What
are your plans regarding SUPERVISOR_MODE and PRIVATE_PASID flags?  The
previous discussion on the subject [1] had me hoping that you could
replace supervisor mode with normal mappings (auxiliary domains?)
I'm less worried about PRIVATE_PASID, it would just add complexity into


We don't seem to have an immediate need for PRIVATE_PASID. There are some talks
about potential usages, but nothing concrete. I think it might be good to
get rid of it now and add when we really need.

For SUPERVISOR_MODE, the idea is to have aux domain. Baolu is working on
something to replace. Certainly the entire kernel address is opening up
the whole kimono.. so we are looking at dynamically creating mappings on demand.
It might take some of the benefits of SVA in general with no need to create
mappings, but for security somebody has to pay the price :-)


My thought is to reuse below aux-domain API.

int iommu_aux_attach_device(struct iommu_domain *domain,
struct device *dev)

Currently, it asks the vendor iommu driver to allocate a PASID and bind
the domain with it. We can change it to allow the caller to pass in an
existing supervisor PASID.

int iommu_aux_attach_device(struct iommu_domain *domain,
struct device *dev,
int *pasid)

In the vendor iommu driver, if (*pasid == INVALID_IOASID), it will
allocate a pasid (the same as current behavior); otherwise, attach
the domain with the pass-in pasid.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] iommu/vt-d: Replace intel SVM APIs with generic SVA APIs

2020-03-23 Thread Raj, Ashok
Hi Jean

On Fri, Mar 20, 2020 at 10:29:55AM +0100, Jean-Philippe Brucker wrote:
> > +#define to_intel_svm_dev(handle) container_of(handle, struct 
> > intel_svm_dev, sva)
> > +struct iommu_sva *
> > +intel_svm_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
> > +{
> > +   struct iommu_sva *sva = ERR_PTR(-EINVAL);
> > +   struct intel_svm_dev *sdev = NULL;
> > +   int flags = 0;
> > +   int ret;
> > +
> > +   /*
> > +* TODO: Consolidate with generic iommu-sva bind after it is merged.
> > +* It will require shared SVM data structures, i.e. combine io_mm
> > +* and intel_svm etc.
> > +*/
> > +   if (drvdata)
> > +   flags = *(int *)drvdata;
> 
> drvdata is more for storing device driver contexts that can be passed to
> iommu_sva_ops, but I get that this is temporary.
> 
> As usual I'm dreading supervisor mode making it into the common API. What
> are your plans regarding SUPERVISOR_MODE and PRIVATE_PASID flags?  The
> previous discussion on the subject [1] had me hoping that you could
> replace supervisor mode with normal mappings (auxiliary domains?)
> I'm less worried about PRIVATE_PASID, it would just add complexity into

We don't seem to have an immediate need for PRIVATE_PASID. There are some talks
about potential usages, but nothing concrete. I think it might be good to
get rid of it now and add when we really need.

For SUPERVISOR_MODE, the idea is to have aux domain. Baolu is working on
something to replace. Certainly the entire kernel address is opening up 
the whole kimono.. so we are looking at dynamically creating mappings on 
demand. 
It might take some of the benefits of SVA in general with no need to create
mappings, but for security somebody has to pay the price :-)

Cheers,
Ashok


> the API and iommu-sva implementation, but doesn't really have security
> implications.
> 
> [1] 
> https://lore.kernel.org/linux-iommu/20190228220449.ga12...@araj-mobl1.jf.intel.com/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu: Lower severity of add/remove device messages

2020-03-23 Thread Ezequiel Garcia
These user messages are not really informational,
but mostly of debug nature. Lower their severity.

Signed-off-by: Ezequiel Garcia 
---
 drivers/iommu/iommu.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3e3528436e0b..1ebd17033714 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -758,7 +758,7 @@ int iommu_group_add_device(struct iommu_group *group, 
struct device *dev)
 
trace_add_device_to_group(group->id, dev);
 
-   dev_info(dev, "Adding to iommu group %d\n", group->id);
+   dev_dbg(dev, "Adding to iommu group %d\n", group->id);
 
return 0;
 
@@ -792,7 +792,7 @@ void iommu_group_remove_device(struct device *dev)
struct iommu_group *group = dev->iommu_group;
struct group_device *tmp_device, *device = NULL;
 
-   dev_info(dev, "Removing from iommu group %d\n", group->id);
+   dev_dbg(dev, "Removing from iommu group %d\n", group->id);
 
/* Pre-notify listeners that a device is being removed. */
blocking_notifier_call_chain(>notifier,
@@ -2337,8 +2337,8 @@ request_default_domain_for_dev(struct device *dev, 
unsigned long type)
 
iommu_group_create_direct_mappings(group, dev);
 
-   dev_info(dev, "Using iommu %s mapping\n",
-type == IOMMU_DOMAIN_DMA ? "dma" : "direct");
+   dev_dbg(dev, "Using iommu %s mapping\n",
+   type == IOMMU_DOMAIN_DMA ? "dma" : "direct");
 
ret = 0;
 out:
-- 
2.26.0.rc2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V10 02/11] iommu/uapi: Define a mask for bind data

2020-03-23 Thread Jacob Pan
On Sun, 22 Mar 2020 09:29:32 +0800
Lu Baolu  wrote:

> On 2020/3/21 7:27, Jacob Pan wrote:
> > Memory type related flags can be grouped together for one simple
> > check.
> > 
> > ---
> > v9 renamed from EMT to MTS since these are memory type support
> > flags. ---
> > 
> > Signed-off-by: Jacob Pan 
> > ---
> >   include/uapi/linux/iommu.h | 5 -
> >   1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> > index 4ad3496e5c43..d7bcbc5f79b0 100644
> > --- a/include/uapi/linux/iommu.h
> > +++ b/include/uapi/linux/iommu.h
> > @@ -284,7 +284,10 @@ struct iommu_gpasid_bind_data_vtd {
> > __u32 pat;
> > __u32 emt;
> >   };
> > -
> > +#define IOMMU_SVA_VTD_GPASID_MTS_MASK
> > (IOMMU_SVA_VTD_GPASID_CD | \
> > +IOMMU_SVA_VTD_GPASID_EMTE
> > | \
> > +IOMMU_SVA_VTD_GPASID_PCD
> > |  \
> > +
> > IOMMU_SVA_VTD_GPASID_PWT)  
> 
> As name implies, can this move to intel-iommu.h?
> 
I also thought about this but the masks are in vendor specific part of
the UAPI.

> Best regards,
> baolu
> 
> >   /**
> >* struct iommu_gpasid_bind_data - Information about device and
> > guest PASID binding
> >* @version:  Version of this data structure
> >   

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [EXT] Re: [PATCH RFC] iommu/virtio: Use page size bitmap supported by endpoint

2020-03-23 Thread Bharat Bhushan
Hi Jean,

> -Original Message-
> From: Jean-Philippe Brucker 
> Sent: Monday, March 23, 2020 3:30 PM
> To: Bharat Bhushan 
> Cc: j...@8bytes.org; m...@redhat.com; jasow...@redhat.com;
> virtualizat...@lists.linux-foundation.org; iommu@lists.linux-foundation.org;
> eric.au...@redhat.com
> Subject: [EXT] Re: [PATCH RFC] iommu/virtio: Use page size bitmap supported by
> endpoint
> 
> External Email
> 
> --
> Hi Bharat,
> 
> Please add the IOMMU list on your next posting
> 
> On Mon, Mar 23, 2020 at 02:11:08PM +0530, Bharat Bhushan wrote:
> > Different endpoint can support different page size, probe endpoint if
> > it supports specific page size otherwise use global page sizes.
> >
> > Signed-off-by: Bharat Bhushan 
> > ---
> >  drivers/iommu/virtio-iommu.c  | 24 
> >  include/uapi/linux/virtio_iommu.h |  6 ++
> >  2 files changed, 26 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/iommu/virtio-iommu.c
> > b/drivers/iommu/virtio-iommu.c index cce329d71fba..e69347ca4ee6 100644
> > --- a/drivers/iommu/virtio-iommu.c
> > +++ b/drivers/iommu/virtio-iommu.c
> > @@ -78,6 +78,7 @@ struct viommu_endpoint {
> > struct viommu_dev   *viommu;
> > struct viommu_domain*vdomain;
> > struct list_headresv_regions;
> > +   u64 pgsize_bitmap;
> >  };
> >
> >  struct viommu_request {
> > @@ -415,6 +416,14 @@ static int viommu_replay_mappings(struct
> viommu_domain *vdomain)
> > return ret;
> >  }
> >
> > +static int viommu_set_pgsize_bitmap(struct viommu_endpoint *vdev,
> > +   struct virtio_iommu_probe_pgsize_mask *mask)
> > +
> > +{
> > +   vdev->pgsize_bitmap = mask->pgsize_bitmap;
> 
> We need to read this through le64_to_cpu(). Also check that the length of the 
> field
> provided by the device is >= sizeof(mask) (like
> viommu_add_resv_mem() does)
> 
> > +   return 0;
> > +}
> > +
> >  static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
> >struct virtio_iommu_probe_resv_mem *mem,
> >size_t len)
> > @@ -494,11 +503,13 @@ static int viommu_probe_endpoint(struct viommu_dev
> *viommu, struct device *dev)
> > while (type != VIRTIO_IOMMU_PROBE_T_NONE &&
> >cur < viommu->probe_size) {
> > len = le16_to_cpu(prop->length) + sizeof(*prop);
> > -
> > switch (type) {
> > case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
> > ret = viommu_add_resv_mem(vdev, (void *)prop, len);
> > break;
> > +   case VIRTIO_IOMMU_PROBE_T_PAGE_SIZE_MASK:
> > +   ret = viommu_set_pgsize_bitmap(vdev, (void *)prop);
> > +   break;
> > default:
> > dev_err(dev, "unknown viommu prop 0x%x\n", type);
> > }
> > @@ -607,16 +618,21 @@ static struct iommu_domain
> *viommu_domain_alloc(unsigned type)
> > return >domain;
> >  }
> >
> > -static int viommu_domain_finalise(struct viommu_dev *viommu,
> > +static int viommu_domain_finalise(struct viommu_endpoint *vdev,
> >   struct iommu_domain *domain)
> >  {
> > int ret;
> > struct viommu_domain *vdomain = to_viommu_domain(domain);
> > +   struct viommu_dev *viommu = vdev->viommu;
> >
> > vdomain->viommu = viommu;
> > vdomain->map_flags  = viommu->map_flags;
> >
> > -   domain->pgsize_bitmap   = viommu->pgsize_bitmap;
> > +   if (vdev->pgsize_bitmap)
> > +   domain->pgsize_bitmap = vdev->pgsize_bitmap;
> > +   else
> > +   domain->pgsize_bitmap   = viommu->pgsize_bitmap;
> > +
> 
> nit: it could be nicer to initialize vdev->pgsize_bitmap in add_device(),

To what size we should initialize in add_device, PAGE_SIZE?

Thanks
-Bharat

> override it
> in probe_endpoint(), and just copy it here.
> 
> > domain->geometry= viommu->geometry;
> >
> > ret = ida_alloc_range(>domain_ids, viommu->first_domain, @@
> > -657,7 +673,7 @@ static int viommu_attach_dev(struct iommu_domain
> *domain, struct device *dev)
> >  * Properly initialize the domain now that we know which viommu
> >  * owns it.
> >  */
> > -   ret = viommu_domain_finalise(vdev->viommu, domain);
> > +   ret = viommu_domain_finalise(vdev, domain);
> 
> Attaching additional endpoints with different masks to the domain should 
> return
> an error
> 
> > } else if (vdomain->viommu != vdev->viommu) {
> > dev_err(dev, "cannot attach to foreign vIOMMU\n");
> > ret = -EXDEV;
> > diff --git a/include/uapi/linux/virtio_iommu.h
> > b/include/uapi/linux/virtio_iommu.h
> > index 237e36a280cb..aff3db0ef54b 100644
> > --- a/include/uapi/linux/virtio_iommu.h
> > +++ b/include/uapi/linux/virtio_iommu.h
> > @@ -111,6 +111,7 @@ struct virtio_iommu_req_unmap {
> >
> >  

RE: [EXT] Re: [PATCH RFC] iommu/virtio: Use page size bitmap supported by endpoint

2020-03-23 Thread Bharat Bhushan
Hi Jean,

> -Original Message-
> From: Jean-Philippe Brucker 
> Sent: Monday, March 23, 2020 3:30 PM
> To: Bharat Bhushan 
> Cc: j...@8bytes.org; m...@redhat.com; jasow...@redhat.com;
> virtualizat...@lists.linux-foundation.org; iommu@lists.linux-foundation.org;
> eric.au...@redhat.com
> Subject: [EXT] Re: [PATCH RFC] iommu/virtio: Use page size bitmap supported by
> endpoint
> 
> External Email
> 
> --
> Hi Bharat,
> 
> Please add the IOMMU list on your next posting

iommu@lists.linux-foundation.org is there, any other mailing list we need to 
add?

> 
> On Mon, Mar 23, 2020 at 02:11:08PM +0530, Bharat Bhushan wrote:
> > Different endpoint can support different page size, probe endpoint if
> > it supports specific page size otherwise use global page sizes.
> >
> > Signed-off-by: Bharat Bhushan 
> > ---
> >  drivers/iommu/virtio-iommu.c  | 24 
> >  include/uapi/linux/virtio_iommu.h |  6 ++
> >  2 files changed, 26 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/iommu/virtio-iommu.c
> > b/drivers/iommu/virtio-iommu.c index cce329d71fba..e69347ca4ee6 100644
> > --- a/drivers/iommu/virtio-iommu.c
> > +++ b/drivers/iommu/virtio-iommu.c
> > @@ -78,6 +78,7 @@ struct viommu_endpoint {
> > struct viommu_dev   *viommu;
> > struct viommu_domain*vdomain;
> > struct list_headresv_regions;
> > +   u64 pgsize_bitmap;
> >  };
> >
> >  struct viommu_request {
> > @@ -415,6 +416,14 @@ static int viommu_replay_mappings(struct
> viommu_domain *vdomain)
> > return ret;
> >  }
> >
> > +static int viommu_set_pgsize_bitmap(struct viommu_endpoint *vdev,
> > +   struct virtio_iommu_probe_pgsize_mask *mask)
> > +
> > +{
> > +   vdev->pgsize_bitmap = mask->pgsize_bitmap;
> 
> We need to read this through le64_to_cpu(). Also check that the length of the 
> field
> provided by the device is >= sizeof(mask) (like
> viommu_add_resv_mem() does)

Will take care of all the comments in next verions

Thank
-Bharat

> 
> > +   return 0;
> > +}
> > +
> >  static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
> >struct virtio_iommu_probe_resv_mem *mem,
> >size_t len)
> > @@ -494,11 +503,13 @@ static int viommu_probe_endpoint(struct viommu_dev
> *viommu, struct device *dev)
> > while (type != VIRTIO_IOMMU_PROBE_T_NONE &&
> >cur < viommu->probe_size) {
> > len = le16_to_cpu(prop->length) + sizeof(*prop);
> > -
> > switch (type) {
> > case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
> > ret = viommu_add_resv_mem(vdev, (void *)prop, len);
> > break;
> > +   case VIRTIO_IOMMU_PROBE_T_PAGE_SIZE_MASK:
> > +   ret = viommu_set_pgsize_bitmap(vdev, (void *)prop);
> > +   break;
> > default:
> > dev_err(dev, "unknown viommu prop 0x%x\n", type);
> > }
> > @@ -607,16 +618,21 @@ static struct iommu_domain
> *viommu_domain_alloc(unsigned type)
> > return >domain;
> >  }
> >
> > -static int viommu_domain_finalise(struct viommu_dev *viommu,
> > +static int viommu_domain_finalise(struct viommu_endpoint *vdev,
> >   struct iommu_domain *domain)
> >  {
> > int ret;
> > struct viommu_domain *vdomain = to_viommu_domain(domain);
> > +   struct viommu_dev *viommu = vdev->viommu;
> >
> > vdomain->viommu = viommu;
> > vdomain->map_flags  = viommu->map_flags;
> >
> > -   domain->pgsize_bitmap   = viommu->pgsize_bitmap;
> > +   if (vdev->pgsize_bitmap)
> > +   domain->pgsize_bitmap = vdev->pgsize_bitmap;
> > +   else
> > +   domain->pgsize_bitmap   = viommu->pgsize_bitmap;
> > +
> 
> nit: it could be nicer to initialize vdev->pgsize_bitmap in add_device(), 
> override it
> in probe_endpoint(), and just copy it here.
> 
> > domain->geometry= viommu->geometry;
> >
> > ret = ida_alloc_range(>domain_ids, viommu->first_domain, @@
> > -657,7 +673,7 @@ static int viommu_attach_dev(struct iommu_domain
> *domain, struct device *dev)
> >  * Properly initialize the domain now that we know which viommu
> >  * owns it.
> >  */
> > -   ret = viommu_domain_finalise(vdev->viommu, domain);
> > +   ret = viommu_domain_finalise(vdev, domain);
> 
> Attaching additional endpoints with different masks to the domain should 
> return
> an error
> 
> > } else if (vdomain->viommu != vdev->viommu) {
> > dev_err(dev, "cannot attach to foreign vIOMMU\n");
> > ret = -EXDEV;
> > diff --git a/include/uapi/linux/virtio_iommu.h
> > b/include/uapi/linux/virtio_iommu.h
> > index 237e36a280cb..aff3db0ef54b 100644
> > --- a/include/uapi/linux/virtio_iommu.h
> > +++ 

Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device

2020-03-23 Thread Christoph Hellwig
On Mon, Mar 23, 2020 at 09:07:38PM +0530, Aneesh Kumar K.V wrote:
> 
> This is what I was trying, but considering I am new to DMA subsystem, I
> am not sure I got all the details correct. The idea is to look at the
> cpu addr and see if that can be used in direct map fashion(is
> bus_dma_limit the right restriction here?) if not fallback to dynamic
> IOMMU mapping.

I don't think we can throw all these complications into the dma
mapping code.  At some point I also wonder what the point is,
especially for scatterlist mappings, where the iommu can coalesce.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device

2020-03-23 Thread Christoph Hellwig
On Mon, Mar 23, 2020 at 07:58:01PM +1100, Alexey Kardashevskiy wrote:
> >> 0x100.. .. 0x101..
> >>
> >> 2x4G, each is 1TB aligned. And we can map directly only the first 4GB
> >> (because of the maximum IOMMU table size) but not the other. And 1:1 on
> >> that "pseries" is done with offset=0x0800....
> >>
> >> So we want to check every bus address against dev->bus_dma_limit, not
> >> dev->coherent_dma_mask. In the example above I'd set bus_dma_limit to
> >> 0x0800.0001.. and 1:1 mapping for the second 4GB would not be
> >> tried. Does this sound reasonable? Thanks,
> > 
> > bus_dma_limit is just another limiting factor applied on top of
> > coherent_dma_mask or dma_mask respectively.
> 
> This is not enough for the task: in my example, I'd set bus limit to
> 0x0800.0001.. but this would disable bypass for all RAM
> addresses - the first and the second 4GB blocks.

So what about something like the version here:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-bypass.3
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 10/15] iommu/arm-smmu: Use accessor functions for iommu private data

2020-03-23 Thread Robin Murphy

Hi Joerg,

Thanks for tackling this!

On 2020-03-20 9:14 am, Joerg Roedel wrote:

From: Joerg Roedel 

Make use of dev_iommu_priv_set/get() functions and simplify the code
where possible with this change.

Tested-by: Will Deacon  # arm-smmu
Signed-off-by: Joerg Roedel 
---
  drivers/iommu/arm-smmu.c | 57 +---
  1 file changed, 30 insertions(+), 27 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 980aae73b45b..7aa36e6c19c0 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -98,12 +98,15 @@ struct arm_smmu_master_cfg {
s16 smendx[];
  };
  #define INVALID_SMENDX-1
-#define __fwspec_cfg(fw) ((struct arm_smmu_master_cfg *)fw->iommu_priv)
-#define fwspec_smmu(fw)  (__fwspec_cfg(fw)->smmu)
-#define fwspec_smendx(fw, i) \
-   (i >= fw->num_ids ? INVALID_SMENDX : __fwspec_cfg(fw)->smendx[i])
-#define for_each_cfg_sme(fw, i, idx) \
-   for (i = 0; idx = fwspec_smendx(fw, i), i < fw->num_ids; ++i)
+#define __fwspec_cfg(dev) ((struct arm_smmu_master_cfg 
*)dev_iommu_priv_get(dev))
+#define fwspec_smmu(dev)  (__fwspec_cfg(dev)->smmu)
+#define fwspec_smendx(dev, i)  \
+   (i >= dev_iommu_fwspec_get(dev)->num_ids ?\
+   INVALID_SMENDX :\
+   __fwspec_cfg(dev)->smendx[i])
+#define for_each_cfg_sme(dev, i, idx) \
+   for (i = 0; idx = fwspec_smendx(dev, i), \
+i < dev_iommu_fwspec_get(dev)->num_ids; ++i)


Yikes, this ends up pretty ugly, and I'd prefer not have this much 
complexity hidden in macros that were intended just to be convenient 
shorthand. Would you mind pulling in the patch below as a precursor?


Other than that, the rest of the series looks OK at a glance. We should 
also move fwspec->ops to dev_iommu, as those are "IOMMU API" data rather 
than "firmware" data, but let's consider that separately as this series 
is already long enough.


Thanks,
Robin.

->8-
Subject: [PATCH] iommu/arm-smmu: Refactor master_cfg/fwspec usage

In preparation for restructuring iommu_fwspec, refactor the way we
access the arm_smmu_master_cfg private data to be less dependent on
the current layout.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu.c | 42 +---
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 16c4b87af42b..b4978f45a7f2 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -98,12 +98,10 @@ struct arm_smmu_master_cfg {
s16 smendx[];
 };
 #define INVALID_SMENDX -1
-#define __fwspec_cfg(fw) ((struct arm_smmu_master_cfg *)fw->iommu_priv)
-#define fwspec_smmu(fw)  (__fwspec_cfg(fw)->smmu)
-#define fwspec_smendx(fw, i) \
-   (i >= fw->num_ids ? INVALID_SMENDX : __fwspec_cfg(fw)->smendx[i])
-#define for_each_cfg_sme(fw, i, idx) \
-   for (i = 0; idx = fwspec_smendx(fw, i), i < fw->num_ids; ++i)
+#define cfg_smendx(cfg, fw, i) \
+   (i >= fw->num_ids ? INVALID_SMENDX : cfg->smendx[i])
+#define for_each_cfg_sme(cfg, fw, i, idx) \
+   for (i = 0; idx = cfg_smendx(cfg, fw, i), i < fw->num_ids; ++i)

 static bool using_legacy_binding, using_generic_binding;

@@ -1069,7 +1067,7 @@ static int arm_smmu_master_alloc_smes(struct 
device *dev)


mutex_lock(>stream_map_mutex);
/* Figure out a viable stream map entry allocation */
-   for_each_cfg_sme(fwspec, i, idx) {
+   for_each_cfg_sme(cfg, fwspec, i, idx) {
u16 sid = FIELD_GET(ARM_SMMU_SMR_ID, fwspec->ids[i]);
u16 mask = FIELD_GET(ARM_SMMU_SMR_MASK, fwspec->ids[i]);

@@ -1100,7 +1098,7 @@ static int arm_smmu_master_alloc_smes(struct 
device *dev)

iommu_group_put(group);

/* It worked! Now, poke the actual hardware */
-   for_each_cfg_sme(fwspec, i, idx) {
+   for_each_cfg_sme(cfg, fwspec, i, idx) {
arm_smmu_write_sme(smmu, idx);
smmu->s2crs[idx].group = group;
}
@@ -1117,14 +1115,14 @@ static int arm_smmu_master_alloc_smes(struct 
device *dev)

return ret;
 }

-static void arm_smmu_master_free_smes(struct iommu_fwspec *fwspec)
+static void arm_smmu_master_free_smes(struct arm_smmu_master_cfg *cfg,
+ struct iommu_fwspec *fwspec)
 {
-   struct arm_smmu_device *smmu = fwspec_smmu(fwspec);
-   struct arm_smmu_master_cfg *cfg = fwspec->iommu_priv;
+   struct arm_smmu_device *smmu = cfg->smmu;
int i, idx;

mutex_lock(>stream_map_mutex);
-   for_each_cfg_sme(fwspec, i, idx) {
+   for_each_cfg_sme(cfg, fwspec, i, idx) {
if (arm_smmu_free_sme(smmu, idx))
arm_smmu_write_sme(smmu, idx);
cfg->smendx[i] = INVALID_SMENDX;
@@ -1133,6 +1131,7 @@ static void 

Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device

2020-03-23 Thread Aneesh Kumar K.V
Christoph Hellwig  writes:

> On Mon, Mar 23, 2020 at 09:37:05AM +0100, Christoph Hellwig wrote:
>> > > +/*
>> > > + * Allows IOMMU drivers to bypass dynamic translations if the 
>> > > DMA mask
>> > > + * is large enough.
>> > > + */
>> > > +if (dev->dma_ops_bypass) {
>> > > +if (min_not_zero(dev->coherent_dma_mask, 
>> > > dev->bus_dma_limit) >=
>> > > +dma_direct_get_required_mask(dev))
>> > > +return true;
>> > > +}
>> > 
>> > 
>> > Why not do this in dma_map_direct() as well?
>> 
>> Mostly beacuse it is a relatively expensive operation, including a
>> fls64.
>
> Which I guess isn't too bad compared to a dynamic IOMMU mapping.  Can
> you just send a draft patch for what you'd like to see for ppc?

This is what I was trying, but considering I am new to DMA subsystem, I
am not sure I got all the details correct. The idea is to look at the
cpu addr and see if that can be used in direct map fashion(is
bus_dma_limit the right restriction here?) if not fallback to dynamic
IOMMU mapping.

diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index e486d1d78de2..bc7e6a8b2caa 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -31,6 +31,87 @@ static inline bool dma_iommu_map_bypass(struct device *dev,
(!iommu_fixed_is_weak || (attrs & DMA_ATTR_WEAK_ORDERING));
 }
 
+static inline bool __dma_direct_map_capable(struct device *dev, struct page 
*page,
+   unsigned long offset, size_t size)
+{
+   phys_addr_t phys = page_to_phys(page) + offset;
+   dma_addr_t dma_addr = phys_to_dma(dev, phys);
+   dma_addr_t end = dma_addr + size - 1;
+
+   return end <= min_not_zero(*dev->dma_mask, dev->bus_dma_limit);
+}
+
+static inline bool dma_direct_map_capable(struct device *dev, struct page 
*page,
+ unsigned long offset, size_t size,
+ unsigned long attrs)
+{
+   if (!dma_iommu_map_bypass(dev, attrs))
+   return false;
+
+   if (!dev->dma_mask)
+   return false;
+
+   return __dma_direct_map_capable(dev, page, offset, size);
+}
+
+
+static inline bool dma_direct_unmap_capable(struct device *dev, dma_addr_t 
addr, size_t size,
+   unsigned long attrs)
+{
+   dma_addr_t end = addr + size - 1;
+
+   if (!dma_iommu_map_bypass(dev, attrs))
+   return false;
+
+   if (!dev->dma_mask)
+   return false;
+
+   return end <= min_not_zero(*dev->dma_mask, dev->bus_dma_limit);
+}
+
+static inline bool dma_direct_sg_map_capable(struct device *dev, struct 
scatterlist *sglist,
+int nelems, unsigned long attrs)
+{
+   int i;
+   struct scatterlist *sg;
+
+   if (!dma_iommu_map_bypass(dev, attrs))
+   return false;
+
+   if (!dev->dma_mask)
+   return false;
+
+   for_each_sg(sglist, sg, nelems, i) {
+   if (!__dma_direct_map_capable(dev, sg_page(sg),
+ sg->offset, sg->length))
+   return false;
+   }
+   return true;
+}
+
+static inline bool dma_direct_sg_unmap_capable(struct device *dev, struct 
scatterlist *sglist,
+  int nelems, unsigned long attrs)
+{
+   int i;
+   dma_addr_t end;
+   struct scatterlist *sg;
+
+   if (!dma_iommu_map_bypass(dev, attrs))
+   return false;
+
+   if (!dev->dma_mask)
+   return false;
+
+   for_each_sg(sglist, sg, nelems, i) {
+   end = sg->dma_address + sg_dma_len(sg);
+
+   if (end > min_not_zero(*dev->dma_mask, dev->bus_dma_limit))
+   return false;
+   }
+   return true;
+}
+
+
 /* Allocates a contiguous real buffer and creates mappings over it.
  * Returns the virtual address of the buffer and sets dma_handle
  * to the dma address (mapping) of the first page.
@@ -67,7 +148,7 @@ static dma_addr_t dma_iommu_map_page(struct device *dev, 
struct page *page,
 enum dma_data_direction direction,
 unsigned long attrs)
 {
-   if (dma_iommu_map_bypass(dev, attrs))
+   if (dma_direct_map_capable(dev, page, offset, size, attrs))
return dma_direct_map_page(dev, page, offset, size, direction,
attrs);
return iommu_map_page(dev, get_iommu_table_base(dev), page, offset,
@@ -79,7 +160,7 @@ static void dma_iommu_unmap_page(struct device *dev, 
dma_addr_t dma_handle,
 size_t size, enum dma_data_direction direction,
 unsigned long attrs)
 {
-   if 

Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device

2020-03-23 Thread Christoph Hellwig
On Mon, Mar 23, 2020 at 12:14:08PM +, Robin Murphy wrote:
> On 2020-03-20 2:16 pm, Christoph Hellwig wrote:
>> Several IOMMU drivers have a bypass mode where they can use a direct
>> mapping if the devices DMA mask is large enough.  Add generic support
>> to the core dma-mapping code to do that to switch those drivers to
>> a common solution.
>
> Hmm, this is _almost_, but not quite the same as the case where drivers are 
> managing their own IOMMU mappings, but still need to use streaming DMA for 
> cache maintenance on the underlying pages.

In that case they should simply not call the DMA API at all.  We'll just
need versions of the cache maintainance APIs that tie in with the raw
IOMMU API.

> For that we need the ops bypass 
> to be a "true" bypass and also avoid SWIOTLB regardless of the device's DMA 
> mask. That behaviour should in fact be fine for the opportunistic bypass 
> case here as well, since the mask being "big enough" implies by definition 
> that this should never need to bounce either.

In practice it does.  But that means adding yet another code path
vs the simple direct call to dma_direct_* vs calling the DMA ops
which I'd rather avoid.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device

2020-03-23 Thread Robin Murphy

On 2020-03-20 2:16 pm, Christoph Hellwig wrote:

Several IOMMU drivers have a bypass mode where they can use a direct
mapping if the devices DMA mask is large enough.  Add generic support
to the core dma-mapping code to do that to switch those drivers to
a common solution.


Hmm, this is _almost_, but not quite the same as the case where drivers 
are managing their own IOMMU mappings, but still need to use streaming 
DMA for cache maintenance on the underlying pages. For that we need the 
ops bypass to be a "true" bypass and also avoid SWIOTLB regardless of 
the device's DMA mask. That behaviour should in fact be fine for the 
opportunistic bypass case here as well, since the mask being "big 
enough" implies by definition that this should never need to bounce either.


For the (hopefully less common) third case where, due to groups or user 
overrides, we end up giving an identity DMA domain to a device with 
limited DMA masks which _does_ need SWIOTLB, I'd like to think we can 
solve that by not giving the device IOMMU DMA ops in the first place, 
such that it never needs to engage the bypass mechanism at all.


Thoughts?

Robin.


Signed-off-by: Christoph Hellwig 
---
  include/linux/device.h  |  6 ++
  include/linux/dma-mapping.h | 30 ++
  kernel/dma/mapping.c| 36 +++-
  3 files changed, 51 insertions(+), 21 deletions(-)

diff --git a/include/linux/device.h b/include/linux/device.h
index 0cd7c647c16c..09be8bb2c4a6 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -525,6 +525,11 @@ struct dev_links_info {
   *  sync_state() callback.
   * @dma_coherent: this particular device is dma coherent, even if the
   *architecture supports non-coherent devices.
+ * @dma_ops_bypass: If set to %true then the dma_ops are bypassed for the
+ * streaming DMA operations (->map_* / ->unmap_* / ->sync_*),
+ * and optionall (if the coherent mask is large enough) also
+ * for dma allocations.  This flag is managed by the dma ops
+ * instance from ->dma_supported.
   *
   * At the lowest level, every device in a Linux system is represented by an
   * instance of struct device. The device structure contains the information
@@ -625,6 +630,7 @@ struct device {
  defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
booldma_coherent:1;
  #endif
+   booldma_ops_bypass : 1;
  };
  
  static inline struct device *kobj_to_dev(struct kobject *kobj)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 330ad58fbf4d..c3af0cf5e435 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -188,9 +188,15 @@ static inline int dma_mmap_from_global_coherent(struct 
vm_area_struct *vma,
  }
  #endif /* CONFIG_DMA_DECLARE_COHERENT */
  
-static inline bool dma_is_direct(const struct dma_map_ops *ops)

+/*
+ * Check if the devices uses a direct mapping for streaming DMA operations.
+ * This allows IOMMU drivers to set a bypass mode if the DMA mask is large
+ * enough.
+ */
+static inline bool dma_map_direct(struct device *dev,
+   const struct dma_map_ops *ops)
  {
-   return likely(!ops);
+   return likely(!ops) || dev->dma_ops_bypass;
  }
  
  /*

@@ -279,7 +285,7 @@ static inline dma_addr_t dma_map_page_attrs(struct device 
*dev,
dma_addr_t addr;
  
  	BUG_ON(!valid_dma_direction(dir));

-   if (dma_is_direct(ops))
+   if (dma_map_direct(dev, ops))
addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
else
addr = ops->map_page(dev, page, offset, size, dir, attrs);
@@ -294,7 +300,7 @@ static inline void dma_unmap_page_attrs(struct device *dev, 
dma_addr_t addr,
const struct dma_map_ops *ops = get_dma_ops(dev);
  
  	BUG_ON(!valid_dma_direction(dir));

-   if (dma_is_direct(ops))
+   if (dma_map_direct(dev, ops))
dma_direct_unmap_page(dev, addr, size, dir, attrs);
else if (ops->unmap_page)
ops->unmap_page(dev, addr, size, dir, attrs);
@@ -313,7 +319,7 @@ static inline int dma_map_sg_attrs(struct device *dev, 
struct scatterlist *sg,
int ents;
  
  	BUG_ON(!valid_dma_direction(dir));

-   if (dma_is_direct(ops))
+   if (dma_map_direct(dev, ops))
ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
else
ents = ops->map_sg(dev, sg, nents, dir, attrs);
@@ -331,7 +337,7 @@ static inline void dma_unmap_sg_attrs(struct device *dev, 
struct scatterlist *sg
  
  	BUG_ON(!valid_dma_direction(dir));

debug_dma_unmap_sg(dev, sg, nents, dir);
-   if (dma_is_direct(ops))
+   if (dma_map_direct(dev, ops))
dma_direct_unmap_sg(dev, sg, nents, dir, attrs);
else if (ops->unmap_sg)
ops->unmap_sg(dev, sg, nents, dir, attrs);
@@ -352,7 +358,7 @@ static inline 

Re: [PATCH RFC] iommu/virtio: Use page size bitmap supported by endpoint

2020-03-23 Thread Jean-Philippe Brucker
Hi Bharat,

Please add the IOMMU list on your next posting

On Mon, Mar 23, 2020 at 02:11:08PM +0530, Bharat Bhushan wrote:
> Different endpoint can support different page size, probe
> endpoint if it supports specific page size otherwise use
> global page sizes.
> 
> Signed-off-by: Bharat Bhushan 
> ---
>  drivers/iommu/virtio-iommu.c  | 24 
>  include/uapi/linux/virtio_iommu.h |  6 ++
>  2 files changed, 26 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index cce329d71fba..e69347ca4ee6 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -78,6 +78,7 @@ struct viommu_endpoint {
>   struct viommu_dev   *viommu;
>   struct viommu_domain*vdomain;
>   struct list_headresv_regions;
> + u64 pgsize_bitmap;
>  };
>  
>  struct viommu_request {
> @@ -415,6 +416,14 @@ static int viommu_replay_mappings(struct viommu_domain 
> *vdomain)
>   return ret;
>  }
>  
> +static int viommu_set_pgsize_bitmap(struct viommu_endpoint *vdev,
> + struct virtio_iommu_probe_pgsize_mask *mask)
> +
> +{
> + vdev->pgsize_bitmap = mask->pgsize_bitmap;

We need to read this through le64_to_cpu(). Also check that the length of
the field provided by the device is >= sizeof(mask) (like
viommu_add_resv_mem() does)

> + return 0;
> +}
> +
>  static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
>  struct virtio_iommu_probe_resv_mem *mem,
>  size_t len)
> @@ -494,11 +503,13 @@ static int viommu_probe_endpoint(struct viommu_dev 
> *viommu, struct device *dev)
>   while (type != VIRTIO_IOMMU_PROBE_T_NONE &&
>  cur < viommu->probe_size) {
>   len = le16_to_cpu(prop->length) + sizeof(*prop);
> -
>   switch (type) {
>   case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
>   ret = viommu_add_resv_mem(vdev, (void *)prop, len);
>   break;
> + case VIRTIO_IOMMU_PROBE_T_PAGE_SIZE_MASK:
> + ret = viommu_set_pgsize_bitmap(vdev, (void *)prop);
> + break;
>   default:
>   dev_err(dev, "unknown viommu prop 0x%x\n", type);
>   }
> @@ -607,16 +618,21 @@ static struct iommu_domain 
> *viommu_domain_alloc(unsigned type)
>   return >domain;
>  }
>  
> -static int viommu_domain_finalise(struct viommu_dev *viommu,
> +static int viommu_domain_finalise(struct viommu_endpoint *vdev,
> struct iommu_domain *domain)
>  {
>   int ret;
>   struct viommu_domain *vdomain = to_viommu_domain(domain);
> + struct viommu_dev *viommu = vdev->viommu;
>  
>   vdomain->viommu = viommu;
>   vdomain->map_flags  = viommu->map_flags;
>  
> - domain->pgsize_bitmap   = viommu->pgsize_bitmap;
> + if (vdev->pgsize_bitmap)
> + domain->pgsize_bitmap = vdev->pgsize_bitmap;
> + else
> + domain->pgsize_bitmap   = viommu->pgsize_bitmap;
> +

nit: it could be nicer to initialize vdev->pgsize_bitmap in add_device(),
override it in probe_endpoint(), and just copy it here.

>   domain->geometry= viommu->geometry;
>  
>   ret = ida_alloc_range(>domain_ids, viommu->first_domain,
> @@ -657,7 +673,7 @@ static int viommu_attach_dev(struct iommu_domain *domain, 
> struct device *dev)
>* Properly initialize the domain now that we know which viommu
>* owns it.
>*/
> - ret = viommu_domain_finalise(vdev->viommu, domain);
> + ret = viommu_domain_finalise(vdev, domain);

Attaching additional endpoints with different masks to the domain should
return an error

>   } else if (vdomain->viommu != vdev->viommu) {
>   dev_err(dev, "cannot attach to foreign vIOMMU\n");
>   ret = -EXDEV;
> diff --git a/include/uapi/linux/virtio_iommu.h 
> b/include/uapi/linux/virtio_iommu.h
> index 237e36a280cb..aff3db0ef54b 100644
> --- a/include/uapi/linux/virtio_iommu.h
> +++ b/include/uapi/linux/virtio_iommu.h
> @@ -111,6 +111,7 @@ struct virtio_iommu_req_unmap {
>  
>  #define VIRTIO_IOMMU_PROBE_T_NONE0
>  #define VIRTIO_IOMMU_PROBE_T_RESV_MEM1
> +#define VIRTIO_IOMMU_PROBE_T_PAGE_SIZE_MASK  2
>  
>  #define VIRTIO_IOMMU_PROBE_T_MASK0xfff
>  
> @@ -119,6 +120,11 @@ struct virtio_iommu_probe_property {
>   __le16  length;
>  };
>  
> +struct virtio_iommu_probe_pgsize_mask {
> + struct virtio_iommu_probe_property  head;

Compilers will introduce 4 bytes of padding here, to align the next field.
We need to make them explicit by adding a 4-bytes 'reserved' field.

> + uint64_tpgsize_bitmap;

__le64

Thanks,
Jean

> +};

Re: [PATCH 0/3] Request direct mapping for modem firmware subdevice

2020-03-23 Thread Sai Prakash Ranjan

Hi Robin,

On 2020-03-12 17:35, Robin Murphy wrote:

On 2020-03-12 6:28 am, Sai Prakash Ranjan wrote:

Hi Robin,


Are you talking about this one for SoC specific change - 
https://lore.kernel.org/patchwork/patch/1183530/


Exactly - this particular wheel needs no reinventing at all.

[ I guess I should go review those patches properly... :) ]



It would be great if you could review the patch - 
https://lore.kernel.org/patchwork/patch/1183530/

Sibi has posted a v2 of this series based on that patch.

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: arm-smmu-v3 high cpu usage for NVMe

2020-03-23 Thread Marc Zyngier

On 2020-03-23 09:03, John Garry wrote:

On 20/03/2020 16:33, Marc Zyngier wrote:

JFYI, I've been playing for "perf annotate" today and it's giving
strange results for my NVMe testing. So "report" looks somewhat sane,
if not a worryingly high % for arm_smmu_cmdq_issue_cmdlist():


    55.39%  irq/342-nvme0q1  [kernel.kallsyms]  [k] 
arm_smmu_cmdq_issue_cmdlist
 9.74%  irq/342-nvme0q1  [kernel.kallsyms]  [k] 
_raw_spin_unlock_irqrestore

 2.02%  irq/342-nvme0q1  [kernel.kallsyms]  [k] nvme_irq
 1.86%  irq/342-nvme0q1  [kernel.kallsyms]  [k] fput_many
 1.73%  irq/342-nvme0q1  [kernel.kallsyms]  [k]
arm_smmu_atc_inv_domain.constprop.42
 1.67%  irq/342-nvme0q1  [kernel.kallsyms]  [k] __arm_lpae_unmap
 1.49%  irq/342-nvme0q1  [kernel.kallsyms]  [k] aio_complete_rw

But "annotate" consistently tells me that a specific instruction
consumes ~99% of the load for the enqueue function:

 :  /* 5. If we are inserting a CMD_SYNC,
we must wait for it to complete */
 :  if (sync) {
    0.00 :   80001071c948:   ldr w0, [x29, #108]
 :  int ret = 0;
    0.00 :   80001071c94c:   mov w24, #0x0  // #0
 :  if (sync) {
    0.00 :   80001071c950:   cbnz    w0, 80001071c990

 :  arch_local_irq_restore():
    0.00 :   80001071c954:   msr daif, x21
 :  arm_smmu_cmdq_issue_cmdlist():
 :  }
 :  }
 :
 :  local_irq_restore(flags);
 :  return ret;
 :  }
   99.51 :   80001071c958:   adrp    x0, 800011909000





Hi Marc,

This is likely the side effect of the re-enabling of interrupts (msr 
daif, x21)
on the previous instruction which causes the perf interrupt to fire 
right after.


ok, makes sense.



Time to enable pseudo-NMIs in the PMUv3 driver...



Do you know if there is any plan for this?


There was. Julien Thierry has a bunch of patches for that [1], but they 
needs

reviving.



In the meantime, maybe I can do some trickery by putting the
local_irq_restore() in a separate function, outside
arm_smmu_cmdq_issue_cmdlist(), to get a fair profile for that same
function.


I don't see how you can improve the profiling without compromising
the locking in this case...

Thanks,

M.

[1] https://patchwork.kernel.org/cover/11047407/
--
Jazz is not dead. It just smells funny...
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device

2020-03-23 Thread Alexey Kardashevskiy



On 23/03/2020 19:37, Christoph Hellwig wrote:
> On Mon, Mar 23, 2020 at 12:28:34PM +1100, Alexey Kardashevskiy wrote:
> 
> [full quote deleted, please follow proper quoting rules]
> 
>>> +static bool dma_alloc_direct(struct device *dev, const struct dma_map_ops 
>>> *ops)
>>> +{
>>> +   if (!ops)
>>> +   return true;
>>> +
>>> +   /*
>>> +* Allows IOMMU drivers to bypass dynamic translations if the DMA mask
>>> +* is large enough.
>>> +*/
>>> +   if (dev->dma_ops_bypass) {
>>> +   if (min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit) >=
>>> +   dma_direct_get_required_mask(dev))
>>> +   return true;
>>> +   }
>>
>>
>> Why not do this in dma_map_direct() as well?
> 
> Mostly beacuse it is a relatively expensive operation, including a
> fls64.

Ah, ok.

>> Or simply have just one dma_map_direct()?
> 
> What do you mean with that?

I mean use dma_alloc_direct() instead of dma_map_direct() everywhere,
you explained just above.

> 
>> And one more general question - we need a way to use non-direct IOMMU
>> for RAM above certain limit.
>>
>> Let's say we have a system with:
>> 0 .. 0x1..
>> 0x100.. .. 0x101..
>>
>> 2x4G, each is 1TB aligned. And we can map directly only the first 4GB
>> (because of the maximum IOMMU table size) but not the other. And 1:1 on
>> that "pseries" is done with offset=0x0800....
>>
>> So we want to check every bus address against dev->bus_dma_limit, not
>> dev->coherent_dma_mask. In the example above I'd set bus_dma_limit to
>> 0x0800.0001.. and 1:1 mapping for the second 4GB would not be
>> tried. Does this sound reasonable? Thanks,
> 
> bus_dma_limit is just another limiting factor applied on top of
> coherent_dma_mask or dma_mask respectively.

This is not enough for the task: in my example, I'd set bus limit to
0x0800.0001.. but this would disable bypass for all RAM
addresses - the first and the second 4GB blocks.


-- 
Alexey
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: arm-smmu-v3 high cpu usage for NVMe

2020-03-23 Thread John Garry

On 20/03/2020 16:33, Marc Zyngier wrote:

JFYI, I've been playing for "perf annotate" today and it's giving
strange results for my NVMe testing. So "report" looks somewhat sane,
if not a worryingly high % for arm_smmu_cmdq_issue_cmdlist():


    55.39%  irq/342-nvme0q1  [kernel.kallsyms]  [k] 
arm_smmu_cmdq_issue_cmdlist
 9.74%  irq/342-nvme0q1  [kernel.kallsyms]  [k] 
_raw_spin_unlock_irqrestore

 2.02%  irq/342-nvme0q1  [kernel.kallsyms]  [k] nvme_irq
 1.86%  irq/342-nvme0q1  [kernel.kallsyms]  [k] fput_many
 1.73%  irq/342-nvme0q1  [kernel.kallsyms]  [k]
arm_smmu_atc_inv_domain.constprop.42
 1.67%  irq/342-nvme0q1  [kernel.kallsyms]  [k] __arm_lpae_unmap
 1.49%  irq/342-nvme0q1  [kernel.kallsyms]  [k] aio_complete_rw

But "annotate" consistently tells me that a specific instruction
consumes ~99% of the load for the enqueue function:

 :  /* 5. If we are inserting a CMD_SYNC,
we must wait for it to complete */
 :  if (sync) {
    0.00 :   80001071c948:   ldr w0, [x29, #108]
 :  int ret = 0;
    0.00 :   80001071c94c:   mov w24, #0x0  // #0
 :  if (sync) {
    0.00 :   80001071c950:   cbnz    w0, 80001071c990

 :  arch_local_irq_restore():
    0.00 :   80001071c954:   msr daif, x21
 :  arm_smmu_cmdq_issue_cmdlist():
 :  }
 :  }
 :
 :  local_irq_restore(flags);
 :  return ret;
 :  }
   99.51 :   80001071c958:   adrp    x0, 800011909000





Hi Marc,

This is likely the side effect of the re-enabling of interrupts (msr 
daif, x21)
on the previous instruction which causes the perf interrupt to fire 
right after.


ok, makes sense.



Time to enable pseudo-NMIs in the PMUv3 driver...



Do you know if there is any plan for this?

In the meantime, maybe I can do some trickery by putting the 
local_irq_restore() in a separate function, outside 
arm_smmu_cmdq_issue_cmdlist(), to get a fair profile for that same function.


Cheers,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device

2020-03-23 Thread Christoph Hellwig
On Mon, Mar 23, 2020 at 09:37:05AM +0100, Christoph Hellwig wrote:
> > > + /*
> > > +  * Allows IOMMU drivers to bypass dynamic translations if the DMA mask
> > > +  * is large enough.
> > > +  */
> > > + if (dev->dma_ops_bypass) {
> > > + if (min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit) >=
> > > + dma_direct_get_required_mask(dev))
> > > + return true;
> > > + }
> > 
> > 
> > Why not do this in dma_map_direct() as well?
> 
> Mostly beacuse it is a relatively expensive operation, including a
> fls64.

Which I guess isn't too bad compared to a dynamic IOMMU mapping.  Can
you just send a draft patch for what you'd like to see for ppc?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] dma-mapping: add a dma_ops_bypass flag to struct device

2020-03-23 Thread Christoph Hellwig
On Mon, Mar 23, 2020 at 12:28:34PM +1100, Alexey Kardashevskiy wrote:

[full quote deleted, please follow proper quoting rules]

> > +static bool dma_alloc_direct(struct device *dev, const struct dma_map_ops 
> > *ops)
> > +{
> > +   if (!ops)
> > +   return true;
> > +
> > +   /*
> > +* Allows IOMMU drivers to bypass dynamic translations if the DMA mask
> > +* is large enough.
> > +*/
> > +   if (dev->dma_ops_bypass) {
> > +   if (min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit) >=
> > +   dma_direct_get_required_mask(dev))
> > +   return true;
> > +   }
> 
> 
> Why not do this in dma_map_direct() as well?

Mostly beacuse it is a relatively expensive operation, including a
fls64.

> Or simply have just one dma_map_direct()?

What do you mean with that?

> And one more general question - we need a way to use non-direct IOMMU
> for RAM above certain limit.
> 
> Let's say we have a system with:
> 0 .. 0x1..
> 0x100.. .. 0x101..
> 
> 2x4G, each is 1TB aligned. And we can map directly only the first 4GB
> (because of the maximum IOMMU table size) but not the other. And 1:1 on
> that "pseries" is done with offset=0x0800....
> 
> So we want to check every bus address against dev->bus_dma_limit, not
> dev->coherent_dma_mask. In the example above I'd set bus_dma_limit to
> 0x0800.0001.. and 1:1 mapping for the second 4GB would not be
> tried. Does this sound reasonable? Thanks,

bus_dma_limit is just another limiting factor applied on top of
coherent_dma_mask or dma_mask respectively.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu