Re: [PATCH v2 3/4] dma-iommu: pass SKIP_CPU_SYNC to swiotlb unmap

2021-08-04 Thread David Stevens
On Mon, Aug 2, 2021 at 10:54 PM Will Deacon  wrote:
>
> On Fri, Jul 09, 2021 at 12:35:01PM +0900, David Stevens wrote:
> > From: David Stevens 
> >
> > If SKIP_CPU_SYNC isn't already set, then iommu_dma_unmap_(page|sg) has
> > already called iommu_dma_sync_(single|sg)_for_cpu, so there is no need
> > to copy from the bounce buffer again.
> >
> > Signed-off-by: David Stevens 
> > ---
> >  drivers/iommu/dma-iommu.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index e79e274d2dc5..0a9a9a343e64 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -505,7 +505,8 @@ static void __iommu_dma_unmap_swiotlb(struct device 
> > *dev, dma_addr_t dma_addr,
> >   __iommu_dma_unmap(dev, dma_addr, size);
> >
> >   if (unlikely(is_swiotlb_buffer(phys)))
> > - swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
> > + swiotlb_tbl_unmap_single(dev, phys, size, dir,
> > +  attrs | DMA_ATTR_SKIP_CPU_SYNC);
> >  }
>
> I think it would be cleaner to drop DMA_ATTR_SKIP_CPU_SYNC in the callers
> once they've called iommu_dma_sync_*_for_cpu().

Dropping that flag in iommu_dma_unmap_* would result in always copying
from the swiotlb here, which is the opposite direction of what this
patch is trying to do.

This change is aiming to address the case where DMA_ATTR_SKIP_CPU_SYNC
isn't passed to dma_unmap_*. In that case, there are calls to
swiotlb_sync_single_for_cpu from iommu_dma_sync_*_for_cpu, and calls
to swiotlb_tlb_unmap_single here. That means we copy from the swiotlb
twice. Adding the DMA_ATTR_SKIP_CPU_SYNC flag here skips the second
copy.

-David

> Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/4] dma-iommu: fix sync_sg with swiotlb

2021-08-04 Thread David Stevens
On Mon, Aug 2, 2021 at 10:30 PM Will Deacon  wrote:
>
> On Fri, Jul 09, 2021 at 12:34:59PM +0900, David Stevens wrote:
> > From: David Stevens 
> >
> > The is_swiotlb_buffer function takes the physical address of the swiotlb
> > buffer, not the physical address of the original buffer. The sglist
> > contains the physical addresses of the original buffer, so for the
> > sync_sg functions to work properly when a bounce buffer might have been
> > used, we need to use iommu_iova_to_phys to look up the physical address.
> > This is what sync_single does, so call that function on each sglist
> > segment.
> >
> > The previous code mostly worked because swiotlb does the transfer on map
> > and unmap. However, any callers which use DMA_ATTR_SKIP_CPU_SYNC with
> > sglists or which call sync_sg would not have had anything copied to the
> > bounce buffer.
> >
> > Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
> > Signed-off-by: David Stevens 
> > ---
> >  drivers/iommu/dma-iommu.c | 26 +-
> >  1 file changed, 13 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > index 7bcdd1205535..eac65302439e 100644
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -811,14 +811,14 @@ static void iommu_dma_sync_sg_for_cpu(struct device 
> > *dev,
> >   if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev))
> >   return;
> >
> > - for_each_sg(sgl, sg, nelems, i) {
> > - if (!dev_is_dma_coherent(dev))
> > - arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
> > -
> > - if (is_swiotlb_buffer(sg_phys(sg)))
> > + if (dev_is_untrusted(dev))
> > + for_each_sg(sgl, sg, nelems, i)
> > + iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg),
> > +   sg->length, dir);
> > + else
> > + for_each_sg(sgl, sg, nelems, i)
> >   swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
> >   sg->length, dir);
>
> Doesn't this skip arch_sync_dma_for_cpu() for non-coherent trusted devices?

Whoops, this was supposed to be a call to arch_sync_dma_for_cpu, not
to swiotlb_sync_single_for_cpu. Similar to the sync_sg_for_device
case.

> Why not skip the extra dev_is_untrusted(dev) call here and just call
> iommu_dma_sync_single_for_cpu() for each entry regardless?

iommu_dma_sync_single_for_cpu calls iommu_iova_to_phys to translate
the dma_addr_t to a phys_addr_t. Since the physical address is readily
available, I think it's better to avoid that extra work.

> Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC v2] /dev/iommu uAPI proposal

2021-08-04 Thread Tian, Kevin
> From: Eric Auger 
> Sent: Wednesday, August 4, 2021 11:59 PM
>
[...] 
> > 1.2. Attach Device to I/O address space
> > +++
> >
> > Device attach/bind is initiated through passthrough framework uAPI.
> >
> > Device attaching is allowed only after a device is successfully bound to
> > the IOMMU fd. User should provide a device cookie when binding the
> > device through VFIO uAPI. This cookie is used when the user queries
> > device capability/format, issues per-device iotlb invalidation and
> > receives per-device I/O page fault data via IOMMU fd.
> >
> > Successful binding puts the device into a security context which isolates
> > its DMA from the rest system. VFIO should not allow user to access the
> s/from the rest system/from the rest of the system
> > device before binding is completed. Similarly, VFIO should prevent the
> > user from unbinding the device before user access is withdrawn.
> With Intel scalable IOV, I understand you could assign an RID/PASID to
> one VM and another one to another VM (which is not the case for ARM). Is
> it a targetted use case?How would it be handled? Is it related to the
> sub-groups evoked hereafter?

Not related to sub-group. Each mdev is bound to the IOMMU fd respectively
with the defPASID which represents the mdev.

> 
> Actually all devices bound to an IOMMU fd should have the same parent
> I/O address space or root address space, am I correct? If so, maybe add
> this comment explicitly?

in most cases yes but it's not mandatory. multiple roots are allowed
(e.g. with vIOMMU but no nesting).

[...]
> > The device in the /dev/iommu context always refers to a physical one
> > (pdev) which is identifiable via RID. Physically each pdev can support
> > one default I/O address space (routed via RID) and optionally multiple
> > non-default I/O address spaces (via RID+PASID).
> >
> > The device in VFIO context is a logic concept, being either a physical
> > device (pdev) or mediated device (mdev or subdev). Each vfio device
> > is represented by RID+cookie in IOMMU fd. User is allowed to create
> > one default I/O address space (routed by vRID from user p.o.v) per
> > each vfio_device.
> The concept of default address space is not fully clear for me. I
> currently understand this is a
> root address space (not nesting). Is that coorect.This may need
> clarification.

w/o PASID there is only one address space (either GPA or GIOVA)
per device. This one is called default. whether it's root is orthogonal
(e.g. GIOVA could be also nested) to the device view of this space.

w/ PASID additional address spaces can be targeted by the device.
those are called non-default.

I could also rename default to RID address space and non-default to 
RID+PASID address space if doing so makes it clearer.

> > VFIO decides the routing information for this default
> > space based on device type:
> >
> > 1)  pdev, routed via RID;
> >
> > 2)  mdev/subdev with IOMMU-enforced DMA isolation, routed via
> > the parent's RID plus the PASID marking this mdev;
> >
> > 3)  a purely sw-mediated device (sw mdev), no routing required i.e. no
> > need to install the I/O page table in the IOMMU. sw mdev just uses
> > the metadata to assist its internal DMA isolation logic on top of
> > the parent's IOMMU page table;
> Maybe you should introduce this concept of SW mediated device earlier
> because it seems to special case the way the attach behaves. I am
> especially refering to
> 
> "Successful attaching activates an I/O address space in the IOMMU, if the
> device is not purely software mediated"

makes sense.

> 
> >
> > In addition, VFIO may allow user to create additional I/O address spaces
> > on a vfio_device based on the hardware capability. In such case the user
> > has its own view of the virtual routing information (vPASID) when marking
> > these non-default address spaces.
> I do not catch what does mean "marking these non default address space".

as explained above, those non-default address spaces are identified/routed
via PASID. 

> >
> > 1.3. Group isolation
> > 
[...]
> >
> > 1)  A successful binding call for the first device in the group creates
> > the security context for the entire group, by:
> >
> > * Verifying group viability in a similar way as VFIO does;
> >
> > * Calling IOMMU-API to move the group into a block-dma state,
> >   which makes all devices in the group attached to an block-dma
> >   domain with an empty I/O page table;
> this block-dma state/domain would deserve to be better defined (I know
> you already evoked it in 1.1 with the dma mapping protocol though)
> activates an empty I/O page table in the IOMMU (if the device is not
> purely SW mediated)?

sure. some explanations are scattered in following paragraph, but I
can consider to further clarify it.

> How does that relate to the default address space? Is it the same?

different. this block-dma domain doesn't hold any valid 

RE: [RFC v2] /dev/iommu uAPI proposal

2021-08-04 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, August 4, 2021 10:05 PM
> 
> On Mon, Aug 02, 2021 at 02:49:44AM +, Tian, Kevin wrote:
> 
> > Can you elaborate? IMO the user only cares about the label (device cookie
> > plus optional vPASID) which is generated by itself when doing the attaching
> > call, and expects this virtual label being used in various spots 
> > (invalidation,
> > page fault, etc.). How the system labels the traffic (the physical RID or 
> > RID+
> > PASID) should be completely invisible to userspace.
> 
> I don't think that is true if the vIOMMU driver is also emulating
> PASID. Presumably the same is true for other PASID-like schemes.
> 

I'm getting even more confused with this comment. Isn't it the
consensus from day one that physical PASID should not be exposed
to userspace as doing so breaks live migration? with PASID emulation
vIOMMU only cares about vPASID instead of pPASID, and the uAPI
only requires user to register vPASID instead of reporting pPASID
back to userspace...

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V2 03/14] x86/set_memory: Add x86_set_memory_enc static call support

2021-08-04 Thread Dave Hansen
On 8/4/21 11:44 AM, Tianyu Lan wrote:
> +static int default_set_memory_enc(unsigned long addr, int numpages, bool 
> enc);
> +DEFINE_STATIC_CALL(x86_set_memory_enc, default_set_memory_enc);
> +
>  #define CPA_FLUSHTLB 1
>  #define CPA_ARRAY 2
>  #define CPA_PAGES_ARRAY 4
> @@ -1981,6 +1985,11 @@ int set_memory_global(unsigned long addr, int numpages)
>  }
>  
>  static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
> +{
> + return static_call(x86_set_memory_enc)(addr, numpages, enc);
> +}
> +
> +static int default_set_memory_enc(unsigned long addr, int numpages, bool enc)
>  {
>   struct cpa_data cpa;
>   int ret;

It doesn't make a lot of difference to add this infrastructure and then
ignore it for the existing in-tree user:

> static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
> {
> struct cpa_data cpa;
> int ret;
> 
> /* Nothing to do if memory encryption is not active */
> if (!mem_encrypt_active())
> return 0;

Shouldn't the default be to just "return 0"?  Then on
mem_encrypt_active() systems, do the bulk of what is in
__set_memory_enc_dec() today.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 08/25] iommu/rockchip: Drop IOVA cookie management

2021-08-04 Thread Heiko Stübner
Am Mittwoch, 4. August 2021, 19:15:36 CEST schrieb Robin Murphy:
> The core code bakes its own cookies now.
> 
> CC: Heiko Stuebner 
> Signed-off-by: Robin Murphy 


On a Rockchip rk3288 (arm32), rk3399 (arm64) and px30 (arm64)
with the graphics pipeline using the iommu

Tested-by: Heiko Stuebner 
Acked-by: Heiko Stuebner 


Works now nicely on both arm32 and arm64


Thanks
Heiko


> 
> ---
> 
> v3: Also remove unneeded include
> ---
>  drivers/iommu/rockchip-iommu.c | 12 +---
>  1 file changed, 1 insertion(+), 11 deletions(-)
> 
> diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
> index 9febfb7f3025..5cb260820eda 100644
> --- a/drivers/iommu/rockchip-iommu.c
> +++ b/drivers/iommu/rockchip-iommu.c
> @@ -10,7 +10,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -1074,10 +1073,6 @@ static struct iommu_domain 
> *rk_iommu_domain_alloc(unsigned type)
>   if (!rk_domain)
>   return NULL;
>  
> - if (type == IOMMU_DOMAIN_DMA &&
> - iommu_get_dma_cookie(_domain->domain))
> - goto err_free_domain;
> -
>   /*
>* rk32xx iommus use a 2 level pagetable.
>* Each level1 (dt) and level2 (pt) table has 1024 4-byte entries.
> @@ -1085,7 +1080,7 @@ static struct iommu_domain 
> *rk_iommu_domain_alloc(unsigned type)
>*/
>   rk_domain->dt = (u32 *)get_zeroed_page(GFP_KERNEL | GFP_DMA32);
>   if (!rk_domain->dt)
> - goto err_put_cookie;
> + goto err_free_domain;
>  
>   rk_domain->dt_dma = dma_map_single(dma_dev, rk_domain->dt,
>  SPAGE_SIZE, DMA_TO_DEVICE);
> @@ -1106,9 +1101,6 @@ static struct iommu_domain 
> *rk_iommu_domain_alloc(unsigned type)
>  
>  err_free_dt:
>   free_page((unsigned long)rk_domain->dt);
> -err_put_cookie:
> - if (type == IOMMU_DOMAIN_DMA)
> - iommu_put_dma_cookie(_domain->domain);
>  err_free_domain:
>   kfree(rk_domain);
>  
> @@ -1137,8 +1129,6 @@ static void rk_iommu_domain_free(struct iommu_domain 
> *domain)
>SPAGE_SIZE, DMA_TO_DEVICE);
>   free_page((unsigned long)rk_domain->dt);
>  
> - if (domain->type == IOMMU_DOMAIN_DMA)
> - iommu_put_dma_cookie(_domain->domain);
>   kfree(rk_domain);
>  }
>  
> 




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 01/25] iommu: Pull IOVA cookie management into the core

2021-08-04 Thread Heiko Stübner
Am Mittwoch, 4. August 2021, 19:15:29 CEST schrieb Robin Murphy:
> Now that everyone has converged on iommu-dma for IOMMU_DOMAIN_DMA
> support, we can abandon the notion of drivers being responsible for the
> cookie type, and consolidate all the management into the core code.
> 
> CC: Marek Szyprowski 
> CC: Yoshihiro Shimoda 
> CC: Geert Uytterhoeven 
> CC: Yong Wu 
> CC: Heiko Stuebner 
> CC: Chunyan Zhang 
> CC: Maxime Ripard 
> Reviewed-by: Jean-Philippe Brucker 
> Reviewed-by: Lu Baolu 
> Signed-off-by: Robin Murphy 

On a Rockchip rk3288 (arm32), rk3399 (arm64) and px30 (arm64)
with the graphics pipeline using the iommu

Tested-by: Heiko Stuebner 


Heiko

> 
> ---
> 
> v3: Use a simpler temporary check instead of trying to be clever with
> the error code
> ---
>  drivers/iommu/iommu.c | 7 +++
>  include/linux/iommu.h | 3 ++-
>  2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index f2cda9950bd5..b65fcc66ffa4 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -7,6 +7,7 @@
>  #define pr_fmt(fmt)"iommu: " fmt
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1946,6 +1947,11 @@ static struct iommu_domain 
> *__iommu_domain_alloc(struct bus_type *bus,
>   /* Assume all sizes by default; the driver may override this later */
>   domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
>  
> + /* Temporarily avoid -EEXIST while drivers still get their own cookies 
> */
> + if (type == IOMMU_DOMAIN_DMA && !domain->iova_cookie && 
> iommu_get_dma_cookie(domain)) {
> + iommu_domain_free(domain);
> + domain = NULL;
> + }
>   return domain;
>  }
>  
> @@ -1957,6 +1963,7 @@ EXPORT_SYMBOL_GPL(iommu_domain_alloc);
>  
>  void iommu_domain_free(struct iommu_domain *domain)
>  {
> + iommu_put_dma_cookie(domain);
>   domain->ops->domain_free(domain);
>  }
>  EXPORT_SYMBOL_GPL(iommu_domain_free);
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 4997c78e2670..141779d76035 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -40,6 +40,7 @@ struct iommu_domain;
>  struct notifier_block;
>  struct iommu_sva;
>  struct iommu_fault_event;
> +struct iommu_dma_cookie;
>  
>  /* iommu fault flags */
>  #define IOMMU_FAULT_READ 0x0
> @@ -86,7 +87,7 @@ struct iommu_domain {
>   iommu_fault_handler_t handler;
>   void *handler_token;
>   struct iommu_domain_geometry geometry;
> - void *iova_cookie;
> + struct iommu_dma_cookie *iova_cookie;
>  };
>  
>  enum iommu_cap {
> 




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V2 14/14] HV/Storvsc: Add Isolation VM support for storvsc driver

2021-08-04 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
mpb_desc() still need to handle. Use DMA API to map/umap these
memory during sending/receiving packet and Hyper-V DMA ops callback
will use swiotlb function to allocate bounce buffer and copy data
from/to bounce buffer.

Signed-off-by: Tianyu Lan 
---
 drivers/scsi/storvsc_drv.c | 68 +++---
 1 file changed, 63 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 328bb961c281..78320719bdd8 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -427,6 +429,8 @@ struct storvsc_cmd_request {
u32 payload_sz;
 
struct vstor_packet vstor_packet;
+   u32 hvpg_count;
+   struct hv_dma_range *dma_range;
 };
 
 
@@ -509,6 +513,14 @@ struct storvsc_scan_work {
u8 tgt_id;
 };
 
+#define storvsc_dma_map(dev, page, offset, size, dir) \
+   dma_map_page(dev, page, offset, size, dir)
+
+#define storvsc_dma_unmap(dev, dma_range, dir) \
+   dma_unmap_page(dev, dma_range.dma,  \
+  dma_range.mapping_size,  \
+  dir ? DMA_FROM_DEVICE : DMA_TO_DEVICE)
+
 static void storvsc_device_scan(struct work_struct *work)
 {
struct storvsc_scan_work *wrk;
@@ -1260,6 +1272,7 @@ static void storvsc_on_channel_callback(void *context)
struct hv_device *device;
struct storvsc_device *stor_device;
struct Scsi_Host *shost;
+   int i;
 
if (channel->primary_channel != NULL)
device = channel->primary_channel->device_obj;
@@ -1314,6 +1327,15 @@ static void storvsc_on_channel_callback(void *context)
request = (struct storvsc_cmd_request 
*)scsi_cmd_priv(scmnd);
}
 
+   if (request->dma_range) {
+   for (i = 0; i < request->hvpg_count; i++)
+   storvsc_dma_unmap(>device,
+   request->dma_range[i],
+   
request->vstor_packet.vm_srb.data_in == READ_TYPE);
+
+   kfree(request->dma_range);
+   }
+
storvsc_on_receive(stor_device, packet, request);
continue;
}
@@ -1810,7 +1832,9 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
unsigned int hvpgoff, hvpfns_to_add;
unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
+   dma_addr_t dma;
u64 hvpfn;
+   u32 size;
 
if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
 
@@ -1824,6 +1848,13 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
payload->range.len = length;
payload->range.offset = offset_in_hvpg;
 
+   cmd_request->dma_range = kcalloc(hvpg_count,
+sizeof(*cmd_request->dma_range),
+GFP_ATOMIC);
+   if (!cmd_request->dma_range) {
+   ret = -ENOMEM;
+   goto free_payload;
+   }
 
for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
/*
@@ -1847,9 +1878,29 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
 * last sgl should be reached at the same time that
 * the PFN array is filled.
 */
-   while (hvpfns_to_add--)
-   payload->range.pfn_array[i++] = hvpfn++;
+   while (hvpfns_to_add--) {
+   size = min(HV_HYP_PAGE_SIZE - offset_in_hvpg,
+  (unsigned long)length);
+   dma = storvsc_dma_map(>device, 
pfn_to_page(hvpfn++),
+ offset_in_hvpg, size,
+ scmnd->sc_data_direction);
+   if (dma_mapping_error(>device, dma)) {
+   ret = -ENOMEM;
+   goto free_dma_range;
+   }
+
+   if (offset_in_hvpg) {
+   payload->range.offset = dma & 
~HV_HYP_PAGE_MASK;
+   

[PATCH V2 13/14] HV/Netvsc: Add Isolation VM support for netvsc driver

2021-08-04 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
pagebuffer() still need to handle. Use DMA API to map/umap these
memory during sending/receiving packet and Hyper-V DMA ops callback
will use swiotlb function to allocate bounce buffer and copy data
from/to bounce buffer.

Signed-off-by: Tianyu Lan 
---
 drivers/net/hyperv/hyperv_net.h   |   6 ++
 drivers/net/hyperv/netvsc.c   | 144 +-
 drivers/net/hyperv/rndis_filter.c |   2 +
 include/linux/hyperv.h|   5 ++
 4 files changed, 154 insertions(+), 3 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index bc48855dff10..862419912bfb 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -164,6 +164,7 @@ struct hv_netvsc_packet {
u32 total_bytes;
u32 send_buf_index;
u32 total_data_buflen;
+   struct hv_dma_range *dma_range;
 };
 
 #define NETVSC_HASH_KEYLEN 40
@@ -1074,6 +1075,7 @@ struct netvsc_device {
 
/* Receive buffer allocated by us but manages by NetVSP */
void *recv_buf;
+   void *recv_original_buf;
u32 recv_buf_size; /* allocated bytes */
u32 recv_buf_gpadl_handle;
u32 recv_section_cnt;
@@ -1082,6 +1084,8 @@ struct netvsc_device {
 
/* Send buffer allocated by us */
void *send_buf;
+   void *send_original_buf;
+   u32 send_buf_size;
u32 send_buf_gpadl_handle;
u32 send_section_cnt;
u32 send_section_size;
@@ -1730,4 +1734,6 @@ struct rndis_message {
 #define RETRY_US_HI1
 #define RETRY_MAX  2000/* >10 sec */
 
+void netvsc_dma_unmap(struct hv_device *hv_dev,
+ struct hv_netvsc_packet *packet);
 #endif /* _HYPERV_NET_H */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 7bd935412853..fc312e5db4d5 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head)
int i;
 
kfree(nvdev->extension);
-   vfree(nvdev->recv_buf);
-   vfree(nvdev->send_buf);
+
+   if (nvdev->recv_original_buf) {
+   vunmap(nvdev->recv_buf);
+   vfree(nvdev->recv_original_buf);
+   } else {
+   vfree(nvdev->recv_buf);
+   }
+
+   if (nvdev->send_original_buf) {
+   vunmap(nvdev->send_buf);
+   vfree(nvdev->send_original_buf);
+   } else {
+   vfree(nvdev->send_buf);
+   }
+
kfree(nvdev->send_section_map);
 
for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
@@ -330,6 +343,27 @@ int netvsc_alloc_recv_comp_ring(struct netvsc_device 
*net_device, u32 q_idx)
return nvchan->mrc.slots ? 0 : -ENOMEM;
 }
 
+static void *netvsc_remap_buf(void *buf, unsigned long size)
+{
+   unsigned long *pfns;
+   void *vaddr;
+   int i;
+
+   pfns = kcalloc(size / HV_HYP_PAGE_SIZE, sizeof(unsigned long),
+  GFP_KERNEL);
+   if (!pfns)
+   return NULL;
+
+   for (i = 0; i < size / HV_HYP_PAGE_SIZE; i++)
+   pfns[i] = virt_to_hvpfn(buf + i * HV_HYP_PAGE_SIZE)
+   + (ms_hyperv.shared_gpa_boundary >> HV_HYP_PAGE_SHIFT);
+
+   vaddr = vmap_pfn(pfns, size / HV_HYP_PAGE_SIZE, PAGE_KERNEL_IO);
+   kfree(pfns);
+
+   return vaddr;
+}
+
 static int netvsc_init_buf(struct hv_device *device,
   struct netvsc_device *net_device,
   const struct netvsc_device_info *device_info)
@@ -340,6 +374,7 @@ static int netvsc_init_buf(struct hv_device *device,
unsigned int buf_size;
size_t map_words;
int i, ret = 0;
+   void *vaddr;
 
/* Get receive buffer area. */
buf_size = device_info->recv_sections * device_info->recv_section_size;
@@ -375,6 +410,15 @@ static int netvsc_init_buf(struct hv_device *device,
goto cleanup;
}
 
+   if (hv_isolation_type_snp()) {
+   vaddr = netvsc_remap_buf(net_device->recv_buf, buf_size);
+   if (!vaddr)
+   goto cleanup;
+
+   net_device->recv_original_buf = net_device->recv_buf;
+   net_device->recv_buf = vaddr;
+   }
+
/* Notify the NetVsp of the gpadl handle */
init_packet = _device->channel_init_pkt;
memset(init_packet, 0, sizeof(struct nvsp_message));
@@ -477,6 +521,15 @@ static int netvsc_init_buf(struct hv_device *device,
goto cleanup;
}
 
+   if (hv_isolation_type_snp()) {
+   vaddr = netvsc_remap_buf(net_device->send_buf, buf_size);
+   if (!vaddr)
+   goto cleanup;
+
+   net_device->send_original_buf = 

[PATCH V2 12/14] HV/IOMMU: Enable swiotlb bounce buffer for Isolation VM

2021-08-04 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V Isolation VM requires bounce buffer support to copy
data from/to encrypted memory and so enable swiotlb force
mode to use swiotlb bounce buffer for DMA transaction.

In Isolation VM with AMD SEV, the bounce buffer needs to be
accessed via extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.

Swiotlb bounce buffer code calls dma_map_decrypted()
to mark bounce buffer visible to host and map it in extra
address space. Populate dma memory decrypted ops with hv
map/unmap function.

Hyper-V initalizes swiotlb bounce buffer and default swiotlb
needs to be disabled. pci_swiotlb_detect_override() and
pci_swiotlb_detect_4gb() enable the default one. To override
the setting, hyperv_swiotlb_detect() needs to run before
these detect functions which depends on the pci_xen_swiotlb_
init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb
_detect() to keep the order.

The map function vmap_pfn() can't work in the early place
hyperv_iommu_swiotlb_init() and so initialize swiotlb bounce
buffer in the hyperv_iommu_swiotlb_later_init().

Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/ivm.c   | 28 ++
 arch/x86/include/asm/mshyperv.h |  2 +
 arch/x86/xen/pci-swiotlb-xen.c  |  3 +-
 drivers/hv/vmbus_drv.c  |  3 ++
 drivers/iommu/hyperv-iommu.c| 65 +
 include/linux/hyperv.h  |  1 +
 6 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 377729279a0b..d2aa3120b005 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -265,3 +265,31 @@ int hv_set_mem_enc(unsigned long addr, int numpages, bool 
enc)
 
return hv_set_mem_host_visibility((void *)addr, numpages, visibility);
 }
+
+/*
+ * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
+ */
+void *hv_map_memory(void *addr, unsigned long size)
+{
+   unsigned long *pfns = kcalloc(size / HV_HYP_PAGE_SIZE,
+ sizeof(unsigned long), GFP_KERNEL);
+   void *vaddr;
+   int i;
+
+   if (!pfns)
+   return NULL;
+
+   for (i = 0; i < size / HV_HYP_PAGE_SIZE; i++)
+   pfns[i] = virt_to_hvpfn(addr + i * HV_HYP_PAGE_SIZE) +
+   (ms_hyperv.shared_gpa_boundary >> HV_HYP_PAGE_SHIFT);
+
+   vaddr = vmap_pfn(pfns, size / HV_HYP_PAGE_SIZE, PAGE_KERNEL_IO);
+   kfree(pfns);
+
+   return vaddr;
+}
+
+void hv_unmap_memory(void *addr)
+{
+   vunmap(addr);
+}
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 8f7e2e3b7227..bd7b3ce1ae59 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -250,6 +250,8 @@ int hv_unmap_ioapic_interrupt(int ioapic_id, struct 
hv_interrupt_entry *entry);
 int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
   enum hv_mem_host_visibility visibility);
 int hv_set_mem_enc(unsigned long addr, int numpages, bool enc);
+void *hv_map_memory(void *addr, unsigned long size);
+void hv_unmap_memory(void *addr);
 void hv_sint_wrmsrl_ghcb(u64 msr, u64 value);
 void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
 void hv_signal_eom_ghcb(void);
diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
index 54f9aa7e8457..43bd031aa332 100644
--- a/arch/x86/xen/pci-swiotlb-xen.c
+++ b/arch/x86/xen/pci-swiotlb-xen.c
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -91,6 +92,6 @@ int pci_xen_swiotlb_init_late(void)
 EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late);
 
 IOMMU_INIT_FINISH(pci_xen_swiotlb_detect,
- NULL,
+ hyperv_swiotlb_detect,
  pci_xen_swiotlb_init,
  NULL);
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 57bbbaa4e8f7..f068e22a5636 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -2081,6 +2082,7 @@ struct hv_device *vmbus_device_create(const guid_t *type,
return child_device_obj;
 }
 
+static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
 /*
  * vmbus_device_register - Register the child device
  */
@@ -2121,6 +2123,7 @@ int vmbus_device_register(struct hv_device 
*child_device_obj)
}
hv_debug_add_dev_dir(child_device_obj);
 
+   child_device_obj->device.dma_mask = _dma_mask;
return 0;
 
 err_kset_unregister:
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e285a220c913..01e874b3b43a 100644
--- a/drivers/iommu/hyperv-iommu.c
+++ 

[PATCH V2 11/14] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM

2021-08-04 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.

Use dma_map_decrypted() in the swiotlb code, store remap address returned
and use the remap address to copy data from/to swiotlb bounce buffer.

Signed-off-by: Tianyu Lan 
---
Change since v1:
   * Make swiotlb_init_io_tlb_mem() return error code and return
 error when dma_map_decrypted() fails.

Signed-off-by: Tianyu Lan 
---
 include/linux/swiotlb.h |  4 
 kernel/dma/swiotlb.c| 32 
 2 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index f507e3eacbea..584560ecaa8e 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -72,6 +72,9 @@ extern enum swiotlb_force swiotlb_force;
  * @end:   The end address of the swiotlb memory pool. Used to do a quick
  * range check to see if the memory was in fact allocated by this
  * API.
+ * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb
+ * memory pool may be remapped in the memory encrypted case and 
store
+ * virtual address for bounce buffer operation.
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
  * @end. For default swiotlb, this is command line adjustable via
  * setup_io_tlb_npages.
@@ -89,6 +92,7 @@ extern enum swiotlb_force swiotlb_force;
 struct io_tlb_mem {
phys_addr_t start;
phys_addr_t end;
+   void *vaddr;
unsigned long nslabs;
unsigned long used;
unsigned int index;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 1fa81c096c1d..29b6d888ef3b 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -176,7 +176,7 @@ void __init swiotlb_update_mem_attributes(void)
memset(vaddr, 0, bytes);
 }
 
-static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
+static int swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
unsigned long nslabs, bool late_alloc)
 {
void *vaddr = phys_to_virt(start);
@@ -194,14 +194,21 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
mem->slots[i].alloc_size = 0;
}
 
-   set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
-   memset(vaddr, 0, bytes);
+   mem->vaddr = dma_map_decrypted(vaddr, bytes);
+   if (!mem->vaddr) {
+   pr_err("Failed to decrypt memory.\n");
+   return -ENOMEM;
+   }
+
+   memset(mem->vaddr, 0, bytes);
+   return 0;
 }
 
 int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
 {
struct io_tlb_mem *mem;
size_t alloc_size;
+   int ret;
 
if (swiotlb_force == SWIOTLB_NO_FORCE)
return 0;
@@ -216,7 +223,11 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long 
nslabs, int verbose)
panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
  __func__, alloc_size, PAGE_SIZE);
 
-   swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
+   ret = swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
+   if (ret) {
+   memblock_free(__pa(mem), alloc_size);
+   return ret;
+   }
 
io_tlb_default_mem = mem;
if (verbose)
@@ -304,6 +315,8 @@ int
 swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 {
struct io_tlb_mem *mem;
+   int size = get_order(struct_size(mem, slots, nslabs));
+   int ret;
 
if (swiotlb_force == SWIOTLB_NO_FORCE)
return 0;
@@ -312,12 +325,15 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long 
nslabs)
if (WARN_ON_ONCE(io_tlb_default_mem))
return -ENOMEM;
 
-   mem = (void *)__get_free_pages(GFP_KERNEL,
-   get_order(struct_size(mem, slots, nslabs)));
+   mem = (void *)__get_free_pages(GFP_KERNEL, size);
if (!mem)
return -ENOMEM;
 
-   swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
+   ret = swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
+   if (ret) {
+   free_pages((unsigned long)mem, size);
+   return ret;
+   }
 
io_tlb_default_mem = mem;
swiotlb_print_info();
@@ -360,7 +376,7 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t 
tlb_addr, size_t size
phys_addr_t orig_addr = mem->slots[index].orig_addr;

[PATCH V2 10/14] DMA: Add dma_map_decrypted/dma_unmap_encrypted() function

2021-08-04 Thread Tianyu Lan
From: Tianyu Lan 

In Hyper-V Isolation VM with AMD SEV, swiotlb boucne buffer
needs to be mapped into address space above vTOM and so
introduce dma_map_decrypted/dma_unmap_encrypted() to map/unmap
bounce buffer memory. The platform can populate man/unmap callback
in the dma memory decrypted ops.

Signed-off-by: Tianyu Lan 
---
 include/linux/dma-map-ops.h |  9 +
 kernel/dma/mapping.c| 22 ++
 2 files changed, 31 insertions(+)

diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 0d53a96a3d64..01d60a024e45 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -71,6 +71,11 @@ struct dma_map_ops {
unsigned long (*get_merge_boundary)(struct device *dev);
 };
 
+struct dma_memory_decrypted_ops {
+   void *(*map)(void *addr, unsigned long size);
+   void (*unmap)(void *addr);
+};
+
 #ifdef CONFIG_DMA_OPS
 #include 
 
@@ -374,6 +379,10 @@ static inline void debug_dma_dump_mappings(struct device 
*dev)
 }
 #endif /* CONFIG_DMA_API_DEBUG */
 
+void *dma_map_decrypted(void *addr, unsigned long size);
+int dma_unmap_decrypted(void *addr, unsigned long size);
+
 extern const struct dma_map_ops dma_dummy_ops;
+extern struct dma_memory_decrypted_ops dma_memory_generic_decrypted_ops;
 
 #endif /* _LINUX_DMA_MAP_OPS_H */
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 2b06a809d0b9..6fb150dc1750 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -13,11 +13,13 @@
 #include 
 #include 
 #include 
+#include 
 #include "debug.h"
 #include "direct.h"
 
 bool dma_default_coherent;
 
+struct dma_memory_decrypted_ops dma_memory_generic_decrypted_ops;
 /*
  * Managed DMA API
  */
@@ -736,3 +738,23 @@ unsigned long dma_get_merge_boundary(struct device *dev)
return ops->get_merge_boundary(dev);
 }
 EXPORT_SYMBOL_GPL(dma_get_merge_boundary);
+
+void *dma_map_decrypted(void *addr, unsigned long size)
+{
+   if (set_memory_decrypted((unsigned long)addr,
+size / PAGE_SIZE))
+   return NULL;
+
+   if (dma_memory_generic_decrypted_ops.map)
+   return dma_memory_generic_decrypted_ops.map(addr, size);
+   else
+   return addr;
+}
+
+int dma_unmap_encrypted(void *addr, unsigned long size)
+{
+   if (dma_memory_generic_decrypted_ops.unmap)
+   dma_memory_generic_decrypted_ops.unmap(addr);
+
+   return set_memory_encrypted((unsigned long)addr, size / PAGE_SIZE);
+}
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V2 09/14] HV/Vmbus: Initialize VMbus ring buffer for Isolation VM

2021-08-04 Thread Tianyu Lan
From: Tianyu Lan 

VMbus ring buffer are shared with host and it's need to
be accessed via extra address space of Isolation VM with
SNP support. This patch is to map the ring buffer
address in extra address space via ioremap(). HV host
visibility hvcall smears data in the ring buffer and
so reset the ring buffer memory to zero after calling
visibility hvcall.

Signed-off-by: Tianyu Lan 
---
 drivers/hv/Kconfig|  1 +
 drivers/hv/channel.c  | 10 +
 drivers/hv/hyperv_vmbus.h |  2 +
 drivers/hv/ring_buffer.c  | 84 ++-
 4 files changed, 79 insertions(+), 18 deletions(-)

diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index 66c794d92391..a8386998be40 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -7,6 +7,7 @@ config HYPERV
depends on X86 && ACPI && X86_LOCAL_APIC && HYPERVISOR_GUEST
select PARAVIRT
select X86_HV_CALLBACK_VECTOR
+   select VMAP_PFN
help
  Select this option to run Linux as a Hyper-V client operating
  system.
diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 4c4717c26240..60ef881a700c 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -712,6 +712,16 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
if (err)
goto error_clean_ring;
 
+   err = hv_ringbuffer_post_init(>outbound,
+ page, send_pages);
+   if (err)
+   goto error_free_gpadl;
+
+   err = hv_ringbuffer_post_init(>inbound,
+ [send_pages], recv_pages);
+   if (err)
+   goto error_free_gpadl;
+
/* Create and init the channel open message */
open_info = kzalloc(sizeof(*open_info) +
   sizeof(struct vmbus_channel_open_channel),
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 40bc0eff6665..15cd23a561f3 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -172,6 +172,8 @@ extern int hv_synic_cleanup(unsigned int cpu);
 /* Interface */
 
 void hv_ringbuffer_pre_init(struct vmbus_channel *channel);
+int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
+   struct page *pages, u32 page_cnt);
 
 int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
   struct page *pages, u32 pagecnt, u32 max_pkt_size);
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 2aee356840a2..d4f93fca1108 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -17,6 +17,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "hyperv_vmbus.h"
 
@@ -179,43 +181,89 @@ void hv_ringbuffer_pre_init(struct vmbus_channel *channel)
mutex_init(>outbound.ring_buffer_mutex);
 }
 
-/* Initialize the ring buffer. */
-int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
-  struct page *pages, u32 page_cnt, u32 max_pkt_size)
+int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
+  struct page *pages, u32 page_cnt)
 {
+   u64 physic_addr = page_to_pfn(pages) << PAGE_SHIFT;
+   unsigned long *pfns_wraparound;
+   void *vaddr;
int i;
-   struct page **pages_wraparound;
 
-   BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));
+   if (!hv_isolation_type_snp())
+   return 0;
+
+   physic_addr += ms_hyperv.shared_gpa_boundary;
 
/*
 * First page holds struct hv_ring_buffer, do wraparound mapping for
 * the rest.
 */
-   pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *),
+   pfns_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(unsigned long),
   GFP_KERNEL);
-   if (!pages_wraparound)
+   if (!pfns_wraparound)
return -ENOMEM;
 
-   pages_wraparound[0] = pages;
+   pfns_wraparound[0] = physic_addr >> PAGE_SHIFT;
for (i = 0; i < 2 * (page_cnt - 1); i++)
-   pages_wraparound[i + 1] = [i % (page_cnt - 1) + 1];
-
-   ring_info->ring_buffer = (struct hv_ring_buffer *)
-   vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP, PAGE_KERNEL);
-
-   kfree(pages_wraparound);
+   pfns_wraparound[i + 1] = (physic_addr >> PAGE_SHIFT) +
+   i % (page_cnt - 1) + 1;
 
-
-   if (!ring_info->ring_buffer)
+   vaddr = vmap_pfn(pfns_wraparound, page_cnt * 2 - 1, PAGE_KERNEL_IO);
+   kfree(pfns_wraparound);
+   if (!vaddr)
return -ENOMEM;
 
-   ring_info->ring_buffer->read_index =
-   ring_info->ring_buffer->write_index = 0;
+   /* Clean memory after setting host visibility. */
+   memset((void *)vaddr, 0x00, page_cnt * PAGE_SIZE);
+
+   ring_info->ring_buffer = (struct hv_ring_buffer *)vaddr;
+   ring_info->ring_buffer->read_index = 0;
+   

[PATCH V2 08/14] HV/Vmbus: Add SNP support for VMbus channel initiate message

2021-08-04 Thread Tianyu Lan
From: Tianyu Lan 

The monitor pages in the CHANNELMSG_INITIATE_CONTACT msg are shared
with host in Isolation VM and so it's necessary to use hvcall to set
them visible to host. In Isolation VM with AMD SEV SNP, the access
address should be in the extra space which is above shared gpa
boundary. So remap these pages into the extra address(pa +
shared_gpa_boundary). Introduce monitor_pages_va to store
the remap address and unmap these va when disconnect vmbus.

Signed-off-by: Tianyu Lan 
---
Change since v1:
* Not remap monitor pages in the non-SNP isolation VM.
---
 drivers/hv/connection.c   | 65 +++
 drivers/hv/hyperv_vmbus.h |  1 +
 2 files changed, 66 insertions(+)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 6d315c1465e0..bf0ac3167bd2 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "hyperv_vmbus.h"
@@ -104,6 +105,12 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo 
*msginfo, u32 version)
 
msg->monitor_page1 = virt_to_phys(vmbus_connection.monitor_pages[0]);
msg->monitor_page2 = virt_to_phys(vmbus_connection.monitor_pages[1]);
+
+   if (hv_isolation_type_snp()) {
+   msg->monitor_page1 += ms_hyperv.shared_gpa_boundary;
+   msg->monitor_page2 += ms_hyperv.shared_gpa_boundary;
+   }
+
msg->target_vcpu = hv_cpu_number_to_vp_number(VMBUS_CONNECT_CPU);
 
/*
@@ -148,6 +155,31 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo 
*msginfo, u32 version)
return -ECONNREFUSED;
}
 
+   if (hv_isolation_type_snp()) {
+   vmbus_connection.monitor_pages_va[0]
+   = vmbus_connection.monitor_pages[0];
+   vmbus_connection.monitor_pages[0]
+   = memremap(msg->monitor_page1, HV_HYP_PAGE_SIZE,
+  MEMREMAP_WB);
+   if (!vmbus_connection.monitor_pages[0])
+   return -ENOMEM;
+
+   vmbus_connection.monitor_pages_va[1]
+   = vmbus_connection.monitor_pages[1];
+   vmbus_connection.monitor_pages[1]
+   = memremap(msg->monitor_page2, HV_HYP_PAGE_SIZE,
+  MEMREMAP_WB);
+   if (!vmbus_connection.monitor_pages[1]) {
+   memunmap(vmbus_connection.monitor_pages[0]);
+   return -ENOMEM;
+   }
+
+   memset(vmbus_connection.monitor_pages[0], 0x00,
+  HV_HYP_PAGE_SIZE);
+   memset(vmbus_connection.monitor_pages[1], 0x00,
+  HV_HYP_PAGE_SIZE);
+   }
+
return ret;
 }
 
@@ -159,6 +191,7 @@ int vmbus_connect(void)
struct vmbus_channel_msginfo *msginfo = NULL;
int i, ret = 0;
__u32 version;
+   u64 pfn[2];
 
/* Initialize the vmbus connection */
vmbus_connection.conn_state = CONNECTING;
@@ -216,6 +249,16 @@ int vmbus_connect(void)
goto cleanup;
}
 
+   if (hv_is_isolation_supported()) {
+   pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
+   pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
+   if (hv_mark_gpa_visibility(2, pfn,
+   VMBUS_PAGE_VISIBLE_READ_WRITE)) {
+   ret = -EFAULT;
+   goto cleanup;
+   }
+   }
+
msginfo = kzalloc(sizeof(*msginfo) +
  sizeof(struct vmbus_channel_initiate_contact),
  GFP_KERNEL);
@@ -284,6 +327,8 @@ int vmbus_connect(void)
 
 void vmbus_disconnect(void)
 {
+   u64 pfn[2];
+
/*
 * First send the unload request to the host.
 */
@@ -303,6 +348,26 @@ void vmbus_disconnect(void)
vmbus_connection.int_page = NULL;
}
 
+   if (hv_is_isolation_supported()) {
+   if (vmbus_connection.monitor_pages_va[0]) {
+   memunmap(vmbus_connection.monitor_pages[0]);
+   vmbus_connection.monitor_pages[0]
+   = vmbus_connection.monitor_pages_va[0];
+   vmbus_connection.monitor_pages_va[0] = NULL;
+   }
+
+   if (vmbus_connection.monitor_pages_va[1]) {
+   memunmap(vmbus_connection.monitor_pages[1]);
+   vmbus_connection.monitor_pages[1]
+   = vmbus_connection.monitor_pages_va[1];
+   vmbus_connection.monitor_pages_va[1] = NULL;
+   }
+
+   pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
+   pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
+   hv_mark_gpa_visibility(2, pfn, VMBUS_PAGE_NOT_VISIBLE);
+

[PATCH V2 07/14] HV: Add ghcb hvcall support for SNP VM

2021-08-04 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides ghcb hvcall to handle VMBus
HVCALL_SIGNAL_EVENT and HVCALL_POST_MESSAGE
msg in SNP Isolation VM. Add such support.

Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/ivm.c   | 43 +
 arch/x86/include/asm/mshyperv.h |  1 +
 drivers/hv/connection.c |  6 -
 drivers/hv/hv.c |  8 +-
 include/asm-generic/mshyperv.h  | 29 ++
 5 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index a135020002fe..377729279a0b 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -15,6 +15,49 @@
 #include 
 #include 
 
+#define GHCB_USAGE_HYPERV_CALL 1
+
+u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
+{
+   union hv_ghcb *hv_ghcb;
+   void **ghcb_base;
+   unsigned long flags;
+
+   if (!ms_hyperv.ghcb_base)
+   return -EFAULT;
+
+   WARN_ON(in_nmi());
+
+   local_irq_save(flags);
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   hv_ghcb = (union hv_ghcb *)*ghcb_base;
+   if (!hv_ghcb) {
+   local_irq_restore(flags);
+   return -EFAULT;
+   }
+
+   hv_ghcb->ghcb.protocol_version = GHCB_PROTOCOL_MAX;
+   hv_ghcb->ghcb.ghcb_usage = GHCB_USAGE_HYPERV_CALL;
+
+   hv_ghcb->hypercall.outputgpa = (u64)output;
+   hv_ghcb->hypercall.hypercallinput.asuint64 = 0;
+   hv_ghcb->hypercall.hypercallinput.callcode = control;
+
+   if (input_size)
+   memcpy(hv_ghcb->hypercall.hypercalldata, input, input_size);
+
+   VMGEXIT();
+
+   hv_ghcb->ghcb.ghcb_usage = 0x;
+   memset(hv_ghcb->ghcb.save.valid_bitmap, 0,
+  sizeof(hv_ghcb->ghcb.save.valid_bitmap));
+
+   local_irq_restore(flags);
+
+   return hv_ghcb->hypercall.hypercalloutput.callstatus;
+}
+EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);
+
 void hv_ghcb_msr_write(u64 msr, u64 value)
 {
union hv_ghcb *hv_ghcb;
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 3e14ff4c..8f7e2e3b7227 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -255,6 +255,7 @@ void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
 void hv_signal_eom_ghcb(void);
 void hv_ghcb_msr_write(u64 msr, u64 value);
 void hv_ghcb_msr_read(u64 msr, u64 *value);
+u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size);
 
 #define hv_get_synint_state_ghcb(int_num, val) \
hv_sint_rdmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 5e479d54918c..6d315c1465e0 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -447,6 +447,10 @@ void vmbus_set_event(struct vmbus_channel *channel)
 
++channel->sig_events;
 
-   hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
+   if (hv_isolation_type_snp())
+   hv_ghcb_hypercall(HVCALL_SIGNAL_EVENT, >sig_event,
+   NULL, sizeof(u64));
+   else
+   hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
 }
 EXPORT_SYMBOL_GPL(vmbus_set_event);
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 59f7173c4d9f..e5c9fc467893 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -98,7 +98,13 @@ int hv_post_message(union hv_connection_id connection_id,
aligned_msg->payload_size = payload_size;
memcpy((void *)aligned_msg->payload, payload, payload_size);
 
-   status = hv_do_hypercall(HVCALL_POST_MESSAGE, aligned_msg, NULL);
+   if (hv_isolation_type_snp())
+   status = hv_ghcb_hypercall(HVCALL_POST_MESSAGE,
+   (void *)aligned_msg, NULL,
+   sizeof(struct hv_input_post_message));
+   else
+   status = hv_do_hypercall(HVCALL_POST_MESSAGE,
+   aligned_msg, NULL);
 
/* Preemption must remain disabled until after the hypercall
 * so some other thread can't get scheduled onto this cpu and
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index b0cfc25dffaa..317d2a8d9700 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -31,6 +31,35 @@
 
 union hv_ghcb {
struct ghcb ghcb;
+   struct {
+   u64 hypercalldata[509];
+   u64 outputgpa;
+   union {
+   union {
+   struct {
+   u32 callcode: 16;
+   u32 isfast  : 1;
+   u32 reserved1   : 14;
+   u32 isnested: 1;
+   u32 countofelements : 12;
+   u32 reserved2 

[PATCH V2 06/14] HV: Add Write/Read MSR registers via ghcb page

2021-08-04 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides GHCB protocol to write Synthetic Interrupt
Controller MSR registers in Isolation VM with AMD SEV SNP
and these registers are emulated by hypervisor directly.
Hyper-V requires to write SINTx MSR registers twice. First
writes MSR via GHCB page to communicate with hypervisor
and then writes wrmsr instruction to talk with paravisor
which runs in VMPL0. Guest OS ID MSR also needs to be set
via GHCB.

Signed-off-by: Tianyu Lan 
---
Change since v1:
 * Introduce sev_es_ghcb_hv_call_simple() and share code
   between SEV and Hyper-V code.

Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/hv_init.c   |  24 +--
 arch/x86/hyperv/ivm.c   | 110 +
 arch/x86/include/asm/mshyperv.h |  78 +++-
 arch/x86/include/asm/sev.h  |   3 +
 arch/x86/kernel/cpu/mshyperv.c  |   3 +
 arch/x86/kernel/sev-shared.c|  63 ++---
 drivers/hv/hv.c | 121 ++--
 include/asm-generic/mshyperv.h  |  12 +++-
 8 files changed, 326 insertions(+), 88 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 57962d407484..1e35979370a4 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -382,7 +382,7 @@ void __init hyperv_init(void)
goto clean_guest_os_id;
 
if (hv_isolation_type_snp()) {
-   ms_hyperv.ghcb_base = alloc_percpu(void *);
+   ms_hyperv.ghcb_base = alloc_percpu(union hv_ghcb __percpu *);
if (!ms_hyperv.ghcb_base)
goto clean_guest_os_id;
 
@@ -479,6 +479,7 @@ void hyperv_cleanup(void)
 
/* Reset our OS id */
wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
+   hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
 
/*
 * Reset hypercall page reference before reset the page,
@@ -552,24 +553,3 @@ bool hv_is_hyperv_initialized(void)
return hypercall_msr.enable;
 }
 EXPORT_SYMBOL_GPL(hv_is_hyperv_initialized);
-
-enum hv_isolation_type hv_get_isolation_type(void)
-{
-   if (!(ms_hyperv.priv_high & HV_ISOLATION))
-   return HV_ISOLATION_TYPE_NONE;
-   return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
-}
-EXPORT_SYMBOL_GPL(hv_get_isolation_type);
-
-bool hv_is_isolation_supported(void)
-{
-   return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
-}
-
-DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
-
-bool hv_isolation_type_snp(void)
-{
-   return static_branch_unlikely(_type_snp);
-}
-EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 6d130ba03f41..a135020002fe 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -6,6 +6,8 @@
  *  Tianyu Lan 
  */
 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -13,6 +15,114 @@
 #include 
 #include 
 
+void hv_ghcb_msr_write(u64 msr, u64 value)
+{
+   union hv_ghcb *hv_ghcb;
+   void **ghcb_base;
+   unsigned long flags;
+
+   if (!ms_hyperv.ghcb_base)
+   return;
+
+   WARN_ON(in_nmi());
+
+   local_irq_save(flags);
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   hv_ghcb = (union hv_ghcb *)*ghcb_base;
+   if (!hv_ghcb) {
+   local_irq_restore(flags);
+   return;
+   }
+
+   ghcb_set_rcx(_ghcb->ghcb, msr);
+   ghcb_set_rax(_ghcb->ghcb, lower_32_bits(value));
+   ghcb_set_rdx(_ghcb->ghcb, value >> 32);
+
+   if (sev_es_ghcb_hv_call_simple(_ghcb->ghcb, SVM_EXIT_MSR, 1, 0))
+   pr_warn("Fail to write msr via ghcb %llx.\n", msr);
+
+   local_irq_restore(flags);
+}
+
+void hv_ghcb_msr_read(u64 msr, u64 *value)
+{
+   union hv_ghcb *hv_ghcb;
+   void **ghcb_base;
+   unsigned long flags;
+
+   if (!ms_hyperv.ghcb_base)
+   return;
+
+   WARN_ON(in_nmi());
+
+   local_irq_save(flags);
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   hv_ghcb = (union hv_ghcb *)*ghcb_base;
+   if (!hv_ghcb) {
+   local_irq_restore(flags);
+   return;
+   }
+
+   ghcb_set_rcx(_ghcb->ghcb, msr);
+   if (sev_es_ghcb_hv_call_simple(_ghcb->ghcb, SVM_EXIT_MSR, 0, 0))
+   pr_warn("Fail to read msr via ghcb %llx.\n", msr);
+   else
+   *value = (u64)lower_32_bits(hv_ghcb->ghcb.save.rax)
+   | ((u64)lower_32_bits(hv_ghcb->ghcb.save.rdx) << 32);
+   local_irq_restore(flags);
+}
+
+void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value)
+{
+   hv_ghcb_msr_read(msr, value);
+}
+EXPORT_SYMBOL_GPL(hv_sint_rdmsrl_ghcb);
+
+void hv_sint_wrmsrl_ghcb(u64 msr, u64 value)
+{
+   hv_ghcb_msr_write(msr, value);
+
+   /* Write proxy bit vua wrmsrl instruction. */
+   if (msr >= HV_X64_MSR_SINT0 && msr <= HV_X64_MSR_SINT15)
+   wrmsrl(msr, value | 1 << 20);
+}
+EXPORT_SYMBOL_GPL(hv_sint_wrmsrl_ghcb);
+
+void 

[PATCH V2 05/14] HV: Mark vmbus ring buffer visible to host in Isolation VM

2021-08-04 Thread Tianyu Lan
From: Tianyu Lan 

Mark vmbus ring buffer visible with set_memory_decrypted() when
establish gpadl handle.

Signed-off-by: Tianyu Lan 
---
 drivers/hv/channel.c   | 44 --
 include/linux/hyperv.h | 11 +++
 2 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index f3761c73b074..4c4717c26240 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -465,7 +466,14 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
*channel,
struct list_head *curr;
u32 next_gpadl_handle;
unsigned long flags;
-   int ret = 0;
+   int ret = 0, index;
+
+   index = atomic_inc_return(>gpadl_index) - 1;
+
+   if (index > VMBUS_GPADL_RANGE_COUNT - 1) {
+   pr_err("Gpadl handle position(%d) has been occupied.\n", index);
+   return -ENOSPC;
+   }
 
next_gpadl_handle =
(atomic_inc_return(_connection.next_gpadl_handle) - 1);
@@ -474,6 +482,13 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
*channel,
if (ret)
return ret;
 
+   ret = set_memory_decrypted((unsigned long)kbuffer,
+  HVPFN_UP(size));
+   if (ret) {
+   pr_warn("Failed to set host visibility.\n");
+   return ret;
+   }
+
init_completion(>waitevent);
msginfo->waiting_channel = channel;
 
@@ -539,6 +554,10 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
*channel,
/* At this point, we received the gpadl created msg */
*gpadl_handle = gpadlmsg->gpadl;
 
+   channel->gpadl_array[index].size = size;
+   channel->gpadl_array[index].buffer = kbuffer;
+   channel->gpadl_array[index].gpadlhandle = *gpadl_handle;
+
 cleanup:
spin_lock_irqsave(_connection.channelmsg_lock, flags);
list_del(>msglistentry);
@@ -549,6 +568,13 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
*channel,
}
 
kfree(msginfo);
+
+   if (ret) {
+   set_memory_encrypted((unsigned long)kbuffer,
+HVPFN_UP(size));
+   atomic_dec(>gpadl_index);
+   }
+
return ret;
 }
 
@@ -676,6 +702,7 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
 
/* Establish the gpadl for the ring buffer */
newchannel->ringbuffer_gpadlhandle = 0;
+   atomic_set(>gpadl_index, 0);
 
err = __vmbus_establish_gpadl(newchannel, HV_GPADL_RING,
  page_address(newchannel->ringbuffer_page),
@@ -811,7 +838,7 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 
gpadl_handle)
struct vmbus_channel_gpadl_teardown *msg;
struct vmbus_channel_msginfo *info;
unsigned long flags;
-   int ret;
+   int ret, i;
 
info = kzalloc(sizeof(*info) +
   sizeof(struct vmbus_channel_gpadl_teardown), GFP_KERNEL);
@@ -859,6 +886,19 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, 
u32 gpadl_handle)
spin_unlock_irqrestore(_connection.channelmsg_lock, flags);
 
kfree(info);
+
+   /* Find gpadl buffer virtual address and size. */
+   for (i = 0; i < VMBUS_GPADL_RANGE_COUNT; i++)
+   if (channel->gpadl_array[i].gpadlhandle == gpadl_handle)
+   break;
+
+   if (set_memory_encrypted((unsigned long)channel->gpadl_array[i].buffer,
+   HVPFN_UP(channel->gpadl_array[i].size)))
+   pr_warn("Fail to set mem host visibility.\n");
+
+   channel->gpadl_array[i].gpadlhandle = 0;
+   atomic_dec(>gpadl_index);
+
return ret;
 }
 EXPORT_SYMBOL_GPL(vmbus_teardown_gpadl);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 2e859d2f9609..cbe376b82de3 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -809,6 +809,14 @@ struct vmbus_device {
 
 #define VMBUS_DEFAULT_MAX_PKT_SIZE 4096
 
+struct vmbus_gpadl {
+   u32 gpadlhandle;
+   u32 size;
+   void *buffer;
+};
+
+#define VMBUS_GPADL_RANGE_COUNT3
+
 struct vmbus_channel {
struct list_head listentry;
 
@@ -829,6 +837,9 @@ struct vmbus_channel {
struct completion rescind_event;
 
u32 ringbuffer_gpadlhandle;
+   /* GPADL_RING and Send/Receive GPADL_BUFFER. */
+   struct vmbus_gpadl gpadl_array[VMBUS_GPADL_RANGE_COUNT];
+   atomic_t gpadl_index;
 
/* Allocated memory for ring buffer */
struct page *ringbuffer_page;
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V2 04/14] x86/HV: Add new hvcall guest address host visibility support

2021-08-04 Thread Tianyu Lan
From: Tianyu Lan 

Add new hvcall guest address host visibility support to mark
memory visible to host. Override x86_set_memory_enc static
call with hv hook to mark memory visible to host via set_
memory_decrypted().

Signed-off-by: Tianyu Lan 
---
Change since v1:
   * Use new staic call x86_set_memory_enc to avoid add Hyper-V
 specific check in the set_memory code.

Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/Makefile   |   2 +-
 arch/x86/hyperv/hv_init.c  |   6 ++
 arch/x86/hyperv/ivm.c  | 114 +
 arch/x86/include/asm/hyperv-tlfs.h |  20 +
 arch/x86/include/asm/mshyperv.h|   4 +-
 include/asm-generic/hyperv-tlfs.h  |   1 +
 6 files changed, 145 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/hyperv/ivm.c

diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
index 48e2c51464e8..5d2de10809ae 100644
--- a/arch/x86/hyperv/Makefile
+++ b/arch/x86/hyperv/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-y  := hv_init.o mmu.o nested.o irqdomain.o
+obj-y  := hv_init.o mmu.o nested.o irqdomain.o ivm.o
 obj-$(CONFIG_X86_64)   += hv_apic.o hv_proc.o
 
 ifdef CONFIG_X86_64
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 247df301491f..57962d407484 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -29,6 +30,7 @@
 #include 
 #include 
 #include 
+#include 
 
 int hyperv_init_cpuhp;
 u64 hv_current_partition_id = ~0ull;
@@ -450,6 +452,10 @@ void __init hyperv_init(void)
 
/* Query the VMs extended capability once, so that it can be cached. */
hv_query_ext_cap(0);
+
+   if (hv_is_isolation_supported())
+   static_call_update(x86_set_memory_enc, hv_set_mem_enc);
+
return;
 
 clean_guest_os_id:
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
new file mode 100644
index ..6d130ba03f41
--- /dev/null
+++ b/arch/x86/hyperv/ivm.c
@@ -0,0 +1,114 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Hyper-V Isolation VM interface with paravisor and hypervisor
+ *
+ * Author:
+ *  Tianyu Lan 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * hv_mark_gpa_visibility - Set pages visible to host via hvcall.
+ *
+ * In Isolation VM, all guest memory is encripted from host and guest
+ * needs to set memory visible to host via hvcall before sharing memory
+ * with host.
+ */
+int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
+  enum hv_mem_host_visibility visibility)
+{
+   struct hv_gpa_range_for_visibility **input_pcpu, *input;
+   u16 pages_processed;
+   u64 hv_status;
+   unsigned long flags;
+
+   /* no-op if partition isolation is not enabled */
+   if (!hv_is_isolation_supported())
+   return 0;
+
+   if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
+   pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
+   HV_MAX_MODIFY_GPA_REP_COUNT);
+   return -EINVAL;
+   }
+
+   local_irq_save(flags);
+   input_pcpu = (struct hv_gpa_range_for_visibility **)
+   this_cpu_ptr(hyperv_pcpu_input_arg);
+   input = *input_pcpu;
+   if (unlikely(!input)) {
+   local_irq_restore(flags);
+   return -EINVAL;
+   }
+
+   input->partition_id = HV_PARTITION_ID_SELF;
+   input->host_visibility = visibility;
+   input->reserved0 = 0;
+   input->reserved1 = 0;
+   memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
+   hv_status = hv_do_rep_hypercall(
+   HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
+   0, input, _processed);
+   local_irq_restore(flags);
+
+   if (!(hv_status & HV_HYPERCALL_RESULT_MASK))
+   return 0;
+
+   return hv_status & HV_HYPERCALL_RESULT_MASK;
+}
+EXPORT_SYMBOL(hv_mark_gpa_visibility);
+
+/*
+ * hv_set_mem_host_visibility - Set specified memory visible to host.
+ *
+ * In Isolation VM, all guest memory is encrypted from host and guest
+ * needs to set memory visible to host via hvcall before sharing memory
+ * with host. This function works as wrap of hv_mark_gpa_visibility()
+ * with memory base and size.
+ */
+static int hv_set_mem_host_visibility(void *kbuffer, int pagecount,
+ enum hv_mem_host_visibility visibility)
+{
+   u64 *pfn_array;
+   int ret = 0;
+   int i, pfn;
+
+   if (!hv_is_isolation_supported() || !ms_hyperv.ghcb_base)
+   return 0;
+
+   pfn_array = kzalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL);
+   if (!pfn_array)
+   return -ENOMEM;
+
+   for (i = 0, pfn = 0; i < pagecount; i++) {
+   pfn_array[pfn] = virt_to_hvpfn(kbuffer + i * 

[PATCH V2 03/14] x86/set_memory: Add x86_set_memory_enc static call support

2021-08-04 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V and other platforms(e.g Intel and AMD) want to override
the __set_memory_enc_dec(). Add x86_set_memory_enc static
call here and platforms can hook their implementation.

Signed-off-by: Tianyu Lan 
---
 arch/x86/include/asm/set_memory.h | 4 
 arch/x86/mm/pat/set_memory.c  | 9 +
 2 files changed, 13 insertions(+)

diff --git a/arch/x86/include/asm/set_memory.h 
b/arch/x86/include/asm/set_memory.h
index 43fa081a1adb..490f2cfc00fa 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 
 /*
  * The set_memory_* API can be used to change various attributes of a virtual
@@ -84,6 +85,9 @@ int set_direct_map_invalid_noflush(struct page *page);
 int set_direct_map_default_noflush(struct page *page);
 bool kernel_page_present(struct page *page);
 
+int dummy_set_memory_enc(unsigned long addr, int numpages, bool enc);
+DECLARE_STATIC_CALL(x86_set_memory_enc, dummy_set_memory_enc);
+
 extern int kernel_set_to_readonly;
 
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index ad8a5c586a35..68e9ab522cea 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -66,6 +67,9 @@ static const int cpa_warn_level = CPA_PROTECT;
  */
 static DEFINE_SPINLOCK(cpa_lock);
 
+static int default_set_memory_enc(unsigned long addr, int numpages, bool enc);
+DEFINE_STATIC_CALL(x86_set_memory_enc, default_set_memory_enc);
+
 #define CPA_FLUSHTLB 1
 #define CPA_ARRAY 2
 #define CPA_PAGES_ARRAY 4
@@ -1981,6 +1985,11 @@ int set_memory_global(unsigned long addr, int numpages)
 }
 
 static int __set_memory_enc_dec(unsigned long addr, int numpages, bool enc)
+{
+   return static_call(x86_set_memory_enc)(addr, numpages, enc);
+}
+
+static int default_set_memory_enc(unsigned long addr, int numpages, bool enc)
 {
struct cpa_data cpa;
int ret;
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V2 02/14] x86/HV: Initialize shared memory boundary in the Isolation VM.

2021-08-04 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V exposes shared memory boundary via cpuid
HYPERV_CPUID_ISOLATION_CONFIG and store it in the
shared_gpa_boundary of ms_hyperv struct. This prepares
to share memory with host for SNP guest.

Signed-off-by: Tianyu Lan 
---
 arch/x86/kernel/cpu/mshyperv.c |  2 ++
 include/asm-generic/mshyperv.h | 12 +++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index dcfbd2770d7f..773e84e134b3 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -327,6 +327,8 @@ static void __init ms_hyperv_init_platform(void)
if (ms_hyperv.priv_high & HV_ISOLATION) {
ms_hyperv.isolation_config_a = 
cpuid_eax(HYPERV_CPUID_ISOLATION_CONFIG);
ms_hyperv.isolation_config_b = 
cpuid_ebx(HYPERV_CPUID_ISOLATION_CONFIG);
+   ms_hyperv.shared_gpa_boundary =
+   (u64)1 << ms_hyperv.shared_gpa_boundary_bits;
 
pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 
0x%x\n",
ms_hyperv.isolation_config_a, 
ms_hyperv.isolation_config_b);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 4269f3174e58..aa26d24a5ca9 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -35,8 +35,18 @@ struct ms_hyperv_info {
u32 max_vp_index;
u32 max_lp_index;
u32 isolation_config_a;
-   u32 isolation_config_b;
+   union {
+   u32 isolation_config_b;
+   struct {
+   u32 cvm_type : 4;
+   u32 Reserved11 : 1;
+   u32 shared_gpa_boundary_active : 1;
+   u32 shared_gpa_boundary_bits : 6;
+   u32 Reserved12 : 20;
+   };
+   };
void  __percpu **ghcb_base;
+   u64 shared_gpa_boundary;
 };
 extern struct ms_hyperv_info ms_hyperv;
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V2 01/14] x86/HV: Initialize GHCB page in Isolation VM

2021-08-04 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V exposes GHCB page via SEV ES GHCB MSR for SNP guest
to communicate with hypervisor. Map GHCB page for all
cpus to read/write MSR register and submit hvcall request
via GHCB.

Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/hv_init.c   | 69 +++--
 arch/x86/include/asm/mshyperv.h |  2 +
 include/asm-generic/mshyperv.h  |  2 +
 3 files changed, 69 insertions(+), 4 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 4a643a85d570..247df301491f 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -42,6 +43,31 @@ static void *hv_hypercall_pg_saved;
 struct hv_vp_assist_page **hv_vp_assist_page;
 EXPORT_SYMBOL_GPL(hv_vp_assist_page);
 
+static int hyperv_init_ghcb(void)
+{
+   u64 ghcb_gpa;
+   void *ghcb_va;
+   void **ghcb_base;
+
+   if (!ms_hyperv.ghcb_base)
+   return -EINVAL;
+
+   /*
+* GHCB page is allocated by paravisor. The address
+* returned by MSR_AMD64_SEV_ES_GHCB is above shared
+* ghcb boundary and map it here.
+*/
+   rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
+   ghcb_va = memremap(ghcb_gpa, HV_HYP_PAGE_SIZE, MEMREMAP_WB);
+   if (!ghcb_va)
+   return -ENOMEM;
+
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   *ghcb_base = ghcb_va;
+
+   return 0;
+}
+
 static int hv_cpu_init(unsigned int cpu)
 {
struct hv_vp_assist_page **hvp = _vp_assist_page[smp_processor_id()];
@@ -75,6 +101,8 @@ static int hv_cpu_init(unsigned int cpu)
wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, val);
}
 
+   hyperv_init_ghcb();
+
return 0;
 }
 
@@ -167,6 +195,14 @@ static int hv_cpu_die(unsigned int cpu)
 {
struct hv_reenlightenment_control re_ctrl;
unsigned int new_cpu;
+   void **ghcb_va = NULL;
+
+   if (ms_hyperv.ghcb_base) {
+   ghcb_va = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   if (*ghcb_va)
+   memunmap(*ghcb_va);
+   *ghcb_va = NULL;
+   }
 
hv_common_cpu_die(cpu);
 
@@ -340,9 +376,22 @@ void __init hyperv_init(void)
VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
__builtin_return_address(0));
-   if (hv_hypercall_pg == NULL) {
-   wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
-   goto remove_cpuhp_state;
+   if (hv_hypercall_pg == NULL)
+   goto clean_guest_os_id;
+
+   if (hv_isolation_type_snp()) {
+   ms_hyperv.ghcb_base = alloc_percpu(void *);
+   if (!ms_hyperv.ghcb_base)
+   goto clean_guest_os_id;
+
+   if (hyperv_init_ghcb()) {
+   free_percpu(ms_hyperv.ghcb_base);
+   ms_hyperv.ghcb_base = NULL;
+   goto clean_guest_os_id;
+   }
+
+   /* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
+   hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
}
 
rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -403,7 +452,8 @@ void __init hyperv_init(void)
hv_query_ext_cap(0);
return;
 
-remove_cpuhp_state:
+clean_guest_os_id:
+   wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
cpuhp_remove_state(cpuhp);
 free_vp_assist_page:
kfree(hv_vp_assist_page);
@@ -431,6 +481,9 @@ void hyperv_cleanup(void)
 */
hv_hypercall_pg = NULL;
 
+   if (ms_hyperv.ghcb_base)
+   free_percpu(ms_hyperv.ghcb_base);
+
/* Reset the hypercall page */
hypercall_msr.as_uint64 = 0;
wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -506,3 +559,11 @@ bool hv_is_isolation_supported(void)
 {
return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
 }
+
+DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
+
+bool hv_isolation_type_snp(void)
+{
+   return static_branch_unlikely(_type_snp);
+}
+EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index adccbc209169..6627cfd2bfba 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -11,6 +11,8 @@
 #include 
 #include 
 
+DECLARE_STATIC_KEY_FALSE(isolation_type_snp);
+
 typedef int (*hyperv_fill_flush_list_func)(
struct hv_guest_mapping_flush_list *flush,
void *data);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index c1ab6a6e72b5..4269f3174e58 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -36,6 +36,7 @@ struct ms_hyperv_info {
u32 max_lp_index;
u32 isolation_config_a;
u32 isolation_config_b;
+   void  __percpu 

[PATCH V2 00/14] x86/Hyper-V: Add Hyper-V Isolation VM support

2021-08-04 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset
is to add support for these Isolation VM support in Linux.

The memory of these vms are encrypted and host can't access guest
memory directly. Hyper-V provides new host visibility hvcall and
the guest needs to call new hvcall to mark memory visible to host
before sharing memory with host. For security, all network/storage
stack memory should not be shared with host and so there is bounce
buffer requests.

Vmbus channel ring buffer already plays bounce buffer role because
all data from/to host needs to copy from/to between the ring buffer
and IO stack memory. So mark vmbus channel ring buffer visible.

There are two exceptions - packets sent by vmbus_sendpacket_
pagebuffer() and vmbus_sendpacket_mpb_desc(). These packets
contains IO stack memory address and host will access these memory.
So add allocation bounce buffer support in vmbus for these packets.

For SNP isolation VM, guest needs to access the shared memory via
extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_
ISOLATION_CONFIG. The access physical address of the shared memory
should be bounce buffer memory GPA plus with shared_gpa_boundary
reported by CPUID.

Change since V1:
   - Introduce x86_set_memory_enc static call and so platforms can
 override __set_memory_enc_dec() with their implementation
   - Introduce sev_es_ghcb_hv_call_simple() and share code
 between SEV and Hyper-V code.
   - Not remap monitor pages in the non-SNP isolation VM
   - Make swiotlb_init_io_tlb_mem() return error code and return
 error when dma_map_decrypted() fails.

Change since RFC V4:
   - Introduce dma map decrypted function to remap bounce buffer
  and provide dma map decrypted ops for platform to hook callback.  
  
   - Split swiotlb and dma map decrypted change into two patches
   - Replace vstart with vaddr in swiotlb changes.

Change since RFC v3:
   - Add interface set_memory_decrypted_map() to decrypt memory and
 map bounce buffer in extra address space
   - Remove swiotlb remap function and store the remap address
 returned by set_memory_decrypted_map() in swiotlb mem data structure.
   - Introduce hv_set_mem_enc() to make code more readable in the 
__set_memory_enc_dec().

Change since RFC v2:
   - Remove not UIO driver in Isolation VM patch
   - Use vmap_pfn() to replace ioremap_page_range function in
   order to avoid exposing symbol ioremap_page_range() and
   ioremap_page_range()
   - Call hv set mem host visibility hvcall in 
set_memory_encrypted/decrypted()
   - Enable swiotlb force mode instead of adding Hyper-V dma map/unmap hook
   - Fix code style


Tianyu Lan (14):
  x86/HV: Initialize GHCB page in Isolation VM
  x86/HV: Initialize shared memory boundary in the Isolation VM.
  x86/set_memory: Add x86_set_memory_enc static call support
  x86/HV: Add new hvcall guest address host visibility support
  HV: Mark vmbus ring buffer visible to host in Isolation VM
  HV: Add Write/Read MSR registers via ghcb page
  HV: Add ghcb hvcall support for SNP VM
  HV/Vmbus: Add SNP support for VMbus channel initiate message
  HV/Vmbus: Initialize VMbus ring buffer for Isolation VM
  DMA: Add dma_map_decrypted/dma_unmap_encrypted() function
  x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
  HV/IOMMU: Enable swiotlb bounce buffer for Isolation VM
  HV/Netvsc: Add Isolation VM support for netvsc driver
  HV/Storvsc: Add Isolation VM support for storvsc driver

 arch/x86/hyperv/Makefile   |   2 +-
 arch/x86/hyperv/hv_init.c  |  81 ++--
 arch/x86/hyperv/ivm.c  | 295 +
 arch/x86/include/asm/hyperv-tlfs.h |  20 ++
 arch/x86/include/asm/mshyperv.h|  87 -
 arch/x86/include/asm/set_memory.h  |   4 +
 arch/x86/include/asm/sev.h |   3 +
 arch/x86/kernel/cpu/mshyperv.c |   5 +
 arch/x86/kernel/sev-shared.c   |  63 +++---
 arch/x86/mm/pat/set_memory.c   |   9 +
 arch/x86/xen/pci-swiotlb-xen.c |   3 +-
 drivers/hv/Kconfig |   1 +
 drivers/hv/channel.c   |  54 +-
 drivers/hv/connection.c|  71 ++-
 drivers/hv/hv.c| 129 +
 drivers/hv/hyperv_vmbus.h  |   3 +
 drivers/hv/ring_buffer.c   |  84 ++--
 drivers/hv/vmbus_drv.c |   3 +
 drivers/iommu/hyperv-iommu.c   |  65 +++
 drivers/net/hyperv/hyperv_net.h|   6 +
 drivers/net/hyperv/netvsc.c| 144 +-
 drivers/net/hyperv/rndis_filter.c  |   2 +
 drivers/scsi/storvsc_drv.c |  68 ++-
 include/asm-generic/hyperv-tlfs.h  |   1 +
 include/asm-generic/mshyperv.h |  53 +-
 include/linux/dma-map-ops.h|   9 +
 include/linux/hyperv.h |  17 ++
 

[PATCH v3 25/25] iommu: Allow enabling non-strict mode dynamically

2021-08-04 Thread Robin Murphy
Allocating and enabling a flush queue is in fact something we can
reasonably do while a DMA domain is active, without having to rebuild it
from scratch. Thus we can allow a strict -> non-strict transition from
sysfs without requiring to unbind the device's driver, which is of
particular interest to users who want to make selective relaxations to
critical devices like the one serving their root filesystem.

Disabling and draining a queue also seems technically possible to
achieve without rebuilding the whole domain, but would certainly be more
involved. Furthermore there's not such a clear use-case for tightening
up security *after* the device may already have done whatever it is that
you don't trust it not to do, so we only consider the relaxation case.

CC: Sai Praneeth Prakhya 
Signed-off-by: Robin Murphy 

---

v3: Actually think about concurrency, rework most of the fq data
accesses to be (hopefully) safe and comment it all
---
 drivers/iommu/dma-iommu.c | 25 ++---
 drivers/iommu/iommu.c | 16 
 drivers/iommu/iova.c  |  9 ++---
 3 files changed, 36 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index f51b8dc99ac6..6b04dc765d91 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -310,6 +310,12 @@ static bool dev_is_untrusted(struct device *dev)
return dev_is_pci(dev) && to_pci_dev(dev)->untrusted;
 }
 
+/*
+ * Protected from concurrent sysfs updates by the mutex of the group who owns
+ * this domain. At worst it might theoretically be able to allocate two queues
+ * and leak one if you poke sysfs to race just right with iommu_setup_dma_ops()
+ * running for the first device in the group. Don't do that.
+ */
 int iommu_dma_init_fq(struct iommu_domain *domain)
 {
struct iommu_dma_cookie *cookie = domain->iova_cookie;
@@ -325,7 +331,12 @@ int iommu_dma_init_fq(struct iommu_domain *domain)
domain->type = IOMMU_DOMAIN_DMA;
return -ENODEV;
}
-   cookie->fq_domain = domain;
+   /*
+* Prevent incomplete iovad->fq being observable. Pairs with path from
+* __iommu_dma_unmap() through iommu_dma_free_iova() to queue_iova()
+*/
+   smp_wmb();
+   WRITE_ONCE(cookie->fq_domain, domain);
return 0;
 }
 
@@ -456,17 +467,17 @@ static dma_addr_t iommu_dma_alloc_iova(struct 
iommu_domain *domain,
 }
 
 static void iommu_dma_free_iova(struct iommu_dma_cookie *cookie,
-   dma_addr_t iova, size_t size, struct page *freelist)
+   dma_addr_t iova, size_t size, struct iommu_iotlb_gather *gather)
 {
struct iova_domain *iovad = >iovad;
 
/* The MSI case is only ever cleaning up its most recent allocation */
if (cookie->type == IOMMU_DMA_MSI_COOKIE)
cookie->msi_iova -= size;
-   else if (cookie->fq_domain) /* non-strict mode */
+   else if (gather && gather->queued)
queue_iova(iovad, iova_pfn(iovad, iova),
size >> iova_shift(iovad),
-   (unsigned long)freelist);
+   (unsigned long)gather->freelist);
else
free_iova_fast(iovad, iova_pfn(iovad, iova),
size >> iova_shift(iovad));
@@ -485,14 +496,14 @@ static void __iommu_dma_unmap(struct device *dev, 
dma_addr_t dma_addr,
dma_addr -= iova_off;
size = iova_align(iovad, size + iova_off);
iommu_iotlb_gather_init(_gather);
-   iotlb_gather.queued = cookie->fq_domain;
+   iotlb_gather.queued = READ_ONCE(cookie->fq_domain);
 
unmapped = iommu_unmap_fast(domain, dma_addr, size, _gather);
WARN_ON(unmapped != size);
 
-   if (!cookie->fq_domain)
+   if (!iotlb_gather.queued)
iommu_iotlb_sync(domain, _gather);
-   iommu_dma_free_iova(cookie, dma_addr, size, iotlb_gather.freelist);
+   iommu_dma_free_iova(cookie, dma_addr, size, _gather);
 }
 
 static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 480ad6a538a9..593d4555bc57 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3203,6 +3203,13 @@ static int iommu_change_dev_def_domain(struct 
iommu_group *group,
goto out;
}
 
+   /* We can bring up a flush queue without tearing down the domain */
+   if (type == IOMMU_DOMAIN_DMA_FQ && prev_dom->type == IOMMU_DOMAIN_DMA) {
+   prev_dom->type = IOMMU_DOMAIN_DMA_FQ;
+   ret = iommu_dma_init_fq(prev_dom);
+   goto out;
+   }
+
/* Sets group->default_domain to the newly allocated domain */
ret = iommu_group_alloc_default_domain(dev->bus, group, type);
if (ret)
@@ -3243,9 +3250,9 @@ static int iommu_change_dev_def_domain(struct iommu_group 
*group,
 }
 
 /*
- * Changing the 

[PATCH v3 24/25] iommu/dma: Factor out flush queue init

2021-08-04 Thread Robin Murphy
Factor out flush queue setup from the initial domain init so that we
can potentially trigger it from sysfs later on in a domain's lifetime.

Reviewed-by: Lu Baolu 
Reviewed-by: John Garry 
Signed-off-by: Robin Murphy 
---
 drivers/iommu/dma-iommu.c | 30 --
 include/linux/dma-iommu.h |  9 ++---
 2 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 2e19505dddf9..f51b8dc99ac6 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -310,6 +310,25 @@ static bool dev_is_untrusted(struct device *dev)
return dev_is_pci(dev) && to_pci_dev(dev)->untrusted;
 }
 
+int iommu_dma_init_fq(struct iommu_domain *domain)
+{
+   struct iommu_dma_cookie *cookie = domain->iova_cookie;
+
+   if (domain->type != IOMMU_DOMAIN_DMA_FQ)
+   return -EINVAL;
+   if (cookie->fq_domain)
+   return 0;
+
+   if (init_iova_flush_queue(>iovad, iommu_dma_flush_iotlb_all,
+ iommu_dma_entry_dtor)) {
+   pr_warn("iova flush queue initialization failed\n");
+   domain->type = IOMMU_DOMAIN_DMA;
+   return -ENODEV;
+   }
+   cookie->fq_domain = domain;
+   return 0;
+}
+
 /**
  * iommu_dma_init_domain - Initialise a DMA mapping domain
  * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
@@ -362,16 +381,7 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
}
 
init_iova_domain(iovad, 1UL << order, base_pfn);
-
-   if (domain->type == IOMMU_DOMAIN_DMA_FQ && !cookie->fq_domain) {
-   if (init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all,
- iommu_dma_entry_dtor)) {
-   pr_warn("iova flush queue initialization failed\n");
-   domain->type = IOMMU_DOMAIN_DMA;
-   } else {
-   cookie->fq_domain = domain;
-   }
-   }
+   iommu_dma_init_fq(domain);
 
return iova_reserve_iommu_regions(dev, domain);
 }
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 758ca4694257..81ab647f1618 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -20,6 +20,7 @@ void iommu_put_dma_cookie(struct iommu_domain *domain);
 
 /* Setup call for arch DMA mapping code */
 void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit);
+int iommu_dma_init_fq(struct iommu_domain *domain);
 
 /* The DMA API isn't _quite_ the whole story, though... */
 /*
@@ -37,9 +38,6 @@ void iommu_dma_compose_msi_msg(struct msi_desc *desc,
 
 void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list);
 
-void iommu_dma_free_cpu_cached_iovas(unsigned int cpu,
-   struct iommu_domain *domain);
-
 extern bool iommu_dma_forcedac;
 
 #else /* CONFIG_IOMMU_DMA */
@@ -54,6 +52,11 @@ static inline void iommu_setup_dma_ops(struct device *dev, 
u64 dma_base,
 {
 }
 
+static inline int iommu_dma_init_fq(struct iommu_domain *domain)
+{
+   return -EINVAL;
+}
+
 static inline int iommu_get_dma_cookie(struct iommu_domain *domain)
 {
return -ENODEV;
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 23/25] iommu: Merge strictness and domain type configs

2021-08-04 Thread Robin Murphy
To parallel the sysfs behaviour, merge the new build-time option
for DMA domain strictness into the default domain type choice.

Suggested-by: Joerg Roedel 
Reviewed-by: Lu Baolu 
Reviewed-by: Jean-Philippe Brucker 
Signed-off-by: Robin Murphy 

---

v3: Remember to update parameter documentation as well
---
 .../admin-guide/kernel-parameters.txt |  8 +-
 drivers/iommu/Kconfig | 80 +--
 drivers/iommu/iommu.c |  2 +-
 3 files changed, 44 insertions(+), 46 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 90b525cf0ec2..19192b39952a 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2045,11 +2045,9 @@
1 - Strict mode.
  DMA unmap operations invalidate IOMMU hardware TLBs
  synchronously.
-   unset - Use value of CONFIG_IOMMU_DEFAULT_{LAZY,STRICT}.
-   Note: on x86, the default behaviour depends on the
-   equivalent driver-specific parameters, but a strict
-   mode explicitly specified by either method takes
-   precedence.
+   unset - Use value of 
CONFIG_IOMMU_DEFAULT_DMA_{LAZY,STRICT}.
+   Note: on x86, strict mode specified via one of the
+   legacy driver-specific options takes precedence.
 
iommu.passthrough=
[ARM64, X86] Configure DMA to bypass the IOMMU by 
default.
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index c84da8205be7..6e06f876d75a 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -79,55 +79,55 @@ config IOMMU_DEBUGFS
  debug/iommu directory, and then populate a subdirectory with
  entries as required.
 
-config IOMMU_DEFAULT_PASSTHROUGH
-   bool "IOMMU passthrough by default"
-   depends on IOMMU_API
-   help
- Enable passthrough by default, removing the need to pass in
- iommu.passthrough=on or iommu=pt through command line. If this
- is enabled, you can still disable with iommu.passthrough=off
- or iommu=nopt depending on the architecture.
-
- If unsure, say N here.
-
 choice
-   prompt "IOMMU default DMA IOTLB invalidation mode"
-   depends on IOMMU_DMA
-
-   default IOMMU_DEFAULT_LAZY if (AMD_IOMMU || INTEL_IOMMU)
-   default IOMMU_DEFAULT_STRICT
+   prompt "IOMMU default domain type"
+   depends on IOMMU_API
+   default IOMMU_DEFAULT_DMA_LAZY if AMD_IOMMU || INTEL_IOMMU
+   default IOMMU_DEFAULT_DMA_STRICT
help
- This option allows an IOMMU DMA IOTLB invalidation mode to be
- chosen at build time, to override the default mode of each ARCH,
- removing the need to pass in kernel parameters through command line.
- It is still possible to provide common boot params to override this
- config.
+ Choose the type of IOMMU domain used to manage DMA API usage by
+ device drivers. The options here typically represent different
+ levels of tradeoff between robustness/security and performance,
+ depending on the IOMMU driver. Not all IOMMUs support all options.
+ This choice can be overridden at boot via the command line, and for
+ some devices also at runtime via sysfs.
 
  If unsure, keep the default.
 
-config IOMMU_DEFAULT_STRICT
-   bool "strict"
+config IOMMU_DEFAULT_DMA_STRICT
+   bool "Translated - Strict"
help
- For every IOMMU DMA unmap operation, the flush operation of IOTLB and
- the free operation of IOVA are guaranteed to be done in the unmap
- function.
+ Trusted devices use translation to restrict their access to only
+ DMA-mapped pages, with strict TLB invalidation on unmap. Equivalent
+ to passing "iommu.passthrough=0 iommu.strict=1" on the command line.
 
-config IOMMU_DEFAULT_LAZY
-   bool "lazy"
+ Untrusted devices always use this mode, with an additional layer of
+ bounce-buffering such that they cannot gain access to any unrelated
+ data within a mapped page.
+
+config IOMMU_DEFAULT_DMA_LAZY
+   bool "Translated - Lazy"
help
- Support lazy mode, where for every IOMMU DMA unmap operation, the
- flush operation of IOTLB and the free operation of IOVA are deferred.
- They are only guaranteed to be done before the related IOVA will be
- reused.
+ Trusted devices use translation to restrict their access to only
+ DMA-mapped pages, but with "lazy" batched TLB invalidation. This
+ mode allows higher performance with some IOMMUs due to reduced TLB
+ flushing, but at the cost of reduced isolation since devices 

[PATCH v3 22/25] iommu: Only log strictness for DMA domains

2021-08-04 Thread Robin Murphy
When passthrough is enabled, the default strictness policy becomes
irrelevant, since any subsequent runtime override to a DMA domain type
now embodies an explicit choice of strictness as well. Save on noise by
only logging the default policy when it is meaningfully in effect.

Reviewed-by: John Garry 
Reviewed-by: Lu Baolu 
Signed-off-by: Robin Murphy 
---
 drivers/iommu/iommu.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index b141161d5bbc..63face36fc49 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -144,10 +144,11 @@ static int __init iommu_subsys_init(void)
(iommu_cmd_line & IOMMU_CMD_LINE_DMA_API) ?
"(set via kernel command line)" : "");
 
-   pr_info("DMA domain TLB invalidation policy: %s mode %s\n",
-   iommu_dma_strict ? "strict" : "lazy",
-   (iommu_cmd_line & IOMMU_CMD_LINE_STRICT) ?
-   "(set via kernel command line)" : "");
+   if (!iommu_default_passthrough())
+   pr_info("DMA domain TLB invalidation policy: %s mode %s\n",
+   iommu_dma_strict ? "strict" : "lazy",
+   (iommu_cmd_line & IOMMU_CMD_LINE_STRICT) ?
+   "(set via kernel command line)" : "");
 
return 0;
 }
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 21/25] iommu: Expose DMA domain strictness via sysfs

2021-08-04 Thread Robin Murphy
The sysfs interface for default domain types exists primarily so users
can choose the performance/security tradeoff relevant to their own
workload. As such, the choice between the policies for DMA domains fits
perfectly as an additional point on that scale - downgrading a
particular device from a strict default to non-strict may be enough to
let it reach the desired level of performance, while still retaining
more peace of mind than with a wide-open identity domain. Now that we've
abstracted non-strict mode as a distinct type of DMA domain, allow it to
be chosen through the user interface as well.

CC: Sai Praneeth Prakhya 
Reviewed-by: Lu Baolu 
Reviewed-by: John Garry 
Signed-off-by: Robin Murphy 

---

v3: Summarise the implications in the documentation for completeness
---
 Documentation/ABI/testing/sysfs-kernel-iommu_groups | 6 +-
 drivers/iommu/iommu.c   | 2 ++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-kernel-iommu_groups 
b/Documentation/ABI/testing/sysfs-kernel-iommu_groups
index eae2f1c1e11e..b15af6a5bc08 100644
--- a/Documentation/ABI/testing/sysfs-kernel-iommu_groups
+++ b/Documentation/ABI/testing/sysfs-kernel-iommu_groups
@@ -42,8 +42,12 @@ Description: /sys/kernel/iommu_groups//type shows 
the type of default
  ==
DMA   All the DMA transactions from the device in this group
  are translated by the iommu.
+   DMA-FQAs above, but using batched invalidation to lazily
+ remove translations after use. This may offer reduced
+ overhead at the cost of reduced memory protection.
identity  All the DMA transactions from the device in this group
- are not translated by the iommu.
+ are not translated by the iommu. Maximum performance
+ but zero protection.
auto  Change to the type the device was booted with.
  ==
 
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 55ca5bf3cafc..b141161d5bbc 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3267,6 +3267,8 @@ static ssize_t iommu_group_store_type(struct iommu_group 
*group,
req_type = IOMMU_DOMAIN_IDENTITY;
else if (sysfs_streq(buf, "DMA"))
req_type = IOMMU_DOMAIN_DMA;
+   else if (sysfs_streq(buf, "DMA-FQ"))
+   req_type = IOMMU_DOMAIN_DMA_FQ;
else if (sysfs_streq(buf, "auto"))
req_type = 0;
else
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 20/25] iommu: Express DMA strictness via the domain type

2021-08-04 Thread Robin Murphy
Eliminate the iommu_get_dma_strict() indirection and pipe the
information through the domain type from the beginning. Besides
the flow simplification this also has several nice side-effects:

 - Automatically implies strict mode for untrusted devices by
   virtue of their IOMMU_DOMAIN_DMA override.
 - Ensures that we only end up using flush queues for drivers
   which are aware of them and can actually benefit.
 - Allows us to handle flush queue init failure by falling back
   to strict mode instead of leaving it to possibly blow up later.

Reviewed-by: Lu Baolu 
Signed-off-by: Robin Murphy 

---

v3: Remember to update iommu_def_domain_type accordingly from
iommu_set_dma_strict() too
---
 drivers/iommu/dma-iommu.c |  9 +
 drivers/iommu/iommu.c | 14 +-
 include/linux/iommu.h |  1 -
 3 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 207c8febdac9..2e19505dddf9 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -363,13 +363,14 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
 
init_iova_domain(iovad, 1UL << order, base_pfn);
 
-   if (!cookie->fq_domain && !dev_is_untrusted(dev) &&
-   domain->ops->flush_iotlb_all && !iommu_get_dma_strict(domain)) {
+   if (domain->type == IOMMU_DOMAIN_DMA_FQ && !cookie->fq_domain) {
if (init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all,
- iommu_dma_entry_dtor))
+ iommu_dma_entry_dtor)) {
pr_warn("iova flush queue initialization failed\n");
-   else
+   domain->type = IOMMU_DOMAIN_DMA;
+   } else {
cookie->fq_domain = domain;
+   }
}
 
return iova_reserve_iommu_regions(dev, domain);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 982545234cf3..55ca5bf3cafc 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -136,6 +136,9 @@ static int __init iommu_subsys_init(void)
}
}
 
+   if (!iommu_default_passthrough() && !iommu_dma_strict)
+   iommu_def_domain_type = IOMMU_DOMAIN_DMA_FQ;
+
pr_info("Default domain type: %s %s\n",
iommu_domain_type_str(iommu_def_domain_type),
(iommu_cmd_line & IOMMU_CMD_LINE_DMA_API) ?
@@ -355,17 +358,10 @@ early_param("iommu.strict", iommu_dma_setup);
 void iommu_set_dma_strict(void)
 {
iommu_dma_strict = true;
+   if (iommu_def_domain_type == IOMMU_DOMAIN_DMA_FQ)
+   iommu_def_domain_type = IOMMU_DOMAIN_DMA;
 }
 
-bool iommu_get_dma_strict(struct iommu_domain *domain)
-{
-   /* only allow lazy flushing for DMA domains */
-   if (domain->type == IOMMU_DOMAIN_DMA)
-   return iommu_dma_strict;
-   return true;
-}
-EXPORT_SYMBOL_GPL(iommu_get_dma_strict);
-
 static ssize_t iommu_group_attr_show(struct kobject *kobj,
 struct attribute *__attr, char *buf)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5629ae42951f..923a8d1c5e39 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -504,7 +504,6 @@ int iommu_set_pgtable_quirks(struct iommu_domain *domain,
unsigned long quirks);
 
 void iommu_set_dma_strict(void);
-bool iommu_get_dma_strict(struct iommu_domain *domain);
 
 extern int report_iommu_fault(struct iommu_domain *domain, struct device *dev,
  unsigned long iova, int flags);
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 19/25] iommu/vt-d: Prepare for multiple DMA domain types

2021-08-04 Thread Robin Murphy
In preparation for the strict vs. non-strict decision for DMA domains
to be expressed in the domain type, make sure we expose our flush queue
awareness by accepting the new domain type, and test the specific
feature flag where we want to identify DMA domains in general. The DMA
ops reset/setup can simply be made unconditional, since iommu-dma
already knows only to touch DMA domains.

Reviewed-by: Lu Baolu 
Signed-off-by: Robin Murphy 
---
 drivers/iommu/intel/iommu.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 7e168634c433..8fc46c9d6b96 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -582,7 +582,7 @@ struct intel_iommu *domain_get_iommu(struct dmar_domain 
*domain)
int iommu_id;
 
/* si_domain and vm domain should not get here. */
-   if (WARN_ON(domain->domain.type != IOMMU_DOMAIN_DMA))
+   if (WARN_ON(!iommu_is_dma_domain(>domain)))
return NULL;
 
for_each_domain_iommu(iommu_id, domain)
@@ -1034,7 +1034,7 @@ static struct dma_pte *pfn_to_dma_pte(struct dmar_domain 
*domain,
pteval = ((uint64_t)virt_to_dma_pfn(tmp_page) << 
VTD_PAGE_SHIFT) | DMA_PTE_READ | DMA_PTE_WRITE;
if (domain_use_first_level(domain)) {
pteval |= DMA_FL_PTE_XD | DMA_FL_PTE_US;
-   if (domain->domain.type == IOMMU_DOMAIN_DMA)
+   if (iommu_is_dma_domain(>domain))
pteval |= DMA_FL_PTE_ACCESS;
}
if (cmpxchg64(>val, 0ULL, pteval))
@@ -2345,7 +2345,7 @@ __domain_mapping(struct dmar_domain *domain, unsigned 
long iov_pfn,
if (domain_use_first_level(domain)) {
attr |= DMA_FL_PTE_XD | DMA_FL_PTE_US;
 
-   if (domain->domain.type == IOMMU_DOMAIN_DMA) {
+   if (iommu_is_dma_domain(>domain)) {
attr |= DMA_FL_PTE_ACCESS;
if (prot & DMA_PTE_WRITE)
attr |= DMA_FL_PTE_DIRTY;
@@ -4528,6 +4528,7 @@ static struct iommu_domain 
*intel_iommu_domain_alloc(unsigned type)
 
switch (type) {
case IOMMU_DOMAIN_DMA:
+   case IOMMU_DOMAIN_DMA_FQ:
case IOMMU_DOMAIN_UNMANAGED:
dmar_domain = alloc_domain(0);
if (!dmar_domain) {
@@ -5197,12 +5198,8 @@ static void intel_iommu_release_device(struct device 
*dev)
 
 static void intel_iommu_probe_finalize(struct device *dev)
 {
-   struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
-
-   if (domain && domain->type == IOMMU_DOMAIN_DMA)
-   iommu_setup_dma_ops(dev, 0, U64_MAX);
-   else
-   set_dma_ops(dev, NULL);
+   set_dma_ops(dev, NULL);
+   iommu_setup_dma_ops(dev, 0, U64_MAX);
 }
 
 static void intel_iommu_get_resv_regions(struct device *device,
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 18/25] iommu/arm-smmu: Prepare for multiple DMA domain types

2021-08-04 Thread Robin Murphy
In preparation for the strict vs. non-strict decision for DMA domains to
be expressed in the domain type, make sure we expose our flush queue
awareness by accepting the new domain type.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 1 +
 drivers/iommu/arm/arm-smmu/arm-smmu.c   | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index d9c93d8d193d..883a99cb10c1 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1972,6 +1972,7 @@ static struct iommu_domain 
*arm_smmu_domain_alloc(unsigned type)
 
if (type != IOMMU_DOMAIN_UNMANAGED &&
type != IOMMU_DOMAIN_DMA &&
+   type != IOMMU_DOMAIN_DMA_FQ &&
type != IOMMU_DOMAIN_IDENTITY)
return NULL;
 
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index a325d4769c17..1d013b1d7bb2 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -866,7 +866,8 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned 
type)
struct arm_smmu_domain *smmu_domain;
 
if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_IDENTITY) {
-   if (using_legacy_binding || type != IOMMU_DOMAIN_DMA)
+   if (using_legacy_binding ||
+   (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_DMA_FQ))
return NULL;
}
/*
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 17/25] iommu/amd: Prepare for multiple DMA domain types

2021-08-04 Thread Robin Murphy
The DMA ops reset/setup can simply be unconditional, since
iommu-dma already knows only to touch DMA domains.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/amd/iommu.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 92f7cbe3d14a..f8cd945f1776 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1706,14 +1706,9 @@ static struct iommu_device 
*amd_iommu_probe_device(struct device *dev)
 
 static void amd_iommu_probe_finalize(struct device *dev)
 {
-   struct iommu_domain *domain;
-
/* Domains are initialized for this device - have a look what we ended 
up with */
-   domain = iommu_get_domain_for_dev(dev);
-   if (domain->type == IOMMU_DOMAIN_DMA)
-   iommu_setup_dma_ops(dev, 0, U64_MAX);
-   else
-   set_dma_ops(dev, NULL);
+   set_dma_ops(dev, NULL);
+   iommu_setup_dma_ops(dev, 0, U64_MAX);
 }
 
 static void amd_iommu_release_device(struct device *dev)
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 16/25] iommu: Introduce explicit type for non-strict DMA domains

2021-08-04 Thread Robin Murphy
Promote the difference between strict and non-strict DMA domains from an
internal detail to a distinct domain feature and type, to pave the road
for exposing it through the sysfs default domain interface.

Reviewed-by: Lu Baolu 
Reviewed-by: Jean-Philippe Brucker 
Signed-off-by: Robin Murphy 
---
 drivers/iommu/dma-iommu.c |  2 +-
 drivers/iommu/iommu.c |  8 ++--
 include/linux/iommu.h | 11 +++
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index d63b30a7dc82..207c8febdac9 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1312,7 +1312,7 @@ void iommu_setup_dma_ops(struct device *dev, u64 
dma_base, u64 dma_limit)
 * The IOMMU core code allocates the default DMA domain, which the
 * underlying IOMMU driver needs to support via the dma-iommu layer.
 */
-   if (domain->type == IOMMU_DOMAIN_DMA) {
+   if (iommu_is_dma_domain(domain)) {
if (iommu_dma_init_domain(domain, dma_base, dma_limit, dev))
goto out_err;
dev->dma_ops = _dma_ops;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index fa8109369f74..982545234cf3 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -115,6 +115,7 @@ static const char *iommu_domain_type_str(unsigned int t)
case IOMMU_DOMAIN_UNMANAGED:
return "Unmanaged";
case IOMMU_DOMAIN_DMA:
+   case IOMMU_DOMAIN_DMA_FQ:
return "Translated";
default:
return "Unknown";
@@ -552,6 +553,9 @@ static ssize_t iommu_group_show_type(struct iommu_group 
*group,
case IOMMU_DOMAIN_DMA:
type = "DMA\n";
break;
+   case IOMMU_DOMAIN_DMA_FQ:
+   type = "DMA-FQ\n";
+   break;
}
}
mutex_unlock(>mutex);
@@ -765,7 +769,7 @@ static int iommu_create_device_direct_mappings(struct 
iommu_group *group,
unsigned long pg_size;
int ret = 0;
 
-   if (!domain || domain->type != IOMMU_DOMAIN_DMA)
+   if (!domain || !iommu_is_dma_domain(domain))
return 0;
 
BUG_ON(!domain->pgsize_bitmap);
@@ -1947,7 +1951,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct 
bus_type *bus,
/* Assume all sizes by default; the driver may override this later */
domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
 
-   if (type == IOMMU_DOMAIN_DMA && iommu_get_dma_cookie(domain)) {
+   if (iommu_is_dma_domain(domain) && iommu_get_dma_cookie(domain)) {
iommu_domain_free(domain);
domain = NULL;
}
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index f7679f6684b1..5629ae42951f 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -61,6 +61,7 @@ struct iommu_domain_geometry {
 #define __IOMMU_DOMAIN_DMA_API (1U << 1)  /* Domain for use in DMA-API
  implementation  */
 #define __IOMMU_DOMAIN_PT  (1U << 2)  /* Domain is identity mapped   */
+#define __IOMMU_DOMAIN_DMA_FQ  (1U << 3)  /* DMA-API uses flush queue*/
 
 /*
  * This are the possible domain-types
@@ -73,12 +74,17 @@ struct iommu_domain_geometry {
  * IOMMU_DOMAIN_DMA- Internally used for DMA-API implementations.
  *   This flag allows IOMMU drivers to implement
  *   certain optimizations for these domains
+ * IOMMU_DOMAIN_DMA_FQ - As above, but definitely using batched TLB
+ *   invalidation.
  */
 #define IOMMU_DOMAIN_BLOCKED   (0U)
 #define IOMMU_DOMAIN_IDENTITY  (__IOMMU_DOMAIN_PT)
 #define IOMMU_DOMAIN_UNMANAGED (__IOMMU_DOMAIN_PAGING)
 #define IOMMU_DOMAIN_DMA   (__IOMMU_DOMAIN_PAGING |\
 __IOMMU_DOMAIN_DMA_API)
+#define IOMMU_DOMAIN_DMA_FQ(__IOMMU_DOMAIN_PAGING |\
+__IOMMU_DOMAIN_DMA_API |   \
+__IOMMU_DOMAIN_DMA_FQ)
 
 struct iommu_domain {
unsigned type;
@@ -90,6 +96,11 @@ struct iommu_domain {
struct iommu_dma_cookie *iova_cookie;
 };
 
+static inline bool iommu_is_dma_domain(struct iommu_domain *domain)
+{
+   return domain->type & __IOMMU_DOMAIN_DMA_API;
+}
+
 enum iommu_cap {
IOMMU_CAP_CACHE_COHERENCY,  /* IOMMU can enforce cache coherent DMA
   transactions */
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 15/25] iommu/io-pgtable: Remove non-strict quirk

2021-08-04 Thread Robin Murphy
IO_PGTABLE_QUIRK_NON_STRICT was never a very comfortable fit, since it's
not a quirk of the pagetable format itself. Now that we have a more
appropriate way to convey non-strict unmaps, though, this last of the
non-quirk quirks can also go, and with the flush queue code also now
enforcing its own ordering we can have a lovely cleanup all round.

Signed-off-by: Robin Murphy 

---

v3: New
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  3 ---
 drivers/iommu/arm/arm-smmu/arm-smmu.c   |  3 ---
 drivers/iommu/io-pgtable-arm-v7s.c  | 12 ++--
 drivers/iommu/io-pgtable-arm.c  | 12 ++--
 include/linux/io-pgtable.h  |  5 -
 5 files changed, 4 insertions(+), 31 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 4c648da447bf..d9c93d8d193d 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2174,9 +2174,6 @@ static int arm_smmu_domain_finalise(struct iommu_domain 
*domain,
.iommu_dev  = smmu->dev,
};
 
-   if (!iommu_get_dma_strict(domain))
-   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
-
pgtbl_ops = alloc_io_pgtable_ops(fmt, _cfg, smmu_domain);
if (!pgtbl_ops)
return -ENOMEM;
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 970d9e4dcd69..a325d4769c17 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -765,9 +765,6 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
.iommu_dev  = smmu->dev,
};
 
-   if (!iommu_get_dma_strict(domain))
-   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
-
if (smmu->impl && smmu->impl->init_context) {
ret = smmu->impl->init_context(smmu_domain, _cfg, dev);
if (ret)
diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index 5db90d7ce2ec..e84478d39705 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -700,14 +700,7 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable 
*data,
ARM_V7S_BLOCK_SIZE(lvl + 1));
ptep = iopte_deref(pte[i], lvl, data);
__arm_v7s_free_table(ptep, lvl + 1, data);
-   } else if (iop->cfg.quirks & 
IO_PGTABLE_QUIRK_NON_STRICT) {
-   /*
-* Order the PTE update against queueing the 
IOVA, to
-* guarantee that a flush callback from a 
different CPU
-* has observed it before the TLBIALL can be 
issued.
-*/
-   smp_wmb();
-   } else {
+   } else if (!gather->queued) {
io_pgtable_tlb_add_page(iop, gather, iova, 
blk_size);
}
iova += blk_size;
@@ -791,8 +784,7 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct 
io_pgtable_cfg *cfg,
 
if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
IO_PGTABLE_QUIRK_NO_PERMS |
-   IO_PGTABLE_QUIRK_ARM_MTK_EXT |
-   IO_PGTABLE_QUIRK_NON_STRICT))
+   IO_PGTABLE_QUIRK_ARM_MTK_EXT))
return NULL;
 
/* If ARM_MTK_4GB is enabled, the NO_PERMS is also expected. */
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 053df4048a29..48a5bd8f571d 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -638,14 +638,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable 
*data,
io_pgtable_tlb_flush_walk(iop, iova + i * size, 
size,
  
ARM_LPAE_GRANULE(data));
__arm_lpae_free_pgtable(data, lvl + 1, 
iopte_deref(pte, data));
-   } else if (iop->cfg.quirks & 
IO_PGTABLE_QUIRK_NON_STRICT) {
-   /*
-* Order the PTE update against queueing the 
IOVA, to
-* guarantee that a flush callback from a 
different CPU
-* has observed it before the TLBIALL can be 
issued.
-*/
-   smp_wmb();
-   } else {
+   } else if (!gather->queued) {
io_pgtable_tlb_add_page(iop, gather, iova + i * 
size, size);
}
 
@@ -825,7 +818,6 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, 
void 

[PATCH v3 14/25] iommu: Indicate queued flushes via gather data

2021-08-04 Thread Robin Murphy
Since iommu_iotlb_gather exists to help drivers optimise flushing for a
given unmap request, it is also the logical place to indicate whether
the unmap is strict or not, and thus help them further optimise for
whether to expect a sync or a flush_all subsequently. As part of that,
it also seems fair to make the flush queue code take responsibility for
enforcing the really subtle ordering requirement it brings, so that we
don't need to worry about forgetting that if new drivers want to add
flush queue support, and can consolidate the existing versions.

While we're adding to the kerneldoc, also fill in some info for
@freelist which was overlooked previously.

Signed-off-by: Robin Murphy 

---

v3: New
---
 drivers/iommu/dma-iommu.c | 1 +
 drivers/iommu/iova.c  | 7 +++
 include/linux/iommu.h | 8 +++-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index e28396cea6eb..d63b30a7dc82 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -474,6 +474,7 @@ static void __iommu_dma_unmap(struct device *dev, 
dma_addr_t dma_addr,
dma_addr -= iova_off;
size = iova_align(iovad, size + iova_off);
iommu_iotlb_gather_init(_gather);
+   iotlb_gather.queued = cookie->fq_domain;
 
unmapped = iommu_unmap_fast(domain, dma_addr, size, _gather);
WARN_ON(unmapped != size);
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index b6cf5f16123b..2ad73fb2e94e 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -637,6 +637,13 @@ void queue_iova(struct iova_domain *iovad,
unsigned long flags;
unsigned idx;
 
+   /*
+* Order against the IOMMU driver's pagetable update from unmapping
+* @pte, to guarantee that iova_domain_flush() observes that if called
+* from a different CPU before we release the lock below.
+*/
+   smp_wmb();
+
spin_lock_irqsave(>lock, flags);
 
/*
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 141779d76035..f7679f6684b1 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -161,16 +161,22 @@ enum iommu_dev_features {
  * @start: IOVA representing the start of the range to be flushed
  * @end: IOVA representing the end of the range to be flushed (inclusive)
  * @pgsize: The interval at which to perform the flush
+ * @freelist: Removed pages to free after sync
+ * @queued: Indicates that the flush will be queued
  *
  * This structure is intended to be updated by multiple calls to the
  * ->unmap() function in struct iommu_ops before eventually being passed
- * into ->iotlb_sync().
+ * into ->iotlb_sync(). Drivers can add pages to @freelist to be freed after
+ * ->iotlb_sync() or ->iotlb_flush_all() have cleared all cached references to
+ * them. @queued is set to indicate when ->iotlb_flush_all() will be called
+ * later instead of ->iotlb_sync(), so drivers may optimise accordingly.
  */
 struct iommu_iotlb_gather {
unsigned long   start;
unsigned long   end;
size_t  pgsize;
struct page *freelist;
+   boolqueued;
 };
 
 /**
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 13/25] iommu/dma: Remove redundant "!dev" checks

2021-08-04 Thread Robin Murphy
iommu_dma_init_domain() is now only called from iommu_setup_dma_ops(),
which has already assumed dev to be non-NULL.

Reviewed-by: John Garry 
Reviewed-by: Lu Baolu 
Signed-off-by: Robin Murphy 
---
 drivers/iommu/dma-iommu.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 10067fbc4309..e28396cea6eb 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -363,7 +363,7 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
 
init_iova_domain(iovad, 1UL << order, base_pfn);
 
-   if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) &&
+   if (!cookie->fq_domain && !dev_is_untrusted(dev) &&
domain->ops->flush_iotlb_all && !iommu_get_dma_strict(domain)) {
if (init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all,
  iommu_dma_entry_dtor))
@@ -372,9 +372,6 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
cookie->fq_domain = domain;
}
 
-   if (!dev)
-   return 0;
-
return iova_reserve_iommu_regions(dev, domain);
 }
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 12/25] iommu/dma: Unexport IOVA cookie management

2021-08-04 Thread Robin Murphy
IOVA cookies are now got and put by core code, so we no longer need to
export these to modular drivers. The export for getting MSI cookies
stays, since VFIO can still be a module, but it was already relying on
someone else putting them, so that aspect is unaffected.

Reviewed-by: Lu Baolu 
Reviewed-by: Jean-Philippe Brucker 
Signed-off-by: Robin Murphy 
---
 drivers/iommu/dma-iommu.c | 7 ---
 drivers/iommu/iommu.c | 3 +--
 2 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 98ba927aee1a..10067fbc4309 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -98,9 +98,6 @@ static struct iommu_dma_cookie *cookie_alloc(enum 
iommu_dma_cookie_type type)
 /**
  * iommu_get_dma_cookie - Acquire DMA-API resources for a domain
  * @domain: IOMMU domain to prepare for DMA-API usage
- *
- * IOMMU drivers should normally call this from their domain_alloc
- * callback when domain->type == IOMMU_DOMAIN_DMA.
  */
 int iommu_get_dma_cookie(struct iommu_domain *domain)
 {
@@ -113,7 +110,6 @@ int iommu_get_dma_cookie(struct iommu_domain *domain)
 
return 0;
 }
-EXPORT_SYMBOL(iommu_get_dma_cookie);
 
 /**
  * iommu_get_msi_cookie - Acquire just MSI remapping resources
@@ -151,8 +147,6 @@ EXPORT_SYMBOL(iommu_get_msi_cookie);
  * iommu_put_dma_cookie - Release a domain's DMA mapping resources
  * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie() or
  *  iommu_get_msi_cookie()
- *
- * IOMMU drivers should normally call this from their domain_free callback.
  */
 void iommu_put_dma_cookie(struct iommu_domain *domain)
 {
@@ -172,7 +166,6 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
kfree(cookie);
domain->iova_cookie = NULL;
 }
-EXPORT_SYMBOL(iommu_put_dma_cookie);
 
 /**
  * iommu_dma_get_resv_regions - Reserved region driver helper
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index b65fcc66ffa4..fa8109369f74 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1947,8 +1947,7 @@ static struct iommu_domain *__iommu_domain_alloc(struct 
bus_type *bus,
/* Assume all sizes by default; the driver may override this later */
domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
 
-   /* Temporarily avoid -EEXIST while drivers still get their own cookies 
*/
-   if (type == IOMMU_DOMAIN_DMA && !domain->iova_cookie && 
iommu_get_dma_cookie(domain)) {
+   if (type == IOMMU_DOMAIN_DMA && iommu_get_dma_cookie(domain)) {
iommu_domain_free(domain);
domain = NULL;
}
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 11/25] iommu/virtio: Drop IOVA cookie management

2021-08-04 Thread Robin Murphy
The core code bakes its own cookies now.

Reviewed-by: Jean-Philippe Brucker 
Signed-off-by: Robin Murphy 
---
 drivers/iommu/virtio-iommu.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 6abdcab7273b..80930ce04a16 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -598,12 +598,6 @@ static struct iommu_domain *viommu_domain_alloc(unsigned 
type)
spin_lock_init(>mappings_lock);
vdomain->mappings = RB_ROOT_CACHED;
 
-   if (type == IOMMU_DOMAIN_DMA &&
-   iommu_get_dma_cookie(>domain)) {
-   kfree(vdomain);
-   return NULL;
-   }
-
return >domain;
 }
 
@@ -643,8 +637,6 @@ static void viommu_domain_free(struct iommu_domain *domain)
 {
struct viommu_domain *vdomain = to_viommu_domain(domain);
 
-   iommu_put_dma_cookie(domain);
-
/* Free all remaining mappings (size 2^64) */
viommu_del_mappings(vdomain, 0, 0);
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 10/25] iommu/sun50i: Drop IOVA cookie management

2021-08-04 Thread Robin Murphy
The core code bakes its own cookies now.

CC: Maxime Ripard 
Signed-off-by: Robin Murphy 

---

v3: Also remove unneeded include
---
 drivers/iommu/sun50i-iommu.c | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/iommu/sun50i-iommu.c b/drivers/iommu/sun50i-iommu.c
index 181bb1c3437c..92997021e188 100644
--- a/drivers/iommu/sun50i-iommu.c
+++ b/drivers/iommu/sun50i-iommu.c
@@ -7,7 +7,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -610,14 +609,10 @@ static struct iommu_domain 
*sun50i_iommu_domain_alloc(unsigned type)
if (!sun50i_domain)
return NULL;
 
-   if (type == IOMMU_DOMAIN_DMA &&
-   iommu_get_dma_cookie(_domain->domain))
-   goto err_free_domain;
-
sun50i_domain->dt = (u32 *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
get_order(DT_SIZE));
if (!sun50i_domain->dt)
-   goto err_put_cookie;
+   goto err_free_domain;
 
refcount_set(_domain->refcnt, 1);
 
@@ -627,10 +622,6 @@ static struct iommu_domain 
*sun50i_iommu_domain_alloc(unsigned type)
 
return _domain->domain;
 
-err_put_cookie:
-   if (type == IOMMU_DOMAIN_DMA)
-   iommu_put_dma_cookie(_domain->domain);
-
 err_free_domain:
kfree(sun50i_domain);
 
@@ -644,8 +635,6 @@ static void sun50i_iommu_domain_free(struct iommu_domain 
*domain)
free_pages((unsigned long)sun50i_domain->dt, get_order(DT_SIZE));
sun50i_domain->dt = NULL;
 
-   iommu_put_dma_cookie(domain);
-
kfree(sun50i_domain);
 }
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 09/25] iommu/sprd: Drop IOVA cookie management

2021-08-04 Thread Robin Murphy
The core code bakes its own cookies now.

CC: Chunyan Zhang 
Signed-off-by: Robin Murphy 

---

v3: Also remove unneeded include
---
 drivers/iommu/sprd-iommu.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/drivers/iommu/sprd-iommu.c b/drivers/iommu/sprd-iommu.c
index 73dfd9946312..27ac818b0354 100644
--- a/drivers/iommu/sprd-iommu.c
+++ b/drivers/iommu/sprd-iommu.c
@@ -8,7 +8,6 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -144,11 +143,6 @@ static struct iommu_domain 
*sprd_iommu_domain_alloc(unsigned int domain_type)
if (!dom)
return NULL;
 
-   if (iommu_get_dma_cookie(>domain)) {
-   kfree(dom);
-   return NULL;
-   }
-
spin_lock_init(>pgtlock);
 
dom->domain.geometry.aperture_start = 0;
@@ -161,7 +155,6 @@ static void sprd_iommu_domain_free(struct iommu_domain 
*domain)
 {
struct sprd_iommu_domain *dom = to_sprd_domain(domain);
 
-   iommu_put_dma_cookie(domain);
kfree(dom);
 }
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 08/25] iommu/rockchip: Drop IOVA cookie management

2021-08-04 Thread Robin Murphy
The core code bakes its own cookies now.

CC: Heiko Stuebner 
Signed-off-by: Robin Murphy 

---

v3: Also remove unneeded include
---
 drivers/iommu/rockchip-iommu.c | 12 +---
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index 9febfb7f3025..5cb260820eda 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -10,7 +10,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -1074,10 +1073,6 @@ static struct iommu_domain 
*rk_iommu_domain_alloc(unsigned type)
if (!rk_domain)
return NULL;
 
-   if (type == IOMMU_DOMAIN_DMA &&
-   iommu_get_dma_cookie(_domain->domain))
-   goto err_free_domain;
-
/*
 * rk32xx iommus use a 2 level pagetable.
 * Each level1 (dt) and level2 (pt) table has 1024 4-byte entries.
@@ -1085,7 +1080,7 @@ static struct iommu_domain 
*rk_iommu_domain_alloc(unsigned type)
 */
rk_domain->dt = (u32 *)get_zeroed_page(GFP_KERNEL | GFP_DMA32);
if (!rk_domain->dt)
-   goto err_put_cookie;
+   goto err_free_domain;
 
rk_domain->dt_dma = dma_map_single(dma_dev, rk_domain->dt,
   SPAGE_SIZE, DMA_TO_DEVICE);
@@ -1106,9 +1101,6 @@ static struct iommu_domain 
*rk_iommu_domain_alloc(unsigned type)
 
 err_free_dt:
free_page((unsigned long)rk_domain->dt);
-err_put_cookie:
-   if (type == IOMMU_DOMAIN_DMA)
-   iommu_put_dma_cookie(_domain->domain);
 err_free_domain:
kfree(rk_domain);
 
@@ -1137,8 +1129,6 @@ static void rk_iommu_domain_free(struct iommu_domain 
*domain)
 SPAGE_SIZE, DMA_TO_DEVICE);
free_page((unsigned long)rk_domain->dt);
 
-   if (domain->type == IOMMU_DOMAIN_DMA)
-   iommu_put_dma_cookie(_domain->domain);
kfree(rk_domain);
 }
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 07/25] iommu/mtk: Drop IOVA cookie management

2021-08-04 Thread Robin Murphy
The core code bakes its own cookies now.

CC: Yong Wu 
Signed-off-by: Robin Murphy 

---

v3: Also remove unneeded includes
---
 drivers/iommu/mtk_iommu.c| 7 ---
 drivers/iommu/mtk_iommu_v1.c | 1 -
 2 files changed, 8 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 6f7c69688ce2..185694eb4456 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -9,7 +9,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -441,17 +440,11 @@ static struct iommu_domain 
*mtk_iommu_domain_alloc(unsigned type)
if (!dom)
return NULL;
 
-   if (iommu_get_dma_cookie(>domain)) {
-   kfree(dom);
-   return NULL;
-   }
-
return >domain;
 }
 
 static void mtk_iommu_domain_free(struct iommu_domain *domain)
 {
-   iommu_put_dma_cookie(domain);
kfree(to_mtk_domain(domain));
 }
 
diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index 778e66f5f1aa..be22fcf988ce 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -13,7 +13,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 06/25] iommu/ipmmu-vmsa: Drop IOVA cookie management

2021-08-04 Thread Robin Murphy
The core code bakes its own cookies now.

CC: Yoshihiro Shimoda 
CC: Geert Uytterhoeven 
Signed-off-by: Robin Murphy 

---

v3: Also remove unneeded include
---
 drivers/iommu/ipmmu-vmsa.c | 28 
 1 file changed, 4 insertions(+), 24 deletions(-)

diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 51ea6f00db2f..d38ff29a76e8 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -8,7 +8,6 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -564,10 +563,13 @@ static irqreturn_t ipmmu_irq(int irq, void *dev)
  * IOMMU Operations
  */
 
-static struct iommu_domain *__ipmmu_domain_alloc(unsigned type)
+static struct iommu_domain *ipmmu_domain_alloc(unsigned type)
 {
struct ipmmu_vmsa_domain *domain;
 
+   if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
+   return NULL;
+
domain = kzalloc(sizeof(*domain), GFP_KERNEL);
if (!domain)
return NULL;
@@ -577,27 +579,6 @@ static struct iommu_domain *__ipmmu_domain_alloc(unsigned 
type)
return >io_domain;
 }
 
-static struct iommu_domain *ipmmu_domain_alloc(unsigned type)
-{
-   struct iommu_domain *io_domain = NULL;
-
-   switch (type) {
-   case IOMMU_DOMAIN_UNMANAGED:
-   io_domain = __ipmmu_domain_alloc(type);
-   break;
-
-   case IOMMU_DOMAIN_DMA:
-   io_domain = __ipmmu_domain_alloc(type);
-   if (io_domain && iommu_get_dma_cookie(io_domain)) {
-   kfree(io_domain);
-   io_domain = NULL;
-   }
-   break;
-   }
-
-   return io_domain;
-}
-
 static void ipmmu_domain_free(struct iommu_domain *io_domain)
 {
struct ipmmu_vmsa_domain *domain = to_vmsa_domain(io_domain);
@@ -606,7 +587,6 @@ static void ipmmu_domain_free(struct iommu_domain 
*io_domain)
 * Free the domain resources. We assume that all devices have already
 * been detached.
 */
-   iommu_put_dma_cookie(io_domain);
ipmmu_domain_destroy_context(domain);
free_io_pgtable_ops(domain->iop);
kfree(domain);
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 05/25] iommu/exynos: Drop IOVA cookie management

2021-08-04 Thread Robin Murphy
The core code bakes its own cookies now.

CC: Marek Szyprowski 
Signed-off-by: Robin Murphy 

---

v3: Also remove unneeded include
---
 drivers/iommu/exynos-iommu.c | 19 ---
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index d0fbf1d10e18..939ffa768986 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -21,7 +21,6 @@
 #include 
 #include 
 #include 
-#include 
 
 typedef u32 sysmmu_iova_t;
 typedef u32 sysmmu_pte_t;
@@ -735,20 +734,16 @@ static struct iommu_domain 
*exynos_iommu_domain_alloc(unsigned type)
/* Check if correct PTE offsets are initialized */
BUG_ON(PG_ENT_SHIFT < 0 || !dma_dev);
 
+   if (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_UNMANAGED)
+   return NULL;
+
domain = kzalloc(sizeof(*domain), GFP_KERNEL);
if (!domain)
return NULL;
 
-   if (type == IOMMU_DOMAIN_DMA) {
-   if (iommu_get_dma_cookie(>domain) != 0)
-   goto err_pgtable;
-   } else if (type != IOMMU_DOMAIN_UNMANAGED) {
-   goto err_pgtable;
-   }
-
domain->pgtable = (sysmmu_pte_t *)__get_free_pages(GFP_KERNEL, 2);
if (!domain->pgtable)
-   goto err_dma_cookie;
+   goto err_pgtable;
 
domain->lv2entcnt = (short *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 
1);
if (!domain->lv2entcnt)
@@ -779,9 +774,6 @@ static struct iommu_domain 
*exynos_iommu_domain_alloc(unsigned type)
free_pages((unsigned long)domain->lv2entcnt, 1);
 err_counter:
free_pages((unsigned long)domain->pgtable, 2);
-err_dma_cookie:
-   if (type == IOMMU_DOMAIN_DMA)
-   iommu_put_dma_cookie(>domain);
 err_pgtable:
kfree(domain);
return NULL;
@@ -809,9 +801,6 @@ static void exynos_iommu_domain_free(struct iommu_domain 
*iommu_domain)
 
spin_unlock_irqrestore(>lock, flags);
 
-   if (iommu_domain->type == IOMMU_DOMAIN_DMA)
-   iommu_put_dma_cookie(iommu_domain);
-
dma_unmap_single(dma_dev, virt_to_phys(domain->pgtable), LV1TABLE_SIZE,
 DMA_TO_DEVICE);
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 04/25] iommu/vt-d: Drop IOVA cookie management

2021-08-04 Thread Robin Murphy
The core code bakes its own cookies now.

Reviewed-by: Lu Baolu 
Signed-off-by: Robin Murphy 
---
 drivers/iommu/intel/iommu.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index c12cc955389a..7e168634c433 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1979,10 +1979,6 @@ static void domain_exit(struct dmar_domain *domain)
/* Remove associated devices and clear attached or cached domains */
domain_remove_dev_info(domain);
 
-   /* destroy iovas */
-   if (domain->domain.type == IOMMU_DOMAIN_DMA)
-   iommu_put_dma_cookie(>domain);
-
if (domain->pgd) {
struct page *freelist;
 
@@ -4544,10 +4540,6 @@ static struct iommu_domain 
*intel_iommu_domain_alloc(unsigned type)
return NULL;
}
 
-   if (type == IOMMU_DOMAIN_DMA &&
-   iommu_get_dma_cookie(_domain->domain))
-   return NULL;
-
domain = _domain->domain;
domain->geometry.aperture_start = 0;
domain->geometry.aperture_end   =
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 03/25] iommu/arm-smmu: Drop IOVA cookie management

2021-08-04 Thread Robin Murphy
The core code bakes its own cookies now.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  7 ---
 drivers/iommu/arm/arm-smmu/arm-smmu.c   | 15 ---
 drivers/iommu/arm/arm-smmu/qcom_iommu.c |  9 -
 3 files changed, 4 insertions(+), 27 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 6346f21726f4..4c648da447bf 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1984,12 +1984,6 @@ static struct iommu_domain 
*arm_smmu_domain_alloc(unsigned type)
if (!smmu_domain)
return NULL;
 
-   if (type == IOMMU_DOMAIN_DMA &&
-   iommu_get_dma_cookie(_domain->domain)) {
-   kfree(smmu_domain);
-   return NULL;
-   }
-
mutex_init(_domain->init_mutex);
INIT_LIST_HEAD(_domain->devices);
spin_lock_init(_domain->devices_lock);
@@ -2021,7 +2015,6 @@ static void arm_smmu_domain_free(struct iommu_domain 
*domain)
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
struct arm_smmu_device *smmu = smmu_domain->smmu;
 
-   iommu_put_dma_cookie(domain);
free_io_pgtable_ops(smmu_domain->pgtbl_ops);
 
/* Free the CD and ASID, if we allocated them */
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index ac21170fa208..970d9e4dcd69 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -868,10 +868,10 @@ static struct iommu_domain 
*arm_smmu_domain_alloc(unsigned type)
 {
struct arm_smmu_domain *smmu_domain;
 
-   if (type != IOMMU_DOMAIN_UNMANAGED &&
-   type != IOMMU_DOMAIN_DMA &&
-   type != IOMMU_DOMAIN_IDENTITY)
-   return NULL;
+   if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_IDENTITY) {
+   if (using_legacy_binding || type != IOMMU_DOMAIN_DMA)
+   return NULL;
+   }
/*
 * Allocate the domain and initialise some of its data structures.
 * We can't really do anything meaningful until we've added a
@@ -881,12 +881,6 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned 
type)
if (!smmu_domain)
return NULL;
 
-   if (type == IOMMU_DOMAIN_DMA && (using_legacy_binding ||
-   iommu_get_dma_cookie(_domain->domain))) {
-   kfree(smmu_domain);
-   return NULL;
-   }
-
mutex_init(_domain->init_mutex);
spin_lock_init(_domain->cb_lock);
 
@@ -901,7 +895,6 @@ static void arm_smmu_domain_free(struct iommu_domain 
*domain)
 * Free the domain resources. We assume that all devices have
 * already been detached.
 */
-   iommu_put_dma_cookie(domain);
arm_smmu_destroy_domain_context(domain);
kfree(smmu_domain);
 }
diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c 
b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
index 021cf8f65ffc..b91874cb6cf3 100644
--- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
+++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
@@ -10,7 +10,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -335,12 +334,6 @@ static struct iommu_domain 
*qcom_iommu_domain_alloc(unsigned type)
if (!qcom_domain)
return NULL;
 
-   if (type == IOMMU_DOMAIN_DMA &&
-   iommu_get_dma_cookie(_domain->domain)) {
-   kfree(qcom_domain);
-   return NULL;
-   }
-
mutex_init(_domain->init_mutex);
spin_lock_init(_domain->pgtbl_lock);
 
@@ -351,8 +344,6 @@ static void qcom_iommu_domain_free(struct iommu_domain 
*domain)
 {
struct qcom_iommu_domain *qcom_domain = to_qcom_iommu_domain(domain);
 
-   iommu_put_dma_cookie(domain);
-
if (qcom_domain->iommu) {
/*
 * NOTE: unmap can be called after client device is powered
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 01/25] iommu: Pull IOVA cookie management into the core

2021-08-04 Thread Robin Murphy
Now that everyone has converged on iommu-dma for IOMMU_DOMAIN_DMA
support, we can abandon the notion of drivers being responsible for the
cookie type, and consolidate all the management into the core code.

CC: Marek Szyprowski 
CC: Yoshihiro Shimoda 
CC: Geert Uytterhoeven 
CC: Yong Wu 
CC: Heiko Stuebner 
CC: Chunyan Zhang 
CC: Maxime Ripard 
Reviewed-by: Jean-Philippe Brucker 
Reviewed-by: Lu Baolu 
Signed-off-by: Robin Murphy 

---

v3: Use a simpler temporary check instead of trying to be clever with
the error code
---
 drivers/iommu/iommu.c | 7 +++
 include/linux/iommu.h | 3 ++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f2cda9950bd5..b65fcc66ffa4 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -7,6 +7,7 @@
 #define pr_fmt(fmt)"iommu: " fmt
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1946,6 +1947,11 @@ static struct iommu_domain *__iommu_domain_alloc(struct 
bus_type *bus,
/* Assume all sizes by default; the driver may override this later */
domain->pgsize_bitmap  = bus->iommu_ops->pgsize_bitmap;
 
+   /* Temporarily avoid -EEXIST while drivers still get their own cookies 
*/
+   if (type == IOMMU_DOMAIN_DMA && !domain->iova_cookie && 
iommu_get_dma_cookie(domain)) {
+   iommu_domain_free(domain);
+   domain = NULL;
+   }
return domain;
 }
 
@@ -1957,6 +1963,7 @@ EXPORT_SYMBOL_GPL(iommu_domain_alloc);
 
 void iommu_domain_free(struct iommu_domain *domain)
 {
+   iommu_put_dma_cookie(domain);
domain->ops->domain_free(domain);
 }
 EXPORT_SYMBOL_GPL(iommu_domain_free);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 4997c78e2670..141779d76035 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -40,6 +40,7 @@ struct iommu_domain;
 struct notifier_block;
 struct iommu_sva;
 struct iommu_fault_event;
+struct iommu_dma_cookie;
 
 /* iommu fault flags */
 #define IOMMU_FAULT_READ   0x0
@@ -86,7 +87,7 @@ struct iommu_domain {
iommu_fault_handler_t handler;
void *handler_token;
struct iommu_domain_geometry geometry;
-   void *iova_cookie;
+   struct iommu_dma_cookie *iova_cookie;
 };
 
 enum iommu_cap {
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 02/25] iommu/amd: Drop IOVA cookie management

2021-08-04 Thread Robin Murphy
The core code bakes its own cookies now.

Signed-off-by: Robin Murphy 

---

v3: Also remove unneeded include
---
 drivers/iommu/amd/iommu.c | 13 -
 1 file changed, 13 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 52fe2326042a..92f7cbe3d14a 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -20,7 +20,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -1918,16 +1917,7 @@ static struct iommu_domain 
*amd_iommu_domain_alloc(unsigned type)
domain->domain.geometry.aperture_end   = ~0ULL;
domain->domain.geometry.force_aperture = true;
 
-   if (type == IOMMU_DOMAIN_DMA &&
-   iommu_get_dma_cookie(>domain) == -ENOMEM)
-   goto free_domain;
-
return >domain;
-
-free_domain:
-   protection_domain_free(domain);
-
-   return NULL;
 }
 
 static void amd_iommu_domain_free(struct iommu_domain *dom)
@@ -1944,9 +1934,6 @@ static void amd_iommu_domain_free(struct iommu_domain 
*dom)
if (!dom)
return;
 
-   if (dom->type == IOMMU_DOMAIN_DMA)
-   iommu_put_dma_cookie(>domain);
-
if (domain->flags & PD_IOMMUV2_MASK)
free_gcr3_table(domain);
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 00/25] iommu: Refactor DMA domain strictness

2021-08-04 Thread Robin Murphy
v1: 
https://lore.kernel.org/linux-iommu/cover.1626888444.git.robin.mur...@arm.com/
v2: 
https://lore.kernel.org/linux-iommu/cover.1627468308.git.robin.mur...@arm.com/

Hi all,

Round 3, and the patch count has crept up yet again. But the overall
diffstat is even more negative, so that's good, right? :)

Once again, to driver/platform maintainers CC'd on cookie cleanup
patches this is just a heads-up and the rest of the changes should not
affect your platforms. I hope I've now fixed the silly bug which broke
bisection between patches #1 and #12 on 32-bit Arm.

The new patches are in the middle, reworking how the SMMU drivers and
io-pgtable implement non-strict mode such that the later changes fall
into place even more easily. Turns out I didn't need the major
refactoring of io-pgtable that I had in mind, and I'm almost kicking
myself that as soon as I put the option of *not* using the existing
quirk on the table, an even cleaner and more logical solution was
staring right out at me.

Due to that signifcant change and the consequent redesign of the final
patch to make dynamic switching look viable in the face of concurrency,
I have not applied the tested-by tags from v2. They were very much
appreciated though, thanks!

Proper changelogs on the individual patches this time since otherwise
I'd have lost track...

Cheers,
Robin.


CC: Marek Szyprowski 
CC: Yoshihiro Shimoda 
CC: Geert Uytterhoeven 
CC: Yong Wu 
CC: Heiko Stuebner 
CC: Chunyan Zhang 
CC: Maxime Ripard 
CC: Jean-Philippe Brucker 
CC: Sai Praneeth Prakhya 

Robin Murphy (25):
  iommu: Pull IOVA cookie management into the core
  iommu/amd: Drop IOVA cookie management
  iommu/arm-smmu: Drop IOVA cookie management
  iommu/vt-d: Drop IOVA cookie management
  iommu/exynos: Drop IOVA cookie management
  iommu/ipmmu-vmsa: Drop IOVA cookie management
  iommu/mtk: Drop IOVA cookie management
  iommu/rockchip: Drop IOVA cookie management
  iommu/sprd: Drop IOVA cookie management
  iommu/sun50i: Drop IOVA cookie management
  iommu/virtio: Drop IOVA cookie management
  iommu/dma: Unexport IOVA cookie management
  iommu/dma: Remove redundant "!dev" checks
  iommu: Indicate queued flushes via gather data
  iommu/io-pgtable: Remove non-strict quirk
  iommu: Introduce explicit type for non-strict DMA domains
  iommu/amd: Prepare for multiple DMA domain types
  iommu/arm-smmu: Prepare for multiple DMA domain types
  iommu/vt-d: Prepare for multiple DMA domain types
  iommu: Express DMA strictness via the domain type
  iommu: Expose DMA domain strictness via sysfs
  iommu: Only log strictness for DMA domains
  iommu: Merge strictness and domain type configs
  iommu/dma: Factor out flush queue init
  iommu: Allow enabling non-strict mode dynamically

 .../ABI/testing/sysfs-kernel-iommu_groups |  6 +-
 .../admin-guide/kernel-parameters.txt |  8 +-
 drivers/iommu/Kconfig | 80 +--
 drivers/iommu/amd/iommu.c | 22 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 11 +--
 rivers/iommu/arm/arm-smmu/arm-smmu.c | 19 ++---
 drivers/iommu/arm/arm-smmu/qcom_iommu.c   |  9 ---
 drivers/iommu/dma-iommu.c | 63 +--
 drivers/iommu/exynos-iommu.c  | 19 +
 drivers/iommu/intel/iommu.c   | 23 ++
 drivers/iommu/io-pgtable-arm-v7s.c| 12 +--
 drivers/iommu/io-pgtable-arm.c| 12 +--
 drivers/iommu/iommu.c | 55 -
 drivers/iommu/iova.c  | 12 ++-
 drivers/iommu/ipmmu-vmsa.c| 28 +--
 drivers/iommu/mtk_iommu.c |  7 --
 drivers/iommu/mtk_iommu_v1.c  |  1 -
 drivers/iommu/rockchip-iommu.c| 12 +--
 drivers/iommu/sprd-iommu.c|  7 --
 drivers/iommu/sun50i-iommu.c  | 13 +--
 drivers/iommu/virtio-iommu.c  |  8 --
 include/linux/dma-iommu.h |  9 ++-
 include/linux/io-pgtable.h|  5 --
 include/linux/iommu.h | 23 +-
 24 files changed, 187 insertions(+), 277 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2] /dev/iommu uAPI proposal

2021-08-04 Thread Eric Auger
Hi Kevin,

Few comments/questions below.

On 7/9/21 9:48 AM, Tian, Kevin wrote:
> /dev/iommu provides an unified interface for managing I/O page tables for 
> devices assigned to userspace. Device passthrough frameworks (VFIO, vDPA, 
> etc.) are expected to use this interface instead of creating their own logic 
> to 
> isolate untrusted device DMAs initiated by userspace. 
>
> This proposal describes the uAPI of /dev/iommu and also sample sequences 
> with VFIO as example in typical usages. The driver-facing kernel API provided 
> by the iommu layer is still TBD, which can be discussed after consensus is 
> made on this uAPI.
>
> It's based on a lengthy discussion starting from here:
>   
> https://lore.kernel.org/linux-iommu/20210330132830.go2356...@nvidia.com/ 
>
> v1 can be found here:
>   
> https://lore.kernel.org/linux-iommu/ph0pr12mb54811863b392c644e5365446dc...@ph0pr12mb5481.namprd12.prod.outlook.com/T/
>
> This doc is also tracked on github, though it's not very useful for v1->v2 
> given dramatic refactoring:
>   https://github.com/luxis1999/dev_iommu_uapi 
>
> Changelog (v1->v2):
> - Rename /dev/ioasid to /dev/iommu (Jason);
> - Add a section for device-centric vs. group-centric design (many);
> - Add a section for handling no-snoop DMA (Jason/Alex/Paolo);
> - Add definition of user/kernel/shared I/O page tables (Baolu/Jason);
> - Allow one device bound to multiple iommu fd's (Jason);
> - No need to track user I/O page tables in kernel on ARM/AMD (Jean/Jason);
> - Add a device cookie for iotlb invalidation and fault handling (Jean/Jason);
> - Add capability/format query interface per device cookie (Jason);
> - Specify format/attribute when creating an IOASID, leading to several v1
>   uAPI commands removed (Jason);
> - Explain the value of software nesting (Jean);
> - Replace IOASID_REGISTER_VIRTUAL_MEMORY with software nesting (David/Jason);
> - Cover software mdev usage (Jason);
> - No restriction on map/unmap vs. bind/invalidate (Jason/David);
> - Report permitted IOVA range instead of reserved range (David);
> - Refine the sample structures and helper functions (Jason);
> - Add definition of default and non-default I/O address spaces;
> - Expand and clarify the design for PASID virtualization;
> - and lots of subtle refinement according to above changes;
>
> TOC
> 
> 1. Terminologies and Concepts
> 1.1. Manage I/O address space
> 1.2. Attach device to I/O address space
> 1.3. Group isolation
> 1.4. PASID virtualization
> 1.4.1. Devices which don't support DMWr
> 1.4.2. Devices which support DMWr
> 1.4.3. Mix different types together
> 1.4.4. User sequence
> 1.5. No-snoop DMA
> 2. uAPI Proposal
> 2.1. /dev/iommu uAPI
> 2.2. /dev/vfio device uAPI
> 2.3. /dev/kvm uAPI
> 3. Sample Structures and Helper Functions
> 4. Use Cases and Flows
> 4.1. A simple example
> 4.2. Multiple IOASIDs (no nesting)
> 4.3. IOASID nesting (software)
> 4.4. IOASID nesting (hardware)
> 4.5. Guest SVA (vSVA)
> 4.6. I/O page fault
> 
>
> 1. Terminologies and Concepts
> -
>
> IOMMU fd is the container holding multiple I/O address spaces. User 
> manages those address spaces through fd operations. Multiple fd's are 
> allowed per process, but with this proposal one fd should be sufficient for 
> all intended usages.
>
> IOASID is the fd-local software handle representing an I/O address space. 
> Each IOASID is associated with a single I/O page table. IOASIDs can be 
> nested together, implying the output address from one I/O page table 
> (represented by child IOASID) must be further translated by another I/O 
> page table (represented by parent IOASID).
>
> An I/O address space takes effect only after it is attached by a device. 
> One device is allowed to attach to multiple I/O address spaces. One I/O 
> address space can be attached by multiple devices.
>
> Device must be bound to an IOMMU fd before attach operation can be
> conducted. Though not necessary, user could bind one device to multiple
> IOMMU FD's. But no cross-FD IOASID nesting is allowed.
>
> The format of an I/O page table must be compatible to the attached 
> devices (or more specifically to the IOMMU which serves the DMA from
> the attached devices). User is responsible for specifying the format
> when allocating an IOASID, according to one or multiple devices which
> will be attached right after. Attaching a device to an IOASID with 
> incompatible format is simply rejected.
>
> Relationship between IOMMU fd, VFIO fd and KVM fd:
>
> -   IOMMU fd provides uAPI for managing IOASIDs and I/O page tables. 
> It also provides an unified capability/format reporting interface for
> each bound device. 
>
> -   VFIO fd provides uAPI for device binding and attaching. In this proposal 
> VFIO is used as the example of device passthrough frameworks. The
> routing information that identifies an I/O 

Re: [PATCH v10 01/17] iova: Export alloc_iova_fast() and free_iova_fast()

2021-08-04 Thread Robin Murphy

On 2021-08-04 06:02, Yongji Xie wrote:

On Tue, Aug 3, 2021 at 6:54 PM Robin Murphy  wrote:


On 2021-08-03 09:54, Yongji Xie wrote:

On Tue, Aug 3, 2021 at 3:41 PM Jason Wang  wrote:



在 2021/7/29 下午3:34, Xie Yongji 写道:

Export alloc_iova_fast() and free_iova_fast() so that
some modules can use it to improve iova allocation efficiency.



It's better to explain why alloc_iova() is not sufficient here.



Fine.


What I fail to understand from the later patches is what the IOVA domain
actually represents. If the "device" is a userspace process then
logically the "IOVA" would be the userspace address, so presumably
somewhere you're having to translate between this arbitrary address
space and actual usable addresses - if you're worried about efficiency
surely it would be even better to not do that?



Yes, userspace daemon needs to translate the "IOVA" in a DMA
descriptor to the VA (from mmap(2)). But this actually doesn't affect
performance since it's an identical mapping in most cases.


I'm not familiar with the vhost_iotlb stuff, but it looks suspiciously 
like you're walking yet another tree to make those translations. Even if 
the buffer can be mapped all at once with a fixed offset such that each 
DMA mapping call doesn't need a lookup for each individual "IOVA" - that 
might be what's happening already, but it's a bit hard to follow just 
reading the patches in my mail client - vhost_iotlb_add_range() doesn't 
look like it's super-cheap to call, and you're serialising on a lock for 
that.


My main point, though, is that if you've already got something else 
keeping track of the actual addresses, then the way you're using an 
iova_domain appears to be something you could do with a trivial bitmap 
allocator. That's why I don't buy the efficiency argument. The main 
design points of the IOVA allocator are to manage large address spaces 
while trying to maximise spatial locality to minimise the underlying 
pagetable usage, and allocating with a flexible limit to support 
multiple devices with different addressing capabilities in the same 
address space. If none of those aspects are relevant to the use-case - 
which AFAICS appears to be true here - then as a general-purpose 
resource allocator it's rubbish and has an unreasonably massive memory 
overhead and there are many, many better choices.


FWIW I've recently started thinking about moving all the caching stuff 
out of iova_domain and into the iommu-dma layer since it's now a giant 
waste of space for all the other current IOVA users.



Presumably userspace doesn't have any concern about alignment and the
things we have to worry about for the DMA API in general, so it's pretty
much just allocating slots in a buffer, and there are far more effective
ways to do that than a full-blown address space manager.


Considering iova allocation efficiency, I think the iova allocator is
better here. In most cases, we don't even need to hold a spin lock
during iova allocation.


If you're going
to reuse any infrastructure I'd have expected it to be SWIOTLB rather
than the IOVA allocator. Because, y'know, you're *literally implementing
a software I/O TLB* ;)



But actually what we can reuse in SWIOTLB is the IOVA allocator.


Huh? Those are completely unrelated and orthogonal things - SWIOTLB does 
not use an external allocator (see find_slots()). By SWIOTLB I mean 
specifically the library itself, not dma-direct or any of the other 
users built around it. The functionality for managing slots in a buffer 
and bouncing data in and out can absolutely be reused - that's why users 
like the Xen and iommu-dma code *are* reusing it instead of open-coding 
their own versions.



And
the IOVA management in SWIOTLB is not what we want. For example,
SWIOTLB allocates and uses contiguous memory for bouncing, which is
not necessary in VDUSE case.


alloc_iova() allocates a contiguous (in IOVA address) region of space. 
In vduse_domain_map_page() you use it to allocate a contiguous region of 
space from your bounce buffer. Can you clarify how that is fundamentally 
different from allocating a contiguous region of space from a bounce 
buffer? Nobody's saying the underlying implementation details of where 
the buffer itself comes from can't be tweaked.



And VDUSE needs coherent mapping which is
not supported by the SWIOTLB. Besides, the SWIOTLB works in singleton
mode (designed for platform IOMMU) , but VDUSE is based on on-chip
IOMMU (supports multiple instances).
That's not entirely true - the IOMMU bounce buffering scheme introduced 
in intel-iommu and now moved into the iommu-dma layer was already a step 
towards something conceptually similar. It does still rely on stealing 
the underlying pages from the global SWIOTLB pool at the moment, but the 
bouncing is effectively done in a per-IOMMU-domain context.


The next step is currently queued in linux-next, wherein we can now have 
individual per-device SWIOTLB pools. In fact at that point I think you 
might actually 

Re: [RFC v2] /dev/iommu uAPI proposal

2021-08-04 Thread Jason Gunthorpe via iommu
On Tue, Aug 03, 2021 at 11:58:54AM +1000, David Gibson wrote:
> > I'd rather deduce the endpoint from a collection of devices than the
> > other way around...
> 
> Which I think is confusing, and in any case doesn't cover the case of
> one "device" with multiple endpoints.

Well they are both confusing, and I'd prefer to focus on the common
case without extra mandatory steps. Exposing optional endpoint sharing
information seems more in line with where everything is going than
making endpoint sharing a first class object.

AFAIK a device with multiple endpoints where those endpoints are
shared with other devices doesn't really exist/or is useful? Eg PASID
has multiple RIDs by they are not shared.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v2] /dev/iommu uAPI proposal

2021-08-04 Thread Jason Gunthorpe via iommu
On Mon, Aug 02, 2021 at 02:49:44AM +, Tian, Kevin wrote:

> Can you elaborate? IMO the user only cares about the label (device cookie 
> plus optional vPASID) which is generated by itself when doing the attaching
> call, and expects this virtual label being used in various spots 
> (invalidation,
> page fault, etc.). How the system labels the traffic (the physical RID or RID+
> PASID) should be completely invisible to userspace.

I don't think that is true if the vIOMMU driver is also emulating
PASID. Presumably the same is true for other PASID-like schemes.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 10/17] virtio: Handle device reset failure in register_virtio_device()

2021-08-04 Thread Yongji Xie
On Wed, Aug 4, 2021 at 4:54 PM Jason Wang  wrote:
>
>
> 在 2021/8/4 下午4:50, Yongji Xie 写道:
> > On Wed, Aug 4, 2021 at 4:32 PM Jason Wang  wrote:
> >>
> >> 在 2021/8/3 下午5:38, Yongji Xie 写道:
> >>> On Tue, Aug 3, 2021 at 4:09 PM Jason Wang  wrote:
>  在 2021/7/29 下午3:34, Xie Yongji 写道:
> > The device reset may fail in virtio-vdpa case now, so add checks to
> > its return value and fail the register_virtio_device().
>  So the reset() would be called by the driver during remove as well, or
>  is it sufficient to deal only with the reset during probe?
> 
> >>> Actually there is no way to handle failure during removal. And it
> >>> should be safe with the protection of software IOTLB even if the
> >>> reset() fails.
> >>>
> >>> Thanks,
> >>> Yongji
> >>
> >> If this is true, does it mean we don't even need to care about reset
> >> failure?
> >>
> > But we need to handle the failure in the vhost-vdpa case, isn't it?
>
>
> Yes, but:
>
> - This patch is for virtio not for vhost, if we don't care virtio, we
> can avoid the changes
> - For vhost, there could be two ways probably:
>
> 1) let the set_status to report error
> 2) require userspace to re-read for status
>
> It looks to me you want to go with 1) and I'm not sure whether or not
> it's too late to go with 2).
>

Looks like 2) can't work if reset failure happens in
vhost_vdpa_release() and vhost_vdpa_open().

Thanks,
Yongji
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v10 10/17] virtio: Handle device reset failure in register_virtio_device()

2021-08-04 Thread Jason Wang


在 2021/8/4 下午4:50, Yongji Xie 写道:

On Wed, Aug 4, 2021 at 4:32 PM Jason Wang  wrote:


在 2021/8/3 下午5:38, Yongji Xie 写道:

On Tue, Aug 3, 2021 at 4:09 PM Jason Wang  wrote:

在 2021/7/29 下午3:34, Xie Yongji 写道:

The device reset may fail in virtio-vdpa case now, so add checks to
its return value and fail the register_virtio_device().

So the reset() would be called by the driver during remove as well, or
is it sufficient to deal only with the reset during probe?


Actually there is no way to handle failure during removal. And it
should be safe with the protection of software IOTLB even if the
reset() fails.

Thanks,
Yongji


If this is true, does it mean we don't even need to care about reset
failure?


But we need to handle the failure in the vhost-vdpa case, isn't it?



Yes, but:

- This patch is for virtio not for vhost, if we don't care virtio, we 
can avoid the changes

- For vhost, there could be two ways probably:

1) let the set_status to report error
2) require userspace to re-read for status

It looks to me you want to go with 1) and I'm not sure whether or not 
it's too late to go with 2).


Thanks




Thanks,
Yongji



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v10 10/17] virtio: Handle device reset failure in register_virtio_device()

2021-08-04 Thread Yongji Xie
On Wed, Aug 4, 2021 at 4:32 PM Jason Wang  wrote:
>
>
> 在 2021/8/3 下午5:38, Yongji Xie 写道:
> > On Tue, Aug 3, 2021 at 4:09 PM Jason Wang  wrote:
> >>
> >> 在 2021/7/29 下午3:34, Xie Yongji 写道:
> >>> The device reset may fail in virtio-vdpa case now, so add checks to
> >>> its return value and fail the register_virtio_device().
> >>
> >> So the reset() would be called by the driver during remove as well, or
> >> is it sufficient to deal only with the reset during probe?
> >>
> > Actually there is no way to handle failure during removal. And it
> > should be safe with the protection of software IOTLB even if the
> > reset() fails.
> >
> > Thanks,
> > Yongji
>
>
> If this is true, does it mean we don't even need to care about reset
> failure?
>

But we need to handle the failure in the vhost-vdpa case, isn't it?

Thanks,
Yongji
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v10 05/17] vhost-vdpa: Fail the vhost_vdpa_set_status() on reset failure

2021-08-04 Thread Jason Wang


在 2021/8/3 下午5:50, Yongji Xie 写道:

On Tue, Aug 3, 2021 at 4:10 PM Jason Wang  wrote:


在 2021/7/29 下午3:34, Xie Yongji 写道:

Re-read the device status to ensure it's set to zero during
resetting. Otherwise, fail the vhost_vdpa_set_status() after timeout.

Signed-off-by: Xie Yongji 
---
   drivers/vhost/vdpa.c | 11 ++-
   1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index b07aa161f7ad..dd05c1e1133c 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -157,7 +157,7 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, u8 
__user *statusp)
   struct vdpa_device *vdpa = v->vdpa;
   const struct vdpa_config_ops *ops = vdpa->config;
   u8 status, status_old;
- int nvqs = v->nvqs;
+ int timeout = 0, nvqs = v->nvqs;
   u16 i;

   if (copy_from_user(, statusp, sizeof(status)))
@@ -173,6 +173,15 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, u8 
__user *statusp)
   return -EINVAL;

   ops->set_status(vdpa, status);
+ if (status == 0) {
+ while (ops->get_status(vdpa)) {
+ timeout += 20;
+ if (timeout > VDPA_RESET_TIMEOUT_MS)
+ return -EIO;
+
+ msleep(20);
+ }


Spec has introduced the reset a one of the basic facility. And consider
we differ reset here.

This makes me think if it's better to introduce a dedicated vdpa ops for
reset?


Do you mean replace the ops.set_status(vdev, 0) with the ops.reset()?
Then we can remove the timeout processing which is device specific
stuff.



Exactly.

Thanks




Thanks,
Yongji



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v10 10/17] virtio: Handle device reset failure in register_virtio_device()

2021-08-04 Thread Jason Wang


在 2021/8/3 下午5:38, Yongji Xie 写道:

On Tue, Aug 3, 2021 at 4:09 PM Jason Wang  wrote:


在 2021/7/29 下午3:34, Xie Yongji 写道:

The device reset may fail in virtio-vdpa case now, so add checks to
its return value and fail the register_virtio_device().


So the reset() would be called by the driver during remove as well, or
is it sufficient to deal only with the reset during probe?


Actually there is no way to handle failure during removal. And it
should be safe with the protection of software IOTLB even if the
reset() fails.

Thanks,
Yongji



If this is true, does it mean we don't even need to care about reset 
failure?


Thanks


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v10 04/17] vdpa: Fail the vdpa_reset() if fail to set device status to zero

2021-08-04 Thread Jason Wang


在 2021/8/3 下午5:31, Yongji Xie 写道:

On Tue, Aug 3, 2021 at 3:58 PM Jason Wang  wrote:


在 2021/7/29 下午3:34, Xie Yongji 写道:

Re-read the device status to ensure it's set to zero during
resetting. Otherwise, fail the vdpa_reset() after timeout.

Signed-off-by: Xie Yongji 
---
   include/linux/vdpa.h | 15 ++-
   1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 406d53a606ac..d1a80ef05089 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -6,6 +6,7 @@
   #include 
   #include 
   #include 
+#include 

   /**
* struct vdpa_calllback - vDPA callback definition.
@@ -340,12 +341,24 @@ static inline struct device *vdpa_get_dma_dev(struct 
vdpa_device *vdev)
   return vdev->dma_dev;
   }

-static inline void vdpa_reset(struct vdpa_device *vdev)
+#define VDPA_RESET_TIMEOUT_MS 1000
+
+static inline int vdpa_reset(struct vdpa_device *vdev)
   {
   const struct vdpa_config_ops *ops = vdev->config;
+ int timeout = 0;

   vdev->features_valid = false;
   ops->set_status(vdev, 0);
+ while (ops->get_status(vdev)) {
+ timeout += 20;
+ if (timeout > VDPA_RESET_TIMEOUT_MS)
+ return -EIO;
+
+ msleep(20);
+ }


I wonder if it's better to do this in the vDPA parent?

Thanks


Sorry, I didn't get you here. Do you mean vDPA parent driver (e.g.
VDUSE)?



Yes, since the how it's expected to behave depends on the specific hardware.

Even for the spec, the behavior is transport specific:

PCI: requires reread until 0
MMIO: doesn't require but it might not work for the hardware so we 
decide to change

CCW: the succeed of the ccw command means the success of the reset

Thanks



Actually I didn't find any other place where I can do
set_status() and get_status().

Thanks,
Yongji



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v10 02/17] file: Export receive_fd() to modules

2021-08-04 Thread Jason Wang


在 2021/8/3 下午5:01, Yongji Xie 写道:

On Tue, Aug 3, 2021 at 3:46 PM Jason Wang  wrote:


在 2021/7/29 下午3:34, Xie Yongji 写道:

Export receive_fd() so that some modules can use
it to pass file descriptor between processes without
missing any security stuffs.

Signed-off-by: Xie Yongji 
---
   fs/file.c| 6 ++
   include/linux/file.h | 7 +++
   2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index 86dc9956af32..210e540672aa 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -1134,6 +1134,12 @@ int receive_fd_replace(int new_fd, struct file *file, 
unsigned int o_flags)
   return new_fd;
   }

+int receive_fd(struct file *file, unsigned int o_flags)
+{
+ return __receive_fd(file, NULL, o_flags);


Any reason that receive_fd_user() can live in the file.h?


Since no modules use it.

Thanks,
Yongji



Ok.


Acked-by: Jason Wang 






___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC v1 6/8] mshv: command line option to skip devices in PV-IOMMU

2021-08-04 Thread Praveen Kumar
On 04-08-2021 03:26, Wei Liu wrote:
>>> struct iommu_domain domain;
>>> @@ -774,6 +784,41 @@ static struct iommu_device 
>>> *hv_iommu_probe_device(struct device *dev)
>>> if (!dev_is_pci(dev))
>>> return ERR_PTR(-ENODEV);
>>>  
>>> +   /*
>>> +* Skip the PCI device specified in `pci_devs_to_skip`. This is a
>>> +* temporary solution until we figure out a way to extract information
>>> +* from the hypervisor what devices it is already using.
>>> +*/
>>> +   if (pci_devs_to_skip && *pci_devs_to_skip) {
>>> +   int pos = 0;
>>> +   int parsed;
>>> +   int segment, bus, slot, func;
>>> +   struct pci_dev *pdev = to_pci_dev(dev);
>>> +
>>> +   do {
>>> +   parsed = 0;
>>> +
>>> +   sscanf(pci_devs_to_skip + pos,
>>> +   " (%x:%x:%x.%x) %n",
>>> +   , , , , );
>>> +
>>> +   if (parsed <= 0)
>>> +   break;
>>> +
>>> +   if (pci_domain_nr(pdev->bus) == segment &&
>>> +   pdev->bus->number == bus &&
>>> +   PCI_SLOT(pdev->devfn) == slot &&
>>> +   PCI_FUNC(pdev->devfn) == func)
>>> +   {
>>> +   dev_info(dev, "skipped by MSHV IOMMU\n");
>>> +   return ERR_PTR(-ENODEV);
>>> +   }
>>> +
>>> +   pos += parsed;
>>> +
>>> +   } while (pci_devs_to_skip[pos]);
>>
>> Is there a possibility of pci_devs_to_skip + pos > sizeof(pci_devs_to_skip)
>> and also a valid memory ?
> 
> pos should point to the last parsed position. If parsing fails pos does
> not get updated and the code breaks out of the loop. If parsing is
> success pos should point to either the start of next element of '\0'
> (end of string). To me this is good enough.

The point is, hypothetically the address to pci_devs_to_skip + pos can be valid 
address (later to '\0'), and thus there is a possibility, that parsing may not 
fail.
Another, there is also a possibility of sscanf faulting accessing the illegal 
address, if pci_devs_to_skip[pos] turns out to be not NULL or valid address.

> 
>> I would recommend to have a check of size as well before accessing the
>> array content, just to be safer accessing any memory.
>>
> 
> What check do you have in mind?

Something like,
size_t len = strlen(pci_devs_to_skip);
do {

len -= parsed;
} while (len);

OR

do {
...
pos += parsed;
} while (pos < len);

Further, I'm also fine with the existing code, if you think this won't break 
and already been taken care. Thanks.

Regards,

~Praveen.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC v1 5/8] mshv: add paravirtualized IOMMU support

2021-08-04 Thread Praveen Kumar
On 04-08-2021 03:17, Wei Liu wrote:
>>> +static size_t hv_iommu_unmap(struct iommu_domain *d, unsigned long iova,
>>> +  size_t size, struct iommu_iotlb_gather *gather)
>>> +{
>>> +   size_t unmapped;
>>> +   struct hv_iommu_domain *domain = to_hv_iommu_domain(d);
>>> +   unsigned long flags, npages;
>>> +   struct hv_input_unmap_device_gpa_pages *input;
>>> +   u64 status;
>>> +
>>> +   unmapped = hv_iommu_del_mappings(domain, iova, size);
>>> +   if (unmapped < size)
>>> +   return 0;
>> Is there a case where unmapped > 0 && unmapped < size ?
>>
> There could be such a case -- hv_iommu_del_mappings' return value is >= 0.
> Is there a problem with this predicate?

What I understand, if we are unmapping and return 0, means nothing was 
unmapped, and will that not cause any corruption or illegal access of unmapped 
memory later?
From __iommu_unmap
...
13 while (unmapped < size) {
12 size_t pgsize = iommu_pgsize(domain, iova, size - 
unmapped);
11
10 unmapped_page = ops->unmap(domain, iova, pgsize, 
iotlb_gather);
 9 if (!unmapped_page)
 8 break;   <<< we just break here, 
thinking there is nothing unmapped, but actually hv_iommu_del_mappings has 
removed some pages.
 7
 6 pr_debug("unmapped: iova 0x%lx size 0x%zx\n",
 5 ¦iova, unmapped_page);
 4
 3 iova += unmapped_page;
 2 unmapped += unmapped_page;
 1 }
...

Am I missing something ?

Regards,

~Praveen.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu