Re: [PATCH v3 03/16] iommu: introduce iommu invalidate API function

2018-01-10 Thread Jean-Philippe Brucker
On 28/12/17 19:25, Jacob Pan wrote:
[...]
>>> + * @size:  2^size of 4K pages, 0 for 4k, 9 for 2MB,
>>> etc.  
>>
>> Having only power of two invalidation seems too restrictive for a
>> software interface. You might have the same problem as above, where
>> the guest or userspace needs to send lots of invalidation requests,
>> They could be multiplexed by passing an arbitrary range instead. How
>> about making @size a __u64?
>>
> Sure if you have such need for non power of two. So it will be __u64 of
> 4k pages?

4k granule would work for us right now, but other architectures may plan
to support arbitrary sizes. The map/unmap API does support arbitrary
sizes, so it might be better to have a byte granularity in @size.

Thanks,
Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 03/16] iommu: introduce iommu invalidate API function

2017-12-28 Thread Jacob Pan
On Fri, 24 Nov 2017 12:04:31 +
Jean-Philippe Brucker  wrote:

> Hi,
> 
> On 17/11/17 18:55, Jacob Pan wrote:
> > From: "Liu, Yi L" 
> > 
> > When an SVM capable device is assigned to a guest, the first level
> > page tables are owned by the guest and the guest PASID table
> > pointer is linked to the device context entry of the physical IOMMU.
> > 
> > Host IOMMU driver has no knowledge of caching structure updates
> > unless the guest invalidation activities are passed down to the
> > host. The primary usage is derived from emulated IOMMU in the
> > guest, where QEMU can trap invalidation activities before passing
> > them down to the host/physical IOMMU.
> > Since the invalidation data are obtained from user space and will be
> > written into physical IOMMU, we must allow security check at various
> > layers. Therefore, generic invalidation data format are proposed
> > here, model specific IOMMU drivers need to convert them into their
> > own format.
> > 
> > Signed-off-by: Liu, Yi L 
> > Signed-off-by: Jacob Pan 
> > Signed-off-by: Ashok Raj   
> [...]
> >  #endif /* __LINUX_IOMMU_H */
> > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> > index 651ad5d..039ba36 100644
> > --- a/include/uapi/linux/iommu.h
> > +++ b/include/uapi/linux/iommu.h
> > @@ -36,4 +36,66 @@ struct pasid_table_config {
> > };
> >  };
> >  
> > +enum iommu_inv_granularity {
> > +   IOMMU_INV_GRANU_GLOBAL, /* all TLBs
> > invalidated */
> > +   IOMMU_INV_GRANU_DOMAIN, /* all TLBs
> > associated with a domain */
> > +   IOMMU_INV_GRANU_DEVICE, /* caching
> > structure associated with a
> > +* device ID
> > +*/  
> 
> I thought you were planning on removing these? If we do need global
> invalidation, for example the guest clears the whole PASID table and
> doesn't want to send individual GRANU_ALL_PASID invalidations, maybe
> keep only GRANU_DOMAIN?
> 
yes, we can remove global and keep domain & pasid.
> > +   IOMMU_INV_GRANU_DOMAIN_PAGE,/* address range with
> > a domain */
> > +   IOMMU_INV_GRANU_ALL_PASID,  /* cache of a given
> > PASID */
> > +   IOMMU_INV_GRANU_PASID_SEL,  /* only invalidate
> > specified PASID */  
> 
> GRANU_PASID_SEL seems redundant, don't you already get it by default
> with GRANU_ALL_PASID and GRANU_DOMAIN_PAGE (with
> IOMMU_INVALIDATE_PASID_TAGGED flag)?
> 
yes, you can deduce from certain combinations of flags. My thinking
was for an easy look up from generic flags to model specific
fields. Same as the one below. I will try to consolidate based on your
input in the next version.
> > +
> > +   IOMMU_INV_GRANU_NG_ALL_PASID,   /* non-global within
> > all PASIDs */
> > +   IOMMU_INV_GRANU_NG_PASID,   /* non-global within a
> > PASIDs */  
> 
> Don't you get the "NG" behavior by not passing the
> IOMMU_INVALIDATE_GLOBAL_PAGE flag defined below?
> 
> > +   IOMMU_INV_GRANU_PAGE_PASID, /* page-selective
> > within a PASID */  
> 
> And don't you get this with
> GRANU_DOMAIN_PAGE+IOMMU_INVALIDATE_PASID_TAGGED?
> 
> > +   IOMMU_INV_NR_GRANU,
> > +};
> > +
> > +enum iommu_inv_type {
> > +   IOMMU_INV_TYPE_DTLB,/* device IOTLB */
> > +   IOMMU_INV_TYPE_TLB, /* IOMMU paging structure cache
> > */
> > +   IOMMU_INV_TYPE_PASID,   /* PASID cache */
> > +   IOMMU_INV_TYPE_CONTEXT, /* device context entry
> > cache */
> > +   IOMMU_INV_NR_TYPE
> > +};  
> 
> When the guest removes a PASID entry, it would have to send DTLB, TLB
> and PASID invalidations separately? Could we define this inv_type as
> cumulative, to avoid redundant invalidation requests:
> 
That is a good idea, but it will require some change to VT-d driver.
For emulated IOMMU and current VT-d driver, we do send separate
requests for PASID cache, followed by IOTLB/DTLB invalidation. But we do
have a caching mode capability bit to tell the driver whether it is
running on a real IOMMU or not. So we can combine and reduce
invalidation overhead as you said below. Not sure about AMD though?

> * TYPE_DTLB only invalidates ATC entries.
> * TYPE_TLB invalidates both ATC and IOTLB entries.
> * TYPE_PASID invalidates all ATC and IOTLB entries for a PASID, and
> also the PASID cache entry.
Sounds good to me.

> * TYPE_CONTEXT invalidates all. Although is it needed by userspace or
> just here fore completeness? "CONTEXT" is specific to VT-d (doesn't
> exist on AMD and has a different meaning on SMMU), how about "DEVICE"
> instead?
It is here for completeness. context entry is set during bind/unbind
pasid table call. I can remove it for now.
> 
> This is important because invalidation will probably become the
> bottleneck. The guest shouldn't have to send DTLB and TLB invalidation
> separately after each unmapping.
> 
Agreed, i will change the VT-d driver to accommodate that. i.e. For

Re: [PATCH v3 03/16] iommu: introduce iommu invalidate API function

2017-12-15 Thread Jean-Philippe Brucker
A quick update on invalidations before I leave for holidays, since we're
struggling to define useful semantics. I worked on the virtio-iommu
prototype for vSVA, so I tried to pin down what I think is needed for vSVA
invalidation in the host. I don't know whether the VT-d and AMD emulations
can translate all of this from guest commands.

Scope selects which entries are invalidated, and flags cherry-pick what
caches to invalidate. For example a guest might remove GBs of sparse
mappings, and decide that it would be quicker to invalidate the whole
context instead of one at a time. Then it would set only flags = (TLB |
DEV_TLB) with scope = PASID. If the guest clears one entry in the PASID
table, then it would send scope = PASID and flags = (LEAF | CONFIG | TLB |
DEV_TLB). On an ARM system the guest can invalidate TLBs with CPU
instructions, but can't invalidate ATCs. So it would send an invalidate
with flags = (LEAF | TLB) and scope = VA.

enum iommu_sva_inval_scope {
IOMMU_INVALIDATE_DOMAIN = 1,
IOMMU_INVALIDATE_PASID,
IOMMU_INVALIDATE_VA,
};

/* Only invalidate leaf entry. Applies to PASID table if scope == PASID or
 * page tables if scope == VA. */
#define IOMMU_INVALIDATE_LEAF   (1 << 0)
/* Invalidate cached PASID table configuration */
#define IOMMU_INVALIDATE_CONFIG (1 << 1)
/* Invalidate IOTLBs */
#define IOMMU_INVALIDATE_TLB(1 << 2)
/* Invalidate ATCs */
#define IOMMU_INVALIDATE_DEV_TLB(1 << 3)
/* + Need a global flag? */

struct iommu_sva_invalidate {
enum iommu_sva_inval_scope  scope;
u32 flags;
u32 pasid;
u64 iova;
u64 size;
/* Arch-specific, format is determined at bind time */
union {
struct {
u16 asid;
u8  granule;
} arm;
}
};

ARM needs two more fields. A 16-bit @asid (Address Space ID) targets TLB
entries and may be different from the PASID (up to the guest to decide),
which targets ATC and config entries.

@granule is the TLB granule that we're invalidating. For instance if the
guest just unmapped a few 2M huge pages, it sets @granule to 21 bits, so
we issue less invalidation commands, since we only need to evict huge TLB
entries. I'm not sure about other architecture but I'd be surprised if
this wasn't more common. Should we move it to the common part?


int iommu_sva_invalidate(struct iommu_domain *domain,
 struct iommu_sva_invalidate *inval);

And so the host driver implementation is roughly:
--
bool leaf   = flags & IOMMU_INVALIDATE_LEAF;
bool config = flags & IOMMU_INVALIDATE_CONFIG;
bool tlb= flags & IOMMU_INVALIDATE_TLB;
bool atc= flags & IOMMU_INVALIDATE_DEV_TLB;

if (config) {
switch (scope) {
case IOMMU_INVALIDATE_PASID:
inval_cached_pasid_entry(domain, pasid, leaf);
break;
case IOMMU_INVALIDATE_DOMAIN:
inval_all_cached_pasid_entries(domain);
break;
default:
return -EINVAL;
}

/* Wait for caches to be clean, then invalidate TLBs */
sync_commands();
}

if (tlb) {
switch (scope) {
case IOMMU_INVALIDATE_VA:
inval_tlb_entries(domain, asid, iova, size, granule,
  leaf);
break;
case IOMMU_INVALIDATE_PASID:
inval_all_tlb_entries_for_asid(domain, asid);
break;
case IOMMU_INVALIDATE_DOMAIN:
inval_all_tlb_entries(domain);
break;
default:
return -EINVAL;
}

/* Wait for TLBs to be clean, then invalidate ATCs. */
sync_commands();
}

if (atc) {
/* ATC invalidations are sent to all devices in the domain */
switch (scope) {
case IOMMU_INVALIDATE_VA:
inval_atc_entries(domain, pasid, iova, size);
break;
case IOMMU_INVALIDATE_PASID:
/* Covers the full address space */
inval_all_atc_entries_for_pasid(domain, pasid);
break;
case IOMMU_INVALIDATE_DOMAIN:
/* Set Global Invalidate */
inval_all_atc_entries(domain);
break;
default:
return -EINVAL;
}

sync_commands();
}

/* Then return to guest. */
--

I think this covers what we need and allows userspace or the guest to
gather multiple invalidations into a single request/ioctl.

I don't think per-device ATC invalidation is needed, but might be wrong.
According to ATS it is implicit when the 

Re: [PATCH v3 03/16] iommu: introduce iommu invalidate API function

2017-11-24 Thread Jean-Philippe Brucker
Hi,

On 17/11/17 18:55, Jacob Pan wrote:
> From: "Liu, Yi L" 
> 
> When an SVM capable device is assigned to a guest, the first level page
> tables are owned by the guest and the guest PASID table pointer is
> linked to the device context entry of the physical IOMMU.
> 
> Host IOMMU driver has no knowledge of caching structure updates unless
> the guest invalidation activities are passed down to the host. The
> primary usage is derived from emulated IOMMU in the guest, where QEMU
> can trap invalidation activities before passing them down to the
> host/physical IOMMU.
> Since the invalidation data are obtained from user space and will be
> written into physical IOMMU, we must allow security check at various
> layers. Therefore, generic invalidation data format are proposed here,
> model specific IOMMU drivers need to convert them into their own format.
> 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Ashok Raj 
[...]
>  #endif /* __LINUX_IOMMU_H */
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index 651ad5d..039ba36 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -36,4 +36,66 @@ struct pasid_table_config {
>   };
>  };
>  
> +enum iommu_inv_granularity {
> + IOMMU_INV_GRANU_GLOBAL, /* all TLBs invalidated */
> + IOMMU_INV_GRANU_DOMAIN, /* all TLBs associated with a domain */
> + IOMMU_INV_GRANU_DEVICE, /* caching structure associated with a
> +  * device ID
> +  */

I thought you were planning on removing these? If we do need global
invalidation, for example the guest clears the whole PASID table and
doesn't want to send individual GRANU_ALL_PASID invalidations, maybe keep
only GRANU_DOMAIN?

> + IOMMU_INV_GRANU_DOMAIN_PAGE,/* address range with a domain */
> + IOMMU_INV_GRANU_ALL_PASID,  /* cache of a given PASID */
> + IOMMU_INV_GRANU_PASID_SEL,  /* only invalidate specified PASID */

GRANU_PASID_SEL seems redundant, don't you already get it by default with
GRANU_ALL_PASID and GRANU_DOMAIN_PAGE (with IOMMU_INVALIDATE_PASID_TAGGED
flag)?

> +
> + IOMMU_INV_GRANU_NG_ALL_PASID,   /* non-global within all PASIDs */
> + IOMMU_INV_GRANU_NG_PASID,   /* non-global within a PASIDs */

Don't you get the "NG" behavior by not passing the
IOMMU_INVALIDATE_GLOBAL_PAGE flag defined below?

> + IOMMU_INV_GRANU_PAGE_PASID, /* page-selective within a PASID */

And don't you get this with GRANU_DOMAIN_PAGE+IOMMU_INVALIDATE_PASID_TAGGED?

> + IOMMU_INV_NR_GRANU,
> +};
> +
> +enum iommu_inv_type {
> + IOMMU_INV_TYPE_DTLB,/* device IOTLB */
> + IOMMU_INV_TYPE_TLB, /* IOMMU paging structure cache */
> + IOMMU_INV_TYPE_PASID,   /* PASID cache */
> + IOMMU_INV_TYPE_CONTEXT, /* device context entry cache */
> + IOMMU_INV_NR_TYPE
> +};

When the guest removes a PASID entry, it would have to send DTLB, TLB and
PASID invalidations separately? Could we define this inv_type as
cumulative, to avoid redundant invalidation requests:

* TYPE_DTLB only invalidates ATC entries.
* TYPE_TLB invalidates both ATC and IOTLB entries.
* TYPE_PASID invalidates all ATC and IOTLB entries for a PASID, and also
the PASID cache entry.
* TYPE_CONTEXT invalidates all. Although is it needed by userspace or just
here fore completeness? "CONTEXT" is specific to VT-d (doesn't exist on
AMD and has a different meaning on SMMU), how about "DEVICE" instead?

This is important because invalidation will probably become the
bottleneck. The guest shouldn't have to send DTLB and TLB invalidation
separately after each unmapping.

> +/**
> + * Translation cache invalidation header that contains mandatory meta data.
> + * @version: info format version, expecting future extesions
> + * @type:type of translation cache to be invalidated
> + */
> +struct tlb_invalidate_hdr {
> + __u32 version;
> +#define TLB_INV_HDR_VERSION_1 1
> + enum iommu_inv_type type;
> +};
> +
> +/**
> + * Translation cache invalidation information, contains generic IOMMU
> + * data which can be parsed based on model ID by model specific drivers.
> + *
> + * @granularity: requested invalidation granularity, type dependent
> + * @size:2^size of 4K pages, 0 for 4k, 9 for 2MB, etc.

Having only power of two invalidation seems too restrictive for a software
interface. You might have the same problem as above, where the guest or
userspace needs to send lots of invalidation requests, They could be
multiplexed by passing an arbitrary range instead. How about making @size
a __u64?

> + * @pasid:   processor address space ID value per PCI spec.
> + * @addr:page address to be invalidated
> + * @flagsIOMMU_INVALIDATE_PASID_TAGGED: DMA with PASID tagged,
> + *  

[PATCH v3 03/16] iommu: introduce iommu invalidate API function

2017-11-17 Thread Jacob Pan
From: "Liu, Yi L" 

When an SVM capable device is assigned to a guest, the first level page
tables are owned by the guest and the guest PASID table pointer is
linked to the device context entry of the physical IOMMU.

Host IOMMU driver has no knowledge of caching structure updates unless
the guest invalidation activities are passed down to the host. The
primary usage is derived from emulated IOMMU in the guest, where QEMU
can trap invalidation activities before passing them down to the
host/physical IOMMU.
Since the invalidation data are obtained from user space and will be
written into physical IOMMU, we must allow security check at various
layers. Therefore, generic invalidation data format are proposed here,
model specific IOMMU drivers need to convert them into their own format.

Signed-off-by: Liu, Yi L 
Signed-off-by: Jacob Pan 
Signed-off-by: Ashok Raj 
---
 drivers/iommu/iommu.c  | 14 +++
 include/linux/iommu.h  | 12 +
 include/uapi/linux/iommu.h | 62 ++
 3 files changed, 88 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index c7e0d64..829e9e9 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1341,6 +1341,20 @@ void iommu_unbind_pasid_table(struct iommu_domain 
*domain, struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_unbind_pasid_table);
 
+int iommu_sva_invalidate(struct iommu_domain *domain,
+   struct device *dev, struct tlb_invalidate_info *inv_info)
+{
+   int ret = 0;
+
+   if (unlikely(!domain->ops->sva_invalidate))
+   return -ENODEV;
+
+   ret = domain->ops->sva_invalidate(domain, dev, inv_info);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_invalidate);
+
 static void __iommu_detach_device(struct iommu_domain *domain,
  struct device *dev)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 0f6f6c5..da684a7 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -190,6 +190,7 @@ struct iommu_resv_region {
  * @pgsize_bitmap: bitmap of all possible supported page sizes
  * @bind_pasid_table: bind pasid table pointer for guest SVM
  * @unbind_pasid_table: unbind pasid table pointer and restore defaults
+ * @sva_invalidate: invalidate translation caches of shared virtual address
  */
 struct iommu_ops {
bool (*capable)(enum iommu_cap);
@@ -243,6 +244,8 @@ struct iommu_ops {
struct pasid_table_config *pasidt_binfo);
void (*unbind_pasid_table)(struct iommu_domain *domain,
struct device *dev);
+   int (*sva_invalidate)(struct iommu_domain *domain,
+   struct device *dev, struct tlb_invalidate_info *inv_info);
 
unsigned long pgsize_bitmap;
 };
@@ -309,6 +312,9 @@ extern int iommu_bind_pasid_table(struct iommu_domain 
*domain,
struct device *dev, struct pasid_table_config *pasidt_binfo);
 extern void iommu_unbind_pasid_table(struct iommu_domain *domain,
struct device *dev);
+extern int iommu_sva_invalidate(struct iommu_domain *domain,
+   struct device *dev, struct tlb_invalidate_info *inv_info);
+
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 phys_addr_t paddr, size_t size, int prot);
@@ -720,6 +726,12 @@ void iommu_unbind_pasid_table(struct iommu_domain *domain, 
struct device *dev)
 {
 }
 
+static inline int iommu_sva_invalidate(struct iommu_domain *domain,
+   struct device *dev, struct tlb_invalidate_info *inv_info)
+{
+   return -EINVAL;
+}
+
 #endif /* CONFIG_IOMMU_API */
 
 #endif /* __LINUX_IOMMU_H */
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 651ad5d..039ba36 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -36,4 +36,66 @@ struct pasid_table_config {
};
 };
 
+enum iommu_inv_granularity {
+   IOMMU_INV_GRANU_GLOBAL, /* all TLBs invalidated */
+   IOMMU_INV_GRANU_DOMAIN, /* all TLBs associated with a domain */
+   IOMMU_INV_GRANU_DEVICE, /* caching structure associated with a
+* device ID
+*/
+   IOMMU_INV_GRANU_DOMAIN_PAGE,/* address range with a domain */
+   IOMMU_INV_GRANU_ALL_PASID,  /* cache of a given PASID */
+   IOMMU_INV_GRANU_PASID_SEL,  /* only invalidate specified PASID */
+
+   IOMMU_INV_GRANU_NG_ALL_PASID,   /* non-global within all PASIDs */
+   IOMMU_INV_GRANU_NG_PASID,   /* non-global within a PASIDs */
+   IOMMU_INV_GRANU_PAGE_PASID, /* page-selective within a PASID */
+   IOMMU_INV_NR_GRANU,
+};
+
+enum iommu_inv_type {
+