RE: [RFC v3 14/25] intel_iommu: add virtual command capability support

2020-02-15 Thread Liu, Yi L
> From: Peter Xu < pet...@redhat.com >
> Sent: Thursday, February 13, 2020 11:09 PM
> To: Liu, Yi L 
> Subject: Re: [RFC v3 14/25] intel_iommu: add virtual command capability 
> support
> 
> On Thu, Feb 13, 2020 at 09:31:10AM -0500, Peter Xu wrote:
> 
> [...]
> 
> > > > Apart of this: also I just noticed (when reading the latter part of
> > > > the series) that the time that a pasid table walk can consume will
> > > > depend on this value too.  I'd suggest to make this as small as we
> > > > can, as long as it satisfies the usage.  We can even bump it in the
> > > > future.
> > >
> > > I see. This looks to be an optimization. right? Instead of modify the
> > > value of this macro,  I think we can do this optimization by tracking
> > > the allocated PASIDs in QEMU. Thus, the pasid table walk  would be more
> > > efficient and also no dependency on the VTD_MAX_HPASID. Does it make
> > > sense to you? :-)
> >
> > Yeah sounds good. :)
> 
> Just to make sure it's safe even for when the global allocation is not
> happening (full emulation devices?  Do they need the PASID table walk
> too?). 

I'd say no. For full emulation devices, just needs to ensure the pasid cache
is latest (do what guest told). Even the invalidation flushes too much cache,
it just affects the performance but no correctness issue.  This is different
with passthru devices, if unbind too much, it means some passthru devices
may encounter DMA  fault later.

> Anyway, be careful to not miss some valid PASID entries, or we
> can still use the MIN(PASID_MAX, CONTEXT_ENTRY_SIZE) to be safe as a
> first version.  Thanks,

Agreed. First version to ensure 100% safe.

Regards,
Yi Liu


Re: [RFC v3 14/25] intel_iommu: add virtual command capability support

2020-02-13 Thread Peter Xu
On Thu, Feb 13, 2020 at 09:31:10AM -0500, Peter Xu wrote:

[...]

> > > Apart of this: also I just noticed (when reading the latter part of
> > > the series) that the time that a pasid table walk can consume will
> > > depend on this value too.  I'd suggest to make this as small as we
> > > can, as long as it satisfies the usage.  We can even bump it in the
> > > future.
> > 
> > I see. This looks to be an optimization. right? Instead of modify the
> > value of this macro,  I think we can do this optimization by tracking
> > the allocated PASIDs in QEMU. Thus, the pasid table walk  would be more
> > efficient and also no dependency on the VTD_MAX_HPASID. Does it make
> > sense to you? :-)
> 
> Yeah sounds good. :)

Just to make sure it's safe even for when the global allocation is not
happening (full emulation devices?  Do they need the PASID table walk
too?).  Anyway, be careful to not miss some valid PASID entries, or we
can still use the MIN(PASID_MAX, CONTEXT_ENTRY_SIZE) to be safe as a
first version.  Thanks,

-- 
Peter Xu




Re: [RFC v3 14/25] intel_iommu: add virtual command capability support

2020-02-13 Thread Peter Xu
On Thu, Feb 13, 2020 at 02:40:45AM +, Liu, Yi L wrote:
> > From: Peter Xu 
> > Sent: Wednesday, February 12, 2020 5:57 AM
> > To: Liu, Yi L 
> > Subject: Re: [RFC v3 14/25] intel_iommu: add virtual command capability 
> > support
> > 
> > On Wed, Jan 29, 2020 at 04:16:45AM -0800, Liu, Yi L wrote:
> > > +/*
> > > + * The basic idea is to let hypervisor to set a range for available
> > > + * PASIDs for VMs. One of the reasons is PASID #0 is reserved by
> > > + * RID_PASID usage. We have no idea how many reserved PASIDs in future,
> > > + * so here just an evaluated value. Honestly, set it as "1" is enough
> > > + * at current stage.
> > > + */
> > > +#define VTD_MIN_HPASID  1
> > > +#define VTD_MAX_HPASID  0xF
> > 
> > One more question: I see that PASID is defined as 20bits long.  It's
> > fine.  However I start to get confused on how the Scalable Mode PASID
> > Directory could service that much of PASID entries.
> > 
> > I'm looking at spec 3.4.3, Figure 3-8.
> > 
> > Firstly, we only have two levels for a PASID table.  The context entry
> > of a device stores a pointer to the "Scalable Mode PASID Directory"
> > page. I see that there're 2^14 entries in "Scalable Mode PASID
> > Directory" page, each is a "Scalable Mode PASID Table".
> > However... how do we fit in the 4K page if each entry is a pointer of
> > x86_64 (8 bytes) while there're 2^14 entries?  A simple math gives me
> > 4K/8 = 512, which means the "Scalable Mode PASID Directory" page can
> > only have 512 entries, then how the 2^14 come from?  Hmm??
> 
> I checked with Kevin. The spec doesn't say the dir table is 4K. It says 4K
> only for pasid table. Also, if you look at 9.4, scalabe-mode context entry
> includes a PDTS field to specify the actual size of the directory table.

Ah I see.  Then it seems to be lost then in this series.  Say, I think
vtd_sm_pasid_table_walk() should also stop walking until reaching the
size there, and you need to fetch that size info from the context
entry before walk starts.

> 
> > Apart of this: also I just noticed (when reading the latter part of
> > the series) that the time that a pasid table walk can consume will
> > depend on this value too.  I'd suggest to make this as small as we
> > can, as long as it satisfies the usage.  We can even bump it in the
> > future.
> 
> I see. This looks to be an optimization. right? Instead of modify the
> value of this macro,  I think we can do this optimization by tracking
> the allocated PASIDs in QEMU. Thus, the pasid table walk  would be more
> efficient and also no dependency on the VTD_MAX_HPASID. Does it make
> sense to you? :-)

Yeah sounds good. :)

Thanks,

-- 
Peter Xu




RE: [RFC v3 14/25] intel_iommu: add virtual command capability support

2020-02-12 Thread Liu, Yi L
> From: Peter Xu 
> Sent: Wednesday, February 12, 2020 5:57 AM
> To: Liu, Yi L 
> Subject: Re: [RFC v3 14/25] intel_iommu: add virtual command capability 
> support
> 
> On Wed, Jan 29, 2020 at 04:16:45AM -0800, Liu, Yi L wrote:
> > +/*
> > + * The basic idea is to let hypervisor to set a range for available
> > + * PASIDs for VMs. One of the reasons is PASID #0 is reserved by
> > + * RID_PASID usage. We have no idea how many reserved PASIDs in future,
> > + * so here just an evaluated value. Honestly, set it as "1" is enough
> > + * at current stage.
> > + */
> > +#define VTD_MIN_HPASID  1
> > +#define VTD_MAX_HPASID  0xF
> 
> One more question: I see that PASID is defined as 20bits long.  It's
> fine.  However I start to get confused on how the Scalable Mode PASID
> Directory could service that much of PASID entries.
> 
> I'm looking at spec 3.4.3, Figure 3-8.
> 
> Firstly, we only have two levels for a PASID table.  The context entry
> of a device stores a pointer to the "Scalable Mode PASID Directory"
> page. I see that there're 2^14 entries in "Scalable Mode PASID
> Directory" page, each is a "Scalable Mode PASID Table".
> However... how do we fit in the 4K page if each entry is a pointer of
> x86_64 (8 bytes) while there're 2^14 entries?  A simple math gives me
> 4K/8 = 512, which means the "Scalable Mode PASID Directory" page can
> only have 512 entries, then how the 2^14 come from?  Hmm??

I checked with Kevin. The spec doesn't say the dir table is 4K. It says 4K
only for pasid table. Also, if you look at 9.4, scalabe-mode context entry
includes a PDTS field to specify the actual size of the directory table.

> Apart of this: also I just noticed (when reading the latter part of
> the series) that the time that a pasid table walk can consume will
> depend on this value too.  I'd suggest to make this as small as we
> can, as long as it satisfies the usage.  We can even bump it in the
> future.

I see. This looks to be an optimization. right? Instead of modify the
value of this macro,  I think we can do this optimization by tracking
the allocated PASIDs in QEMU. Thus, the pasid table walk  would be more
efficient and also no dependency on the VTD_MAX_HPASID. Does it make
sense to you? :-)

Regards,
Yi Liu


RE: [RFC v3 14/25] intel_iommu: add virtual command capability support

2020-02-11 Thread Liu, Yi L
> From: Peter Xu 
> Sent: Wednesday, February 12, 2020 4:16 AM
> To: Liu, Yi L 
> Subject: Re: [RFC v3 14/25] intel_iommu: add virtual command capability 
> support
> 
> On Wed, Jan 29, 2020 at 04:16:45AM -0800, Liu, Yi L wrote:
> > From: Liu Yi L 
> >
> > This patch adds virtual command support to Intel vIOMMU per Intel VT-d
> > 3.1 spec. And adds two virtual commands: allocate pasid and free
> > pasid.
> >
> > Cc: Kevin Tian 
> > Cc: Jacob Pan 
> > Cc: Peter Xu 
> > Cc: Yi Sun 
> > Cc: Paolo Bonzini 
> > Cc: Richard Henderson 
> > Cc: Eduardo Habkost 
> > Signed-off-by: Liu Yi L 
> > Signed-off-by: Yi Sun 
> > ---
> >  hw/i386/intel_iommu.c  | 163
> -
> >  hw/i386/intel_iommu_internal.h |  38 ++
> >  hw/i386/trace-events   |   1 +
> >  include/hw/i386/intel_iommu.h  |   6 +-
> >  4 files changed, 206 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index
> > 33be40c..43a728f 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -2649,6 +2649,142 @@ static void
> vtd_handle_iectl_write(IntelIOMMUState *s)
> >  }
> >  }
> >
> > +static int vtd_request_pasid_alloc(IntelIOMMUState *s, uint32_t
> > +*pasid) {
> > +VTDBus *vtd_bus;
> > +int bus_n, devfn, ret = -errno;
> > +VTDIOMMUContext *vtd_icx;
> > +
> > +for (bus_n = 0; bus_n < PCI_BUS_MAX; bus_n++) {
> > +vtd_bus = vtd_find_as_from_bus_num(s, bus_n);
> > +if (!vtd_bus) {
> > +continue;
> > +}
> > +for (devfn = 0; devfn < PCI_DEVFN_MAX; devfn++) {
> > +vtd_icx = vtd_bus->dev_icx[devfn];
> > +if (!vtd_icx) {
> > +continue;
> > +}
> > +
> > +/*
> > + * We'll return the first valid result we got. It's
> > + * a bit hackish in that we don't have a good global
> > + * interface yet to talk to modules like vfio to deliver
> > + * this allocation request, so we're leveraging this
> > + * per-device iommu object to do the same thing just
> > + * to make sure the allocation happens only once.
> > + */
> > +ret = ds_iommu_pasid_alloc(vtd_icx->dsi_obj,
> > + VTD_MIN_HPASID, VTD_MAX_HPASID, pasid);
> 
> Your indents are always strange to me for long funcalls...  Not a complaint 
> though,
> as long as no one else complains. :)

yeah, I'm also not feeling well with them... I'll try to make the indents  for 
long
funccalls better. 

> 
> > +if (!ret) {
> > +break;
> > +}
> > +}
> > +}
> > +return ret;
> > +}
> > +
> > +static int vtd_request_pasid_free(IntelIOMMUState *s, uint32_t pasid)
> > +{
> > +VTDBus *vtd_bus;
> > +int bus_n, devfn, ret = -errno;
> > +VTDIOMMUContext *vtd_icx;
> > +
> > +for (bus_n = 0; bus_n < PCI_BUS_MAX; bus_n++) {
> > +vtd_bus = vtd_find_as_from_bus_num(s, bus_n);
> > +if (!vtd_bus) {
> > +continue;
> > +}
> > +for (devfn = 0; devfn < PCI_DEVFN_MAX; devfn++) {
> > +vtd_icx = vtd_bus->dev_icx[devfn];
> > +if (!vtd_icx) {
> > +continue;
> > +}
> > +/*
> > + * Similar with pasid allocation. We'll free the pasid
> > + * on the first successful free operation. It's a bit
> > + * hackish in that we don't have a good global interface
> > + * yet to talk to modules like vfio to deliver this pasid
> > + * free request, so we're leveraging this per-device iommu
> > + * object to do the same thing just to make sure the
> > + * free happens only once.
> > + */
> > +ret = ds_iommu_pasid_free(vtd_icx->dsi_obj, pasid);
> > +if (!ret) {
> > +break;
> > +}
> > +}
> > +}
> > +return ret;
> > +}
> > +
> > +/*
> > + * If IP is not set, set it and return 0
> > + * If IP is already set, return -1
> 
> Out of date?  Instead can mention that this also resets the reply status code 
> to
> zero implicitly so by default it will return a success.

Ooops, yeah, it's out of date

Re: [RFC v3 14/25] intel_iommu: add virtual command capability support

2020-02-11 Thread Peter Xu
On Wed, Jan 29, 2020 at 04:16:45AM -0800, Liu, Yi L wrote:
> +/*
> + * The basic idea is to let hypervisor to set a range for available
> + * PASIDs for VMs. One of the reasons is PASID #0 is reserved by
> + * RID_PASID usage. We have no idea how many reserved PASIDs in future,
> + * so here just an evaluated value. Honestly, set it as "1" is enough
> + * at current stage.
> + */
> +#define VTD_MIN_HPASID  1
> +#define VTD_MAX_HPASID  0xF

One more question: I see that PASID is defined as 20bits long.  It's
fine.  However I start to get confused on how the Scalable Mode PASID
Directory could service that much of PASID entries.

I'm looking at spec 3.4.3, Figure 3-8.

Firstly, we only have two levels for a PASID table.  The context entry
of a device stores a pointer to the "Scalable Mode PASID Directory"
page. I see that there're 2^14 entries in "Scalable Mode PASID
Directory" page, each is a "Scalable Mode PASID Table".
However... how do we fit in the 4K page if each entry is a pointer of
x86_64 (8 bytes) while there're 2^14 entries?  A simple math gives me
4K/8 = 512, which means the "Scalable Mode PASID Directory" page can
only have 512 entries, then how the 2^14 come from?  Hmm??

Apart of this: also I just noticed (when reading the latter part of
the series) that the time that a pasid table walk can consume will
depend on this value too.  I'd suggest to make this as small as we
can, as long as it satisfies the usage.  We can even bump it in the
future.

-- 
Peter Xu




Re: [RFC v3 14/25] intel_iommu: add virtual command capability support

2020-02-11 Thread Peter Xu
On Wed, Jan 29, 2020 at 04:16:45AM -0800, Liu, Yi L wrote:
> From: Liu Yi L 
> 
> This patch adds virtual command support to Intel vIOMMU per
> Intel VT-d 3.1 spec. And adds two virtual commands: allocate
> pasid and free pasid.
> 
> Cc: Kevin Tian 
> Cc: Jacob Pan 
> Cc: Peter Xu 
> Cc: Yi Sun 
> Cc: Paolo Bonzini 
> Cc: Richard Henderson 
> Cc: Eduardo Habkost 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Yi Sun 
> ---
>  hw/i386/intel_iommu.c  | 163 
> -
>  hw/i386/intel_iommu_internal.h |  38 ++
>  hw/i386/trace-events   |   1 +
>  include/hw/i386/intel_iommu.h  |   6 +-
>  4 files changed, 206 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 33be40c..43a728f 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -2649,6 +2649,142 @@ static void vtd_handle_iectl_write(IntelIOMMUState *s)
>  }
>  }
>  
> +static int vtd_request_pasid_alloc(IntelIOMMUState *s, uint32_t *pasid)
> +{
> +VTDBus *vtd_bus;
> +int bus_n, devfn, ret = -errno;
> +VTDIOMMUContext *vtd_icx;
> +
> +for (bus_n = 0; bus_n < PCI_BUS_MAX; bus_n++) {
> +vtd_bus = vtd_find_as_from_bus_num(s, bus_n);
> +if (!vtd_bus) {
> +continue;
> +}
> +for (devfn = 0; devfn < PCI_DEVFN_MAX; devfn++) {
> +vtd_icx = vtd_bus->dev_icx[devfn];
> +if (!vtd_icx) {
> +continue;
> +}
> +
> +/*
> + * We'll return the first valid result we got. It's
> + * a bit hackish in that we don't have a good global
> + * interface yet to talk to modules like vfio to deliver
> + * this allocation request, so we're leveraging this
> + * per-device iommu object to do the same thing just
> + * to make sure the allocation happens only once.
> + */
> +ret = ds_iommu_pasid_alloc(vtd_icx->dsi_obj,
> + VTD_MIN_HPASID, VTD_MAX_HPASID, pasid);

Your indents are always strange to me for long funcalls...  Not a
complaint though, as long as no one else complains. :)

> +if (!ret) {
> +break;
> +}
> +}
> +}
> +return ret;
> +}
> +
> +static int vtd_request_pasid_free(IntelIOMMUState *s, uint32_t pasid)
> +{
> +VTDBus *vtd_bus;
> +int bus_n, devfn, ret = -errno;
> +VTDIOMMUContext *vtd_icx;
> +
> +for (bus_n = 0; bus_n < PCI_BUS_MAX; bus_n++) {
> +vtd_bus = vtd_find_as_from_bus_num(s, bus_n);
> +if (!vtd_bus) {
> +continue;
> +}
> +for (devfn = 0; devfn < PCI_DEVFN_MAX; devfn++) {
> +vtd_icx = vtd_bus->dev_icx[devfn];
> +if (!vtd_icx) {
> +continue;
> +}
> +/*
> + * Similar with pasid allocation. We'll free the pasid
> + * on the first successful free operation. It's a bit
> + * hackish in that we don't have a good global interface
> + * yet to talk to modules like vfio to deliver this pasid
> + * free request, so we're leveraging this per-device iommu
> + * object to do the same thing just to make sure the
> + * free happens only once.
> + */
> +ret = ds_iommu_pasid_free(vtd_icx->dsi_obj, pasid);
> +if (!ret) {
> +break;
> +}
> +}
> +}
> +return ret;
> +}
> +
> +/*
> + * If IP is not set, set it and return 0
> + * If IP is already set, return -1

Out of date?  Instead can mention that this also resets the reply
status code to zero implicitly so by default it will return a success.

Other than that:

Reviewed-by: Peter Xu 

> + */
> +static void vtd_vcmd_set_ip(IntelIOMMUState *s)
> +{
> +s->vcrsp = 1;
> +vtd_set_quad_raw(s, DMAR_VCRSP_REG,
> + ((uint64_t) s->vcrsp));
> +}
> +
> +static void vtd_vcmd_clear_ip(IntelIOMMUState *s)
> +{
> +s->vcrsp &= (~((uint64_t)(0x1)));
> +vtd_set_quad_raw(s, DMAR_VCRSP_REG,
> + ((uint64_t) s->vcrsp));
> +}
> +
> +/* Handle write to Virtual Command Register */
> +static int vtd_handle_vcmd_write(IntelIOMMUState *s, uint64_t val)
> +{
> +uint32_t pasid;
> +int ret = -1;
> +
> +trace_vtd_reg_write_vcmd(s->vcrsp, val);
> +
> +if (!(s->vccap & VTD_VCCAP_PAS) ||
> + (s->vcrsp & 1)) {
> +return -1;
> +}
> +
> +/*
> + * Since vCPU should be blocked when the guest VMCD
> + * write was trapped to here. Should be no other vCPUs
> + * try to access VCMD if guest software is well written.
> + * However, we still emulate the IP bit here in case of
> + * bad guest software. Also align with the spec.
> + */
> +vtd_vcmd_set_ip(s);
> +
> +switch (val & VTD_VCMD_CMD_MASK) {
> +case VTD_VCMD_ALLOC_PASID:
> +ret =