Re: [virtio-dev] [RFC] virtio-iommu version 0.4

2017-10-09 Thread Jean-Philippe Brucker
Hi Eric,

On 03/10/17 14:04, Auger Eric wrote:
> When rebasing the v0.4 driver on master I observe a regression: commands
> are not received properly by QEMU (typically an attach command is
> received with a type of 0). After a bisection of the guest kernel the
> first commit the problem appears is:
> 
> commit e3067861ba6650a566a6273738c23c956ad55c02
> arm64: add basic VMAP_STACK support

Indeed, thanks for the report. I can reproduce the problem with 4.14 and
kvmtool. We should allocate buffers on the heap for any request (see for
example 9472fe704 or e37e2ff35). I rebased onto 4.14 and pushed the
following patch at: git://linux-arm.org/linux-jpb.git virtio-iommu/v0.4.1

It's not very nice, but should fix the issue. I didn't manage to measure
the difference though, my benchmark isn't precise enough. So I don't
know if we should try to use a kmem cache instead of kmalloc.

Thanks,
Jean

--- 8< ---
>From 3fc957560e1e6f070a0468bf75ebc4862d37ff82 Mon Sep 17 00:00:00 2001
From: Jean-Philippe Brucker 
Date: Mon, 9 Oct 2017 20:13:57 +0100
Subject: [PATCH] iommu/virtio-iommu: Allocate all requests on the heap

When CONFIG_VMAP_STACK is enabled, DMA on the stack isn't allowed. Instead
of allocating virtio-iommu requests on the stack, use kmalloc.

Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c | 86 +---
 1 file changed, 58 insertions(+), 28 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index ec1cfaba5997..2d8fd5e99fa7 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -473,13 +473,10 @@ static int viommu_attach_dev(struct iommu_domain *domain, 
struct device *dev)
 {
int i;
int ret = 0;
+   struct virtio_iommu_req_attach *req;
struct iommu_fwspec *fwspec = dev->iommu_fwspec;
struct viommu_endpoint *vdev = fwspec->iommu_priv;
struct viommu_domain *vdomain = to_viommu_domain(domain);
-   struct virtio_iommu_req_attach req = {
-   .head.type  = VIRTIO_IOMMU_T_ATTACH,
-   .address_space  = cpu_to_le32(vdomain->id),
-   };

mutex_lock(&vdomain->mutex);
if (!vdomain->viommu) {
@@ -531,14 +528,25 @@ static int viommu_attach_dev(struct iommu_domain *domain, 
struct device *dev)

dev_dbg(dev, "attach to domain %llu\n", vdomain->id);

+   req = kzalloc(sizeof(*req), GFP_KERNEL);
+   if (!req)
+   return -ENOMEM;
+
+   *req = (struct virtio_iommu_req_attach) {
+   .head.type  = VIRTIO_IOMMU_T_ATTACH,
+   .address_space  = cpu_to_le32(vdomain->id),
+   };
+
for (i = 0; i < fwspec->num_ids; i++) {
-   req.device = cpu_to_le32(fwspec->ids[i]);
+   req->device = cpu_to_le32(fwspec->ids[i]);

-   ret = viommu_send_req_sync(vdomain->viommu, &req);
+   ret = viommu_send_req_sync(vdomain->viommu, req);
if (ret)
break;
}

+   kfree(req);
+
vdomain->attached++;
vdev->vdomain = vdomain;

@@ -550,13 +558,7 @@ static int viommu_map(struct iommu_domain *domain, 
unsigned long iova,
 {
int ret;
struct viommu_domain *vdomain = to_viommu_domain(domain);
-   struct virtio_iommu_req_map req = {
-   .head.type  = VIRTIO_IOMMU_T_MAP,
-   .address_space  = cpu_to_le32(vdomain->id),
-   .virt_addr  = cpu_to_le64(iova),
-   .phys_addr  = cpu_to_le64(paddr),
-   .size   = cpu_to_le64(size),
-   };
+   struct virtio_iommu_req_map *req;

pr_debug("map %llu 0x%lx -> 0x%llx (%zu)\n", vdomain->id, iova,
 paddr, size);
@@ -564,17 +566,30 @@ static int viommu_map(struct iommu_domain *domain, 
unsigned long iova,
if (!vdomain->attached)
return -ENODEV;

-   if (prot & IOMMU_READ)
-   req.flags |= cpu_to_le32(VIRTIO_IOMMU_MAP_F_READ);
-
-   if (prot & IOMMU_WRITE)
-   req.flags |= cpu_to_le32(VIRTIO_IOMMU_MAP_F_WRITE);
-
ret = viommu_tlb_map(vdomain, iova, paddr, size);
if (ret)
return ret;

-   ret = viommu_send_req_sync(vdomain->viommu, &req);
+   req = kzalloc(sizeof(*req), GFP_KERNEL);
+   if (!req)
+   return -ENOMEM;
+
+   *req = (struct virtio_iommu_req_map) {
+   .head.type  = VIRTIO_IOMMU_T_MAP,
+   .address_space  = cpu_to_le32(vdomain->id),
+   .virt_addr  = cpu_to_le64(iova),
+   .phys_addr  = cpu_to_le64(paddr),
+   .size   = cpu_to_le64(size),
+   };
+
+   if (prot & IOMMU_READ)
+   req->flags |= cpu_to_le32(VIRTIO_IOMMU_MAP_F_READ);
+
+   if (prot & IOMMU_WRITE)
+   req->flags |= cpu_to_le32(VIRTIO_IOMMU_MAP_F_WRITE);
+
+   ret = viommu_send_req_sync(vdomain->viommu, req

Re: [virtio-dev] [RFC] virtio-iommu version 0.4

2017-10-03 Thread Auger Eric
Hi Jean,

On 04/08/2017 20:19, Jean-Philippe Brucker wrote:
> This is the continuation of my proposal for virtio-iommu, the para-
> virtualized IOMMU. Here is a summary of the changes since last time [1]:
> 
> * The virtio-iommu document now resembles an actual specification. It is
>   split into a formal description of the virtio device, and implementation
>   notes. Please find sources and binaries at [2].
> 
> * Added a probe request to describe to the guest different properties that
>   do not fit in firmware or in the virtio config space. This is a
>   necessary stepping stone for extending the virtio-iommu.
> 
> * There is a working Qemu prototype [3], thanks to Eric Auger and Bharat
>   Bhushan.
> 
> You can find the Linux driver and kvmtool device at [4] and [5]. I
> plan to rework driver and kvmtool device slightly before sending the
> patches.
> 
> To understand the virtio-iommu, I advise to first read introduction and
> motivation, then skim through implementation notes and finally look at the
> device specification.
> 
> I wasn't sure how to organize the review. For those who prefer to comment
> inline, I attached v0.4 of device-operations.tex and topology.tex+MSI.tex
> to this thread. They are the biggest chunks of the document. But LaTeX
> isn't very pleasant to read, so you can simply send a list of comments in
> relation to section numbers and a few words of context, we'll manage.
> 
> ---
> Version numbers 0.1-0.4 are arbitrary. I'm hoping they allow to compare
> more easily differences since the RFC (see [6]), but haven't been made
> public so far. This is the first public posting since initial proposal
> [1], and the following describes all changes.
> 
> ## v0.1 ##
> 
> Content is the same as the RFC, but formatted to LaTeX. 'make' generates
> one PDF and one HTML document.
> 
> ## v0.2 ##
> 
> Add introductions, improve topology example and firmware description based
> on feedback and a number of useful discussions.
> 
> ## v0.3 ##
> 
> Add normative sections (MUST, SHOULD, etc). Clarify some things, tighten
> the device and driver behaviour. Unmap semantics are consolidated; they
> are now closer to VFIO Type1 v2 semantics.
> 
> ## v0.4 ##
> 
> Introduce PROBE requests. They provide per-endpoint information to the
> driver that couldn't be described otherwise.
> 
> For the moment, they allow to handle MSIs on x86 virtual platforms (see
> 3.2). To do that we communicate reserved IOVA regions, that will also be
> useful for describing regions that cannot be mapped for a given endpoint,
> for instance addresses that correspond to a PCI bridge window.
> 
> Introducing such a large framework for this tiny feature may seem
> overkill, but it is needed for future extensions of the virtio-iommu and I
> believe it really is worth the effort.
> 
> ## Future ##
> 
> Other extensions are in preparation. I won't detail them here because v0.4
> already is a lot to digest, but in short, building on top of PROBE:
> 
> * First, since the IOMMU is paravirtualized, the device can expose some
>   properties of the physical topology to the guest, and let it allocate
>   resources more efficiently. For example, when the virtio-iommu manages
>   both physical and emulated endpoints, with different underlying IOMMUs,
>   we now have a way to describe multiple page and block granularities,
>   instead of forcing the guest to use the most restricted one for all
>   endpoints. This will most likely be in v0.5.
> 
> * Then on top of that, a major improvement will describe hardware
>   acceleration features available to the guest. There is what I call "Page
>   Table Handover" (or simply, from the host POV, "Nested"), the ability
>   for the guest to manipulate its own page tables instead of sending
>   MAP/UNMAP requests to the host. This, along with IO Page Fault
>   reporting, will also permit SVM virtualization on different platforms.
> 
> Thanks,
> Jean
> 
> [1] http://www.spinics.net/lists/kvm/msg147990.html
> [2] git://linux-arm.org/virtio-iommu.git branch viommu/v0.4
> 
> http://www.linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.4/virtio-iommu-v0.4.pdf
> I reiterate the disclaimers: don't use this document as a reference,
> it's a draft. It's also not an OASIS document yet. It may be riddled
> with mistakes. As this is a working draft, it is unstable and I do not
> guarantee backward compatibility of future versions.
> [3] https://lists.gnu.org/archive/html/qemu-arm/2017-08/msg4.html
> [4] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.4
> Warning: UAPI headers have changed! They didn't follow the spec,
> please update. (Use branch v0.1, that has the old headers, for the
> Qemu prototype [3])
When rebasing the v0.4 driver on master I observe a regression: commands
are not received properly by QEMU (typically an attach command is
received with a type of 0). After a bisection of the guest kernel the
first commit the problem appears is:

commit e3067861ba6650a566a

Re: [virtio-dev] [RFC] virtio-iommu version 0.4

2017-09-25 Thread Jean-Philippe Brucker
On 21/09/17 07:41, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
>> Sent: Wednesday, September 6, 2017 7:49 PM
>>
>>
>>> 2.6.8.2.1
>>> Multiple overlapping RESV_MEM properties are merged together. Device
>>> requirement? if same types I assume?
>>
>> Combination rules apply to device and driver. When the driver puts
>> multiple endpoints in the same domain, combination rules apply. They will
>> become important once the guest attempts to do things like combining
>> endpoints with different PASID capabilities in the same domain. I'm
>> considering these endpoints to be behind different physical IOMMUs.
>>
>> For reserved regions, we specify what the driver should do and what the
>> device should enforce with regard to mapping IOVAs of overlapping regions.
>> For example, if a device has some virtual address RESV_T_MSI and an other
>> device has the same virtual address RESV_T_IDENTITY, what should the
>> driver do?
>>
>> I think it should apply the RESV_T_IDENTITY. RESV_T_MSI is just a special
>> case of RESV_T_RESERVED, it's a hint for the IRQ subsystem and doesn't
>> have a meaning within a domain. From DMA mappings point of view, it is
>> effectively the same as RESV_T_RESERVED. When merging
>> RESV_T_RESERVED and
>> RESV_T_IDENTITY, we should make it RESV_T_IDENTITY. Because it is
>> required
>> for one endpoint to work (the one with IDENTITY) and is not accessed by
>> the other (the driver must not allocate this IOVA range for DMA).
>>
>> More generally, I'd like to add the following combination table to the
>> spec, that shows the resulting reserved region type after merging regions
>> from two endpoints. N: no reserved region, R: RESV_T_RESERVED, M:
>> RESV_T_MSI, I: RESV_T_IDENTITY. It is symmetric so I didn't fill the
>> bottom half.
>>
>>   | N | R | M | I
>> --
>> N | N | R | M | ?
>> --
>> R |   | R | R | I
>> --
>> M |   |   | M | I
>> --
>> I |   |   |   | I
>>
>> The 'I' column is problematic. If one endpoint has an IDENTITY region and
>> the other doesn't have anything at that virtual address, then making that
>> region an identity mapping will result in the second endpoint being able
>> to access firmware memory. If using nested translation, stage-2 can help
>> us here. But in general we should allow the device to reject an attach
>> that would result in a N+I combination if the host considers it too dodgy.
>> I think the other combinations are safe enough.
>>
> 
> will overlap happen in real? Can we simplify the spec to have device
> not reporting overlapping regions?

I think it's likely to happen, to have two endpoints with different
overlapping types. Reporting is easy, but handling mismatched reserved
regions is hard. So what I can do is leave these combination rules as
implementation detail, and let the device reject any domain attach that
overlaps mismatched RESV regions.

After all, putting multiple devices in the same domain is a nice feature
but might not be widely used in reality. It's not worth spending too much
time describing all possible behaviors for it (but still worth thinking
about to avoid introducing security holes).

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [virtio-dev] [RFC] virtio-iommu version 0.4

2017-09-20 Thread Tian, Kevin
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Wednesday, September 6, 2017 7:49 PM
> 
> 
> > 2.6.8.2.1
> > Multiple overlapping RESV_MEM properties are merged together. Device
> > requirement? if same types I assume?
> 
> Combination rules apply to device and driver. When the driver puts
> multiple endpoints in the same domain, combination rules apply. They will
> become important once the guest attempts to do things like combining
> endpoints with different PASID capabilities in the same domain. I'm
> considering these endpoints to be behind different physical IOMMUs.
> 
> For reserved regions, we specify what the driver should do and what the
> device should enforce with regard to mapping IOVAs of overlapping regions.
> For example, if a device has some virtual address RESV_T_MSI and an other
> device has the same virtual address RESV_T_IDENTITY, what should the
> driver do?
> 
> I think it should apply the RESV_T_IDENTITY. RESV_T_MSI is just a special
> case of RESV_T_RESERVED, it's a hint for the IRQ subsystem and doesn't
> have a meaning within a domain. From DMA mappings point of view, it is
> effectively the same as RESV_T_RESERVED. When merging
> RESV_T_RESERVED and
> RESV_T_IDENTITY, we should make it RESV_T_IDENTITY. Because it is
> required
> for one endpoint to work (the one with IDENTITY) and is not accessed by
> the other (the driver must not allocate this IOVA range for DMA).
> 
> More generally, I'd like to add the following combination table to the
> spec, that shows the resulting reserved region type after merging regions
> from two endpoints. N: no reserved region, R: RESV_T_RESERVED, M:
> RESV_T_MSI, I: RESV_T_IDENTITY. It is symmetric so I didn't fill the
> bottom half.
> 
>   | N | R | M | I
> --
> N | N | R | M | ?
> --
> R |   | R | R | I
> --
> M |   |   | M | I
> --
> I |   |   |   | I
> 
> The 'I' column is problematic. If one endpoint has an IDENTITY region and
> the other doesn't have anything at that virtual address, then making that
> region an identity mapping will result in the second endpoint being able
> to access firmware memory. If using nested translation, stage-2 can help
> us here. But in general we should allow the device to reject an attach
> that would result in a N+I combination if the host considers it too dodgy.
> I think the other combinations are safe enough.
> 

will overlap happen in real? Can we simplify the spec to have device
not reporting overlapping regions?

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [virtio-dev] [RFC] virtio-iommu version 0.4

2017-09-06 Thread Jean-Philippe Brucker
Hi Eric,

On 23/08/17 14:55, Auger Eric wrote:
> Please find some comments/questions below:

Thanks a lot for this. Sorry for the delay, I was on holiday and it took
me a while to sort out the details.

> 2.6.7:1
> I do not understand the footnode #6 sentence: 'Without a specific
> definition of the ACK requirements for a given property type, it simply
> means that the driver read all fields of that property."

That doesn't serve any purpose, I'll remove the footnote.

> 2.6.7:2
> what if the driver has not provided a big enough buffer and the device
> cannot report all supported properties?

Good point, how about adding the following to 2.6.7.2 - Device
Requirements: PROBE Request

If the properties array is smaller than \field{probe_size}, then the
device SHOULD NOT write any property and SHOULD set the request
\field{status} to VIRTIO_IOMMU_S_INVAL.

> 2.6.8.2:
> Couldn't we use the same terminology as iommu_resv_type in iommu.h?
> the negative form is not the easiest to understand for me.
> Why F_MSI?

It also needs to be understandable by someone not familiar with the Linux
APIs. It's not entirely obvious to me what the iommu.h regions are without
looking at the implementation, maybe using names from the device point of
view is more clear. I don't mind simplifying though, as I don't see a
reason for the driver to know about BYPASS regions other than
HW MSIs.

How about I remove ABORT and BYPASS, make the only subtype RESV_MEM_T_MSI
(1)? We'd use subtype 0 as "don't care, just don't map this". Can call it
RESV_MEM_T_RESERVED. Flags won't be used for these subtypes.

Another subtype will be RESV_MEM_T_IDENTITY (see my reply to Kevin).
IDENTITY regions must be identity-mapped by the guest and, as I understand
it, are virtual addresses used for DMA by firmware. The flags field will
then contain read/write protection flags.

> 2.6.8.2.1
> Multiple overlapping RESV_MEM properties are merged together. Device
> requirement? if same types I assume?

Combination rules apply to device and driver. When the driver puts
multiple endpoints in the same domain, combination rules apply. They will
become important once the guest attempts to do things like combining
endpoints with different PASID capabilities in the same domain. I'm
considering these endpoints to be behind different physical IOMMUs.

For reserved regions, we specify what the driver should do and what the
device should enforce with regard to mapping IOVAs of overlapping regions.
For example, if a device has some virtual address RESV_T_MSI and an other
device has the same virtual address RESV_T_IDENTITY, what should the
driver do?

I think it should apply the RESV_T_IDENTITY. RESV_T_MSI is just a special
case of RESV_T_RESERVED, it's a hint for the IRQ subsystem and doesn't
have a meaning within a domain. From DMA mappings point of view, it is
effectively the same as RESV_T_RESERVED. When merging RESV_T_RESERVED and
RESV_T_IDENTITY, we should make it RESV_T_IDENTITY. Because it is required
for one endpoint to work (the one with IDENTITY) and is not accessed by
the other (the driver must not allocate this IOVA range for DMA).

More generally, I'd like to add the following combination table to the
spec, that shows the resulting reserved region type after merging regions
from two endpoints. N: no reserved region, R: RESV_T_RESERVED, M:
RESV_T_MSI, I: RESV_T_IDENTITY. It is symmetric so I didn't fill the
bottom half.

  | N | R | M | I
--
N | N | R | M | ?
--
R |   | R | R | I
--
M |   |   | M | I
--
I |   |   |   | I

The 'I' column is problematic. If one endpoint has an IDENTITY region and
the other doesn't have anything at that virtual address, then making that
region an identity mapping will result in the second endpoint being able
to access firmware memory. If using nested translation, stage-2 can help
us here. But in general we should allow the device to reject an attach
that would result in a N+I combination if the host considers it too dodgy.
I think the other combinations are safe enough.

> what if the device is a physical assigned device. does the device report
> the host doorbell or the guest doorbell, may be worth to clarify.

Indeed. For a physical assigned device it is the guest doorbell as it
corresponds to the virtual ITS or APIC (though with the APIC it's the same
region).

> 3.1.1 last paragraph
> you talk about device ID but this is a terminology only known by ARM I
> think. Actually it is introduced later in 3.1.2.

Indeed, I'll move the explanation up.

> 3.1.2
> About ACPI integration. Isn't IORT ARM specific?

IORT is currently edited by ARM, as is virtio-iommu. Their contents are
not ARM specific however, they describe software interfaces that solve
mostly generic problems, and aim to be open.

IORT describes the relation between endpoints and their translation
agents. This problem has already been solved by IVRS, DMAR and IORT. From

Re: [virtio-dev] [RFC] virtio-iommu version 0.4

2017-08-23 Thread Auger Eric
Hi Jean-Philippe,

On 04/08/2017 20:19, Jean-Philippe Brucker wrote:
> This is the continuation of my proposal for virtio-iommu, the para-
> virtualized IOMMU. Here is a summary of the changes since last time [1]:
> 
> * The virtio-iommu document now resembles an actual specification. It is
>   split into a formal description of the virtio device, and implementation
>   notes. Please find sources and binaries at [2].
> 
> * Added a probe request to describe to the guest different properties that
>   do not fit in firmware or in the virtio config space. This is a
>   necessary stepping stone for extending the virtio-iommu.
> 
> * There is a working Qemu prototype [3], thanks to Eric Auger and Bharat
>   Bhushan.
> 
> You can find the Linux driver and kvmtool device at [4] and [5]. I
> plan to rework driver and kvmtool device slightly before sending the
> patches.
> 
> To understand the virtio-iommu, I advise to first read introduction and
> motivation, then skim through implementation notes and finally look at the
> device specification.
> 
> I wasn't sure how to organize the review. For those who prefer to comment
> inline, I attached v0.4 of device-operations.tex and topology.tex+MSI.tex
> to this thread. They are the biggest chunks of the document. But LaTeX
> isn't very pleasant to read, so you can simply send a list of comments in
> relation to section numbers and a few words of context, we'll manage.

Please find some comments/questions below:

2.6.7:1
I do not understand the footnode #6 sentence: 'Without a specific
definition of the ACK requirements for a given property type, it simply
means that the driver read all fields of that property."
2.6.7:2
what if the driver has not provided a big enough buffer and the device
cannot report all supported properties?
2.6.8.2:
Couldn't we use the same terminology as iommu_resv_type in iommu.h?
the negative form is not the easiest to understand for me.
Why F_MSI?
2.6.8.2.1
Multiple overlapping RESV_MEM properties are merged together. Device
requirement? if same types I assume?

what if the device is a physical assigned device. does the device report
the host doorbell or the guest doorbell, may be worth to clarify.

3.1.1 last paragraph
you talk about device ID but this is a terminology only known by ARM I
think. Actually it is introduced later in 3.1.2.

3.1.2
About ACPI integration. Isn't IORT ARM specific? Will other
architectures use that table? In IORT we also have Named Component node
which look pretty the same. Could you elaborate of the difference?
Device Object name: isn't it the path to the LNR0005 in the namespace?

Thanks

Eric
> 
> ---
> Version numbers 0.1-0.4 are arbitrary. I'm hoping they allow to compare
> more easily differences since the RFC (see [6]), but haven't been made
> public so far. This is the first public posting since initial proposal
> [1], and the following describes all changes.
> 
> ## v0.1 ##
> 
> Content is the same as the RFC, but formatted to LaTeX. 'make' generates
> one PDF and one HTML document.
> 
> ## v0.2 ##
> 
> Add introductions, improve topology example and firmware description based
> on feedback and a number of useful discussions.
> 
> ## v0.3 ##
> 
> Add normative sections (MUST, SHOULD, etc). Clarify some things, tighten
> the device and driver behaviour. Unmap semantics are consolidated; they
> are now closer to VFIO Type1 v2 semantics.
> 
> ## v0.4 ##
> 
> Introduce PROBE requests. They provide per-endpoint information to the
> driver that couldn't be described otherwise.
> 
> For the moment, they allow to handle MSIs on x86 virtual platforms (see
> 3.2). To do that we communicate reserved IOVA regions, that will also be
> useful for describing regions that cannot be mapped for a given endpoint,
> for instance addresses that correspond to a PCI bridge window.
> 
> Introducing such a large framework for this tiny feature may seem
> overkill, but it is needed for future extensions of the virtio-iommu and I
> believe it really is worth the effort.
> 
> ## Future ##
> 
> Other extensions are in preparation. I won't detail them here because v0.4
> already is a lot to digest, but in short, building on top of PROBE:
> 
> * First, since the IOMMU is paravirtualized, the device can expose some
>   properties of the physical topology to the guest, and let it allocate
>   resources more efficiently. For example, when the virtio-iommu manages
>   both physical and emulated endpoints, with different underlying IOMMUs,
>   we now have a way to describe multiple page and block granularities,
>   instead of forcing the guest to use the most restricted one for all
>   endpoints. This will most likely be in v0.5.
> 
> * Then on top of that, a major improvement will describe hardware
>   acceleration features available to the guest. There is what I call "Page
>   Table Handover" (or simply, from the host POV, "Nested"), the ability
>   for the guest to manipulate its own page tables instead of sending
>   MAP/UNMAP requests to 

RE: [virtio-dev] [RFC] virtio-iommu version 0.4

2017-08-17 Thread Bharat Bhushan
Hi,

> -Original Message-
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Thursday, August 17, 2017 3:42 PM
> To: Adam Tao 
> Cc: iommu@lists.linux-foundation.org; k...@vger.kernel.org;
> virtualizat...@lists.linux-foundation.org; virtio-...@lists.oasis-open.org;
> will.dea...@arm.com; robin.mur...@arm.com; lorenzo.pieral...@arm.com;
> m...@redhat.com; jasow...@redhat.com; marc.zyng...@arm.com;
> eric.au...@redhat.com; eric.auger@gmail.com; Bharat Bhushan
> ; pet...@redhat.com; kevin.t...@intel.com
> Subject: Re: [virtio-dev] [RFC] virtio-iommu version 0.4
> 
> Hi Adam,
> 
> On 16/08/17 05:08, Adam Tao wrote:
> >> * There is a working Qemu prototype [3], thanks to Eric Auger and Bharat
> >>   Bhushan.
> > Hi, Brucker
> > I read the related spec for virtio IOMMU, I am wondering if we support
> > both the virtual and physical devices in the guest to use the virtio
> > IOMMU, how can we config the related address translation for the
> > different type of devices.
> > e.g.
> > 1. for Virtio devs(e.g. virtio-net), we can change the memmap table
> > used by the vhost device.
> > 2. for physical devices we can directly config the IOMMU/SMMU on the
> > host.
> >
> > So do you know are there any RFCs for the back-end implementation for
> > the virtio-IOMMU thanks Adam
> 
> Eric's series [3] adds support for virtual devices (I'v only had time to test
> virtio-net so far, but as vhost-iotlb seems to be supported by Qemu, it should
> work with vhost-net as well). Bharat is working on the VFIO backend for
> physical devices. As far as I'm aware, his latest version is:
> 
> [7] https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg04094.html

I am working on newer version which will be based on Eric-v3.
Actually I wanted to include the fix for the issue reported by Eric, it fails 
if we assign more than one interface.
I reproduced the issues on my side and know the root cause. I am working on 
fixing this.

Thanks
-Bharat

> 
> Thanks,
> Jean
> 
> 
> >> [1] http://www.spinics.net/lists/kvm/msg147990.html
> >> [2] git://linux-arm.org/virtio-iommu.git branch viommu/v0.4
> >> http://www.linux-arm.org/git?p=virtio-
> iommu.git;a=blob;f=dist/v0.4/virtio-iommu-v0.4.pdf
> >> I reiterate the disclaimers: don't use this document as a reference,
> >> it's a draft. It's also not an OASIS document yet. It may be riddled
> >> with mistakes. As this is a working draft, it is unstable and I do not
> >> guarantee backward compatibility of future versions.
> >> [3] https://lists.gnu.org/archive/html/qemu-arm/2017-08/msg4.html
> >> [4] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.4
> >> Warning: UAPI headers have changed! They didn't follow the spec,
> >> please update. (Use branch v0.1, that has the old headers, for the
> >> Qemu prototype [3])
> >> [5] git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.4
> >> Warning: command-line has changed! Use --viommu vfio[,opts] and
> >> --viommu virtio[,opts] to instantiate a device.
> >> [6]
> >> http://www.linux-arm.org/git?p=virtio-iommu.git;a=tree;f=dist/diffs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [virtio-dev] [RFC] virtio-iommu version 0.4

2017-08-17 Thread Jean-Philippe Brucker
Hi Adam,

On 16/08/17 05:08, Adam Tao wrote:
>> * There is a working Qemu prototype [3], thanks to Eric Auger and Bharat
>>   Bhushan.
> Hi, Brucker
> I read the related spec for virtio IOMMU,
> I am wondering if we support both the virtual and physical devices in
> the guest to use the virtio IOMMU, how can we config the related address
> translation for the different type of devices.
> e.g.
> 1. for Virtio devs(e.g. virtio-net), we can change the memmap table used
> by the vhost device.
> 2. for physical devices we can directly config the IOMMU/SMMU on the
> host.
> 
> So do you know are there any RFCs for the back-end implementation for the 
> virtio-IOMMU
> thanks
> Adam

Eric's series [3] adds support for virtual devices (I'v only had time to
test virtio-net so far, but as vhost-iotlb seems to be supported by Qemu,
it should work with vhost-net as well). Bharat is working on the VFIO
backend for physical devices. As far as I'm aware, his latest version is:

[7] https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg04094.html

Thanks,
Jean


>> [1] http://www.spinics.net/lists/kvm/msg147990.html
>> [2] git://linux-arm.org/virtio-iommu.git branch viommu/v0.4
>> 
>> http://www.linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.4/virtio-iommu-v0.4.pdf
>> I reiterate the disclaimers: don't use this document as a reference,
>> it's a draft. It's also not an OASIS document yet. It may be riddled
>> with mistakes. As this is a working draft, it is unstable and I do not
>> guarantee backward compatibility of future versions.
>> [3] https://lists.gnu.org/archive/html/qemu-arm/2017-08/msg4.html
>> [4] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.4
>> Warning: UAPI headers have changed! They didn't follow the spec,
>> please update. (Use branch v0.1, that has the old headers, for the
>> Qemu prototype [3])
>> [5] git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.4
>> Warning: command-line has changed! Use --viommu vfio[,opts] and
>> --viommu virtio[,opts] to instantiate a device.
>> [6] http://www.linux-arm.org/git?p=virtio-iommu.git;a=tree;f=dist/diffs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [virtio-dev] [RFC] virtio-iommu version 0.4

2017-08-15 Thread Adam Tao
On Fri, Aug 04, 2017 at 07:19:25PM +0100, Jean-Philippe Brucker wrote:
> This is the continuation of my proposal for virtio-iommu, the para-
> virtualized IOMMU. Here is a summary of the changes since last time [1]:
> 
> * The virtio-iommu document now resembles an actual specification. It is
>   split into a formal description of the virtio device, and implementation
>   notes. Please find sources and binaries at [2].
> 
> * Added a probe request to describe to the guest different properties that
>   do not fit in firmware or in the virtio config space. This is a
>   necessary stepping stone for extending the virtio-iommu.
> 
> * There is a working Qemu prototype [3], thanks to Eric Auger and Bharat
>   Bhushan.
Hi, Brucker
I read the related spec for virtio IOMMU,
I am wondering if we support both the virtual and physical devices in
the guest to use the virtio IOMMU, how can we config the related address
translation for the different type of devices.
e.g.
1. for Virtio devs(e.g. virtio-net), we can change the memmap table used
by the vhost device.
2. for physical devices we can directly config the IOMMU/SMMU on the
host.

So do you know are there any RFCs for the back-end implementation for the 
virtio-IOMMU
thanks
Adam
> 
> You can find the Linux driver and kvmtool device at [4] and [5]. I
> plan to rework driver and kvmtool device slightly before sending the
> patches.
> 
> To understand the virtio-iommu, I advise to first read introduction and
> motivation, then skim through implementation notes and finally look at the
> device specification.
> 
> I wasn't sure how to organize the review. For those who prefer to comment
> inline, I attached v0.4 of device-operations.tex and topology.tex+MSI.tex
> to this thread. They are the biggest chunks of the document. But LaTeX
> isn't very pleasant to read, so you can simply send a list of comments in
> relation to section numbers and a few words of context, we'll manage.
> 
> ---
> Version numbers 0.1-0.4 are arbitrary. I'm hoping they allow to compare
> more easily differences since the RFC (see [6]), but haven't been made
> public so far. This is the first public posting since initial proposal
> [1], and the following describes all changes.
> 
> ## v0.1 ##
> 
> Content is the same as the RFC, but formatted to LaTeX. 'make' generates
> one PDF and one HTML document.
> 
> ## v0.2 ##
> 
> Add introductions, improve topology example and firmware description based
> on feedback and a number of useful discussions.
> 
> ## v0.3 ##
> 
> Add normative sections (MUST, SHOULD, etc). Clarify some things, tighten
> the device and driver behaviour. Unmap semantics are consolidated; they
> are now closer to VFIO Type1 v2 semantics.
> 
> ## v0.4 ##
> 
> Introduce PROBE requests. They provide per-endpoint information to the
> driver that couldn't be described otherwise.
> 
> For the moment, they allow to handle MSIs on x86 virtual platforms (see
> 3.2). To do that we communicate reserved IOVA regions, that will also be
> useful for describing regions that cannot be mapped for a given endpoint,
> for instance addresses that correspond to a PCI bridge window.
> 
> Introducing such a large framework for this tiny feature may seem
> overkill, but it is needed for future extensions of the virtio-iommu and I
> believe it really is worth the effort.
> 
> ## Future ##
> 
> Other extensions are in preparation. I won't detail them here because v0.4
> already is a lot to digest, but in short, building on top of PROBE:
> 
> * First, since the IOMMU is paravirtualized, the device can expose some
>   properties of the physical topology to the guest, and let it allocate
>   resources more efficiently. For example, when the virtio-iommu manages
>   both physical and emulated endpoints, with different underlying IOMMUs,
>   we now have a way to describe multiple page and block granularities,
>   instead of forcing the guest to use the most restricted one for all
>   endpoints. This will most likely be in v0.5.
> 
> * Then on top of that, a major improvement will describe hardware
>   acceleration features available to the guest. There is what I call "Page
>   Table Handover" (or simply, from the host POV, "Nested"), the ability
>   for the guest to manipulate its own page tables instead of sending
>   MAP/UNMAP requests to the host. This, along with IO Page Fault
>   reporting, will also permit SVM virtualization on different platforms.
> 
> Thanks,
> Jean
> 
> [1] http://www.spinics.net/lists/kvm/msg147990.html
> [2] git://linux-arm.org/virtio-iommu.git branch viommu/v0.4
> 
> http://www.linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.4/virtio-iommu-v0.4.pdf
> I reiterate the disclaimers: don't use this document as a reference,
> it's a draft. It's also not an OASIS document yet. It may be riddled
> with mistakes. As this is a working draft, it is unstable and I do not
> guarantee backward compatibility of future versions.
> [3] https://lists.gnu.org/archive/html/qe