> -----Original Message----- > From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] > Sent: Wednesday, July 12, 2017 4:28 PM > To: Bharat Bhushan <bharat.bhus...@nxp.com>; Auger Eric > <eric.au...@redhat.com>; eric.auger....@gmail.com; > peter.mayd...@linaro.org; alex.william...@redhat.com; m...@redhat.com; > qemu-...@nongnu.org; qemu-devel@nongnu.org > Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; > t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; > robin.mur...@arm.com; christoffer.d...@linaro.org > Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device > > On 12/07/17 11:27, Bharat Bhushan wrote: > > > > > >> -----Original Message----- > >> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com] > >> Sent: Wednesday, July 12, 2017 3:48 PM > >> To: Bharat Bhushan <bharat.bhus...@nxp.com>; Auger Eric > >> <eric.au...@redhat.com>; eric.auger....@gmail.com; > >> peter.mayd...@linaro.org; alex.william...@redhat.com; > m...@redhat.com; > >> qemu-...@nongnu.org; qemu-devel@nongnu.org > >> Cc: w...@redhat.com; kevin.t...@intel.com; marc.zyng...@arm.com; > >> t...@semihalf.com; will.dea...@arm.com; drjo...@redhat.com; > >> robin.mur...@arm.com; christoffer.d...@linaro.org > >> Subject: Re: [Qemu-devel] [RFC v2 0/8] VIRTIO-IOMMU device > >> > >> On 12/07/17 04:50, Bharat Bhushan wrote: > >> [...] > >>>> The size of the virtio_iommu_req_probe structure is variable, and > >> depends > >>>> what fields the device implements. So the device initially computes > >>>> the > >> size it > >>>> needs to fill virtio_iommu_req_probe, describes it in probe_size, > >>>> and the driver allocates that many bytes for > >>>> virtio_iommu_req_probe.content[] > >>>> > >>>>>> * When device offers VIRTIO_IOMMU_F_PROBE, the driver should > >> send > >>>> an > >>>>>> VIRTIO_IOMMU_T_PROBE request for each new endpoint. > >>>>>> * The driver allocates a device-writeable buffer of probe_size > >>>>>> (plus > >>>>>> framing) and sends it as a VIRTIO_IOMMU_T_PROBE request. > >>>>>> * The device fills the buffer with various information. > >>>>>> > >>>>>> struct virtio_iommu_req_probe { > >>>>>> /* device-readable */ > >>>>>> struct virtio_iommu_req_head head; > >>>>>> le32 device; > >>>>>> le32 flags; > >>>>>> > >>>>>> /* maybe also le32 content_size, but it must be equal to > >>>>>> probe_size */ > >>>>> > >>>>> Can you please describe why we need to pass size of "probe_size" > >>>>> in > >> probe > >>>> request? > >>>> > >>>> We don't. I don't think we should add this 'content_size' field > >>>> unless there > >> is > >>>> a compelling reason to do so. > >>>> > >>>>>> > >>>>>> /* device-writeable */ > >>>>>> u8 content[]; > >>>>> > >>>>> I assume content_size above is the size of array "content[]" and > >>>>> max > >> value > >>>> can be equal to probe_size advertised by device? > >>>> > >>>> probe_size is exactly the size of array content[]. The driver must > >>>> allocate a buffer of this size (plus the space needed for head, device, > flags and tail). > >>>> > >>>> Then the device is free to leave parts of content[] empty. Field > >>>> 'type' 0 will > >> be > >>>> reserved and mark the end of the array. > >>>> > >>>>>> struct virtio_iommu_req_tail tail; }; > >>>>>> > >>>>>> I'm still struggling with the content and layout of the probe > >>>>>> request, and would appreciate any feedback. To be easily > >>>>>> extended, I think it should contain a list of fields of variable size: > >>>>>> > >>>>>> |0 15|16 31|32 N| > >>>>>> | type | length | values | > >>>>>> > >>>>>> 'length' might be made optional if it can be deduced from type, > >>>>>> but might make driver-side parsing more robust. > >>>>>> > >>>>>> The probe could either be done for each endpoint, or for each > >>>>>> address space. I much prefer endpoint because it is the smallest > granularity. > >>>>>> The driver can then decide what endpoints to put together in the > >>>>>> same address space based on their individual capabilities. The > >>>>>> specification would described how each endpoint property is > >>>>>> combined when endpoints are put in the same address space. For > >>>>>> example, take the minimum of all PASID size, the maximum of all > >>>>>> page granularities, combine doorbell addresses, etc. > >>>>>> > >>>>>> If we did the probe on address spaces instead, the driver would > >>>>>> have to re-send a probe request each time a new endpoint is > >>>>>> attached to an existing address space, to see if it is still > >>>>>> capable of page table handover or if the driver just combined a > >>>>>> VFIO and an emulated endpoint by accident. > >>>>>> > >>>>>> *** > >>>>>> > >>>>>> Using this framework, the device can declare doorbell regions by > >>>>>> adding one or more RESV fields into the probe buffer: > >>>>>> > >>>>>> /* 'type' */ > >>>>>> #define VIRTIO_IOMMU_PROBE_T_RESV 0x1 > >>>>>> > >>>>>> /* 'values'. 'length' is sizeof(struct virtio_iommu_probe_resv) > >>>>>> */ struct virtio_iommu_probe_resv { > >>>>>> le64 gpa; > >>>>>> le64 size; > >>>>>> > >>>>>> #define VIRTIO_IOMMU_PROBE_RESV_MSI 0x1 > >>>>>> u8 type; > >>> > >>> To be sure I am understanding it correctly, Is this "type" in struct > >> virtio_iommu_req_head? > >> > >> No, virtio_iommu_req_head::type is the request type > >> (ATTACH/DETACH/MAP/UNMAP/PROBE). > >> > >> Then virtio_iommu_probe_property::type is the property type (only > >> RESV for the moment). > >> > >> And this is virtio_iommu_probe_resv::type, which is the type of the > >> resv region (MSI). I renamed it to 'subtype' below, but I think it > >> still is pretty confusing. > >> > >> > >> I did a number of changes to structures and naming when trying to > >> integrate it to the specification: > >> > >> * Added 64 bytes of padding in virtio_iommu_req_probe, so that future > >> extensions can add fields in the device-readable part. > >> * renamed "RESV" to "RESV_MEM". > >> * The resv_mem property now looks like this: > >> struct virtio_iommu_probe_resv_mem { > >> u8 subtype; > >> u8 padding[3]; > >> le32 flags; > >> le64 addr; > >> le64 size; > >> }; > >> * subtype for MSI doorbells is now > >> VIRTIO_IOMMU_PROBE_RESV_MEM_T_BYPASS > >> (because transactions to this region bypass the IOMMU). 'flags' > >> contain a hint VIRTIO_IOMMU_PROBE_RESV_MEM_F_MSI, telling the > driver > >> that this region is used for MSIs. > >> > >> Here is an example of a probe request returning an MSI doorbell > property. > >> > >> 31 7 0 > >> +---------------------------------+ > >> | 0 | type | <- request type = PROBE (5) > >> +---------------------------------+ > >> | device | > >> +---------------------------------+ > >> : : > >> : (64B padding) : > >> : : > >> +---------------------------------+ > >> ^ | length = 24 | type = 1 | <- property type = RESV_MEM (1) > >> | +---------------------------------+ > >> | | 0 |subtype | <- RESV_MEM subtype = BYPASS (1) > >> p| +---------------------------------+ > >> r| | flags = MSI | > >> o| +---------------------------------+ > >> b| | addr = 0xfee00000 | > >> e| | | > >> _| +---------------------------------+ > >> s| | size = 0x00100000 | > >> i| | | > >> z| +---------------------------------+ > >> e| | length | type | <- another property may start > >> | : : here > >> v : ... : > >> +---------------------------------+ > >> | 0 | status | <- request tail > >> +---------------------------------+ > > > > So we want a single probe will return info of all "types" and each "subtype" > of given "type"? I was of impression that based on flags there will be > separate probe request for a type. > > Argh, now I'm lost :) > > The virtio-iommu driver sends a single PROBE request for each endpoint. > The virtio-iommu device fills the 'properties' field with a list of > properties. > > And endpoint may have one or more reserved virtual addresses. Each such > region is described by the virtio-iommu device with a RESV_MEM property in > the properties list. > > There will be other types of properties in the future, for other information > than RESV_MEM. So the PROBE request is a generic channel for different > types of properties, all aggregated into a single list. The virtio-iommu > device > chooses which property it needs to communicate to the driver. > > The list does not have a fixed layout, and the driver knows what properties it > contains by reading the 'type' header of each property. > > The virtio-iommu driver parses the 'properties' list filled by the device. > If it encounters one or more RESV_MEM properties, the driver should take > note of them. Thereafter, the driver should never create a mapping > overlapping RESV_MEM regions for this endpoint. > > If, in addition, the RESV_MEM property is of subtype BYPASS and has flag > MSI, then the driver knows that it is an MSI doorbell and it doesn't need to > create a mapping (using a MAP request) for this MSI doorbell.
Thanks for clarification, it is inline with my understanding. Thanks -Bharat > > Maybe my prototype implementation will be more clear. > > Thanks, > Jean > > > >> > >> > >> I'll try to send the next version of the spec out as soon as possible. > >> > >> Thanks, > >> Jean > >> > >> > >>> Thanks > >>> -Bharat > >>> > >>>>> > >>>>> type is 16 bit above? > >>>> > >>>> Ah, the naming isn't great. This is not the same as above, and > >>>> could be > >> called > >>>> 'subtype' to avoid confusion. The above 16-bit type describes the > >>>> field > >> type, > >>>> e.g. struct virtio_iommu_probe_resv. I proposed 16-bit because it > >>>> seems easy to reach more than 255 kinds of endpoint properties, but > >>>> 65535 should do. > >>>> > >>>> This subtype describes which kind of resv region is described in > >>>> the > >> structure. > >>>> For the moment there only is VIRTIO_IOMMU_PROBE_RESV_MSI, but > we > >>>> could for example add resv regions that the driver should never use > >>>> or > >> that it > >>>> should identity-map (equivalent to IOMMU_RESV_RESERVED/DIRECT > in > >>>> Linux). I think 8 bits should be enough to contain any future > >>>> types, unless > >> we > >>>> make this a bitfield. For identity-map, there may be an additional > >>>> flags > >> field > >>>> describing the protection. > >>>> > >>>>>> }; > >>>>>> > >>>>>> Such a region would be subject to the following rules: > >>>>>> > >>>>>> * Driver should not use any IOVA declared as RESV_MSI in a > mapping. > >>>>>> * Device should leave any transaction matching a RESV_MSI region > >>>>>> pass through untranslated. > >>>>>> * If the device does not advertise any RESV region, then the > >>>>>> driver should assume that MSI doorbells, like any other GPA, must > >>>>>> be > >> mapped > >>>>>> with an arbitrary IOVA in order for the endpoint to access them. > >>>>>> * Given that the driver *should* perform a probe request if > >>>>>> available, and it *should* understand the > >>>> VIRTIO_IOMMU_PROBE_T_RESV > >>>>>> field, then this field tells the guest how it should handle MSI > >>>>>> doorbells, and whether it should map the address via MAP requests > >>>>>> or > >>>> not. > >>>>>> > >>>>>> Does this make sense and did I overlook something? > >>>>> > >>>>> Overall it looks good to me. Do you have plans to implements this > >>>>> in > >> virtio- > >>>> iommu driver and kvmtool? > >>>> > >>>> Yes, if there is no objection I'll try to formalize it and > >>>> implement it right > >> away. > >>>> > >>>> Thanks, > >>>> Jean > >