> -----Original Message-----
> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
> Sent: Monday, March 27, 2017 6:14 PM
> To: Liu, Yi L <yi.l....@intel.com>; Alex Williamson 
> <alex.william...@redhat.com>
> Cc: Shanker Donthineni <shank...@qti.qualcomm.com>; k...@vger.kernel.org;
> Catalin Marinas <catalin.mari...@arm.com>; Sinan Kaya
> <ok...@qti.qualcomm.com>; Will Deacon <will.dea...@arm.com>;
> iommu@lists.linux-foundation.org; Harv Abdulhamid <ha...@qti.qualcomm.com>;
> linux-...@vger.kernel.org; Bjorn Helgaas <bhelg...@google.com>; David
> Woodhouse <dw...@infradead.org>; linux-arm-ker...@lists.infradead.org; Nate
> Watterson <nwatt...@qti.qualcomm.com>; Tian, Kevin <kevin.t...@intel.com>;
> Lan, Tianyu <tianyu....@intel.com>; Raj, Ashok <ashok....@intel.com>; Pan, 
> Jacob
> jun <jacob.jun....@intel.com>; Joerg Roedel <j...@8bytes.org>; Robin Murphy
> <robin.mur...@arm.com>
> Subject: Re: [RFC PATCH 29/30] vfio: Add support for Shared Virtual Memory
> 
> On 24/03/17 07:46, Liu, Yi L wrote:
> [...]
> >>>>
> >>>> So we need some kind of high-level classification that the vIOMMU
> >>>> must communicate to the physical one. Each IOMMU flavor would get a
> >>>> unique, global identifier, simply to make sure that vIOMMU and
> >>>> pIOMMU speak
> >> the same language.
> >>>> For example:
> >>>>
> >>>> 0x65776886 "AMDV" AMD IOMMU
> >>>> 0x73788476 "INTL" Intel IOMMU
> >>>> 0x83515748 "S390" s390 IOMMU
> >>>> 0x83777785 "SMMU" ARM SMMU
> >>>> etc.
> >>>>
> >>>> It needs to be a global magic number that everyone can recognize.
> >>>> Could be as simple as 32-bit numbers allocated from 0. Once we have
> >>>> a global magic number, we can use it to differentiate 
> >>>> architecture-specific
> details.
> >
> > I prefer simple numbers to stand for each vendor.
> 
> Sure, I don't have any preference. Simple numbers could be easier to allocate.
> 
> >>> I may need to think more on this part.
> >>>
> >>>> struct pasid_table_info {
> >>>>  __u64 ptr;
> >>>>  __u64 size;             /* Is it number of entry or size in
> >>>>                             bytes? */
> >>>
> >>> For Intel platform, it's encoded. But I can make it in bytes. Here,
> >>> I'd like to check with you if whole guest PASID info is also needed on 
> >>> ARM?
> >>
> >> It will be needed on ARM if someone ever emulates the SMMU with SVM.
> >> Though I'm not planning on doing that myself, it is unavoidable. And
> >> it would be a shame for the next SVM virtualization solution to have
> >> to introduce a new flag "VFIO_SVM_BIND_PASIDPT_2" if they could reuse
> >> most of the BIND_PASIDPT interface but simply needed to add one or
> >> two configuration fields specific to their IOMMU.
> >
> > So you are totally fine with putting PASID table ptr and size in the
> > generic part? Maybe we have different usage for it. For me, it's a
> > guest PASID table ptr. For you, it may be different.
> 
> It's the same for SMMU, with some added format specifiers that would go in
> 'opaque[]'. I think that table pointer and size (in bytes, or number of
> entries) is generic enough for a "bind table" call and can be reused by future
> implementations.
> 
> >>>>
> >>>>  __u32 model;            /* magic number */
> >>>>  __u32 variant;          /* version of the IOMMU architecture,
> >>>>                             maybe? IOMMU-specific. */
> >
> > For variant, it will be combined with model to do sanity check. Am I right?
> > Maybe it could be moved to opaque.
> 
> Yes I guess it could be moved to opaque. It would be a version of the model 
> used, so
> we wouldn't have to allocate a new model number whenever an architecture
> updates the fields of its PASID descriptors, but we can let IOMMU drivers 
> decide if
> they need it and what to put in there.
> 
> >>>>  __u8 opaque[];          /* IOMMU-specific details */
> >>>> };
> >>>>
> [...]
> >>
> >> Yes, that seems sensible. I could add an explicit VFIO_BIND_PASID
> >> flags to make it explicit that data[] is "u32 pasid" and avoid having any 
> >> default.
> >
> > Add it in the comment I suppose. The length is 4 byes, it could be deduced 
> > from
> argsz.
> >
> >>
> >>>>
> >>>>> #define VFIO_SVM_PASSDOWN_INVALIDATE    (1 << 1)
> >>>>
> >>>> Using the vfio_device_svm structure for invalidate operations is a
> >>>> bit odd, it might be nicer to add a new VFIO_SVM_INVALIDATE ioctl,
> >>>> that takes the above iommu_svm_tlb_invalidate_info as argument
> >>>> (with an added argsz.)
> >>>
> >>> Agree, would add a separate IOCTL for invalidation.
> >>>
> >>>>
> >>>>> #define VFIO_SVM_PASID_RELEASE_FLUSHED  (1 << 2)
> >>>>> #define VFIO_SVM_PASID_RELEASE_CLEAN      (1 << 3)
> >>>>>        __u32   flags;
> >>>>>        __u32   length;
> >>>>
> >>>> If length is the size of data[], I guess we can already deduce this info 
> >>>> from argsz.
> >>>
> >>> yes, it is size of data. Maybe remove argsz. How about your opinion?
> >>
> >> Argsz as first argument is standard across all VFIO ioctls, it allows
> >> the kernel to check that the structure passed by the guest is
> >> consistent with the structure it knows (see the comment at the
> >> beginning of
> >> include/uapi/linux/vfio.h) So argsz would be the preferred way to
> >> communicate the size of data[]
> >
> > yes, it makes sense. BTW. I think it is still possible to leave the
> > "length" since there is similar usage. e.g. struct vfio_irq_set, count 
> > tells the size of
> data[] in that structure.
> > Anyhow, it's no big deal here.
> >
> >>>>>        __u8    data[];
> >>>>> };
> >>>>
> >>>> In general, I think that your proposal would work fine along mine.
> >>>> Note that for my next version of this patch, I would like to move
> >>>> the BIND/UNBIND SVM operations onto a VFIO container fd, instead of
> >>>> a VFIO device
> >> fd, if possible.
> >>>
> >>> Attach the BIND/UNBIND operation onto VFIO container fd is practical.
> >>>
> >>> BTW. Before you send out your next version, we'd better have a
> >>> consensus on the vfio_device_svm definition. So that we can continue
> >>> to drive our
> >> own work separately.
> >>>
> >>>> ---
> >>>> As an aside, it also aligns with the one I'm working on at the
> >>>> moment, for virtual SVM at a finer granularity, where the BIND call
> >>>> is for a page table. I would add this flag to vfio_device_svm:
> >>>>
> >>>> #define VFIO_SVM_BIND_PGTABLE            (1 << x)
> >>>
> >>> Sure. I think it may also be a requirement from other vendors. I
> >>> think you've mentioned it in the above statements.
> >>>
> >>>> Which would indicate the following data (just a draft, I'm
> >>>> oscillating between this and a generic PASID table solution, which
> >>>> would instead
> >> reuse your proposal):
> >>>>
> >>>> struct pgtable_info {
> >>>>  __u32 pasid;
> >>>>
> >>>>  __u64 pgd;
> >>>>
> >>>>  __u32 model;
> >>>>  __u32 variant;
> >>>>
> >>>>  __u8 opaque[];
> >>>> };
> >
> > I think you would have this definition as an ARM specific one. Would
> > be filled in the data[] of vfio_device_svm. Is it?
> 
> Yes it would be another kind of data[] for vfio_device_svm.
> 
> >>>>
> >>>> On ARM SMMU we would need to specify an io-pgtable variant and the
> >>>> opaque structure would be bits of io_pgtable_cfg (tcr, mair, etc.)
> >>>>
> >>>> The problem of portability is slightly different here, because
> >>>> while PASID table format is specific to an IOMMU, page table format
> >>>> might be the same across multiple flavors of IOMMUs. For instance,
> >>>> the PASID table format I use in this series can only be found in
> >>>> the ARM
> >>>> SMMUv3 architecture, but the page table format is the same as ARM
> >>>> SMMUv2, and other MMUs. I'd like to implement an IOMMU independent
> >>>> of any page-table formats, that could use whatever the host offers
> >>>> (not necessarily MMU PT). The model numbers described above
> >>>> wouldn't be suitable,
> >> so we'd need another set of magic numbers for page table formats.
> >>>
> >>> Not sure if I totally got your points. We may assume PASID table
> >>> format differs from vendor to vendor. For the page table format, I
> >>> assume you mean the page table of process. Or in other words MMU page 
> >>> table.
> >>>
> >>> I'm a little bit confused about the following statements. Could you
> >>> speak a little bit
> >> more?
> >>> Is it for virtual SVM or user-space driver SVM usage?
> >>> " I'd like to implement an IOMMU independent of any page-table
> >>> formats, that could use whatever the host offers (not necessarily MMU 
> >>> PT)."
> >>
> >> The IOMMU might be using a different page table format than the MMU.
> >> I'm currently toying with the idea of having a portable
> >> paravirtualized IOMMU that can query the page table format used by
> >> pIOMMU, and adopt it if the page table handling code is available in
> >> the guest. This would be for non-SVM nested translation, using VFIO
> >> to bind page tables private to the IOMMU. I'll send a separate RFC to 
> >> discuss this.
> >
> > sounds interesting. Look forward to your further work.
> >
> >>
> >>> I's nice to have such discussion. Let's co-work and have a well-defined 
> >>> API.
> >>
> >> I agree. I don't plan to send a new version immediately since it will
> >> take some time rework, but we can sync-up here or in private to avoid 
> >> conflicting
> proposals.
> >
> > yes, let's keep in touch. For "struct vfio_device_svm", how about define it 
> > in such
> way:
> >
> > struct vfio_device_svm {
> >        __u32   argsz;
> >      /* data length would be sizeof(pasid_table_info) */
> > #define VFIO_SVM_BIND_PASIDTP                            (1 << 0)
> >      /* data length would be 4 byte */
> > #define VFIO_BIND_PASID                                           (1 << 1)
> >      /* data length would be sizeof(x) */
> > #define VFIO_SVM_PASID_RELEASE_FLUSHED       (1 << 2)
> >      /* data length would be sizeof(y) */
> > #define VFIO_SVM_PASID_RELEASE_CLEAN         (1 << 3)
> 
> (Note that I'll probably drop these two flags in next version.)
> 
> >     /* data length would be sizeof(pgtable_info) */
> > #define VFIO_SVM_BIND_PGTABLE                (1 << 4)
> >        __u32   flags;
> >        __u32   length;  /*can remove it if you think it's better */
> >        __u8    data[];   /* payload data for different bind/unbind request*/
> > };
> >
> > struct pasid_table_info {
> >                __u16 sid;             /* It's BDF of device. */
> >     __u64 ptr;            /* PASID table ptr */
> >     __u64 size;     /* PASID table size in bytes */
> >     __u32 model;    /* magic number */
> >     __u32 variant;  /* YiL: maybe move it to opaque? */
> 
> Yes, I think it could be in opaque, not all formats would need that. ptr, 
> size, model
> and opaque seem sufficient.
> 
> >     __u8 opaque[];  /* IOMMU-specific details */
> > };
> >
> > I added a new field as "sid". The reason is as below:
> > Since the IOCTL is going to be on container fd, then needs to tell
> > where does the bind request come from. SID could be used to filter the 
> > target core
> IOMMU.
> > Also, it's needed to find the corresponding context entry with the SID
> > on Intel platform. I assume you also need an equivalent on your work.
> > Pls let me know if you prefer different definition.
> 
> Actually moving the ioctl to the container changes the BIND model: the PASID 
> table
> would be attached to all devices in the container, so there shouldn't be any 
> device
> selector in the ioctl. I think it is more in line with the existing MAP/UNMAP 
> API,
> where a container represents an address space, and all devices in the 
> container
> access the same virtual addresses.
> It adds complexity on the driver side, but seems more clear from the user 
> point of
> view.

Hi Jean,

I do need a device selector here. Bind whole guest PASID table in host thus it 
needs 
to modify a context entry which is per-device. In other words, it is to replace 
the PASID
table in host instead of modifying some entries. Maybe I can move it to opaque. 
Then
it won't bother other user. Since my op is actually per-device, so it may be an 
open
if I need to place IOCTL in vfio-pci. Anyhow, iommu ops usage mainly happen in 
type1,
I don't want to break the rule so far. So I would go on with the consensus we 
had here.

Regards,
Yi L
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to