RE: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend

2023-11-12 Thread Duan, Zhenzhong


>-Original Message-
>From: Joao Martins 
>Sent: Friday, November 10, 2023 9:09 PM
>Subject: Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
>
>On 10/11/2023 03:15, Duan, Zhenzhong wrote:
>> Hi Jason, Joao,
>>
>>> -Original Message-
>>> From: Jason Gunthorpe 
>>> Sent: Thursday, November 9, 2023 10:35 PM
>>> Subject: Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd
>backend
>>>
>>> On Thu, Nov 09, 2023 at 01:21:59PM +, Joao Martins wrote:
>>>> On 09/11/2023 13:09, Jason Gunthorpe wrote:
>>>>> On Thu, Nov 09, 2023 at 01:03:02PM +, Joao Martins wrote:
>>>>>
>>>>>>> I am not talking about mdevs; but rather the regular (non mdev) case not
>>> being
>>>>>>> able to use dirty tracking with autodomains hwpt allocation.
>>>>>>
>>>>>> ... without any vIOMMU.
>>>>>
>>>>> Ah, well, that is troublesome isn't it..
>>>>>
>>>>> So do we teach autodomains to be more featured in the kernel or do we
>>>>> teach the generic qemu code to effectively implement autodomains in
>>>>> userspace?
>>>>
>>>> The latter is actually what we have been doing. Well I wouldn't call
>>> autodomains
>>>> in qemu, but rather just allocate a hwpt, instead of attaching the IOAS
>>>> directly. But well mdevs don't have domains and we overlooked that. I would
>>> turn
>>>> the exception into an exception rather than making the norm, doesn't look 
>>>> to
>>> be
>>>> much complexity added?
>>>
>>> Autodomains are complex because of things like mdev and iommu
>>> non-uniformity's. Qemu can't just allocate a single HWPT, it needs to
>>> be annoyingly managed.
>>>
>>>> What I last re-collect is that autodomains represents the 'simple users' 
>>>> that
>>>> don't care much beyond the basics of IOMMU features (I recall the example
>>> was
>>>> DPDK apps and the like). You could say that for current needs IOMMU
>>> autodomains
>>>> suffices for qemu.
>>>
>>> Yes, that was my intention. Aside from that it primarily exists to
>>> support vfio compatibility
>>>
>>>> Connecting autodomains to this enforcing on the hwpt is relatively simple
>btw,
>>>> it just needs to connect the dirty tracking flag with same semantic of
>>>> hwpt-alloc equivalent and pass the hwpt flags into the domain allocation.
>>>
>>> Yes
>>>
>>>> It's more of what of a question should be the expectations to the user when
>>>> using ATTACH_HWPT with an IOAS_ID versus direct manipulation of HWPT. I
>am
>>>> wondering if dirty tracking is alone here or whether there's more features
>that
>>>> start to mud the simplicity of autodomains that would approximate of hwpt-
>>> alloc.
>>>
>>> This is why I had been thinking of a pure HWPT based scheme
>>>
>>> So it seems we cannot have a simple model where the generic qmeu layer
>>> just works in IOAS :( It might as well always work in HWPT and
>>> understand all the auto domains complexity itself.
>>
>> Let me know if there is anything I can do in this series to facilitate
>> future qemu dirty tracking support of iommufd. Not clear if I should
>> restore to the manual HWPT_ALLOC method in v4.
>
>If we want to have the closest support as type1-iommu, from what we have been
>discussing... it sounds like IOAS is the easiest first step to get barebones
>iommufd support. Which sort of makes sense since this is the introduction of
>iommufd and it already requires a lot of churn & refactoring to get there.
Agree.

>
>For the new iommufd-only features (nesting/dirty-tracking) we will need the 
>auto
>domains done by Qemu IIUC -- unless nesting is meant to coexist with
>autodomains
>with its own hwpts somehow (?)

We have a draft nesting implementation which has its own hwpts and coexist
with autodomains.
See https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_cdev_v5_nesting
if you are interested.

>
>Right now I don't have the autodomains QEMU equivalent structure in mind to
>suggest a path in alternative to v5; Looking at the kernel autodomains path,
>aside from mdev I am not sure yet what annoyances the autodomains path in
>qemu
>is going to generate: more worringly whether we have enough information to
>tackle the non-uniformity e.g. if we are talking about features or whether
>different devices are behind different IOMMUs.

OK, looks more thinking and discuss needed except mdev.
I'd like to keep this series as a basic iommufd support with IOAS attaching.
QEMU autodomain may be another series addressing new iommufd-only features.

Thanks
Zhenzhong



Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend

2023-11-10 Thread Joao Martins
On 10/11/2023 03:15, Duan, Zhenzhong wrote:
> Hi Jason, Joao,
> 
>> -Original Message-
>> From: Jason Gunthorpe 
>> Sent: Thursday, November 9, 2023 10:35 PM
>> Subject: Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
>>
>> On Thu, Nov 09, 2023 at 01:21:59PM +, Joao Martins wrote:
>>> On 09/11/2023 13:09, Jason Gunthorpe wrote:
>>>> On Thu, Nov 09, 2023 at 01:03:02PM +, Joao Martins wrote:
>>>>
>>>>>> I am not talking about mdevs; but rather the regular (non mdev) case not
>> being
>>>>>> able to use dirty tracking with autodomains hwpt allocation.
>>>>>
>>>>> ... without any vIOMMU.
>>>>
>>>> Ah, well, that is troublesome isn't it..
>>>>
>>>> So do we teach autodomains to be more featured in the kernel or do we
>>>> teach the generic qemu code to effectively implement autodomains in
>>>> userspace?
>>>
>>> The latter is actually what we have been doing. Well I wouldn't call
>> autodomains
>>> in qemu, but rather just allocate a hwpt, instead of attaching the IOAS
>>> directly. But well mdevs don't have domains and we overlooked that. I would
>> turn
>>> the exception into an exception rather than making the norm, doesn't look to
>> be
>>> much complexity added?
>>
>> Autodomains are complex because of things like mdev and iommu
>> non-uniformity's. Qemu can't just allocate a single HWPT, it needs to
>> be annoyingly managed.
>>
>>> What I last re-collect is that autodomains represents the 'simple users' 
>>> that
>>> don't care much beyond the basics of IOMMU features (I recall the example
>> was
>>> DPDK apps and the like). You could say that for current needs IOMMU
>> autodomains
>>> suffices for qemu.
>>
>> Yes, that was my intention. Aside from that it primarily exists to
>> support vfio compatibility
>>
>>> Connecting autodomains to this enforcing on the hwpt is relatively simple 
>>> btw,
>>> it just needs to connect the dirty tracking flag with same semantic of
>>> hwpt-alloc equivalent and pass the hwpt flags into the domain allocation.
>>
>> Yes
>>
>>> It's more of what of a question should be the expectations to the user when
>>> using ATTACH_HWPT with an IOAS_ID versus direct manipulation of HWPT. I am
>>> wondering if dirty tracking is alone here or whether there's more features 
>>> that
>>> start to mud the simplicity of autodomains that would approximate of hwpt-
>> alloc.
>>
>> This is why I had been thinking of a pure HWPT based scheme
>>
>> So it seems we cannot have a simple model where the generic qmeu layer
>> just works in IOAS :( It might as well always work in HWPT and
>> understand all the auto domains complexity itself.
> 
> Let me know if there is anything I can do in this series to facilitate
> future qemu dirty tracking support of iommufd. Not clear if I should
> restore to the manual HWPT_ALLOC method in v4.

If we want to have the closest support as type1-iommu, from what we have been
discussing... it sounds like IOAS is the easiest first step to get barebones
iommufd support. Which sort of makes sense since this is the introduction of
iommufd and it already requires a lot of churn & refactoring to get there.

For the new iommufd-only features (nesting/dirty-tracking) we will need the auto
domains done by Qemu IIUC -- unless nesting is meant to coexist with autodomains
with its own hwpts somehow (?)

Right now I don't have the autodomains QEMU equivalent structure in mind to
suggest a path in alternative to v5; Looking at the kernel autodomains path,
aside from mdev I am not sure yet what annoyances the autodomains path in qemu
is going to generate: more worringly whether we have enough information to
tackle the non-uniformity e.g. if we are talking about features or whether
different devices are behind different IOMMUs.



RE: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend

2023-11-09 Thread Duan, Zhenzhong
Hi Jason, Joao,

>-Original Message-
>From: Jason Gunthorpe 
>Sent: Thursday, November 9, 2023 10:35 PM
>Subject: Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
>
>On Thu, Nov 09, 2023 at 01:21:59PM +, Joao Martins wrote:
>> On 09/11/2023 13:09, Jason Gunthorpe wrote:
>> > On Thu, Nov 09, 2023 at 01:03:02PM +, Joao Martins wrote:
>> >
>> >>> I am not talking about mdevs; but rather the regular (non mdev) case not
>being
>> >>> able to use dirty tracking with autodomains hwpt allocation.
>> >>
>> >> ... without any vIOMMU.
>> >
>> > Ah, well, that is troublesome isn't it..
>> >
>> > So do we teach autodomains to be more featured in the kernel or do we
>> > teach the generic qemu code to effectively implement autodomains in
>> > userspace?
>>
>> The latter is actually what we have been doing. Well I wouldn't call
>autodomains
>> in qemu, but rather just allocate a hwpt, instead of attaching the IOAS
>> directly. But well mdevs don't have domains and we overlooked that. I would
>turn
>> the exception into an exception rather than making the norm, doesn't look to
>be
>> much complexity added?
>
>Autodomains are complex because of things like mdev and iommu
>non-uniformity's. Qemu can't just allocate a single HWPT, it needs to
>be annoyingly managed.
>
>> What I last re-collect is that autodomains represents the 'simple users' that
>> don't care much beyond the basics of IOMMU features (I recall the example
>was
>> DPDK apps and the like). You could say that for current needs IOMMU
>autodomains
>> suffices for qemu.
>
>Yes, that was my intention. Aside from that it primarily exists to
>support vfio compatibility
>
>> Connecting autodomains to this enforcing on the hwpt is relatively simple 
>> btw,
>> it just needs to connect the dirty tracking flag with same semantic of
>> hwpt-alloc equivalent and pass the hwpt flags into the domain allocation.
>
>Yes
>
>> It's more of what of a question should be the expectations to the user when
>> using ATTACH_HWPT with an IOAS_ID versus direct manipulation of HWPT. I am
>> wondering if dirty tracking is alone here or whether there's more features 
>> that
>> start to mud the simplicity of autodomains that would approximate of hwpt-
>alloc.
>
>This is why I had been thinking of a pure HWPT based scheme
>
>So it seems we cannot have a simple model where the generic qmeu layer
>just works in IOAS :( It might as well always work in HWPT and
>understand all the auto domains complexity itself.

Let me know if there is anything I can do in this series to facilitate
future qemu dirty tracking support of iommufd. Not clear if I should
restore to the manual HWPT_ALLOC method in v4.

Thanks
Zhenzhong



Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend

2023-11-09 Thread Jason Gunthorpe
On Thu, Nov 09, 2023 at 01:21:59PM +, Joao Martins wrote:
> On 09/11/2023 13:09, Jason Gunthorpe wrote:
> > On Thu, Nov 09, 2023 at 01:03:02PM +, Joao Martins wrote:
> > 
> >>> I am not talking about mdevs; but rather the regular (non mdev) case not 
> >>> being
> >>> able to use dirty tracking with autodomains hwpt allocation.
> >>
> >> ... without any vIOMMU.
> > 
> > Ah, well, that is troublesome isn't it..
> > 
> > So do we teach autodomains to be more featured in the kernel or do we
> > teach the generic qemu code to effectively implement autodomains in
> > userspace?
> 
> The latter is actually what we have been doing. Well I wouldn't call 
> autodomains
> in qemu, but rather just allocate a hwpt, instead of attaching the IOAS
> directly. But well mdevs don't have domains and we overlooked that. I would 
> turn
> the exception into an exception rather than making the norm, doesn't look to 
> be
> much complexity added?

Autodomains are complex because of things like mdev and iommu
non-uniformity's. Qemu can't just allocate a single HWPT, it needs to
be annoyingly managed.

> What I last re-collect is that autodomains represents the 'simple users' that
> don't care much beyond the basics of IOMMU features (I recall the example was
> DPDK apps and the like). You could say that for current needs IOMMU 
> autodomains
> suffices for qemu.

Yes, that was my intention. Aside from that it primarily exists to
support vfio compatibility

> Connecting autodomains to this enforcing on the hwpt is relatively simple btw,
> it just needs to connect the dirty tracking flag with same semantic of
> hwpt-alloc equivalent and pass the hwpt flags into the domain allocation.

Yes

> It's more of what of a question should be the expectations to the user when
> using ATTACH_HWPT with an IOAS_ID versus direct manipulation of HWPT. I am
> wondering if dirty tracking is alone here or whether there's more features 
> that
> start to mud the simplicity of autodomains that would approximate of 
> hwpt-alloc.

This is why I had been thinking of a pure HWPT based scheme

So it seems we cannot have a simple model where the generic qmeu layer
just works in IOAS :( It might as well always work in HWPT and
understand all the auto domains complexity itself.

Jason



Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend

2023-11-09 Thread Joao Martins
On 09/11/2023 13:09, Jason Gunthorpe wrote:
> On Thu, Nov 09, 2023 at 01:03:02PM +, Joao Martins wrote:
> 
>>> I am not talking about mdevs; but rather the regular (non mdev) case not 
>>> being
>>> able to use dirty tracking with autodomains hwpt allocation.
>>
>> ... without any vIOMMU.
> 
> Ah, well, that is troublesome isn't it..
> 
> So do we teach autodomains to be more featured in the kernel or do we
> teach the generic qemu code to effectively implement autodomains in
> userspace?

The latter is actually what we have been doing. Well I wouldn't call autodomains
in qemu, but rather just allocate a hwpt, instead of attaching the IOAS
directly. But well mdevs don't have domains and we overlooked that. I would turn
the exception into an exception rather than making the norm, doesn't look to be
much complexity added?

What I last re-collect is that autodomains represents the 'simple users' that
don't care much beyond the basics of IOMMU features (I recall the example was
DPDK apps and the like). You could say that for current needs IOMMU autodomains
suffices for qemu.

For more advanced features we have advocating into our new iommu domain
manipulation i.e the more advanced API or manipulation domain objects. Nesting
is obviously the one that stresses 99% of the hwpt APIs (beyond alloc), and the
other one has been dirty tracking as the domain is where we enforce current
device support and future device attachments.

Connecting autodomains to this enforcing on the hwpt is relatively simple btw,
it just needs to connect the dirty tracking flag with same semantic of
hwpt-alloc equivalent and pass the hwpt flags into the domain allocation.

It's more of what of a question should be the expectations to the user when
using ATTACH_HWPT with an IOAS_ID versus direct manipulation of HWPT. I am
wondering if dirty tracking is alone here or whether there's more features that
start to mud the simplicity of autodomains that would approximate of hwpt-alloc.

Joao



Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend

2023-11-09 Thread Jason Gunthorpe
On Thu, Nov 09, 2023 at 01:03:02PM +, Joao Martins wrote:

> > I am not talking about mdevs; but rather the regular (non mdev) case not 
> > being
> > able to use dirty tracking with autodomains hwpt allocation.
> 
> ... without any vIOMMU.

Ah, well, that is troublesome isn't it..

So do we teach autodomains to be more featured in the kernel or do we
teach the generic qemu code to effectively implement autodomains in
userspace?

Jason



Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend

2023-11-09 Thread Joao Martins
On 09/11/2023 12:59, Joao Martins wrote:
> On 09/11/2023 12:57, Jason Gunthorpe wrote:
>> On Thu, Nov 09, 2023 at 12:17:35PM +, Joao Martins wrote:
>>> On 08/11/2023 12:48, Jason Gunthorpe wrote:
 On Wed, Nov 08, 2023 at 07:16:52AM +, Duan, Zhenzhong wrote:

>>> +ret = iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>>> + container->ioas_id, _id);
>>> +
>>> +if (ret) {
>>> +error_setg_errno(errp, errno, "error alloc shadow hwpt");
>>> +return ret;
>>> +}
>>
>> The above alloc_hwpt fails for mdevs (at least, it fails for me 
>> attempting to use
>> iommufd backend with vfio-ccw and vfio-ap on s390).  The ioctl is 
>> failing in the
>> kernel because it can't find an IOMMUFD_OBJ_DEVICE.
>>
>> AFAIU that's because the mdevs are meant to instead use kernel access via
>> vfio_iommufd_emulated_attach_ioas, not hwpt.  That's how mdevs behave 
>> when
>> looking at the kernel vfio compat container.
>>
>> As a test, I was able to get vfio-ccw and vfio-ap working using the 
>> iommufd
>> backend by just skipping this alloc_hwpt above and instead passing 
>> container-
>>> ioas_id into the iommufd_cdev_attach_hwpt below.  That triggers the
>> vfio_iommufd_emulated_attach_ioas call in the kernel.
>
> Thanks for help test and investigation.
> I was only focusing on real device and missed the mdev particularity, 
> sorry.
> You are right, there is no hwpt support for mdev, not even an emulated 
> hwpt.
> I'll digging into this and see how to distinguish mdev with real device in
> this low level function.

 I was expecting that hwpt manipulation would be done exclusively
 inside the device-specific vIOMMU userspace driver. Generic code paths
 that don't have that knowledge should use the IOAS for everything
>>>
>>> I am probably late into noticing this given Zhenzhong v5; but arent' we
>>> forgetting the enforcing of dirty tracking in HWPT is done /via/
>>> ALLOC_HWPT ?
>>
>> The underlying viommu driver supporting mdev cannot support dirty
>> tracking via the hwpt flag, so it doesn't matter.
>>
>> The entire point is that a mdev doesn't have a hwpt or any of the hwpt
>> linked features including dirty tracking.
> 
> I am not talking about mdevs; but rather the regular (non mdev) case not being
> able to use dirty tracking with autodomains hwpt allocation.

... without any vIOMMU.



Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend

2023-11-09 Thread Jason Gunthorpe
On Thu, Nov 09, 2023 at 12:17:35PM +, Joao Martins wrote:
> 
> 
> On 08/11/2023 12:48, Jason Gunthorpe wrote:
> > On Wed, Nov 08, 2023 at 07:16:52AM +, Duan, Zhenzhong wrote:
> > 
>  +ret = iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>  + container->ioas_id, _id);
>  +
>  +if (ret) {
>  +error_setg_errno(errp, errno, "error alloc shadow hwpt");
>  +return ret;
>  +}
> >>>
> >>> The above alloc_hwpt fails for mdevs (at least, it fails for me 
> >>> attempting to use
> >>> iommufd backend with vfio-ccw and vfio-ap on s390).  The ioctl is failing 
> >>> in the
> >>> kernel because it can't find an IOMMUFD_OBJ_DEVICE.
> >>>
> >>> AFAIU that's because the mdevs are meant to instead use kernel access via
> >>> vfio_iommufd_emulated_attach_ioas, not hwpt.  That's how mdevs behave when
> >>> looking at the kernel vfio compat container.
> >>>
> >>> As a test, I was able to get vfio-ccw and vfio-ap working using the 
> >>> iommufd
> >>> backend by just skipping this alloc_hwpt above and instead passing 
> >>> container-
>  ioas_id into the iommufd_cdev_attach_hwpt below.  That triggers the
> >>> vfio_iommufd_emulated_attach_ioas call in the kernel.
> >>
> >> Thanks for help test and investigation.
> >> I was only focusing on real device and missed the mdev particularity, 
> >> sorry.
> >> You are right, there is no hwpt support for mdev, not even an emulated 
> >> hwpt.
> >> I'll digging into this and see how to distinguish mdev with real device in
> >> this low level function.
> > 
> > I was expecting that hwpt manipulation would be done exclusively
> > inside the device-specific vIOMMU userspace driver. Generic code paths
> > that don't have that knowledge should use the IOAS for everything
> 
> I am probably late into noticing this given Zhenzhong v5; but arent' we
> forgetting the enforcing of dirty tracking in HWPT is done /via/
> ALLOC_HWPT ?

The underlying viommu driver supporting mdev cannot support dirty
tracking via the hwpt flag, so it doesn't matter.

The entire point is that a mdev doesn't have a hwpt or any of the hwpt
linked features including dirty tracking.

Jason



Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend

2023-11-09 Thread Joao Martins
On 09/11/2023 12:57, Jason Gunthorpe wrote:
> On Thu, Nov 09, 2023 at 12:17:35PM +, Joao Martins wrote:
>> On 08/11/2023 12:48, Jason Gunthorpe wrote:
>>> On Wed, Nov 08, 2023 at 07:16:52AM +, Duan, Zhenzhong wrote:
>>>
>> +ret = iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>> + container->ioas_id, _id);
>> +
>> +if (ret) {
>> +error_setg_errno(errp, errno, "error alloc shadow hwpt");
>> +return ret;
>> +}
>
> The above alloc_hwpt fails for mdevs (at least, it fails for me 
> attempting to use
> iommufd backend with vfio-ccw and vfio-ap on s390).  The ioctl is failing 
> in the
> kernel because it can't find an IOMMUFD_OBJ_DEVICE.
>
> AFAIU that's because the mdevs are meant to instead use kernel access via
> vfio_iommufd_emulated_attach_ioas, not hwpt.  That's how mdevs behave when
> looking at the kernel vfio compat container.
>
> As a test, I was able to get vfio-ccw and vfio-ap working using the 
> iommufd
> backend by just skipping this alloc_hwpt above and instead passing 
> container-
>> ioas_id into the iommufd_cdev_attach_hwpt below.  That triggers the
> vfio_iommufd_emulated_attach_ioas call in the kernel.

 Thanks for help test and investigation.
 I was only focusing on real device and missed the mdev particularity, 
 sorry.
 You are right, there is no hwpt support for mdev, not even an emulated 
 hwpt.
 I'll digging into this and see how to distinguish mdev with real device in
 this low level function.
>>>
>>> I was expecting that hwpt manipulation would be done exclusively
>>> inside the device-specific vIOMMU userspace driver. Generic code paths
>>> that don't have that knowledge should use the IOAS for everything
>>
>> I am probably late into noticing this given Zhenzhong v5; but arent' we
>> forgetting the enforcing of dirty tracking in HWPT is done /via/
>> ALLOC_HWPT ?
> 
> The underlying viommu driver supporting mdev cannot support dirty
> tracking via the hwpt flag, so it doesn't matter.
> 
> The entire point is that a mdev doesn't have a hwpt or any of the hwpt
> linked features including dirty tracking.

I am not talking about mdevs; but rather the regular (non mdev) case not being
able to use dirty tracking with autodomains hwpt allocation.

Joao



Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend

2023-11-09 Thread Joao Martins



On 08/11/2023 12:48, Jason Gunthorpe wrote:
> On Wed, Nov 08, 2023 at 07:16:52AM +, Duan, Zhenzhong wrote:
> 
 +ret = iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
 + container->ioas_id, _id);
 +
 +if (ret) {
 +error_setg_errno(errp, errno, "error alloc shadow hwpt");
 +return ret;
 +}
>>>
>>> The above alloc_hwpt fails for mdevs (at least, it fails for me attempting 
>>> to use
>>> iommufd backend with vfio-ccw and vfio-ap on s390).  The ioctl is failing 
>>> in the
>>> kernel because it can't find an IOMMUFD_OBJ_DEVICE.
>>>
>>> AFAIU that's because the mdevs are meant to instead use kernel access via
>>> vfio_iommufd_emulated_attach_ioas, not hwpt.  That's how mdevs behave when
>>> looking at the kernel vfio compat container.
>>>
>>> As a test, I was able to get vfio-ccw and vfio-ap working using the iommufd
>>> backend by just skipping this alloc_hwpt above and instead passing 
>>> container-
 ioas_id into the iommufd_cdev_attach_hwpt below.  That triggers the
>>> vfio_iommufd_emulated_attach_ioas call in the kernel.
>>
>> Thanks for help test and investigation.
>> I was only focusing on real device and missed the mdev particularity, sorry.
>> You are right, there is no hwpt support for mdev, not even an emulated hwpt.
>> I'll digging into this and see how to distinguish mdev with real device in
>> this low level function.
> 
> I was expecting that hwpt manipulation would be done exclusively
> inside the device-specific vIOMMU userspace driver. Generic code paths
> that don't have that knowledge should use the IOAS for everything

I am probably late into noticing this given Zhenzhong v5; but arent' we
forgetting the enforcing of dirty tracking in HWPT is done /via/ ALLOC_HWPT ?

We decided sometime ago that the domain_alloc_user flow (and thus enforcement of
dirty tracking) would go via hwpt manip as opposed to the autodomains flow.

Otherwise if I need to ressurect the autodomains support we will need a
ATTACH_IOAS flag replicating this enforcement to pass into the HWPT auto 
allocation.

Or I can add the hwpt manip on the qemu dirty tracking support of iommufd.

Joao



RE: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend

2023-11-08 Thread Duan, Zhenzhong



>-Original Message-
>From: Jason Gunthorpe 
>Sent: Wednesday, November 8, 2023 10:19 PM
>Subject: Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
>
>On Wed, Nov 08, 2023 at 01:25:34PM +, Duan, Zhenzhong wrote:
>
>> >I was expecting that hwpt manipulation would be done exclusively
>> >inside the device-specific vIOMMU userspace driver. Generic code paths
>> >that don't have that knowledge should use the IOAS for everything
>>
>> Yes, this way we don't need to distinguish between mdev and real device,
>> just attach to IOAS. But lose the benefit that same hwpt could be passed
>> into vIOMMU to be used as S2 hwpt in nesting.
>
>If you have a nesting capable vIOMMU driver then it should be
>creating the HWPTs and managing them in its layer. Maybe the core code
>provides some helpers.

OK, thanks for suggestion.

>
>Obviously you can't link a mdev to a nesting vIOMMU driver in the
>first place. Mdev should be connected to a different IOMMU driver that
>doesn't use HWPT at all.
>
>I think it will make alot of trouble to put the hwpt in the wrong
>layer as there shouldn't really be much generic code touching it.

I'll send v5 with your suggested changes.

Thanks
Zhenzhong



Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend

2023-11-08 Thread Jason Gunthorpe
On Wed, Nov 08, 2023 at 01:25:34PM +, Duan, Zhenzhong wrote:

> >I was expecting that hwpt manipulation would be done exclusively
> >inside the device-specific vIOMMU userspace driver. Generic code paths
> >that don't have that knowledge should use the IOAS for everything
> 
> Yes, this way we don't need to distinguish between mdev and real device,
> just attach to IOAS. But lose the benefit that same hwpt could be passed
> into vIOMMU to be used as S2 hwpt in nesting.

If you have a nesting capable vIOMMU driver then it should be
creating the HWPTs and managing them in its layer. Maybe the core code
provides some helpers.

Obviously you can't link a mdev to a nesting vIOMMU driver in the
first place. Mdev should be connected to a different IOMMU driver that
doesn't use HWPT at all.

I think it will make alot of trouble to put the hwpt in the wrong
layer as there shouldn't really be much generic code touching it.

Jason



RE: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend

2023-11-08 Thread Duan, Zhenzhong



>-Original Message-
>From: Jason Gunthorpe 
>Sent: Wednesday, November 8, 2023 8:48 PM
>Subject: Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
>
>On Wed, Nov 08, 2023 at 07:16:52AM +, Duan, Zhenzhong wrote:
>
>> >> +ret = iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>> >> + container->ioas_id, _id);
>> >> +
>> >> +if (ret) {
>> >> +error_setg_errno(errp, errno, "error alloc shadow hwpt");
>> >> +return ret;
>> >> +}
>> >
>> >The above alloc_hwpt fails for mdevs (at least, it fails for me attempting 
>> >to
>use
>> >iommufd backend with vfio-ccw and vfio-ap on s390).  The ioctl is failing 
>> >in the
>> >kernel because it can't find an IOMMUFD_OBJ_DEVICE.
>> >
>> >AFAIU that's because the mdevs are meant to instead use kernel access via
>> >vfio_iommufd_emulated_attach_ioas, not hwpt.  That's how mdevs behave
>when
>> >looking at the kernel vfio compat container.
>> >
>> >As a test, I was able to get vfio-ccw and vfio-ap working using the iommufd
>> >backend by just skipping this alloc_hwpt above and instead passing 
>> >container-
>> >>ioas_id into the iommufd_cdev_attach_hwpt below.  That triggers the
>> >vfio_iommufd_emulated_attach_ioas call in the kernel.
>>
>> Thanks for help test and investigation.
>> I was only focusing on real device and missed the mdev particularity, sorry.
>> You are right, there is no hwpt support for mdev, not even an emulated hwpt.
>> I'll digging into this and see how to distinguish mdev with real device in
>> this low level function.
>
>I was expecting that hwpt manipulation would be done exclusively
>inside the device-specific vIOMMU userspace driver. Generic code paths
>that don't have that knowledge should use the IOAS for everything

Yes, this way we don't need to distinguish between mdev and real device,
just attach to IOAS. But lose the benefit that same hwpt could be passed
into vIOMMU to be used as S2 hwpt in nesting.

If you don't have a strong opinion to use IOAS for everything, I'm thinking
about adding a bool variable is_mdev in VFIODevice, checking this bool
to decide if attach to manually allocated hwpt or IOAS.
For vfio-ap and vfio-ccw, is_mdev is set to true, for vfio-pci, we check
"/sys/bus/mdev" from vbasedev->sysfsdev to decide if it's true.

Another choice is to add VFIO_DEVICE_FLAGS_MDEV in vfio_device_info.flags
on kernel side, qemu can know if this device is mdev by checking the flag from
kernel, this works even in fd passing case.

Thanks
Zhenzhong



Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend

2023-11-08 Thread Jason Gunthorpe
On Wed, Nov 08, 2023 at 07:16:52AM +, Duan, Zhenzhong wrote:

> >> +ret = iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
> >> + container->ioas_id, _id);
> >> +
> >> +if (ret) {
> >> +error_setg_errno(errp, errno, "error alloc shadow hwpt");
> >> +return ret;
> >> +}
> >
> >The above alloc_hwpt fails for mdevs (at least, it fails for me attempting 
> >to use
> >iommufd backend with vfio-ccw and vfio-ap on s390).  The ioctl is failing in 
> >the
> >kernel because it can't find an IOMMUFD_OBJ_DEVICE.
> >
> >AFAIU that's because the mdevs are meant to instead use kernel access via
> >vfio_iommufd_emulated_attach_ioas, not hwpt.  That's how mdevs behave when
> >looking at the kernel vfio compat container.
> >
> >As a test, I was able to get vfio-ccw and vfio-ap working using the iommufd
> >backend by just skipping this alloc_hwpt above and instead passing container-
> >>ioas_id into the iommufd_cdev_attach_hwpt below.  That triggers the
> >vfio_iommufd_emulated_attach_ioas call in the kernel.
> 
> Thanks for help test and investigation.
> I was only focusing on real device and missed the mdev particularity, sorry.
> You are right, there is no hwpt support for mdev, not even an emulated hwpt.
> I'll digging into this and see how to distinguish mdev with real device in
> this low level function.

I was expecting that hwpt manipulation would be done exclusively
inside the device-specific vIOMMU userspace driver. Generic code paths
that don't have that knowledge should use the IOAS for everything

Jason



RE: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend

2023-11-07 Thread Duan, Zhenzhong
Hi Matthew,

>-Original Message-
>From: Matthew Rosato 
>Sent: Wednesday, November 8, 2023 11:00 AM
>Subject: Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
>
>On 11/2/23 3:12 AM, Zhenzhong Duan wrote:
>> From: Yi Liu 
>>
>> Add the iommufd backend. The IOMMUFD container class is implemented
>> based on the new /dev/iommu user API. This backend obviously depends
>> on CONFIG_IOMMUFD.
>>
>> So far, the iommufd backend doesn't support dirty page sync yet due
>> to missing support in the host kernel.
>>
>> Co-authored-by: Eric Auger 
>> Signed-off-by: Eric Auger 
>> Signed-off-by: Yi Liu 
>> Signed-off-by: Zhenzhong Duan 
>> ---
>
>[...]
>
>> +static int iommufd_cdev_attach_container(VFIODevice *vbasedev,
>> + VFIOIOMMUFDContainer *container,
>> + Error **errp)
>> +{
>> +int ret, iommufd = vbasedev->iommufd->fd;
>> +VFIOIOASHwpt *hwpt;
>> +uint32_t hwpt_id;
>> +Error *err = NULL;
>> +
>> +/* try to attach to an existing hwpt in this container */
>> +QLIST_FOREACH(hwpt, >hwpt_list, next) {
>> +ret = iommufd_cdev_attach_hwpt(vbasedev, hwpt->hwpt_id, );
>> +if (ret) {
>> +const char *msg = error_get_pretty(err);
>> +
>> +trace_iommufd_cdev_fail_attach_existing_hwpt(msg);
>> +error_free(err);
>> +err = NULL;
>> +} else {
>> +goto found_hwpt;
>> +}
>> +}
>> +
>> +ret = iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>> + container->ioas_id, _id);
>> +
>> +if (ret) {
>> +error_setg_errno(errp, errno, "error alloc shadow hwpt");
>> +return ret;
>> +}
>
>The above alloc_hwpt fails for mdevs (at least, it fails for me attempting to 
>use
>iommufd backend with vfio-ccw and vfio-ap on s390).  The ioctl is failing in 
>the
>kernel because it can't find an IOMMUFD_OBJ_DEVICE.
>
>AFAIU that's because the mdevs are meant to instead use kernel access via
>vfio_iommufd_emulated_attach_ioas, not hwpt.  That's how mdevs behave when
>looking at the kernel vfio compat container.
>
>As a test, I was able to get vfio-ccw and vfio-ap working using the iommufd
>backend by just skipping this alloc_hwpt above and instead passing container-
>>ioas_id into the iommufd_cdev_attach_hwpt below.  That triggers the
>vfio_iommufd_emulated_attach_ioas call in the kernel.

Thanks for help test and investigation.
I was only focusing on real device and missed the mdev particularity, sorry.
You are right, there is no hwpt support for mdev, not even an emulated hwpt.
I'll digging into this and see how to distinguish mdev with real device in
this low level function.

BRs.
Zhenzhong



RE: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend

2023-11-07 Thread Duan, Zhenzhong


>-Original Message-
>From: Cédric Le Goater 
>Sent: Tuesday, November 7, 2023 9:41 PM
>Subject: Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
>
>On 11/2/23 08:12, Zhenzhong Duan wrote:
>> From: Yi Liu 
>>
>> Add the iommufd backend. The IOMMUFD container class is implemented
>> based on the new /dev/iommu user API. This backend obviously depends
>> on CONFIG_IOMMUFD.
>>
>> So far, the iommufd backend doesn't support dirty page sync yet due
>> to missing support in the host kernel.
>>
>> Co-authored-by: Eric Auger 
>> Signed-off-by: Eric Auger 
>
>I think one tag for Eric is enough.
>
>> Signed-off-by: Yi Liu 
>> Signed-off-by: Zhenzhong Duan 
>> ---
>> v4: use SPDX identifier, use iommufd_cdev_* prefix, merge with manual alloc
>patch
>>
>>   include/hw/vfio/vfio-common.h |  23 ++
>>   hw/vfio/common.c  |  19 +-
>>   hw/vfio/iommufd.c | 504 ++
>>   hw/vfio/meson.build   |   3 +
>>   hw/vfio/trace-events  |  13 +
>>   5 files changed, 558 insertions(+), 4 deletions(-)
>>   create mode 100644 hw/vfio/iommufd.c
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 24ecc0e7ee..3f1a39a991 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -89,6 +89,23 @@ typedef struct VFIOHostDMAWindow {
>>   QLIST_ENTRY(VFIOHostDMAWindow) hostwin_next;
>>   } VFIOHostDMAWindow;
>>
>> +#ifdef CONFIG_IOMMUFD
>
>Please remove the #ifdef.

Will do.

>
>> +typedef struct VFIOIOASHwpt {
>> +uint32_t hwpt_id;
>> +QLIST_HEAD(, VFIODevice) device_list;
>> +QLIST_ENTRY(VFIOIOASHwpt) next;
>> +} VFIOIOASHwpt;
>> +
>> +typedef struct IOMMUFDBackend IOMMUFDBackend;
>> +
>> +typedef struct VFIOIOMMUFDContainer {
>> +VFIOContainerBase bcontainer;
>> +IOMMUFDBackend *be;
>> +uint32_t ioas_id;
>> +QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
>> +} VFIOIOMMUFDContainer;
>> +#endif
>> +
>>   typedef struct VFIODeviceOps VFIODeviceOps;
>>
>>   typedef struct VFIODevice {
>> @@ -116,6 +133,11 @@ typedef struct VFIODevice {
>>   OnOffAuto pre_copy_dirty_page_tracking;
>>   bool dirty_pages_supported;
>>   bool dirty_tracking;
>> +#ifdef CONFIG_IOMMUFD
>> +int devid;
>> +VFIOIOASHwpt *hwpt;
>> +IOMMUFDBackend *iommufd;
>> +#endif
>>   } VFIODevice;
>>
>>   struct VFIODeviceOps {
>> @@ -201,6 +223,7 @@ typedef QLIST_HEAD(VFIODeviceList, VFIODevice)
>VFIODeviceList;
>>   extern VFIOGroupList vfio_group_list;
>>   extern VFIODeviceList vfio_device_list;
>>   extern const VFIOIOMMUOps vfio_legacy_ops;
>> +extern const VFIOIOMMUOps vfio_iommufd_ops;
>>   extern const MemoryListener vfio_memory_listener;
>>   extern int vfio_kvm_device_fd;
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 572ae7c934..a61dce2845 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -1462,10 +1462,13 @@ VFIOAddressSpace
>*vfio_get_address_space(AddressSpace *as)
>>
>>   void vfio_put_address_space(VFIOAddressSpace *space)
>>   {
>> -if (QLIST_EMPTY(>containers)) {
>> -QLIST_REMOVE(space, list);
>> -g_free(space);
>> +if (!QLIST_EMPTY(>containers)) {
>> +return;
>>   }
>> +
>> +QLIST_REMOVE(space, list);
>> +g_free(space);
>> +
>>   if (QLIST_EMPTY(_address_spaces)) {
>>   qemu_unregister_reset(vfio_reset_handler, NULL);
>>   }
>> @@ -1498,8 +1501,16 @@ retry:
>>   int vfio_attach_device(char *name, VFIODevice *vbasedev,
>>  AddressSpace *as, Error **errp)
>>   {
>> -const VFIOIOMMUOps *ops = _legacy_ops;
>> +const VFIOIOMMUOps *ops;
>>
>> +#ifdef CONFIG_IOMMUFD
>
>You can keep this one though.

Will do.

>
>> +if (vbasedev->iommufd) {
>> +ops = _iommufd_ops;
>> +} else
>> +#endif
>> +{
>> +ops = _legacy_ops;
>> +}
>>   return ops->attach_device(name, vbasedev, as, errp);
>>   }
>>
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> new file mode 100644
>> index 00..1bb55ca2c4
>> --- /dev/null
>> +++ b/hw/vfio/iommufd.c
>> @@ -0,0 +1,504 @@
>> +/*
>> + * iommufd container backend
>> + *
>> + * Copyright (C) 202

Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend

2023-11-07 Thread Matthew Rosato
On 11/2/23 3:12 AM, Zhenzhong Duan wrote:
> From: Yi Liu 
> 
> Add the iommufd backend. The IOMMUFD container class is implemented
> based on the new /dev/iommu user API. This backend obviously depends
> on CONFIG_IOMMUFD.
> 
> So far, the iommufd backend doesn't support dirty page sync yet due
> to missing support in the host kernel.
> 
> Co-authored-by: Eric Auger 
> Signed-off-by: Eric Auger 
> Signed-off-by: Yi Liu 
> Signed-off-by: Zhenzhong Duan 
> ---

[...]

> +static int iommufd_cdev_attach_container(VFIODevice *vbasedev,
> + VFIOIOMMUFDContainer *container,
> + Error **errp)
> +{
> +int ret, iommufd = vbasedev->iommufd->fd;
> +VFIOIOASHwpt *hwpt;
> +uint32_t hwpt_id;
> +Error *err = NULL;
> +
> +/* try to attach to an existing hwpt in this container */
> +QLIST_FOREACH(hwpt, >hwpt_list, next) {
> +ret = iommufd_cdev_attach_hwpt(vbasedev, hwpt->hwpt_id, );
> +if (ret) {
> +const char *msg = error_get_pretty(err);
> +
> +trace_iommufd_cdev_fail_attach_existing_hwpt(msg);
> +error_free(err);
> +err = NULL;
> +} else {
> +goto found_hwpt;
> +}
> +}
> +
> +ret = iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
> + container->ioas_id, _id);
> +
> +if (ret) {
> +error_setg_errno(errp, errno, "error alloc shadow hwpt");
> +return ret;
> +}

The above alloc_hwpt fails for mdevs (at least, it fails for me attempting to 
use iommufd backend with vfio-ccw and vfio-ap on s390).  The ioctl is failing 
in the kernel because it can't find an IOMMUFD_OBJ_DEVICE.

AFAIU that's because the mdevs are meant to instead use kernel access via 
vfio_iommufd_emulated_attach_ioas, not hwpt.  That's how mdevs behave when 
looking at the kernel vfio compat container.

As a test, I was able to get vfio-ccw and vfio-ap working using the iommufd 
backend by just skipping this alloc_hwpt above and instead passing 
container->ioas_id into the iommufd_cdev_attach_hwpt below.  That triggers the 
vfio_iommufd_emulated_attach_ioas call in the kernel.

> +
> +/* Attach cdev to a new allocated hwpt within iommufd */
> +ret = iommufd_cdev_attach_hwpt(vbasedev, hwpt_id, errp);
> +if (ret) {
> +iommufd_backend_free_id(iommufd, hwpt_id);
> +return ret;
> +}
> +
> +hwpt = iommufd_container_get_hwpt(container, hwpt_id);
> +found_hwpt:
> +QLIST_INSERT_HEAD(>device_list, vbasedev, next);
> +vbasedev->hwpt = hwpt;
> +
> +trace_iommufd_cdev_attach_container(iommufd, vbasedev->name, 
> vbasedev->fd,
> +container->ioas_id, hwpt->hwpt_id);
> +return ret;
> +}




Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend

2023-11-07 Thread Cédric Le Goater

On 11/2/23 08:12, Zhenzhong Duan wrote:

From: Yi Liu 

Add the iommufd backend. The IOMMUFD container class is implemented
based on the new /dev/iommu user API. This backend obviously depends
on CONFIG_IOMMUFD.

So far, the iommufd backend doesn't support dirty page sync yet due
to missing support in the host kernel.

Co-authored-by: Eric Auger 
Signed-off-by: Eric Auger 


I think one tag for Eric is enough.


Signed-off-by: Yi Liu 
Signed-off-by: Zhenzhong Duan 
---
v4: use SPDX identifier, use iommufd_cdev_* prefix, merge with manual alloc 
patch

  include/hw/vfio/vfio-common.h |  23 ++
  hw/vfio/common.c  |  19 +-
  hw/vfio/iommufd.c | 504 ++
  hw/vfio/meson.build   |   3 +
  hw/vfio/trace-events  |  13 +
  5 files changed, 558 insertions(+), 4 deletions(-)
  create mode 100644 hw/vfio/iommufd.c

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 24ecc0e7ee..3f1a39a991 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -89,6 +89,23 @@ typedef struct VFIOHostDMAWindow {
  QLIST_ENTRY(VFIOHostDMAWindow) hostwin_next;
  } VFIOHostDMAWindow;
  
+#ifdef CONFIG_IOMMUFD


Please remove the #ifdef.


+typedef struct VFIOIOASHwpt {
+uint32_t hwpt_id;
+QLIST_HEAD(, VFIODevice) device_list;
+QLIST_ENTRY(VFIOIOASHwpt) next;
+} VFIOIOASHwpt;
+
+typedef struct IOMMUFDBackend IOMMUFDBackend;
+
+typedef struct VFIOIOMMUFDContainer {
+VFIOContainerBase bcontainer;
+IOMMUFDBackend *be;
+uint32_t ioas_id;
+QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
+} VFIOIOMMUFDContainer;
+#endif
+
  typedef struct VFIODeviceOps VFIODeviceOps;
  
  typedef struct VFIODevice {

@@ -116,6 +133,11 @@ typedef struct VFIODevice {
  OnOffAuto pre_copy_dirty_page_tracking;
  bool dirty_pages_supported;
  bool dirty_tracking;
+#ifdef CONFIG_IOMMUFD
+int devid;
+VFIOIOASHwpt *hwpt;
+IOMMUFDBackend *iommufd;
+#endif
  } VFIODevice;
  
  struct VFIODeviceOps {

@@ -201,6 +223,7 @@ typedef QLIST_HEAD(VFIODeviceList, VFIODevice) 
VFIODeviceList;
  extern VFIOGroupList vfio_group_list;
  extern VFIODeviceList vfio_device_list;
  extern const VFIOIOMMUOps vfio_legacy_ops;
+extern const VFIOIOMMUOps vfio_iommufd_ops;
  extern const MemoryListener vfio_memory_listener;
  extern int vfio_kvm_device_fd;
  
diff --git a/hw/vfio/common.c b/hw/vfio/common.c

index 572ae7c934..a61dce2845 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1462,10 +1462,13 @@ VFIOAddressSpace *vfio_get_address_space(AddressSpace 
*as)
  
  void vfio_put_address_space(VFIOAddressSpace *space)

  {
-if (QLIST_EMPTY(>containers)) {
-QLIST_REMOVE(space, list);
-g_free(space);
+if (!QLIST_EMPTY(>containers)) {
+return;
  }
+
+QLIST_REMOVE(space, list);
+g_free(space);
+
  if (QLIST_EMPTY(_address_spaces)) {
  qemu_unregister_reset(vfio_reset_handler, NULL);
  }
@@ -1498,8 +1501,16 @@ retry:
  int vfio_attach_device(char *name, VFIODevice *vbasedev,
 AddressSpace *as, Error **errp)
  {
-const VFIOIOMMUOps *ops = _legacy_ops;
+const VFIOIOMMUOps *ops;
  
+#ifdef CONFIG_IOMMUFD


You can keep this one though.


+if (vbasedev->iommufd) {
+ops = _iommufd_ops;
+} else
+#endif
+{
+ops = _legacy_ops;
+}
  return ops->attach_device(name, vbasedev, as, errp);
  }
  
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c

new file mode 100644
index 00..1bb55ca2c4
--- /dev/null
+++ b/hw/vfio/iommufd.c
@@ -0,0 +1,504 @@
+/*
+ * iommufd container backend
+ *
+ * Copyright (C) 2023 Intel Corporation.
+ * Copyright Red Hat, Inc. 2023
+ *
+ * Authors: Yi Liu 
+ *  Eric Auger 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include 
+#include 
+#include 
+
+#include "hw/vfio/vfio-common.h"
+#include "qemu/error-report.h"
+#include "trace.h"
+#include "qapi/error.h"
+#include "sysemu/iommufd.h"
+#include "hw/qdev-core.h"
+#include "sysemu/reset.h"
+#include "qemu/cutils.h"
+#include "qemu/chardev_open.h"
+
+static int iommufd_map(VFIOContainerBase *bcontainer, hwaddr iova,
+   ram_addr_t size, void *vaddr, bool readonly)
+{
+VFIOIOMMUFDContainer *container =
+container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
+
+return iommufd_backend_map_dma(container->be,
+   container->ioas_id,
+   iova, size, vaddr, readonly);
+}
+
+static int iommufd_unmap(VFIOContainerBase *bcontainer,
+ hwaddr iova, ram_addr_t size,
+ IOMMUTLBEntry *iotlb)
+{
+VFIOIOMMUFDContainer *container =
+container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
+
+/* TODO: Handle dma_unmap_bitmap with iotlb args (migration) */
+return iommufd_backend_unmap_dma(container->be,
+

[PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend

2023-11-02 Thread Zhenzhong Duan
From: Yi Liu 

Add the iommufd backend. The IOMMUFD container class is implemented
based on the new /dev/iommu user API. This backend obviously depends
on CONFIG_IOMMUFD.

So far, the iommufd backend doesn't support dirty page sync yet due
to missing support in the host kernel.

Co-authored-by: Eric Auger 
Signed-off-by: Eric Auger 
Signed-off-by: Yi Liu 
Signed-off-by: Zhenzhong Duan 
---
v4: use SPDX identifier, use iommufd_cdev_* prefix, merge with manual alloc 
patch

 include/hw/vfio/vfio-common.h |  23 ++
 hw/vfio/common.c  |  19 +-
 hw/vfio/iommufd.c | 504 ++
 hw/vfio/meson.build   |   3 +
 hw/vfio/trace-events  |  13 +
 5 files changed, 558 insertions(+), 4 deletions(-)
 create mode 100644 hw/vfio/iommufd.c

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 24ecc0e7ee..3f1a39a991 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -89,6 +89,23 @@ typedef struct VFIOHostDMAWindow {
 QLIST_ENTRY(VFIOHostDMAWindow) hostwin_next;
 } VFIOHostDMAWindow;
 
+#ifdef CONFIG_IOMMUFD
+typedef struct VFIOIOASHwpt {
+uint32_t hwpt_id;
+QLIST_HEAD(, VFIODevice) device_list;
+QLIST_ENTRY(VFIOIOASHwpt) next;
+} VFIOIOASHwpt;
+
+typedef struct IOMMUFDBackend IOMMUFDBackend;
+
+typedef struct VFIOIOMMUFDContainer {
+VFIOContainerBase bcontainer;
+IOMMUFDBackend *be;
+uint32_t ioas_id;
+QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
+} VFIOIOMMUFDContainer;
+#endif
+
 typedef struct VFIODeviceOps VFIODeviceOps;
 
 typedef struct VFIODevice {
@@ -116,6 +133,11 @@ typedef struct VFIODevice {
 OnOffAuto pre_copy_dirty_page_tracking;
 bool dirty_pages_supported;
 bool dirty_tracking;
+#ifdef CONFIG_IOMMUFD
+int devid;
+VFIOIOASHwpt *hwpt;
+IOMMUFDBackend *iommufd;
+#endif
 } VFIODevice;
 
 struct VFIODeviceOps {
@@ -201,6 +223,7 @@ typedef QLIST_HEAD(VFIODeviceList, VFIODevice) 
VFIODeviceList;
 extern VFIOGroupList vfio_group_list;
 extern VFIODeviceList vfio_device_list;
 extern const VFIOIOMMUOps vfio_legacy_ops;
+extern const VFIOIOMMUOps vfio_iommufd_ops;
 extern const MemoryListener vfio_memory_listener;
 extern int vfio_kvm_device_fd;
 
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 572ae7c934..a61dce2845 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1462,10 +1462,13 @@ VFIOAddressSpace *vfio_get_address_space(AddressSpace 
*as)
 
 void vfio_put_address_space(VFIOAddressSpace *space)
 {
-if (QLIST_EMPTY(>containers)) {
-QLIST_REMOVE(space, list);
-g_free(space);
+if (!QLIST_EMPTY(>containers)) {
+return;
 }
+
+QLIST_REMOVE(space, list);
+g_free(space);
+
 if (QLIST_EMPTY(_address_spaces)) {
 qemu_unregister_reset(vfio_reset_handler, NULL);
 }
@@ -1498,8 +1501,16 @@ retry:
 int vfio_attach_device(char *name, VFIODevice *vbasedev,
AddressSpace *as, Error **errp)
 {
-const VFIOIOMMUOps *ops = _legacy_ops;
+const VFIOIOMMUOps *ops;
 
+#ifdef CONFIG_IOMMUFD
+if (vbasedev->iommufd) {
+ops = _iommufd_ops;
+} else
+#endif
+{
+ops = _legacy_ops;
+}
 return ops->attach_device(name, vbasedev, as, errp);
 }
 
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
new file mode 100644
index 00..1bb55ca2c4
--- /dev/null
+++ b/hw/vfio/iommufd.c
@@ -0,0 +1,504 @@
+/*
+ * iommufd container backend
+ *
+ * Copyright (C) 2023 Intel Corporation.
+ * Copyright Red Hat, Inc. 2023
+ *
+ * Authors: Yi Liu 
+ *  Eric Auger 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include 
+#include 
+#include 
+
+#include "hw/vfio/vfio-common.h"
+#include "qemu/error-report.h"
+#include "trace.h"
+#include "qapi/error.h"
+#include "sysemu/iommufd.h"
+#include "hw/qdev-core.h"
+#include "sysemu/reset.h"
+#include "qemu/cutils.h"
+#include "qemu/chardev_open.h"
+
+static int iommufd_map(VFIOContainerBase *bcontainer, hwaddr iova,
+   ram_addr_t size, void *vaddr, bool readonly)
+{
+VFIOIOMMUFDContainer *container =
+container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
+
+return iommufd_backend_map_dma(container->be,
+   container->ioas_id,
+   iova, size, vaddr, readonly);
+}
+
+static int iommufd_unmap(VFIOContainerBase *bcontainer,
+ hwaddr iova, ram_addr_t size,
+ IOMMUTLBEntry *iotlb)
+{
+VFIOIOMMUFDContainer *container =
+container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
+
+/* TODO: Handle dma_unmap_bitmap with iotlb args (migration) */
+return iommufd_backend_unmap_dma(container->be,
+ container->ioas_id, iova, size);
+}
+
+static void iommufd_cdev_kvm_device_add(VFIODevice *vbasedev)
+{
+Error *err = NULL;
+
+if (vfio_kvm_device_add_fd(vbasedev->fd,