Re: [PATCH RFC 02/19] iommufd: Dirty tracking for io_pagetable
On 4/30/22 00:51, Baolu Lu wrote: > On 2022/4/29 05:09, Joao Martins wrote: >> +int iopt_set_dirty_tracking(struct io_pagetable *iopt, >> +struct iommu_domain *domain, bool enable) >> +{ >> +struct iommu_domain *dom; >> +unsigned long index; >> +int ret = -EOPNOTSUPP; >> + >> +down_write(>iova_rwsem); >> +if (!domain) { >> +down_write(>domains_rwsem); >> +xa_for_each(>domains, index, dom) { >> +ret = iommu_set_dirty_tracking(dom, iopt, enable); >> +if (ret < 0) >> +break; > > Do you need to roll back to the original state before return failure? > Partial domains have already had dirty bit tracking enabled. > Yeap, will fix the unwinding for next iteration. >> +} >> +up_write(>domains_rwsem); >> +} else { >> +ret = iommu_set_dirty_tracking(domain, iopt, enable); >> +} >> + >> +up_write(>iova_rwsem); >> +return ret; >> +} > > Best regards, > baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RFC 02/19] iommufd: Dirty tracking for io_pagetable
On 2022/4/29 05:09, Joao Martins wrote: +int iopt_set_dirty_tracking(struct io_pagetable *iopt, + struct iommu_domain *domain, bool enable) +{ + struct iommu_domain *dom; + unsigned long index; + int ret = -EOPNOTSUPP; + + down_write(>iova_rwsem); + if (!domain) { + down_write(>domains_rwsem); + xa_for_each(>domains, index, dom) { + ret = iommu_set_dirty_tracking(dom, iopt, enable); + if (ret < 0) + break; Do you need to roll back to the original state before return failure? Partial domains have already had dirty bit tracking enabled. + } + up_write(>domains_rwsem); + } else { + ret = iommu_set_dirty_tracking(domain, iopt, enable); + } + + up_write(>iova_rwsem); + return ret; +} Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RFC 02/19] iommufd: Dirty tracking for io_pagetable
On 4/29/22 12:56, Jason Gunthorpe wrote: > On Fri, Apr 29, 2022 at 08:07:14AM +, Tian, Kevin wrote: >>> From: Joao Martins >>> Sent: Friday, April 29, 2022 5:09 AM >>> >>> +static int __set_dirty_tracking_range_locked(struct iommu_domain >>> *domain, >> >> suppose anything using iommu_domain as the first argument should >> be put in the iommu layer. Here it's more reasonable to use iopt >> as the first argument or simply merge with the next function. >> >>> +struct io_pagetable *iopt, >>> +bool enable) >>> +{ >>> + const struct iommu_domain_ops *ops = domain->ops; >>> + struct iommu_iotlb_gather gather; >>> + struct iopt_area *area; >>> + int ret = -EOPNOTSUPP; >>> + unsigned long iova; >>> + size_t size; >>> + >>> + iommu_iotlb_gather_init(); >>> + >>> + for (area = iopt_area_iter_first(iopt, 0, ULONG_MAX); area; >>> +area = iopt_area_iter_next(area, 0, ULONG_MAX)) { >> >> how is this different from leaving iommu driver to walk the page table >> and the poke the modifier bit for all present PTEs? As commented in last >> patch this may allow removing the range op completely. > > Yea, I'm not super keen on the two ops either, especially since they > are so wildly different. > /me ack > I would expect that set_dirty_tracking turns on tracking for the > entire iommu domain, for all present and future maps > Yes. I didn't do that correctly on ARM, neither on device-attach (for x86 e.g. on hotplug). > While set_dirty_tracking_range - I guess it only does the range, so if > we make a new map then the new range will be untracked? But that is > now racy, we have to map and then call set_dirty_tracking_range > > It seems better for the iommu driver to deal with this and ARM should > atomically make the new maps dirty tracking.. > Next iteration I'll need to fix the way IOMMUs handle dirty-tracking probing and tracking in its private intermediate structures. But yes, I was trying to transfer this to the iommu driver (perhaps in a convoluted way). >>> +int iopt_set_dirty_tracking(struct io_pagetable *iopt, >>> + struct iommu_domain *domain, bool enable) >>> +{ >>> + struct iommu_domain *dom; >>> + unsigned long index; >>> + int ret = -EOPNOTSUPP; > > Returns EOPNOTSUPP if the xarray is empty? > Argh no. Maybe -EINVAL is better here. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RFC 02/19] iommufd: Dirty tracking for io_pagetable
On Fri, Apr 29, 2022 at 08:07:14AM +, Tian, Kevin wrote: > > From: Joao Martins > > Sent: Friday, April 29, 2022 5:09 AM > > > > +static int __set_dirty_tracking_range_locked(struct iommu_domain > > *domain, > > suppose anything using iommu_domain as the first argument should > be put in the iommu layer. Here it's more reasonable to use iopt > as the first argument or simply merge with the next function. > > > +struct io_pagetable *iopt, > > +bool enable) > > +{ > > + const struct iommu_domain_ops *ops = domain->ops; > > + struct iommu_iotlb_gather gather; > > + struct iopt_area *area; > > + int ret = -EOPNOTSUPP; > > + unsigned long iova; > > + size_t size; > > + > > + iommu_iotlb_gather_init(); > > + > > + for (area = iopt_area_iter_first(iopt, 0, ULONG_MAX); area; > > +area = iopt_area_iter_next(area, 0, ULONG_MAX)) { > > how is this different from leaving iommu driver to walk the page table > and the poke the modifier bit for all present PTEs? As commented in last > patch this may allow removing the range op completely. Yea, I'm not super keen on the two ops either, especially since they are so wildly different. I would expect that set_dirty_tracking turns on tracking for the entire iommu domain, for all present and future maps While set_dirty_tracking_range - I guess it only does the range, so if we make a new map then the new range will be untracked? But that is now racy, we have to map and then call set_dirty_tracking_range It seems better for the iommu driver to deal with this and ARM should atomically make the new maps dirty tracking.. > > +int iopt_set_dirty_tracking(struct io_pagetable *iopt, > > + struct iommu_domain *domain, bool enable) > > +{ > > + struct iommu_domain *dom; > > + unsigned long index; > > + int ret = -EOPNOTSUPP; Returns EOPNOTSUPP if the xarray is empty? Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RFC 02/19] iommufd: Dirty tracking for io_pagetable
On 4/29/22 09:07, Tian, Kevin wrote: >> From: Joao Martins >> Sent: Friday, April 29, 2022 5:09 AM >> >> +static int __set_dirty_tracking_range_locked(struct iommu_domain >> *domain, > > suppose anything using iommu_domain as the first argument should > be put in the iommu layer. Here it's more reasonable to use iopt > as the first argument or simply merge with the next function. > OK >> + struct io_pagetable *iopt, >> + bool enable) >> +{ >> +const struct iommu_domain_ops *ops = domain->ops; >> +struct iommu_iotlb_gather gather; >> +struct iopt_area *area; >> +int ret = -EOPNOTSUPP; >> +unsigned long iova; >> +size_t size; >> + >> +iommu_iotlb_gather_init(); >> + >> +for (area = iopt_area_iter_first(iopt, 0, ULONG_MAX); area; >> + area = iopt_area_iter_next(area, 0, ULONG_MAX)) { > > how is this different from leaving iommu driver to walk the page table > and the poke the modifier bit for all present PTEs? It isn't. Moving towards a single op makes this simpler for iommu core API. > As commented in last > patch this may allow removing the range op completely. > Yes. >> +iova = iopt_area_iova(area); >> +size = iopt_area_last_iova(area) - iova; >> + >> +if (ops->set_dirty_tracking_range) { >> +ret = ops->set_dirty_tracking_range(domain, iova, >> +size, , >> +enable); >> +if (ret < 0) >> +break; >> +} >> +} >> + >> +iommu_iotlb_sync(domain, ); >> + >> +return ret; >> +} >> + >> +static int iommu_set_dirty_tracking(struct iommu_domain *domain, >> +struct io_pagetable *iopt, bool enable) > > similarly rename to __iopt_set_dirty_tracking() and use iopt as the > leading argument. > /me nods ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH RFC 02/19] iommufd: Dirty tracking for io_pagetable
> From: Joao Martins > Sent: Friday, April 29, 2022 5:09 AM > > +static int __set_dirty_tracking_range_locked(struct iommu_domain > *domain, suppose anything using iommu_domain as the first argument should be put in the iommu layer. Here it's more reasonable to use iopt as the first argument or simply merge with the next function. > + struct io_pagetable *iopt, > + bool enable) > +{ > + const struct iommu_domain_ops *ops = domain->ops; > + struct iommu_iotlb_gather gather; > + struct iopt_area *area; > + int ret = -EOPNOTSUPP; > + unsigned long iova; > + size_t size; > + > + iommu_iotlb_gather_init(); > + > + for (area = iopt_area_iter_first(iopt, 0, ULONG_MAX); area; > + area = iopt_area_iter_next(area, 0, ULONG_MAX)) { how is this different from leaving iommu driver to walk the page table and the poke the modifier bit for all present PTEs? As commented in last patch this may allow removing the range op completely. > + iova = iopt_area_iova(area); > + size = iopt_area_last_iova(area) - iova; > + > + if (ops->set_dirty_tracking_range) { > + ret = ops->set_dirty_tracking_range(domain, iova, > + size, , > + enable); > + if (ret < 0) > + break; > + } > + } > + > + iommu_iotlb_sync(domain, ); > + > + return ret; > +} > + > +static int iommu_set_dirty_tracking(struct iommu_domain *domain, > + struct io_pagetable *iopt, bool enable) similarly rename to __iopt_set_dirty_tracking() and use iopt as the leading argument. > +{ > + const struct iommu_domain_ops *ops = domain->ops; > + int ret = -EOPNOTSUPP; > + > + if (ops->set_dirty_tracking) > + ret = ops->set_dirty_tracking(domain, enable); > + else if (ops->set_dirty_tracking_range) > + ret = __set_dirty_tracking_range_locked(domain, iopt, > + enable); > + > + return ret; > +} > + > +int iopt_set_dirty_tracking(struct io_pagetable *iopt, > + struct iommu_domain *domain, bool enable) > +{ > + struct iommu_domain *dom; > + unsigned long index; > + int ret = -EOPNOTSUPP; > + > + down_write(>iova_rwsem); > + if (!domain) { > + down_write(>domains_rwsem); > + xa_for_each(>domains, index, dom) { > + ret = iommu_set_dirty_tracking(dom, iopt, enable); > + if (ret < 0) > + break; > + } > + up_write(>domains_rwsem); > + } else { > + ret = iommu_set_dirty_tracking(domain, iopt, enable); > + } > + > + up_write(>iova_rwsem); > + return ret; > +} > + > struct iopt_pages *iopt_get_pages(struct io_pagetable *iopt, unsigned long > iova, > unsigned long *start_byte, > unsigned long length) > diff --git a/drivers/iommu/iommufd/iommufd_private.h > b/drivers/iommu/iommufd/iommufd_private.h > index f55654278ac4..d00ef3b785c5 100644 > --- a/drivers/iommu/iommufd/iommufd_private.h > +++ b/drivers/iommu/iommufd/iommufd_private.h > @@ -49,6 +49,9 @@ int iopt_unmap_iova(struct io_pagetable *iopt, > unsigned long iova, > unsigned long length); > int iopt_unmap_all(struct io_pagetable *iopt); > > +int iopt_set_dirty_tracking(struct io_pagetable *iopt, > + struct iommu_domain *domain, bool enable); > + > int iopt_access_pages(struct io_pagetable *iopt, unsigned long iova, > unsigned long npages, struct page **out_pages, bool > write); > void iopt_unaccess_pages(struct io_pagetable *iopt, unsigned long iova, > -- > 2.17.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH RFC 02/19] iommufd: Dirty tracking for io_pagetable
Add an io_pagetable kernel API to toggle dirty tracking: * iopt_set_dirty_tracking(iopt, [domain], state) It receives either NULL (which means all domains) or an iommu_domain. The intended caller of this is via the hw_pagetable object that is created on device attach, which passes an iommu_domain. For now, the all-domains is left for vfio-compat. The hw protection domain dirty control is favored over the IOVA-range alternative. For the latter, it iterates over all IOVA areas and calls iommu domain op to enable/disable for the range. Signed-off-by: Joao Martins --- drivers/iommu/iommufd/io_pagetable.c| 71 + drivers/iommu/iommufd/iommufd_private.h | 3 ++ 2 files changed, 74 insertions(+) diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c index f9f3b06946bf..f4609ef369e0 100644 --- a/drivers/iommu/iommufd/io_pagetable.c +++ b/drivers/iommu/iommufd/io_pagetable.c @@ -276,6 +276,77 @@ int iopt_map_user_pages(struct io_pagetable *iopt, unsigned long *iova, return 0; } +static int __set_dirty_tracking_range_locked(struct iommu_domain *domain, +struct io_pagetable *iopt, +bool enable) +{ + const struct iommu_domain_ops *ops = domain->ops; + struct iommu_iotlb_gather gather; + struct iopt_area *area; + int ret = -EOPNOTSUPP; + unsigned long iova; + size_t size; + + iommu_iotlb_gather_init(); + + for (area = iopt_area_iter_first(iopt, 0, ULONG_MAX); area; +area = iopt_area_iter_next(area, 0, ULONG_MAX)) { + iova = iopt_area_iova(area); + size = iopt_area_last_iova(area) - iova; + + if (ops->set_dirty_tracking_range) { + ret = ops->set_dirty_tracking_range(domain, iova, + size, , + enable); + if (ret < 0) + break; + } + } + + iommu_iotlb_sync(domain, ); + + return ret; +} + +static int iommu_set_dirty_tracking(struct iommu_domain *domain, + struct io_pagetable *iopt, bool enable) +{ + const struct iommu_domain_ops *ops = domain->ops; + int ret = -EOPNOTSUPP; + + if (ops->set_dirty_tracking) + ret = ops->set_dirty_tracking(domain, enable); + else if (ops->set_dirty_tracking_range) + ret = __set_dirty_tracking_range_locked(domain, iopt, + enable); + + return ret; +} + +int iopt_set_dirty_tracking(struct io_pagetable *iopt, + struct iommu_domain *domain, bool enable) +{ + struct iommu_domain *dom; + unsigned long index; + int ret = -EOPNOTSUPP; + + down_write(>iova_rwsem); + if (!domain) { + down_write(>domains_rwsem); + xa_for_each(>domains, index, dom) { + ret = iommu_set_dirty_tracking(dom, iopt, enable); + if (ret < 0) + break; + } + up_write(>domains_rwsem); + } else { + ret = iommu_set_dirty_tracking(domain, iopt, enable); + } + + up_write(>iova_rwsem); + return ret; +} + struct iopt_pages *iopt_get_pages(struct io_pagetable *iopt, unsigned long iova, unsigned long *start_byte, unsigned long length) diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index f55654278ac4..d00ef3b785c5 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -49,6 +49,9 @@ int iopt_unmap_iova(struct io_pagetable *iopt, unsigned long iova, unsigned long length); int iopt_unmap_all(struct io_pagetable *iopt); +int iopt_set_dirty_tracking(struct io_pagetable *iopt, + struct iommu_domain *domain, bool enable); + int iopt_access_pages(struct io_pagetable *iopt, unsigned long iova, unsigned long npages, struct page **out_pages, bool write); void iopt_unaccess_pages(struct io_pagetable *iopt, unsigned long iova, -- 2.17.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu