Re: [PATCH RFC 02/19] iommufd: Dirty tracking for io_pagetable

2022-05-02 Thread Joao Martins
On 4/30/22 00:51, Baolu Lu wrote:
> On 2022/4/29 05:09, Joao Martins wrote:
>> +int iopt_set_dirty_tracking(struct io_pagetable *iopt,
>> +struct iommu_domain *domain, bool enable)
>> +{
>> +struct iommu_domain *dom;
>> +unsigned long index;
>> +int ret = -EOPNOTSUPP;
>> +
>> +down_write(>iova_rwsem);
>> +if (!domain) {
>> +down_write(>domains_rwsem);
>> +xa_for_each(>domains, index, dom) {
>> +ret = iommu_set_dirty_tracking(dom, iopt, enable);
>> +if (ret < 0)
>> +break;
> 
> Do you need to roll back to the original state before return failure?
> Partial domains have already had dirty bit tracking enabled.
> 
Yeap, will fix the unwinding for next iteration.

>> +}
>> +up_write(>domains_rwsem);
>> +} else {
>> +ret = iommu_set_dirty_tracking(domain, iopt, enable);
>> +}
>> +
>> +up_write(>iova_rwsem);
>> +return ret;
>> +}
> 
> Best regards,
> baolu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 02/19] iommufd: Dirty tracking for io_pagetable

2022-04-29 Thread Baolu Lu

On 2022/4/29 05:09, Joao Martins wrote:

+int iopt_set_dirty_tracking(struct io_pagetable *iopt,
+   struct iommu_domain *domain, bool enable)
+{
+   struct iommu_domain *dom;
+   unsigned long index;
+   int ret = -EOPNOTSUPP;
+
+   down_write(>iova_rwsem);
+   if (!domain) {
+   down_write(>domains_rwsem);
+   xa_for_each(>domains, index, dom) {
+   ret = iommu_set_dirty_tracking(dom, iopt, enable);
+   if (ret < 0)
+   break;


Do you need to roll back to the original state before return failure?
Partial domains have already had dirty bit tracking enabled.


+   }
+   up_write(>domains_rwsem);
+   } else {
+   ret = iommu_set_dirty_tracking(domain, iopt, enable);
+   }
+
+   up_write(>iova_rwsem);
+   return ret;
+}


Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 02/19] iommufd: Dirty tracking for io_pagetable

2022-04-29 Thread Joao Martins
On 4/29/22 12:56, Jason Gunthorpe wrote:
> On Fri, Apr 29, 2022 at 08:07:14AM +, Tian, Kevin wrote:
>>> From: Joao Martins 
>>> Sent: Friday, April 29, 2022 5:09 AM
>>>
>>> +static int __set_dirty_tracking_range_locked(struct iommu_domain
>>> *domain,
>>
>> suppose anything using iommu_domain as the first argument should
>> be put in the iommu layer. Here it's more reasonable to use iopt
>> as the first argument or simply merge with the next function.
>>
>>> +struct io_pagetable *iopt,
>>> +bool enable)
>>> +{
>>> +   const struct iommu_domain_ops *ops = domain->ops;
>>> +   struct iommu_iotlb_gather gather;
>>> +   struct iopt_area *area;
>>> +   int ret = -EOPNOTSUPP;
>>> +   unsigned long iova;
>>> +   size_t size;
>>> +
>>> +   iommu_iotlb_gather_init();
>>> +
>>> +   for (area = iopt_area_iter_first(iopt, 0, ULONG_MAX); area;
>>> +area = iopt_area_iter_next(area, 0, ULONG_MAX)) {
>>
>> how is this different from leaving iommu driver to walk the page table
>> and the poke the modifier bit for all present PTEs? As commented in last
>> patch this may allow removing the range op completely.
> 
> Yea, I'm not super keen on the two ops either, especially since they
> are so wildly different.
> 
/me ack

> I would expect that set_dirty_tracking turns on tracking for the
> entire iommu domain, for all present and future maps
> 
Yes.

I didn't do that correctly on ARM, neither on device-attach
(for x86 e.g. on hotplug).

> While set_dirty_tracking_range - I guess it only does the range, so if
> we make a new map then the new range will be untracked? But that is
> now racy, we have to map and then call set_dirty_tracking_range
> 
> It seems better for the iommu driver to deal with this and ARM should
> atomically make the new maps dirty tracking..
> 

Next iteration I'll need to fix the way IOMMUs handle dirty-tracking
probing and tracking in its private intermediate structures.

But yes, I was trying to transfer this to the iommu driver (perhaps in a
convoluted way).

>>> +int iopt_set_dirty_tracking(struct io_pagetable *iopt,
>>> +   struct iommu_domain *domain, bool enable)
>>> +{
>>> +   struct iommu_domain *dom;
>>> +   unsigned long index;
>>> +   int ret = -EOPNOTSUPP;
> 
> Returns EOPNOTSUPP if the xarray is empty?
> 
Argh no. Maybe -EINVAL is better here.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 02/19] iommufd: Dirty tracking for io_pagetable

2022-04-29 Thread Jason Gunthorpe via iommu
On Fri, Apr 29, 2022 at 08:07:14AM +, Tian, Kevin wrote:
> > From: Joao Martins 
> > Sent: Friday, April 29, 2022 5:09 AM
> > 
> > +static int __set_dirty_tracking_range_locked(struct iommu_domain
> > *domain,
> 
> suppose anything using iommu_domain as the first argument should
> be put in the iommu layer. Here it's more reasonable to use iopt
> as the first argument or simply merge with the next function.
> 
> > +struct io_pagetable *iopt,
> > +bool enable)
> > +{
> > +   const struct iommu_domain_ops *ops = domain->ops;
> > +   struct iommu_iotlb_gather gather;
> > +   struct iopt_area *area;
> > +   int ret = -EOPNOTSUPP;
> > +   unsigned long iova;
> > +   size_t size;
> > +
> > +   iommu_iotlb_gather_init();
> > +
> > +   for (area = iopt_area_iter_first(iopt, 0, ULONG_MAX); area;
> > +area = iopt_area_iter_next(area, 0, ULONG_MAX)) {
> 
> how is this different from leaving iommu driver to walk the page table
> and the poke the modifier bit for all present PTEs? As commented in last
> patch this may allow removing the range op completely.

Yea, I'm not super keen on the two ops either, especially since they
are so wildly different.

I would expect that set_dirty_tracking turns on tracking for the
entire iommu domain, for all present and future maps

While set_dirty_tracking_range - I guess it only does the range, so if
we make a new map then the new range will be untracked? But that is
now racy, we have to map and then call set_dirty_tracking_range

It seems better for the iommu driver to deal with this and ARM should
atomically make the new maps dirty tracking..

> > +int iopt_set_dirty_tracking(struct io_pagetable *iopt,
> > +   struct iommu_domain *domain, bool enable)
> > +{
> > +   struct iommu_domain *dom;
> > +   unsigned long index;
> > +   int ret = -EOPNOTSUPP;

Returns EOPNOTSUPP if the xarray is empty?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 02/19] iommufd: Dirty tracking for io_pagetable

2022-04-29 Thread Joao Martins
On 4/29/22 09:07, Tian, Kevin wrote:
>> From: Joao Martins 
>> Sent: Friday, April 29, 2022 5:09 AM
>>
>> +static int __set_dirty_tracking_range_locked(struct iommu_domain
>> *domain,
> 
> suppose anything using iommu_domain as the first argument should
> be put in the iommu layer. Here it's more reasonable to use iopt
> as the first argument or simply merge with the next function.
> 
OK

>> + struct io_pagetable *iopt,
>> + bool enable)
>> +{
>> +const struct iommu_domain_ops *ops = domain->ops;
>> +struct iommu_iotlb_gather gather;
>> +struct iopt_area *area;
>> +int ret = -EOPNOTSUPP;
>> +unsigned long iova;
>> +size_t size;
>> +
>> +iommu_iotlb_gather_init();
>> +
>> +for (area = iopt_area_iter_first(iopt, 0, ULONG_MAX); area;
>> + area = iopt_area_iter_next(area, 0, ULONG_MAX)) {
> 
> how is this different from leaving iommu driver to walk the page table
> and the poke the modifier bit for all present PTEs? 

It isn't. Moving towards a single op makes this simpler for iommu core API.

> As commented in last
> patch this may allow removing the range op completely.
> 
Yes.

>> +iova = iopt_area_iova(area);
>> +size = iopt_area_last_iova(area) - iova;
>> +
>> +if (ops->set_dirty_tracking_range) {
>> +ret = ops->set_dirty_tracking_range(domain, iova,
>> +size, ,
>> +enable);
>> +if (ret < 0)
>> +break;
>> +}
>> +}
>> +
>> +iommu_iotlb_sync(domain, );
>> +
>> +return ret;
>> +}
>> +
>> +static int iommu_set_dirty_tracking(struct iommu_domain *domain,
>> +struct io_pagetable *iopt, bool enable)
> 
> similarly rename to __iopt_set_dirty_tracking() and use iopt as the
> leading argument.
> 
/me nods
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH RFC 02/19] iommufd: Dirty tracking for io_pagetable

2022-04-29 Thread Tian, Kevin
> From: Joao Martins 
> Sent: Friday, April 29, 2022 5:09 AM
> 
> +static int __set_dirty_tracking_range_locked(struct iommu_domain
> *domain,

suppose anything using iommu_domain as the first argument should
be put in the iommu layer. Here it's more reasonable to use iopt
as the first argument or simply merge with the next function.

> +  struct io_pagetable *iopt,
> +  bool enable)
> +{
> + const struct iommu_domain_ops *ops = domain->ops;
> + struct iommu_iotlb_gather gather;
> + struct iopt_area *area;
> + int ret = -EOPNOTSUPP;
> + unsigned long iova;
> + size_t size;
> +
> + iommu_iotlb_gather_init();
> +
> + for (area = iopt_area_iter_first(iopt, 0, ULONG_MAX); area;
> +  area = iopt_area_iter_next(area, 0, ULONG_MAX)) {

how is this different from leaving iommu driver to walk the page table
and the poke the modifier bit for all present PTEs? As commented in last
patch this may allow removing the range op completely.

> + iova = iopt_area_iova(area);
> + size = iopt_area_last_iova(area) - iova;
> +
> + if (ops->set_dirty_tracking_range) {
> + ret = ops->set_dirty_tracking_range(domain, iova,
> + size, ,
> + enable);
> + if (ret < 0)
> + break;
> + }
> + }
> +
> + iommu_iotlb_sync(domain, );
> +
> + return ret;
> +}
> +
> +static int iommu_set_dirty_tracking(struct iommu_domain *domain,
> + struct io_pagetable *iopt, bool enable)

similarly rename to __iopt_set_dirty_tracking() and use iopt as the
leading argument.

> +{
> + const struct iommu_domain_ops *ops = domain->ops;
> + int ret = -EOPNOTSUPP;
> +
> + if (ops->set_dirty_tracking)
> + ret = ops->set_dirty_tracking(domain, enable);
> + else if (ops->set_dirty_tracking_range)
> + ret = __set_dirty_tracking_range_locked(domain, iopt,
> + enable);
> +
> + return ret;
> +}
> +
> +int iopt_set_dirty_tracking(struct io_pagetable *iopt,
> + struct iommu_domain *domain, bool enable)
> +{
> + struct iommu_domain *dom;
> + unsigned long index;
> + int ret = -EOPNOTSUPP;
> +
> + down_write(>iova_rwsem);
> + if (!domain) {
> + down_write(>domains_rwsem);
> + xa_for_each(>domains, index, dom) {
> + ret = iommu_set_dirty_tracking(dom, iopt, enable);
> + if (ret < 0)
> + break;
> + }
> + up_write(>domains_rwsem);
> + } else {
> + ret = iommu_set_dirty_tracking(domain, iopt, enable);
> + }
> +
> + up_write(>iova_rwsem);
> + return ret;
> +}
> +
>  struct iopt_pages *iopt_get_pages(struct io_pagetable *iopt, unsigned long
> iova,
> unsigned long *start_byte,
> unsigned long length)
> diff --git a/drivers/iommu/iommufd/iommufd_private.h
> b/drivers/iommu/iommufd/iommufd_private.h
> index f55654278ac4..d00ef3b785c5 100644
> --- a/drivers/iommu/iommufd/iommufd_private.h
> +++ b/drivers/iommu/iommufd/iommufd_private.h
> @@ -49,6 +49,9 @@ int iopt_unmap_iova(struct io_pagetable *iopt,
> unsigned long iova,
>   unsigned long length);
>  int iopt_unmap_all(struct io_pagetable *iopt);
> 
> +int iopt_set_dirty_tracking(struct io_pagetable *iopt,
> + struct iommu_domain *domain, bool enable);
> +
>  int iopt_access_pages(struct io_pagetable *iopt, unsigned long iova,
> unsigned long npages, struct page **out_pages, bool
> write);
>  void iopt_unaccess_pages(struct io_pagetable *iopt, unsigned long iova,
> --
> 2.17.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH RFC 02/19] iommufd: Dirty tracking for io_pagetable

2022-04-28 Thread Joao Martins
Add an io_pagetable kernel API to toggle dirty tracking:

* iopt_set_dirty_tracking(iopt, [domain], state)

It receives either NULL (which means all domains) or an
iommu_domain. The intended caller of this is via the hw_pagetable
object that is created on device attach, which passes an
iommu_domain. For now, the all-domains is left for vfio-compat.

The hw protection domain dirty control is favored over the IOVA-range
alternative. For the latter, it iterates over all IOVA areas and calls
iommu domain op to enable/disable for the range.

Signed-off-by: Joao Martins 
---
 drivers/iommu/iommufd/io_pagetable.c| 71 +
 drivers/iommu/iommufd/iommufd_private.h |  3 ++
 2 files changed, 74 insertions(+)

diff --git a/drivers/iommu/iommufd/io_pagetable.c 
b/drivers/iommu/iommufd/io_pagetable.c
index f9f3b06946bf..f4609ef369e0 100644
--- a/drivers/iommu/iommufd/io_pagetable.c
+++ b/drivers/iommu/iommufd/io_pagetable.c
@@ -276,6 +276,77 @@ int iopt_map_user_pages(struct io_pagetable *iopt, 
unsigned long *iova,
return 0;
 }
 
+static int __set_dirty_tracking_range_locked(struct iommu_domain *domain,
+struct io_pagetable *iopt,
+bool enable)
+{
+   const struct iommu_domain_ops *ops = domain->ops;
+   struct iommu_iotlb_gather gather;
+   struct iopt_area *area;
+   int ret = -EOPNOTSUPP;
+   unsigned long iova;
+   size_t size;
+
+   iommu_iotlb_gather_init();
+
+   for (area = iopt_area_iter_first(iopt, 0, ULONG_MAX); area;
+area = iopt_area_iter_next(area, 0, ULONG_MAX)) {
+   iova = iopt_area_iova(area);
+   size = iopt_area_last_iova(area) - iova;
+
+   if (ops->set_dirty_tracking_range) {
+   ret = ops->set_dirty_tracking_range(domain, iova,
+   size, ,
+   enable);
+   if (ret < 0)
+   break;
+   }
+   }
+
+   iommu_iotlb_sync(domain, );
+
+   return ret;
+}
+
+static int iommu_set_dirty_tracking(struct iommu_domain *domain,
+   struct io_pagetable *iopt, bool enable)
+{
+   const struct iommu_domain_ops *ops = domain->ops;
+   int ret = -EOPNOTSUPP;
+
+   if (ops->set_dirty_tracking)
+   ret = ops->set_dirty_tracking(domain, enable);
+   else if (ops->set_dirty_tracking_range)
+   ret = __set_dirty_tracking_range_locked(domain, iopt,
+   enable);
+
+   return ret;
+}
+
+int iopt_set_dirty_tracking(struct io_pagetable *iopt,
+   struct iommu_domain *domain, bool enable)
+{
+   struct iommu_domain *dom;
+   unsigned long index;
+   int ret = -EOPNOTSUPP;
+
+   down_write(>iova_rwsem);
+   if (!domain) {
+   down_write(>domains_rwsem);
+   xa_for_each(>domains, index, dom) {
+   ret = iommu_set_dirty_tracking(dom, iopt, enable);
+   if (ret < 0)
+   break;
+   }
+   up_write(>domains_rwsem);
+   } else {
+   ret = iommu_set_dirty_tracking(domain, iopt, enable);
+   }
+
+   up_write(>iova_rwsem);
+   return ret;
+}
+
 struct iopt_pages *iopt_get_pages(struct io_pagetable *iopt, unsigned long 
iova,
  unsigned long *start_byte,
  unsigned long length)
diff --git a/drivers/iommu/iommufd/iommufd_private.h 
b/drivers/iommu/iommufd/iommufd_private.h
index f55654278ac4..d00ef3b785c5 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -49,6 +49,9 @@ int iopt_unmap_iova(struct io_pagetable *iopt, unsigned long 
iova,
unsigned long length);
 int iopt_unmap_all(struct io_pagetable *iopt);
 
+int iopt_set_dirty_tracking(struct io_pagetable *iopt,
+   struct iommu_domain *domain, bool enable);
+
 int iopt_access_pages(struct io_pagetable *iopt, unsigned long iova,
  unsigned long npages, struct page **out_pages, bool 
write);
 void iopt_unaccess_pages(struct io_pagetable *iopt, unsigned long iova,
-- 
2.17.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu