Re: [PATCH] iommu/vt-d: Fix unmap_pages support

2021-11-11 Thread Lu Baolu

Hi Alex,

On 11/11/21 8:32 AM, Alex Williamson wrote:

When supporting only the .map and .unmap callbacks of iommu_ops,
the IOMMU driver can make assumptions about the size and alignment
used for mappings based on the driver provided pgsize_bitmap.  VT-d
previously used essentially PAGE_MASK for this bitmap as any power
of two mapping was acceptably filled by native page sizes.

However, with the .map_pages and .unmap_pages interface we're now
getting page-size and count arguments.  If we simply combine these
as (page-size * count) and make use of the previous map/unmap
functions internally, any size and alignment assumptions are very
different.

As an example, a given vfio device assignment VM will often create
a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff].  On a system that
does not support IOMMU super pages, the unmap_pages interface will
ask to unmap 1024 4KB pages at the base IOVA.  dma_pte_clear_level()
will recurse down to level 2 of the page table where the first half
of the pfn range exactly matches the entire pte level.  We clear the
pte, increment the pfn by the level size, but (oops) the next pte is
on a new page, so we exit the loop an pop back up a level.  When we
then update the pfn based on that higher level, we seem to assume
that the previous pfn value was at the start of the level.  In this
case the level size is 256K pfns, which we add to the base pfn and
get a results of 0x7fe00, which is clearly greater than 0x401ff,
so we're done.  Meanwhile we never cleared the ptes for the remainder
of the range.  When the VM remaps this range, we're overwriting valid
ptes and the VT-d driver complains loudly, as reported by the user
report linked below.

The fix for this seems relatively simple, if each iteration of the
loop in dma_pte_clear_level() is assumed to clear to the end of the
level pte page, then our next pfn should be calculated from level_pfn
rather than our working pfn.

Fixes: 3f34f1259776 ("iommu/vt-d: Implement map/unmap_pages() iommu_ops 
callback")
Reported-by: Ajay Garg 
Link: https://lore.kernel.org/all/20211002124012.18186-1-ajaygargn...@gmail.com/
Signed-off-by: Alex Williamson 


Thank you for fixing this! I will queue it for v5.16.

Best regards,
baolu


---
  drivers/iommu/intel/iommu.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d75f59ae28e6..f6395f5425f0 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1249,7 +1249,7 @@ static struct page *dma_pte_clear_level(struct 
dmar_domain *domain, int level,
   freelist);
}
  next:
-   pfn += level_size(level);
+   pfn = level_pfn + level_size(level);
} while (!first_pte_in_page(++pte) && pfn <= last_pfn);
  
  	if (first_pte)




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] memory: mtk-smi: Fix a null dereference for the ostd

2021-11-11 Thread Yong Wu
Hi AngeloGioacchino,

Thanks very much for your help review.

On Thu, 2021-11-11 at 14:41 +0100, AngeloGioacchino Del Regno wrote:
> Il 08/11/21 09:24, Yong Wu ha scritto:
> > We add the ostd setting for mt8195. It introduces a KE for the
> > previous SoC which doesn't have ostd setting. This is the log:
> > 
> > Unable to handle kernel NULL pointer dereference at virtual address
> > 0080
> > ...
> > pc : mtk_smi_larb_config_port_gen2_general+0x64/0x130
> > lr : mtk_smi_larb_resume+0x54/0x98
> > ...
> > Call trace:
> >   mtk_smi_larb_config_port_gen2_general+0x64/0x130
> >   pm_generic_runtime_resume+0x2c/0x48
> >   __genpd_runtime_resume+0x30/0xa8
> >   genpd_runtime_resume+0x94/0x2c8
> >   __rpm_callback+0x44/0x150
> >   rpm_callback+0x6c/0x78
> >   rpm_resume+0x310/0x558
> >   __pm_runtime_resume+0x3c/0x88
> > 
> > In the code: larbostd = larb->larb_gen->ostd[larb->larbid],
> > if "larb->larb_gen->ostd" is null, the "larbostd" is the
> > offset(e.g.
> > 0x80 above), it's also a valid value, then accessing "larbostd[i]"
> > in the
> > "for" loop will cause the KE above. To avoid this issue, initialize
> > "larbostd" to NULL when the SoC doesn't have ostd setting.
> > 
> > Signed-off-by: Yong Wu 
> > ---
> > change note: Reword the commit message to show why it KE. and
> > update the
> > solution via initializing "larbostd" is NULL explicitly in the non-
> > ostd
> > case.
> > ---
> >   drivers/memory/mtk-smi.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> 
> This commit needs a Fixes tag. Please add the proper one.

This should fix this one:

fe6dd2a4017d ("memory: mtk-smi: mt8195: Add initial setting for smi-
larb")

But the commit id comes from linux-next. I'm not sure if the sha-id
will be changed again when enter mainline. so I didn't add it.


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] memory: mtk-smi: Fix a null dereference for the ostd

2021-11-11 Thread AngeloGioacchino Del Regno

Il 08/11/21 09:24, Yong Wu ha scritto:

We add the ostd setting for mt8195. It introduces a KE for the
previous SoC which doesn't have ostd setting. This is the log:

Unable to handle kernel NULL pointer dereference at virtual address
0080
...
pc : mtk_smi_larb_config_port_gen2_general+0x64/0x130
lr : mtk_smi_larb_resume+0x54/0x98
...
Call trace:
  mtk_smi_larb_config_port_gen2_general+0x64/0x130
  pm_generic_runtime_resume+0x2c/0x48
  __genpd_runtime_resume+0x30/0xa8
  genpd_runtime_resume+0x94/0x2c8
  __rpm_callback+0x44/0x150
  rpm_callback+0x6c/0x78
  rpm_resume+0x310/0x558
  __pm_runtime_resume+0x3c/0x88

In the code: larbostd = larb->larb_gen->ostd[larb->larbid],
if "larb->larb_gen->ostd" is null, the "larbostd" is the offset(e.g.
0x80 above), it's also a valid value, then accessing "larbostd[i]" in the
"for" loop will cause the KE above. To avoid this issue, initialize
"larbostd" to NULL when the SoC doesn't have ostd setting.

Signed-off-by: Yong Wu 
---
change note: Reword the commit message to show why it KE. and update the
solution via initializing "larbostd" is NULL explicitly in the non-ostd
case.
---
  drivers/memory/mtk-smi.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)



This commit needs a Fixes tag. Please add the proper one.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 02/20] vfio: Add device class for /dev/vfio/devices

2021-11-11 Thread Liu, Yi L
Hi Jason,

> From: Jason Gunthorpe 
> Sent: Wednesday, November 3, 2021 9:26 PM
> 
> On Tue, Nov 02, 2021 at 09:53:29AM +, Liu, Yi L wrote:
> 
> > >   vfio_uninit_group_dev(_state->vdev);
> > >   kfree(mdev_state->pages);
> > >   kfree(mdev_state->vconfig);
> > >   kfree(mdev_state);
> > >
> > > pages/vconfig would logically be in a release function
> >
> > I see. So the criteria is: the pointer fields pointing to a memory buffer
> > allocated by the device driver should be logically be free in a release
> > function. right?
> 
> Often yes, that is usually a good idea
> 
> >I can see there are such fields in struct vfio_pci_core_device
> > and mdev_state (both mbochs and mdpy). So we may go with your option
> #2.
> > Is it? otherwise, needs to add release callback for all the related drivers.
> 
> Yes, that is the approx trade off
> 
> > > On the other hand ccw needs to rcu free the vfio_device, so that would
> > > have to be global overhead with this api design.
> >
> > not quite get. why ccw is special here? could you elaborate?
> 
> I added a rcu usage to it in order to fix a race
> 
> +static inline struct vfio_ccw_private *vfio_ccw_get_priv(struct subchannel
> *sch)
> +{
> +   struct vfio_ccw_private *private;
> +
> +   rcu_read_lock();
> +   private = dev_get_drvdata(>dev);
> +   if (private && !vfio_device_try_get(>vdev))
> +   private = NULL;
> +   rcu_read_unlock();
> +   return private;
> +}

you are right. After checking your ccw patch, the private free triggered
by vfio_ccw_free_private() should use kfree_rcu(). So it is not quite
same with other vfio_device users which only need kfree() to free the
vfio_device. So how can I address the difference when moving the vfio_device
alloc/free into vfio core? any suggestion?

@@ -164,14 +173,14 @@ static void vfio_ccw_free_private(struct vfio_ccw_private 
*private)
kmem_cache_free(vfio_ccw_io_region, private->io_region);
kfree(private->cp.guest_cp);
mutex_destroy(>io_mutex);
-   kfree(private);
+   vfio_uninit_group_dev(>vdev);
+   kfree_rcu(private, rcu);
 }

https://lore.kernel.org/kvm/10-v3-57c1502c62fd+2190-ccw_mdev_...@nvidia.com/

Regards,
Yi Liu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Fix unmap_pages support

2021-11-11 Thread Giovanni Cabiddu
I was able to hit the issue that this patch is fixing using the memory
allocator in QATlib (https://github.com/intel/qatlib/) which uses vfio
to map and unmap memory.

This patch fixes the problem.

On Wed, Nov 10, 2021 at 05:32:50PM -0700, Alex Williamson wrote:
> When supporting only the .map and .unmap callbacks of iommu_ops,
> the IOMMU driver can make assumptions about the size and alignment
> used for mappings based on the driver provided pgsize_bitmap.  VT-d
> previously used essentially PAGE_MASK for this bitmap as any power
> of two mapping was acceptably filled by native page sizes.
> 
> However, with the .map_pages and .unmap_pages interface we're now
> getting page-size and count arguments.  If we simply combine these
> as (page-size * count) and make use of the previous map/unmap
> functions internally, any size and alignment assumptions are very
> different.
> 
> As an example, a given vfio device assignment VM will often create
> a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff].  On a system that
> does not support IOMMU super pages, the unmap_pages interface will
> ask to unmap 1024 4KB pages at the base IOVA.  dma_pte_clear_level()
> will recurse down to level 2 of the page table where the first half
> of the pfn range exactly matches the entire pte level.  We clear the
> pte, increment the pfn by the level size, but (oops) the next pte is
> on a new page, so we exit the loop an pop back up a level.  When we
> then update the pfn based on that higher level, we seem to assume
> that the previous pfn value was at the start of the level.  In this
> case the level size is 256K pfns, which we add to the base pfn and
> get a results of 0x7fe00, which is clearly greater than 0x401ff,
> so we're done.  Meanwhile we never cleared the ptes for the remainder
> of the range.  When the VM remaps this range, we're overwriting valid
> ptes and the VT-d driver complains loudly, as reported by the user
> report linked below.
> 
> The fix for this seems relatively simple, if each iteration of the
> loop in dma_pte_clear_level() is assumed to clear to the end of the
> level pte page, then our next pfn should be calculated from level_pfn
> rather than our working pfn.
> 
> Fixes: 3f34f1259776 ("iommu/vt-d: Implement map/unmap_pages() iommu_ops 
> callback")
> Reported-by: Ajay Garg 
> Link: 
> https://lore.kernel.org/all/20211002124012.18186-1-ajaygargn...@gmail.com/
> Signed-off-by: Alex Williamson 
Tested-by: Giovanni Cabiddu 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/vt-d: Fix unmap_pages support

2021-11-11 Thread Ajay Garg
Thanks Alex, Baolu.
The patch fixes things at my end. No kernel-flooding is seen now
(tested starting/stopping vm > 10 times).


Thanks and Regards,
Ajay

On Thu, Nov 11, 2021 at 6:03 AM Alex Williamson
 wrote:
>
> When supporting only the .map and .unmap callbacks of iommu_ops,
> the IOMMU driver can make assumptions about the size and alignment
> used for mappings based on the driver provided pgsize_bitmap.  VT-d
> previously used essentially PAGE_MASK for this bitmap as any power
> of two mapping was acceptably filled by native page sizes.
>
> However, with the .map_pages and .unmap_pages interface we're now
> getting page-size and count arguments.  If we simply combine these
> as (page-size * count) and make use of the previous map/unmap
> functions internally, any size and alignment assumptions are very
> different.
>
> As an example, a given vfio device assignment VM will often create
> a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff].  On a system that
> does not support IOMMU super pages, the unmap_pages interface will
> ask to unmap 1024 4KB pages at the base IOVA.  dma_pte_clear_level()
> will recurse down to level 2 of the page table where the first half
> of the pfn range exactly matches the entire pte level.  We clear the
> pte, increment the pfn by the level size, but (oops) the next pte is
> on a new page, so we exit the loop an pop back up a level.  When we
> then update the pfn based on that higher level, we seem to assume
> that the previous pfn value was at the start of the level.  In this
> case the level size is 256K pfns, which we add to the base pfn and
> get a results of 0x7fe00, which is clearly greater than 0x401ff,
> so we're done.  Meanwhile we never cleared the ptes for the remainder
> of the range.  When the VM remaps this range, we're overwriting valid
> ptes and the VT-d driver complains loudly, as reported by the user
> report linked below.
>
> The fix for this seems relatively simple, if each iteration of the
> loop in dma_pte_clear_level() is assumed to clear to the end of the
> level pte page, then our next pfn should be calculated from level_pfn
> rather than our working pfn.
>
> Fixes: 3f34f1259776 ("iommu/vt-d: Implement map/unmap_pages() iommu_ops 
> callback")
> Reported-by: Ajay Garg 
> Link: 
> https://lore.kernel.org/all/20211002124012.18186-1-ajaygargn...@gmail.com/
> Signed-off-by: Alex Williamson 
> ---
>  drivers/iommu/intel/iommu.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index d75f59ae28e6..f6395f5425f0 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1249,7 +1249,7 @@ static struct page *dma_pte_clear_level(struct 
> dmar_domain *domain, int level,
>freelist);
> }
>  next:
> -   pfn += level_size(level);
> +   pfn = level_pfn + level_size(level);
> } while (!first_pte_in_page(++pte) && pfn <= last_pfn);
>
> if (first_pte)
>
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu