Re: [RFC PATCH v2 0/6] powerpc: pSeries: vfio: iommu: Re-enable support for SPAPR TCE VFIO
On 5/6/24 12:43 PM, Jason Gunthorpe wrote: > On Sat, May 04, 2024 at 12:33:53AM +0530, Shivaprasad G Bhat wrote: >> We have legacy workloads using VFIO in userspace/kvm guests running >> on downstream distro kernels. We want these workloads to be able to >> continue running on our arch. > > It has been broken since 2018, I don't find this reasoning entirely > reasonable :\ > Raptor is currently working on an automated test runner setup to exercise the VFIO subsystem on PowerNV and (to a lesser extent) pSeries, so breakages like this going forward will hopefully be caught much more quickly. >> I firmly believe the refactoring in this patch series is a step in >> that direction. > > But fine, as long as we are going to fix it. PPC really needs this to > be resolved to keep working. > Agreed. Modernizing/de-cluttering PPC's IOMMU code in general is another task that we're working towards. As mentioned previously on the list, we're working towards a more standard IOMMU driver for PPC that can be used with dma_iommu, which I think will be a good step towards cleaning this up. Initially PowerNV is going to be our target, but to the extent that it is possible and useful, pSeries support could be brought in later. > Jason Thanks, Shawn
Re: [RFC PATCH v2 0/6] powerpc: pSeries: vfio: iommu: Re-enable support for SPAPR TCE VFIO
Hi Jason, On 5/6/24 23:13, Jason Gunthorpe wrote: On Sat, May 04, 2024 at 12:33:53AM +0530, Shivaprasad G Bhat wrote: We have legacy workloads using VFIO in userspace/kvm guests running on downstream distro kernels. We want these workloads to be able to continue running on our arch. It has been broken since 2018, I don't find this reasoning entirely reasonable :\ Though upstream has been broken since 2018 for pSeries, the breaking patches got trickled into downstream distro kernels only in the last few years. The legacy workloads that were running on PowerNV with these downstream distros are now broken on the pSeries logical partitions without the fixes in this series. I firmly believe the refactoring in this patch series is a step in that direction. But fine, as long as we are going to fix it. PPC really needs this to be resolved to keep working. Thanks, We are working on it. Regards, Shivaprasad Jason
Re: [RFC PATCH v2 0/6] powerpc: pSeries: vfio: iommu: Re-enable support for SPAPR TCE VFIO
On Sat, May 04, 2024 at 12:33:53AM +0530, Shivaprasad G Bhat wrote: > We have legacy workloads using VFIO in userspace/kvm guests running > on downstream distro kernels. We want these workloads to be able to > continue running on our arch. It has been broken since 2018, I don't find this reasoning entirely reasonable :\ > I firmly believe the refactoring in this patch series is a step in > that direction. But fine, as long as we are going to fix it. PPC really needs this to be resolved to keep working. Jason
Re: [RFC PATCH v2 0/6] powerpc: pSeries: vfio: iommu: Re-enable support for SPAPR TCE VFIO
On 5/2/24 06:59, Alexey Kardashevskiy wrote: On 2/5/24 00:09, Jason Gunthorpe wrote: On Tue, Apr 30, 2024 at 03:05:34PM -0500, Shivaprasad G Bhat wrote: RFC v1 was posted here [1]. As I was testing more and fixing the issues, I realized its clean to have the table_group_ops implemented the way it is done on PowerNV and stop 'borrowing' the DMA windows for pSeries. This patch-set implements the iommu table_group_ops for pSeries for VFIO SPAPR TCE sub-driver thereby enabling the VFIO support on POWER pSeries machines. Wait, did they previously not have any support? > Again, this TCE stuff needs to go away, not grow. I can grudgingly accept fixing it where it used to work, but not enabling more HW that never worked before! :( This used to work when I tried last time 2+ years ago, not a new stuff. Thanks, Thanks Alexey for pitching in. Hi Jason, As Alexey implied, this used to work in the past. The support for pSeries VFIO exists for a long time, and the support for VFIO_SPAPR_TCE_v2_IOMMU also was added with 9d67c9433509 ("powerpc/iommu: Add "borrowing" iommu_table_group_ops") The commit 090bad39b237a ("powerpc/powernv: Add indirect levels to it_userspace") broke the userspace view for pSeries, which the Patch 6 here tries to bring back. We found more issues with 9d67c9433509 and I felt its better to stop "borrowing" the DMA windows as that would be cleaner which is what is done in Patch 6. In this process we discovered few bugs in upstream as well, which we have been trying to fix and have posted few of fixes earlier like, d2d00e15808 powerpc: iommu: Bring back table group release_ownership() call 83b3836bf83 iommu: Allow ops->default_domain to work when !CONFIG_IOMMU_DMA So, this patch series tries to fix some more issues(patch 2, 4, 6) coupled with some code refactoring(1, 3, 5 & 6) to stop "borrowing" DMA windows. We have legacy workloads using VFIO in userspace/kvm guests running on downstream distro kernels. We want these workloads to be able to continue running on our arch. Going forward we are planning to have the IOMMUFD support for PPC64, I firmly believe the refactoring in this patch series is a step in that direction. Thanks, Shivaprasad
Re: [RFC PATCH v2 0/6] powerpc: pSeries: vfio: iommu: Re-enable support for SPAPR TCE VFIO
On 2/5/24 00:09, Jason Gunthorpe wrote: On Tue, Apr 30, 2024 at 03:05:34PM -0500, Shivaprasad G Bhat wrote: RFC v1 was posted here [1]. As I was testing more and fixing the issues, I realized its clean to have the table_group_ops implemented the way it is done on PowerNV and stop 'borrowing' the DMA windows for pSeries. This patch-set implements the iommu table_group_ops for pSeries for VFIO SPAPR TCE sub-driver thereby enabling the VFIO support on POWER pSeries machines. Wait, did they previously not have any support? > Again, this TCE stuff needs to go away, not grow. I can grudgingly accept fixing it where it used to work, but not enabling more HW that never worked before! :( This used to work when I tried last time 2+ years ago, not a new stuff. Thanks, -- Alexey
Re: [RFC PATCH v2 0/6] powerpc: pSeries: vfio: iommu: Re-enable support for SPAPR TCE VFIO
On Tue, Apr 30, 2024 at 03:05:34PM -0500, Shivaprasad G Bhat wrote: > RFC v1 was posted here [1]. As I was testing more and fixing the > issues, I realized its clean to have the table_group_ops implemented > the way it is done on PowerNV and stop 'borrowing' the DMA windows > for pSeries. > > This patch-set implements the iommu table_group_ops for pSeries for > VFIO SPAPR TCE sub-driver thereby enabling the VFIO support on POWER > pSeries machines. Wait, did they previously not have any support? Again, this TCE stuff needs to go away, not grow. I can grudgingly accept fixing it where it used to work, but not enabling more HW that never worked before! :( Jason
[RFC PATCH v2 0/6] powerpc: pSeries: vfio: iommu: Re-enable support for SPAPR TCE VFIO
RFC v1 was posted here [1]. As I was testing more and fixing the issues, I realized its clean to have the table_group_ops implemented the way it is done on PowerNV and stop 'borrowing' the DMA windows for pSeries. This patch-set implements the iommu table_group_ops for pSeries for VFIO SPAPR TCE sub-driver thereby enabling the VFIO support on POWER pSeries machines. So, this patchset is a re-write and not close to the V1 except for few changes. Structure of the patchset: - The first and fifth patches just code movements. Second patch takes care of collecting the TCE and DDW information for the vfio_iommu_spapr_tce_ddw_info during probe. Third patch fixes the convention of using table[1] for VFs on pSeries when used by the host driver. Fourth patch fixes the VFIO to call TCE clear before unset window. The last patch has the API implementations, please find the details on its commit description. Testing: --- Tested with nested guest for NVME card, Mellanox multi-function card by attaching them to nested kvm guest running on a pSeries lpar. Also vfio-test [2] by Alex Willamson, was forked and updated to add support for pSeries guest and used to test these patches[3]. Limitations/Known Issues: * The DMA window restrictions with SRIOV VF scenarios of having maximum 1 dma window is taken care in the current patches itself. However, the necessary changes required in vfio_iommu_spapr_tce_ddw_info to expose the default window being a 64-bit one and the qemu changes handle the same will be taken care in next versions. * KVM guest boot throws warning at remap_pfn_range_notrack(), on the host, I will post the fix along in the next versions. * The DLPAR hotplugged device has no FDT entry until next reboot, default dma window property has to be preserved differently for this case. References: -- [1] https://lore.kernel.org/linuxppc-dev/171026724548.8367.8321359354119254395.st...@linux.ibm.com/ [2] https://github.com/awilliam/tests [3] https://github.com/nnmwebmin/vfio-ppc-tests/tree/vfio-ppc-ex --- Changelog: v1: https://lore.kernel.org/linuxppc-dev/171026724548.8367.8321359354119254395.st...@linux.ibm.com/ - Rewrite as to stop borrowing the DMA windows and implemented the table_group_ops for pSeries. - Cover letter and Patch 6 has more details as this was a rewrite. Shivaprasad G Bhat (6): powerpc/iommu: Move pSeries specific functions to pseries/iommu.c powerpc/pseries/iommu: Fix the VFIO_IOMMU_SPAPR_TCE_GET_INFO ioctl output powerpc/pseries/iommu: Use the iommu table[0] for IOV VF's DDW vfio/spapr: Always clear TCEs before unsetting the window powerpc/iommu: Move dev_has_iommu_table() to iommu.c powerpc/iommu: Implement the iommu_table_group_ops for pSeries arch/powerpc/include/asm/iommu.h | 9 +- arch/powerpc/kernel/eeh.c | 16 - arch/powerpc/kernel/iommu.c | 170 + arch/powerpc/platforms/powernv/pci-ioda.c | 6 +- arch/powerpc/platforms/pseries/iommu.c| 720 +- drivers/vfio/vfio_iommu_spapr_tce.c | 13 +- 6 files changed, 729 insertions(+), 205 deletions(-) -- Signature