Re: [RFC PATCH 0/1] iommu: Detach device from domain when removed from group
On Tue, Jul 28, 2015 at 07:55:55PM +0200, Gerald Schaefer wrote: On s390, this eventually leads to a kernel panic when binding the device again to its non-vfio PCI driver, because of the missing arch-specific cleanup in detach_dev. On x86, the detach_dev callback will also not be called directly, but there is a notifier that will catch BUS_NOTIFY_REMOVED_DEVICE and eventually do the cleanup. Other architectures w/o the notifier probably have at least some kind of memory leak in this scenario, so a general fix would be nice. This notifier is not arch-specific, but registered against the bus the iommu-ops are set for. Why does it not run on s390? Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 0/1] iommu: Detach device from domain when removed from group
On Mon, 3 Aug 2015 17:48:55 +0200 Joerg Roedel j...@8bytes.org wrote: On Tue, Jul 28, 2015 at 07:55:55PM +0200, Gerald Schaefer wrote: On s390, this eventually leads to a kernel panic when binding the device again to its non-vfio PCI driver, because of the missing arch-specific cleanup in detach_dev. On x86, the detach_dev callback will also not be called directly, but there is a notifier that will catch BUS_NOTIFY_REMOVED_DEVICE and eventually do the cleanup. Other architectures w/o the notifier probably have at least some kind of memory leak in this scenario, so a general fix would be nice. This notifier is not arch-specific, but registered against the bus the iommu-ops are set for. Why does it not run on s390? Adding the notifier would of course also work on s390 (and all other affected architectures). However, it seems that the missing detach_dev issue in this scenario is not fundamentally fixed by using this notifier, it just seems to hide the symptom by chance. Adding the otherwise unneeded notifier just to work around this issue somehow doesn't seem right, also given that x86 is so far the only user of it. At least I thought it would be cleaner to fix it in common code and for all architectures. Not sure what's wrong with fixing the asymmetry as suggested in my patch, but I guess there are good reasons for having this asymmetry. For now, I'll just add the notifier to my s390 implementation and post it soon. Joerg -- To unsubscribe from this list: send the line unsubscribe linux-pci in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 0/1] iommu: Detach device from domain when removed from group
On Tue, 28 Jul 2015 19:55:55 +0200 Gerald Schaefer gerald.schae...@de.ibm.com wrote: Hi, during IOMMU API function testing on s390 I hit the following scenario: After binding a device to vfio-pci, the user completes the VFIO_SET_IOMMU ioctl and stops, see the sample C program below. Now the device is manually removed via echo 1 /sys/bus/pci/devices/.../remove. Although the SET_IOMMU ioctl triggered the attach_dev callback in the underlying IOMMU API, removing the device in this way won't trigger the detach_dev callback, neither during remove nor when the user program continues with closing group/container. On s390, this eventually leads to a kernel panic when binding the device again to its non-vfio PCI driver, because of the missing arch-specific cleanup in detach_dev. On x86, the detach_dev callback will also not be called directly, but there is a notifier that will catch BUS_NOTIFY_REMOVED_DEVICE and eventually do the cleanup. Other architectures w/o the notifier probably have at least some kind of memory leak in this scenario, so a general fix would be nice. My first approach was to try and fix this in VFIO code, but Alex Williamson pointed me to some asymmetry in the IOMMU code: iommu_group_add_device() will invoke the attach_dev callback, but iommu_group_remove_device() won't trigger detach_dev. Fixing this asymmetry would fix the issue for me, but is this the correct fix? Any thoughts? Ping. The suggested fix may be completely wrong, but not having detach_dev called seems like like a serious issue, any feedback would be greatly appreciated. Regards, Gerald Here is the sample C program to trigger the ioctl: #include stdio.h #include fcntl.h #include linux/vfio.h int main(void) { int container, group, rc; container = open(/dev/vfio/vfio, O_RDWR); if (container 0) { perror(open /dev/vfio/vfio\n); return -1; } group = open(/dev/vfio/0, O_RDWR); if (group 0) { perror(open /dev/vfio/0\n); return -1; } rc = ioctl(group, VFIO_GROUP_SET_CONTAINER, container); if (rc) { perror(ioctl VFIO_GROUP_SET_CONTAINER\n); return -1; } rc = ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU); if (rc) { perror(ioctl VFIO_SET_IOMMU\n); return -1; } printf(Try device remove...\n); getchar(); close(group); close(container); return 0; } Gerald Schaefer (1): iommu: Detach device from domain when removed from group drivers/iommu/iommu.c | 5 + 1 file changed, 5 insertions(+) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH 0/1] iommu: Detach device from domain when removed from group
Hi, during IOMMU API function testing on s390 I hit the following scenario: After binding a device to vfio-pci, the user completes the VFIO_SET_IOMMU ioctl and stops, see the sample C program below. Now the device is manually removed via echo 1 /sys/bus/pci/devices/.../remove. Although the SET_IOMMU ioctl triggered the attach_dev callback in the underlying IOMMU API, removing the device in this way won't trigger the detach_dev callback, neither during remove nor when the user program continues with closing group/container. On s390, this eventually leads to a kernel panic when binding the device again to its non-vfio PCI driver, because of the missing arch-specific cleanup in detach_dev. On x86, the detach_dev callback will also not be called directly, but there is a notifier that will catch BUS_NOTIFY_REMOVED_DEVICE and eventually do the cleanup. Other architectures w/o the notifier probably have at least some kind of memory leak in this scenario, so a general fix would be nice. My first approach was to try and fix this in VFIO code, but Alex Williamson pointed me to some asymmetry in the IOMMU code: iommu_group_add_device() will invoke the attach_dev callback, but iommu_group_remove_device() won't trigger detach_dev. Fixing this asymmetry would fix the issue for me, but is this the correct fix? Any thoughts? Regards, Gerald Here is the sample C program to trigger the ioctl: #include stdio.h #include fcntl.h #include linux/vfio.h int main(void) { int container, group, rc; container = open(/dev/vfio/vfio, O_RDWR); if (container 0) { perror(open /dev/vfio/vfio\n); return -1; } group = open(/dev/vfio/0, O_RDWR); if (group 0) { perror(open /dev/vfio/0\n); return -1; } rc = ioctl(group, VFIO_GROUP_SET_CONTAINER, container); if (rc) { perror(ioctl VFIO_GROUP_SET_CONTAINER\n); return -1; } rc = ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_IOMMU); if (rc) { perror(ioctl VFIO_SET_IOMMU\n); return -1; } printf(Try device remove...\n); getchar(); close(group); close(container); return 0; } Gerald Schaefer (1): iommu: Detach device from domain when removed from group drivers/iommu/iommu.c | 5 + 1 file changed, 5 insertions(+) -- 2.3.8 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu