Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
On Wed, Aug 21, 2019 at 09:50:43AM +0200, Paolo Bonzini wrote: > On 21/08/19 07:03, Peter Xu wrote: > > On Tue, Aug 20, 2019 at 08:24:49AM +0200, Paolo Bonzini wrote: > >> On 20/08/19 07:22, Peter Xu wrote: > >>> On Mon, Aug 12, 2019 at 09:45:27AM +0200, Peter Xu wrote: > This is a RFC series. > > The VT-d code has some defects, one of them is that we cannot detect > the misuse of vIOMMU and vfio-pci early enough. > > For example, logically this is not allowed: > > -device intel-iommu,caching-mode=off \ > -device vfio-pci,host=05:00.0 > > Because the caching mode is required to make vfio-pci devices > functional. > > Previously we did this sanity check in vtd_iommu_notify_flag_changed() > as when the memory regions change their attributes. However that's > too late in most cases! Because the memory region layouts will only > change after IOMMU is enabled, and that's in most cases during the > guest OS boots. So when the configuration is wrong, we will only bail > out during the guest boots rather than simply telling the user before > QEMU starts. > > The same problem happens on device hotplug, say, when we have this: > > -device intel-iommu,caching-mode=off > > Then we do something like: > > (HMP) device_add vfio-pci,host=05:00.0,bus=pcie.1 > > If at that time the vIOMMU is enabled in the guest then the QEMU > process will simply quit directly due to this hotplug event. This is > a bit insane... > > This series tries to solve above two problems by introducing two > sanity checks upon these places separately: > > - machine done > - hotplug device > > This is a bit awkward but I hope this could be better than before. > There is of course other solutions like hard-code the check into > vfio-pci but I feel it even more unpretty. I didn't think out any > better way to do this, if there is please kindly shout out. > > Please have a look to see whether this would be acceptable, thanks. > >>> > >>> Any more comment on this? > >> > >> No problem from me, but I wouldn't mind if someone else merged it. :) > > > > Can I read this as an "acked-by"? :) > > Yes, it shouldn't even need Acked-by since there are other maintainers > that handle this part of the tree: > > Paolo Bonzini (maintainer:X86 TCG CPUs) > Richard Henderson (maintainer:X86 TCG CPUs) > Eduardo Habkost (maintainer:X86 TCG CPUs) > "Michael S. Tsirkin" (supporter:PC) > Marcel Apfelbaum (supporter:PC) Michael (or any maintainers listed above): Do any of you have any further comment on this series? Do any of you like to merge this? Thanks, -- Peter Xu
Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
On Thu, Aug 29, 2019 at 10:46:42AM +0200, Auger Eric wrote: > If I understand correctly PT mode is a bypass mode. With the ARM SMMUv3 > the IOMMU MR translate() function gets called but implements a direct > mapping. I understand that on your side, you destroy the IOMMU MR, right? > > At the moment since SMMUv3/VFIO integration is not ready I plan to > forbid any usage of VFIO along with SMMUv3, whatever the enable state. > > When HW nested paging gets ready, the stage1 bypass state will be > propagated to the HW config structure. > > Hope I answer your question. Yes, nested page tables will be fine. :) Thanks, -- Peter Xu
Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
Hi Peter, On 8/29/19 10:21 AM, Peter Xu wrote: > On Thu, Aug 29, 2019 at 10:05:27AM +0200, Auger Eric wrote: >> Hi Peter, > > Hi, Eric, > >> On 8/29/19 3:18 AM, Peter Xu wrote: >>> On Wed, Aug 28, 2019 at 02:59:45PM +0200, Auger Eric wrote: Hi Peter, >>> >>> Hi, Eric, >>> >>> [...] >>> In [PATCH v4 2/5] memory: Add IOMMU_ATTR_HW_NESTED_PAGING IOMMU memory region attribute (https://patchwork.kernel.org/patch/11109701/) >>> >>> [1] >>> [PATCH v4 3/5] hw/vfio/common: Fail on VFIO/HW nested paging detection (https://patchwork.kernel.org/patch/11109697/) I proposed to introduce a new IOMMU MR attribute to retrieve whether the vIOMMU uses HW nested paging to integrate with VFIO. I wonder whether this kind of solution would fit your need too. Assuming we would rename the attribute (whose name is challenged by Peter anyway) into something like IOMMU_ATTR_PHYS_MAP_MODE taking the possible values: NONE, CM, HW_NESTED_PAGING. SMMUv3 would return HW_NESTED_PAGING, Intel IOMMU would return CM if CM is enabled or NONE in the negative. Then we could implement the check directly in VFIO common.c. That way I don't think you would need the new notifiers and this would satisfy both requirements? >>> >>> IMHO it'll suffer from the similar issue we have now with >>> flag_changed, because at the very beginning of x86 system boots DMAR >>> is not yet enabled, the intel-iommu device is using the same mode as >>> its passthrough mode so there's no IOMMU memory region at all in the >>> DMA address spaces of the devices. >> >> Ah OK I did not get this initially. We don't have this issue with SMMUv3 >> as the IOMMU MR exists from the very beginning and does not depend on >> its enablement by the guest. Also it stays there. So the detection can >> be made immediatly. > > True. With that, I'm a bit curious on whether ARM should implement > something like PT mode of Intel's. For example, have you tried to run > a ARM guest with both a vSMMU and a vfio-pci inside, however keep DMAR > disabled? IIUC in that case there will be no mapping at all for the > assigned device, then would that work? Or is there any magic for ARM? If I understand correctly PT mode is a bypass mode. With the ARM SMMUv3 the IOMMU MR translate() function gets called but implements a direct mapping. I understand that on your side, you destroy the IOMMU MR, right? At the moment since SMMUv3/VFIO integration is not ready I plan to forbid any usage of VFIO along with SMMUv3, whatever the enable state. When HW nested paging gets ready, the stage1 bypass state will be propagated to the HW config structure. Hope I answer your question. Thanks Eric > > Regards, >
Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
On Thu, Aug 29, 2019 at 10:05:27AM +0200, Auger Eric wrote: > Hi Peter, Hi, Eric, > On 8/29/19 3:18 AM, Peter Xu wrote: > > On Wed, Aug 28, 2019 at 02:59:45PM +0200, Auger Eric wrote: > >> Hi Peter, > > > > Hi, Eric, > > > > [...] > > > >> In > >> [PATCH v4 2/5] memory: Add IOMMU_ATTR_HW_NESTED_PAGING IOMMU memory > >> region attribute (https://patchwork.kernel.org/patch/11109701/) > > > > [1] > > > >> > >> [PATCH v4 3/5] hw/vfio/common: Fail on VFIO/HW nested paging detection > >> (https://patchwork.kernel.org/patch/11109697/) > >> > >> I proposed to introduce a new IOMMU MR attribute to retrieve whether the > >> vIOMMU uses HW nested paging to integrate with VFIO. I wonder whether > >> this kind of solution would fit your need too. > >> > >> Assuming we would rename the attribute (whose name is challenged by > >> Peter anyway) into something like IOMMU_ATTR_PHYS_MAP_MODE > >> taking the possible values: NONE, CM, HW_NESTED_PAGING. SMMUv3 would > >> return HW_NESTED_PAGING, Intel IOMMU would return CM if CM is enabled or > >> NONE in the negative. Then we could implement the check directly in VFIO > >> common.c. That way I don't think you would need the new notifiers and > >> this would satisfy both requirements? > > > > IMHO it'll suffer from the similar issue we have now with > > flag_changed, because at the very beginning of x86 system boots DMAR > > is not yet enabled, the intel-iommu device is using the same mode as > > its passthrough mode so there's no IOMMU memory region at all in the > > DMA address spaces of the devices. > > Ah OK I did not get this initially. We don't have this issue with SMMUv3 > as the IOMMU MR exists from the very beginning and does not depend on > its enablement by the guest. Also it stays there. So the detection can > be made immediatly. True. With that, I'm a bit curious on whether ARM should implement something like PT mode of Intel's. For example, have you tried to run a ARM guest with both a vSMMU and a vfio-pci inside, however keep DMAR disabled? IIUC in that case there will be no mapping at all for the assigned device, then would that work? Or is there any magic for ARM? Regards, -- Peter Xu
Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
Hi Peter, On 8/29/19 3:18 AM, Peter Xu wrote: > On Wed, Aug 28, 2019 at 02:59:45PM +0200, Auger Eric wrote: >> Hi Peter, > > Hi, Eric, > > [...] > >> In >> [PATCH v4 2/5] memory: Add IOMMU_ATTR_HW_NESTED_PAGING IOMMU memory >> region attribute (https://patchwork.kernel.org/patch/11109701/) > > [1] > >> >> [PATCH v4 3/5] hw/vfio/common: Fail on VFIO/HW nested paging detection >> (https://patchwork.kernel.org/patch/11109697/) >> >> I proposed to introduce a new IOMMU MR attribute to retrieve whether the >> vIOMMU uses HW nested paging to integrate with VFIO. I wonder whether >> this kind of solution would fit your need too. >> >> Assuming we would rename the attribute (whose name is challenged by >> Peter anyway) into something like IOMMU_ATTR_PHYS_MAP_MODE >> taking the possible values: NONE, CM, HW_NESTED_PAGING. SMMUv3 would >> return HW_NESTED_PAGING, Intel IOMMU would return CM if CM is enabled or >> NONE in the negative. Then we could implement the check directly in VFIO >> common.c. That way I don't think you would need the new notifiers and >> this would satisfy both requirements? > > IMHO it'll suffer from the similar issue we have now with > flag_changed, because at the very beginning of x86 system boots DMAR > is not yet enabled, the intel-iommu device is using the same mode as > its passthrough mode so there's no IOMMU memory region at all in the > DMA address spaces of the devices. Ah OK I did not get this initially. We don't have this issue with SMMUv3 as the IOMMU MR exists from the very beginning and does not depend on its enablement by the guest. Also it stays there. So the detection can be made immediatly. Hence even with patch [1] above we > still can't really reach the get_attr() check until DMAR enabled? > > Maybe we can figure out a good way to expose IOMMU attributes rather > than the IOMMU memory region attributes then we let vfio to pick that > up, but I'm not very sure whether that's clean enough. > > Thanks, > Thanks Eric
Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
On Wed, Aug 28, 2019 at 02:59:45PM +0200, Auger Eric wrote: > Hi Peter, Hi, Eric, [...] > In > [PATCH v4 2/5] memory: Add IOMMU_ATTR_HW_NESTED_PAGING IOMMU memory > region attribute (https://patchwork.kernel.org/patch/11109701/) [1] > > [PATCH v4 3/5] hw/vfio/common: Fail on VFIO/HW nested paging detection > (https://patchwork.kernel.org/patch/11109697/) > > I proposed to introduce a new IOMMU MR attribute to retrieve whether the > vIOMMU uses HW nested paging to integrate with VFIO. I wonder whether > this kind of solution would fit your need too. > > Assuming we would rename the attribute (whose name is challenged by > Peter anyway) into something like IOMMU_ATTR_PHYS_MAP_MODE > taking the possible values: NONE, CM, HW_NESTED_PAGING. SMMUv3 would > return HW_NESTED_PAGING, Intel IOMMU would return CM if CM is enabled or > NONE in the negative. Then we could implement the check directly in VFIO > common.c. That way I don't think you would need the new notifiers and > this would satisfy both requirements? IMHO it'll suffer from the similar issue we have now with flag_changed, because at the very beginning of x86 system boots DMAR is not yet enabled, the intel-iommu device is using the same mode as its passthrough mode so there's no IOMMU memory region at all in the DMA address spaces of the devices. Hence even with patch [1] above we still can't really reach the get_attr() check until DMAR enabled? Maybe we can figure out a good way to expose IOMMU attributes rather than the IOMMU memory region attributes then we let vfio to pick that up, but I'm not very sure whether that's clean enough. Thanks, -- Peter Xu
Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
Hi Peter, On 8/13/19 10:41 AM, Jason Wang wrote: > > On 2019/8/12 下午3:45, Peter Xu wrote: >> This is a RFC series. >> >> The VT-d code has some defects, one of them is that we cannot detect >> the misuse of vIOMMU and vfio-pci early enough. >> >> For example, logically this is not allowed: >> >> -device intel-iommu,caching-mode=off \ >> -device vfio-pci,host=05:00.0 >> >> Because the caching mode is required to make vfio-pci devices >> functional. >> >> Previously we did this sanity check in vtd_iommu_notify_flag_changed() >> as when the memory regions change their attributes. However that's >> too late in most cases! Because the memory region layouts will only >> change after IOMMU is enabled, and that's in most cases during the >> guest OS boots. So when the configuration is wrong, we will only bail >> out during the guest boots rather than simply telling the user before >> QEMU starts. >> >> The same problem happens on device hotplug, say, when we have this: >> >> -device intel-iommu,caching-mode=off >> >> Then we do something like: >> >> (HMP) device_add vfio-pci,host=05:00.0,bus=pcie.1 >> >> If at that time the vIOMMU is enabled in the guest then the QEMU >> process will simply quit directly due to this hotplug event. This is >> a bit insane... >> >> This series tries to solve above two problems by introducing two >> sanity checks upon these places separately: >> >> - machine done >> - hotplug device >> >> This is a bit awkward but I hope this could be better than before. >> There is of course other solutions like hard-code the check into >> vfio-pci but I feel it even more unpretty. I didn't think out any >> better way to do this, if there is please kindly shout out. >> >> Please have a look to see whether this would be acceptable, thanks. >> >> Peter Xu (4): >> intel_iommu: Sanity check vfio-pci config on machine init done >> qdev/machine: Introduce hotplug_allowed hook >> pc/q35: Disallow vfio-pci hotplug without VT-d caching mode >> intel_iommu: Remove the caching-mode check during flag change >> >> hw/core/qdev.c | 17 + >> hw/i386/intel_iommu.c | 40 ++-- >> hw/i386/pc.c | 21 + >> include/hw/boards.h | 9 + >> include/hw/qdev-core.h | 1 + >> qdev-monitor.c | 7 +++ >> 6 files changed, 89 insertions(+), 6 deletions(-) >> > > Do we need a generic solution other than an Intel specific one? In [PATCH v4 2/5] memory: Add IOMMU_ATTR_HW_NESTED_PAGING IOMMU memory region attribute (https://patchwork.kernel.org/patch/11109701/) [PATCH v4 3/5] hw/vfio/common: Fail on VFIO/HW nested paging detection (https://patchwork.kernel.org/patch/11109697/) I proposed to introduce a new IOMMU MR attribute to retrieve whether the vIOMMU uses HW nested paging to integrate with VFIO. I wonder whether this kind of solution would fit your need too. Assuming we would rename the attribute (whose name is challenged by Peter anyway) into something like IOMMU_ATTR_PHYS_MAP_MODE taking the possible values: NONE, CM, HW_NESTED_PAGING. SMMUv3 would return HW_NESTED_PAGING, Intel IOMMU would return CM if CM is enabled or NONE in the negative. Then we could implement the check directly in VFIO common.c. That way I don't think you would need the new notifiers and this would satisfy both requirements? Thoughts? Thanks Eric > > Thanks > >
Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
On 21/08/19 07:03, Peter Xu wrote: > On Tue, Aug 20, 2019 at 08:24:49AM +0200, Paolo Bonzini wrote: >> On 20/08/19 07:22, Peter Xu wrote: >>> On Mon, Aug 12, 2019 at 09:45:27AM +0200, Peter Xu wrote: This is a RFC series. The VT-d code has some defects, one of them is that we cannot detect the misuse of vIOMMU and vfio-pci early enough. For example, logically this is not allowed: -device intel-iommu,caching-mode=off \ -device vfio-pci,host=05:00.0 Because the caching mode is required to make vfio-pci devices functional. Previously we did this sanity check in vtd_iommu_notify_flag_changed() as when the memory regions change their attributes. However that's too late in most cases! Because the memory region layouts will only change after IOMMU is enabled, and that's in most cases during the guest OS boots. So when the configuration is wrong, we will only bail out during the guest boots rather than simply telling the user before QEMU starts. The same problem happens on device hotplug, say, when we have this: -device intel-iommu,caching-mode=off Then we do something like: (HMP) device_add vfio-pci,host=05:00.0,bus=pcie.1 If at that time the vIOMMU is enabled in the guest then the QEMU process will simply quit directly due to this hotplug event. This is a bit insane... This series tries to solve above two problems by introducing two sanity checks upon these places separately: - machine done - hotplug device This is a bit awkward but I hope this could be better than before. There is of course other solutions like hard-code the check into vfio-pci but I feel it even more unpretty. I didn't think out any better way to do this, if there is please kindly shout out. Please have a look to see whether this would be acceptable, thanks. >>> >>> Any more comment on this? >> >> No problem from me, but I wouldn't mind if someone else merged it. :) > > Can I read this as an "acked-by"? :) Yes, it shouldn't even need Acked-by since there are other maintainers that handle this part of the tree: Paolo Bonzini (maintainer:X86 TCG CPUs) Richard Henderson (maintainer:X86 TCG CPUs) Eduardo Habkost (maintainer:X86 TCG CPUs) "Michael S. Tsirkin" (supporter:PC) Marcel Apfelbaum (supporter:PC) > Michael, should this be for your tree? What do you think about the > series? Please let me know what I need to do to move this forward. I > can repost a non-rfc series if needed, but it'll be exactly the same > content. > > Regards, >
Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
On Tue, Aug 20, 2019 at 08:24:49AM +0200, Paolo Bonzini wrote: > On 20/08/19 07:22, Peter Xu wrote: > > On Mon, Aug 12, 2019 at 09:45:27AM +0200, Peter Xu wrote: > >> This is a RFC series. > >> > >> The VT-d code has some defects, one of them is that we cannot detect > >> the misuse of vIOMMU and vfio-pci early enough. > >> > >> For example, logically this is not allowed: > >> > >> -device intel-iommu,caching-mode=off \ > >> -device vfio-pci,host=05:00.0 > >> > >> Because the caching mode is required to make vfio-pci devices > >> functional. > >> > >> Previously we did this sanity check in vtd_iommu_notify_flag_changed() > >> as when the memory regions change their attributes. However that's > >> too late in most cases! Because the memory region layouts will only > >> change after IOMMU is enabled, and that's in most cases during the > >> guest OS boots. So when the configuration is wrong, we will only bail > >> out during the guest boots rather than simply telling the user before > >> QEMU starts. > >> > >> The same problem happens on device hotplug, say, when we have this: > >> > >> -device intel-iommu,caching-mode=off > >> > >> Then we do something like: > >> > >> (HMP) device_add vfio-pci,host=05:00.0,bus=pcie.1 > >> > >> If at that time the vIOMMU is enabled in the guest then the QEMU > >> process will simply quit directly due to this hotplug event. This is > >> a bit insane... > >> > >> This series tries to solve above two problems by introducing two > >> sanity checks upon these places separately: > >> > >> - machine done > >> - hotplug device > >> > >> This is a bit awkward but I hope this could be better than before. > >> There is of course other solutions like hard-code the check into > >> vfio-pci but I feel it even more unpretty. I didn't think out any > >> better way to do this, if there is please kindly shout out. > >> > >> Please have a look to see whether this would be acceptable, thanks. > > > > Any more comment on this? > > No problem from me, but I wouldn't mind if someone else merged it. :) Can I read this as an "acked-by"? :) Michael, should this be for your tree? What do you think about the series? Please let me know what I need to do to move this forward. I can repost a non-rfc series if needed, but it'll be exactly the same content. Regards, -- Peter Xu
Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
On 20/08/19 07:22, Peter Xu wrote: > On Mon, Aug 12, 2019 at 09:45:27AM +0200, Peter Xu wrote: >> This is a RFC series. >> >> The VT-d code has some defects, one of them is that we cannot detect >> the misuse of vIOMMU and vfio-pci early enough. >> >> For example, logically this is not allowed: >> >> -device intel-iommu,caching-mode=off \ >> -device vfio-pci,host=05:00.0 >> >> Because the caching mode is required to make vfio-pci devices >> functional. >> >> Previously we did this sanity check in vtd_iommu_notify_flag_changed() >> as when the memory regions change their attributes. However that's >> too late in most cases! Because the memory region layouts will only >> change after IOMMU is enabled, and that's in most cases during the >> guest OS boots. So when the configuration is wrong, we will only bail >> out during the guest boots rather than simply telling the user before >> QEMU starts. >> >> The same problem happens on device hotplug, say, when we have this: >> >> -device intel-iommu,caching-mode=off >> >> Then we do something like: >> >> (HMP) device_add vfio-pci,host=05:00.0,bus=pcie.1 >> >> If at that time the vIOMMU is enabled in the guest then the QEMU >> process will simply quit directly due to this hotplug event. This is >> a bit insane... >> >> This series tries to solve above two problems by introducing two >> sanity checks upon these places separately: >> >> - machine done >> - hotplug device >> >> This is a bit awkward but I hope this could be better than before. >> There is of course other solutions like hard-code the check into >> vfio-pci but I feel it even more unpretty. I didn't think out any >> better way to do this, if there is please kindly shout out. >> >> Please have a look to see whether this would be acceptable, thanks. > > Any more comment on this? No problem from me, but I wouldn't mind if someone else merged it. :) Paolo
Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
On Mon, Aug 12, 2019 at 09:45:27AM +0200, Peter Xu wrote: > This is a RFC series. > > The VT-d code has some defects, one of them is that we cannot detect > the misuse of vIOMMU and vfio-pci early enough. > > For example, logically this is not allowed: > > -device intel-iommu,caching-mode=off \ > -device vfio-pci,host=05:00.0 > > Because the caching mode is required to make vfio-pci devices > functional. > > Previously we did this sanity check in vtd_iommu_notify_flag_changed() > as when the memory regions change their attributes. However that's > too late in most cases! Because the memory region layouts will only > change after IOMMU is enabled, and that's in most cases during the > guest OS boots. So when the configuration is wrong, we will only bail > out during the guest boots rather than simply telling the user before > QEMU starts. > > The same problem happens on device hotplug, say, when we have this: > > -device intel-iommu,caching-mode=off > > Then we do something like: > > (HMP) device_add vfio-pci,host=05:00.0,bus=pcie.1 > > If at that time the vIOMMU is enabled in the guest then the QEMU > process will simply quit directly due to this hotplug event. This is > a bit insane... > > This series tries to solve above two problems by introducing two > sanity checks upon these places separately: > > - machine done > - hotplug device > > This is a bit awkward but I hope this could be better than before. > There is of course other solutions like hard-code the check into > vfio-pci but I feel it even more unpretty. I didn't think out any > better way to do this, if there is please kindly shout out. > > Please have a look to see whether this would be acceptable, thanks. Any more comment on this? Thanks, -- Peter Xu
Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
On Tue, Aug 13, 2019 at 04:41:49PM +0800, Jason Wang wrote: > Do we need a generic solution other than an Intel specific one? I assume you're asking about ARM not AMD, right? :) Yes I think we should have a generic solution. Though I'd like to see whether this idea can be accepted first, then we can expand to ARM or other IOMMUs. After all we've got some existing duplications already between at least x86 and arm on vIOMMU so we can actually do more things like that when time comes. Thanks, -- Peter Xu
Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
On 2019/8/12 下午3:45, Peter Xu wrote: This is a RFC series. The VT-d code has some defects, one of them is that we cannot detect the misuse of vIOMMU and vfio-pci early enough. For example, logically this is not allowed: -device intel-iommu,caching-mode=off \ -device vfio-pci,host=05:00.0 Because the caching mode is required to make vfio-pci devices functional. Previously we did this sanity check in vtd_iommu_notify_flag_changed() as when the memory regions change their attributes. However that's too late in most cases! Because the memory region layouts will only change after IOMMU is enabled, and that's in most cases during the guest OS boots. So when the configuration is wrong, we will only bail out during the guest boots rather than simply telling the user before QEMU starts. The same problem happens on device hotplug, say, when we have this: -device intel-iommu,caching-mode=off Then we do something like: (HMP) device_add vfio-pci,host=05:00.0,bus=pcie.1 If at that time the vIOMMU is enabled in the guest then the QEMU process will simply quit directly due to this hotplug event. This is a bit insane... This series tries to solve above two problems by introducing two sanity checks upon these places separately: - machine done - hotplug device This is a bit awkward but I hope this could be better than before. There is of course other solutions like hard-code the check into vfio-pci but I feel it even more unpretty. I didn't think out any better way to do this, if there is please kindly shout out. Please have a look to see whether this would be acceptable, thanks. Peter Xu (4): intel_iommu: Sanity check vfio-pci config on machine init done qdev/machine: Introduce hotplug_allowed hook pc/q35: Disallow vfio-pci hotplug without VT-d caching mode intel_iommu: Remove the caching-mode check during flag change hw/core/qdev.c | 17 + hw/i386/intel_iommu.c | 40 ++-- hw/i386/pc.c | 21 + include/hw/boards.h| 9 + include/hw/qdev-core.h | 1 + qdev-monitor.c | 7 +++ 6 files changed, 89 insertions(+), 6 deletions(-) Do we need a generic solution other than an Intel specific one? Thanks
Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
On Mon, Aug 12, 2019 at 10:24:53AM -0600, Alex Williamson wrote: > On Mon, 12 Aug 2019 09:45:27 +0200 > Peter Xu wrote: > > > This is a RFC series. > > > > The VT-d code has some defects, one of them is that we cannot detect > > the misuse of vIOMMU and vfio-pci early enough. > > > > For example, logically this is not allowed: > > > > -device intel-iommu,caching-mode=off \ > > -device vfio-pci,host=05:00.0 > > Do we require intel-iommu with intremap=on in order to get x2apic for > large vCPU count guests? If so, wouldn't it be a valid configuration > for the user to specify: > >-device intel-iommu,caching-mode=off,intremap=on \ >-device vfio-pci,host=05:00.0 > > so long as they never have any intention of the guest enabling DMA > translation? Would there be any advantage to this config versus > caching-mode=on? I suspect the overhead of CM=1 when only using > interrupt remapping is small to non-existent, but are there other > reasons for running with CM=0, perhaps guest drivers not supporting it? AFAIU the major users of the vIOMMU should be guest DPDK apps and nested device assignments. For these users I would just make bold to guess they are mostly using Linux so the majority should be safe. For the minority, I do agree that above question is valid. IMHO the hard point is to find out those users and let them join the discussion, then we can know how many will be affected and how. I think one way to achieve it could be that we merge the patchset like this, then people will start to complain if there is any. :) I'm not sure whether that's the best way to go. I think that could still be a serious option considering that it could potentially fix a more severe issue (unexpected QEMU quits), and also reverting the patchset like this one could be easy as well when really necessary (e.g., the patchset will not bring machine state changes which might cause migration issues, or so on). > > I like the idea of being able to nak an incompatible hot-add rather > than kill the VM, we could narrow that even further to look at not only > whether caching-mode support is enabled, but also whether translation > is enabled on the vIOMMU. Ideally we might disallow the guest from > enabling translation in such a configuration, but the Linux code is not > written with the expectation that the hardware can refuse to enable > translation and there are no capability bits to remove the DMA > translation capability of the IOMMU. This is an interesting view at least to me, while... I'm not sure we should allow that even for emulation. I'm just imaging such a patch for the Linux kernel to allow failures on enabling DMAR - it'll be only for QEMU emulation and I'm not sure whether upstream would like such a patch. After all, we are emulating the hardwares, and the hardware will always succeed in enabling DMAR, AFAICT. For Windows and other OSs it could be even harder. If without the support of all these, we could simply have other risks of having hanging guests when the driver is busy waiting for the DMAR status bit to be set. > Still, we might want to think > about which is the better user experience, to have the guest panic when > DMA_GSTS_TES never becomes set (as it seems Linux would do) or to have > QEMU exit, or as proposed here, prevent all configurations where this > might occur. Thanks, Agreed. So far, a stricter rule could be a bit better than a hanging guest to me. Though that could be very subjective. Thanks! -- Peter Xu
Re: [Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
On Mon, 12 Aug 2019 09:45:27 +0200 Peter Xu wrote: > This is a RFC series. > > The VT-d code has some defects, one of them is that we cannot detect > the misuse of vIOMMU and vfio-pci early enough. > > For example, logically this is not allowed: > > -device intel-iommu,caching-mode=off \ > -device vfio-pci,host=05:00.0 Do we require intel-iommu with intremap=on in order to get x2apic for large vCPU count guests? If so, wouldn't it be a valid configuration for the user to specify: -device intel-iommu,caching-mode=off,intremap=on \ -device vfio-pci,host=05:00.0 so long as they never have any intention of the guest enabling DMA translation? Would there be any advantage to this config versus caching-mode=on? I suspect the overhead of CM=1 when only using interrupt remapping is small to non-existent, but are there other reasons for running with CM=0, perhaps guest drivers not supporting it? I like the idea of being able to nak an incompatible hot-add rather than kill the VM, we could narrow that even further to look at not only whether caching-mode support is enabled, but also whether translation is enabled on the vIOMMU. Ideally we might disallow the guest from enabling translation in such a configuration, but the Linux code is not written with the expectation that the hardware can refuse to enable translation and there are no capability bits to remove the DMA translation capability of the IOMMU. Still, we might want to think about which is the better user experience, to have the guest panic when DMA_GSTS_TES never becomes set (as it seems Linux would do) or to have QEMU exit, or as proposed here, prevent all configurations where this might occur. Thanks, Alex > Because the caching mode is required to make vfio-pci devices > functional. > > Previously we did this sanity check in vtd_iommu_notify_flag_changed() > as when the memory regions change their attributes. However that's > too late in most cases! Because the memory region layouts will only > change after IOMMU is enabled, and that's in most cases during the > guest OS boots. So when the configuration is wrong, we will only bail > out during the guest boots rather than simply telling the user before > QEMU starts. > > The same problem happens on device hotplug, say, when we have this: > > -device intel-iommu,caching-mode=off > > Then we do something like: > > (HMP) device_add vfio-pci,host=05:00.0,bus=pcie.1 > > If at that time the vIOMMU is enabled in the guest then the QEMU > process will simply quit directly due to this hotplug event. This is > a bit insane... > > This series tries to solve above two problems by introducing two > sanity checks upon these places separately: > > - machine done > - hotplug device > > This is a bit awkward but I hope this could be better than before. > There is of course other solutions like hard-code the check into > vfio-pci but I feel it even more unpretty. I didn't think out any > better way to do this, if there is please kindly shout out. > > Please have a look to see whether this would be acceptable, thanks. > > Peter Xu (4): > intel_iommu: Sanity check vfio-pci config on machine init done > qdev/machine: Introduce hotplug_allowed hook > pc/q35: Disallow vfio-pci hotplug without VT-d caching mode > intel_iommu: Remove the caching-mode check during flag change > > hw/core/qdev.c | 17 + > hw/i386/intel_iommu.c | 40 ++-- > hw/i386/pc.c | 21 + > include/hw/boards.h| 9 + > include/hw/qdev-core.h | 1 + > qdev-monitor.c | 7 +++ > 6 files changed, 89 insertions(+), 6 deletions(-) >
[Qemu-devel] [PATCH RFC 0/4] intel_iommu: Do sanity check of vfio-pci earlier
This is a RFC series. The VT-d code has some defects, one of them is that we cannot detect the misuse of vIOMMU and vfio-pci early enough. For example, logically this is not allowed: -device intel-iommu,caching-mode=off \ -device vfio-pci,host=05:00.0 Because the caching mode is required to make vfio-pci devices functional. Previously we did this sanity check in vtd_iommu_notify_flag_changed() as when the memory regions change their attributes. However that's too late in most cases! Because the memory region layouts will only change after IOMMU is enabled, and that's in most cases during the guest OS boots. So when the configuration is wrong, we will only bail out during the guest boots rather than simply telling the user before QEMU starts. The same problem happens on device hotplug, say, when we have this: -device intel-iommu,caching-mode=off Then we do something like: (HMP) device_add vfio-pci,host=05:00.0,bus=pcie.1 If at that time the vIOMMU is enabled in the guest then the QEMU process will simply quit directly due to this hotplug event. This is a bit insane... This series tries to solve above two problems by introducing two sanity checks upon these places separately: - machine done - hotplug device This is a bit awkward but I hope this could be better than before. There is of course other solutions like hard-code the check into vfio-pci but I feel it even more unpretty. I didn't think out any better way to do this, if there is please kindly shout out. Please have a look to see whether this would be acceptable, thanks. Peter Xu (4): intel_iommu: Sanity check vfio-pci config on machine init done qdev/machine: Introduce hotplug_allowed hook pc/q35: Disallow vfio-pci hotplug without VT-d caching mode intel_iommu: Remove the caching-mode check during flag change hw/core/qdev.c | 17 + hw/i386/intel_iommu.c | 40 ++-- hw/i386/pc.c | 21 + include/hw/boards.h| 9 + include/hw/qdev-core.h | 1 + qdev-monitor.c | 7 +++ 6 files changed, 89 insertions(+), 6 deletions(-) -- 2.21.0