Re: [Qemu-devel] Supporting emulation of IOMMUs
On Tue, May 10, 2011 at 11:44:26AM +1000, David Gibson wrote: On Thu, Apr 21, 2011 at 09:47:31PM +0300, Eduard - Gabriel Munteanu wrote: On Thu, Apr 21, 2011 at 05:03:47PM +1000, David Gibson wrote: A few months ago, Eduard - Gabriel Munteanu posted a series of patches implementing support for emulating the AMD PCI IOMMU (http://lists.nongnu.org/archive/html/qemu-devel/2011-01/msg03196.html). In fact, this series implemented a general DMA/IOMMU layer which can be used by any device model, and one translation backend for this implementing the AMD specific PCI IOMMU. These patches don't seem to have gone anywhere for the last few months, however, and so far I've been unable to contact the author (trying again with this mail). I have an interest in this code, because the pSeries machine will also need IOMMU emulation support. At present we only support virtual devices, through the PAPR interface, and we have support for the hypervisor-controller IOMMU translation in the PAPR VIO code. However, we want to add PCI device support and this will also need IOMMU translation. The series seems to have the right basic approach, so if the author is indeed MIA, I was planning to pick up the patches and resubmit them (with support for the pSeries IOMMU added). Hi, Not really MIA, but I've been a bit busy lately, so I'm sorry if I couldn't answer your mail in a timely fashion. I'll try making another merge attempt tonight/tomorrow. Ok. Did this happen? Sorry, I've been away the last couple of weeks - I had a google at the qemu-devel archives but couldn't spot a new merge, but did I just not look hard enough? [snip] No, I've made some progress but I've still got a few concerns to address, mainly how to handle unaligned accesses and some things related to the IOMMU behavior like target aborts. I've added some macro magic to declare bus-specific interfaces and converted PCI devices to use pci_memory_*(). I'll try not to hold this back too much, but I can't make promises wrt timing. Eduard
Re: [Qemu-devel] Supporting emulation of IOMMUs
On Sat, 2011-05-14 at 18:27 +0300, Eduard - Gabriel Munteanu wrote: No, I've made some progress but I've still got a few concerns to address, mainly how to handle unaligned accesses and some things related to the IOMMU behavior like target aborts. I've added some macro magic to declare bus-specific interfaces and converted PCI devices to use pci_memory_*(). I'll try not to hold this back too much, but I can't make promises wrt timing. If you have newer patches, even if they don't address all the above issues, please send them. We cannot afford to be blocked waiting for you, we have to make progress on implementing the ppc side. Cheers, Ben.
Re: [Qemu-devel] Supporting emulation of IOMMUs
On Thu, Apr 21, 2011 at 09:47:31PM +0300, Eduard - Gabriel Munteanu wrote: On Thu, Apr 21, 2011 at 05:03:47PM +1000, David Gibson wrote: A few months ago, Eduard - Gabriel Munteanu posted a series of patches implementing support for emulating the AMD PCI IOMMU (http://lists.nongnu.org/archive/html/qemu-devel/2011-01/msg03196.html). In fact, this series implemented a general DMA/IOMMU layer which can be used by any device model, and one translation backend for this implementing the AMD specific PCI IOMMU. These patches don't seem to have gone anywhere for the last few months, however, and so far I've been unable to contact the author (trying again with this mail). I have an interest in this code, because the pSeries machine will also need IOMMU emulation support. At present we only support virtual devices, through the PAPR interface, and we have support for the hypervisor-controller IOMMU translation in the PAPR VIO code. However, we want to add PCI device support and this will also need IOMMU translation. The series seems to have the right basic approach, so if the author is indeed MIA, I was planning to pick up the patches and resubmit them (with support for the pSeries IOMMU added). Hi, Not really MIA, but I've been a bit busy lately, so I'm sorry if I couldn't answer your mail in a timely fashion. I'll try making another merge attempt tonight/tomorrow. Ok. Did this happen? Sorry, I've been away the last couple of weeks - I had a google at the qemu-devel archives but couldn't spot a new merge, but did I just not look hard enough? Before I do that, I was hoping to get some consensus that this is the right way to go. For reference, I have an updated version of the first patch (which adds the core IOMMU layer) below. I think the base DMA layer is the correct approach. There are some problems with the handling in PCI - as someone else pointed out the fact that it assumes the IOMMU is itself a PCI device is problematic for non-x86 platforms. Some developers expressed a few concerns during my last merge attempt, I'm going to go through them and see if they have been solved. Ok. [snip] * the dma_memory_map() tracking was storing the guest physical address of each mapping, but not the qemu user virtual address. However in unmap() it was then attempting to lookup by virtual using a completely bogus cast. Thanks. Map invalidation didn't get much testing, maybe figuring out a way to trigger it from a guest would be nice, say a testcase. Well, I didn't catch a logic problem - for me this bug caused compile failure. * The dma_memory_rw() function is moved from dma_rw.h to dma_rw.c, it was a bit too much code for an inline. * IOMMU support is now available on all target platforms, not just i386, but is configurable (--enable-iommu/--disable-iommu). Stubs are used so that individual drivers can use the new dma interface and it will turn into old-style cpu physical accesses at no cost on IOMMU-less builds. My impression was people were in favor of having the IOMMU code always built in (and go through the direct cpu_physical_* when not configured by the guest). And perhaps poison the old interfaces once everything goes through the new DMA layer. I'm okay any way, though. Oh, I had the opposite impression. I don't care either way, personally. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] Supporting emulation of IOMMUs
On Thu, Apr 28, 2011 at 02:57:14PM -0700, Richard Henderson wrote: On 04/21/2011 02:39 AM, Alexander Graf wrote: How exactly is this going to be used? Also, in the end I think that most devices should just go through a PCI specific interface that then calls the DMA helpers: pci_memory_rw(PCIDevice *d, ...) even if it's only as simple as calling dma_memory_rw(d-iommu, ...) I've had a read through the patches posted in January. It all does seem relatively sane. At least, I can readily see how I would apply these interfaces to my Alpha port without trouble. I'll agree with Alex though that the raw dma_memory_rw functions should not be exposed to the drivers for any given bus. They should always go through {pci,isa}_memory_rw. And these should almost certainly be inline functions that just pass on device-bus.mmu. I don't really see the point of having per-bus helpers. This is going to be widespread enough that we can have an iommu pointer in the core qdev, then everything can use the same helpers. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] Supporting emulation of IOMMUs
On Thu, Apr 21, 2011 at 11:39:22AM +0200, Alexander Graf wrote: On 21.04.2011, at 09:03, David Gibson wrote: [snip] @@ -934,6 +939,8 @@ echo --enable-docsenable documentation build echo --disable-docs disable documentation build echo --disable-vhost-net disable vhost-net acceleration support echo --enable-vhost-net enable vhost-net acceleration support +echo --disable-iommu disable IOMMU emulation support +echo --enable-vhost-net enable IOMMU emulation support eeh? Oops, copy and paste error. Fix in my tree now. [snip] +#else +struct DMAMmu { +DeviceState *iommu; +DMATranslateFunc *translate; +QLIST_HEAD(memory_maps, DMAMemoryMap) memory_maps; +}; + +struct DMADevice { How exactly is this going to be used? Well, the guts of the dma layer is not my work, so I can't really answer that. Also, in the end I think that most devices should just go through a PCI specific interface that then calls the DMA helpers: pci_memory_rw(PCIDevice *d, ...) even if it's only as simple as calling dma_memory_rw(d-iommu, ...) So, I was actually thinking it would make most sense to have an iommu pointer in *every* qdev, PCI or otherwise, so that we just use dma_memory_rw(qdev, ...) everywhere. Obviously a NULL iommu pointer would fall back to no translation. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] Supporting emulation of IOMMUs
On Fri, Apr 29, 2011 at 07:27:38AM -0700, Richard Henderson wrote: On 04/28/2011 02:57 PM, Richard Henderson wrote: I've had a read through the patches posted in January. It all does seem relatively sane. At least, I can readily see how I would apply these interfaces to my Alpha port without trouble. I take that back, I see one rather annoying problem: the assumption that the translate function operates on some sort of standalone device. This assumption is present in two places: Yeah, I noticed this problem too, though I think it's in the AMD IOMMU support patch rather than the DMA translation core. void pci_register_iommu(PCIDevice *dev, DMATranslateFunc *translate); Here, you're assuming that the IOMMU is a device on the PCI bus itself. While this may indeed be how the AMD-IOMMU presents itself for the convenience of the pc-minded operating system, that's certainly not how the hardware is implemented. In practice it's a function of the PCI host controller. And indeed, that's how it is presented to the system for the Sparc and Alpha controllers with which I am familiar. I assume it's similar for PowerPC, but I've never looked. Yes, that's right. On pSeries platforms IOMMU translations are configured using hcalls. struct DMAMmu { DeviceState *iommu; DMATranslateFunc *translate; QLIST_HEAD(memory_maps, DMAMemoryMap) memory_maps; }; Here, you're assuming that the iommu state is a standalone qdev. This is probably true most of the time, given that FROM_SYSBUS(PCIHostState, sysbus_from_qdev(dev)) is true. However, the Alpha Typhoon chipset has two pci host controllers that are tied together in ways that would be difficult, or at least irritating, to represent as two separate qdev entities. I suggest that, like many other places we have callbacks, this should be an opaque value private to the translate function. In your AMD-IOMMU case you can then do pci_register_iommu(dev-bus, amd_iommu_translate, st); and avoid PCIDevice *pci_dev = container_of(dev, PCIDevice, dma); PCIDevice *iommu_dev = DO_UPCAST(PCIDevice, qdev, dev-mmu-iommu); AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, iommu_dev); at the beginning of your translate function. You currently have three levels of casting and pointer chasing; surely you can agree that having a single cast from void* is much easier to follow. In my Alpha case I can then do pci_register_iommu(pci_bus0, typhoon_iommu_translate, state-pchip0); pci_register_iommu(pci_bus1, typhoon_iommu_translate, state-pchip1); and be off to the races. Yes, I think that makes sense. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] Supporting emulation of IOMMUs
On 04/28/2011 02:57 PM, Richard Henderson wrote: I've had a read through the patches posted in January. It all does seem relatively sane. At least, I can readily see how I would apply these interfaces to my Alpha port without trouble. I take that back, I see one rather annoying problem: the assumption that the translate function operates on some sort of standalone device. This assumption is present in two places: void pci_register_iommu(PCIDevice *dev, DMATranslateFunc *translate); Here, you're assuming that the IOMMU is a device on the PCI bus itself. While this may indeed be how the AMD-IOMMU presents itself for the convenience of the pc-minded operating system, that's certainly not how the hardware is implemented. In practice it's a function of the PCI host controller. And indeed, that's how it is presented to the system for the Sparc and Alpha controllers with which I am familiar. I assume it's similar for PowerPC, but I've never looked. struct DMAMmu { DeviceState *iommu; DMATranslateFunc *translate; QLIST_HEAD(memory_maps, DMAMemoryMap) memory_maps; }; Here, you're assuming that the iommu state is a standalone qdev. This is probably true most of the time, given that FROM_SYSBUS(PCIHostState, sysbus_from_qdev(dev)) is true. However, the Alpha Typhoon chipset has two pci host controllers that are tied together in ways that would be difficult, or at least irritating, to represent as two separate qdev entities. I suggest that, like many other places we have callbacks, this should be an opaque value private to the translate function. In your AMD-IOMMU case you can then do pci_register_iommu(dev-bus, amd_iommu_translate, st); and avoid PCIDevice *pci_dev = container_of(dev, PCIDevice, dma); PCIDevice *iommu_dev = DO_UPCAST(PCIDevice, qdev, dev-mmu-iommu); AMDIOMMUState *st = DO_UPCAST(AMDIOMMUState, dev, iommu_dev); at the beginning of your translate function. You currently have three levels of casting and pointer chasing; surely you can agree that having a single cast from void* is much easier to follow. In my Alpha case I can then do pci_register_iommu(pci_bus0, typhoon_iommu_translate, state-pchip0); pci_register_iommu(pci_bus1, typhoon_iommu_translate, state-pchip1); and be off to the races. r~
Re: [Qemu-devel] Supporting emulation of IOMMUs
On 04/21/2011 02:39 AM, Alexander Graf wrote: How exactly is this going to be used? Also, in the end I think that most devices should just go through a PCI specific interface that then calls the DMA helpers: pci_memory_rw(PCIDevice *d, ...) even if it's only as simple as calling dma_memory_rw(d-iommu, ...) I've had a read through the patches posted in January. It all does seem relatively sane. At least, I can readily see how I would apply these interfaces to my Alpha port without trouble. I'll agree with Alex though that the raw dma_memory_rw functions should not be exposed to the drivers for any given bus. They should always go through {pci,isa}_memory_rw. And these should almost certainly be inline functions that just pass on device-bus.mmu. r~
Re: [Qemu-devel] Supporting emulation of IOMMUs
Hello David, On Thu, Apr 21, 2011 at 03:03:47AM -0400, David Gibson wrote: A few months ago, Eduard - Gabriel Munteanu posted a series of patches implementing support for emulating the AMD PCI IOMMU (http://lists.nongnu.org/archive/html/qemu-devel/2011-01/msg03196.html). In fact, this series implemented a general DMA/IOMMU layer which can be used by any device model, and one translation backend for this implementing the AMD specific PCI IOMMU. the patches for AMD IOMMU emulation are not yet upstream has you have already noticed. Eduard is busy with his studies at the moment so any help is greatly appreciated :) Feel free to pick up hist patches and re-submit them (keeping the author correct of course). If you submit your code, it would be cool if you could put me on Cc so I can easily follow the discussion and jump in if required. Regards, Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632
Re: [Qemu-devel] Supporting emulation of IOMMUs
On 21.04.2011, at 09:03, David Gibson wrote: A few months ago, Eduard - Gabriel Munteanu posted a series of patches implementing support for emulating the AMD PCI IOMMU (http://lists.nongnu.org/archive/html/qemu-devel/2011-01/msg03196.html). In fact, this series implemented a general DMA/IOMMU layer which can be used by any device model, and one translation backend for this implementing the AMD specific PCI IOMMU. These patches don't seem to have gone anywhere for the last few months, however, and so far I've been unable to contact the author (trying again with this mail). I have an interest in this code, because the pSeries machine will also need IOMMU emulation support. At present we only support virtual devices, through the PAPR interface, and we have support for the hypervisor-controller IOMMU translation in the PAPR VIO code. However, we want to add PCI device support and this will also need IOMMU translation. The series seems to have the right basic approach, so if the author is indeed MIA, I was planning to pick up the patches and resubmit them (with support for the pSeries IOMMU added). Before I do that, I was hoping to get some consensus that this is the right way to go. For reference, I have an updated version of the first patch (which adds the core IOMMU layer) below. From: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro Date: Fri, 4 Feb 2011 01:32:55 +0200 Subject: [PATCH] Generic DMA memory access interface This introduces replacements for memory access functions like cpu_physical_memory_read(). The new interface can handle address translation and access checking through an IOMMU. David Gibson: I have made several bugfixes and cleanups to Eduard's original patch. * dma_memory_rw() was incorrectly using (uninitialized) plen instead of len in the fallback to no-IOMMU case. * the dma_memory_map() tracking was storing the guest physical address of each mapping, but not the qemu user virtual address. However in unmap() it was then attempting to lookup by virtual using a completely bogus cast. * The dma_memory_rw() function is moved from dma_rw.h to dma_rw.c, it was a bit too much code for an inline. * IOMMU support is now available on all target platforms, not just i386, but is configurable (--enable-iommu/--disable-iommu). Stubs are used so that individual drivers can use the new dma interface and it will turn into old-style cpu physical accesses at no cost on IOMMU-less builds. Signed-off-by: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro Signed-off-by: David Gibson da...@gibson.dropbear.id.au --- Makefile.target |1 + configure | 12 hw/dma_rw.c | 147 +++ hw/dma_rw.h | 156 +++ 4 files changed, 316 insertions(+), 0 deletions(-) create mode 100644 hw/dma_rw.c create mode 100644 hw/dma_rw.h diff --git a/Makefile.target b/Makefile.target index 95f5eda..c3d36c6 100644 --- a/Makefile.target +++ b/Makefile.target @@ -198,6 +198,7 @@ obj-$(CONFIG_REALLY_VIRTFS) += virtio-9p.o obj-y += rwhandler.o obj-$(CONFIG_KVM) += kvm.o kvm-all.o obj-$(CONFIG_NO_KVM) += kvm-stub.o +obj-$(CONFIG_IOMMU) += dma_rw.o LIBS+=-lz QEMU_CFLAGS += $(VNC_TLS_CFLAGS) diff --git a/configure b/configure index da2da04..fa6f4d5 100755 --- a/configure +++ b/configure @@ -130,6 +130,7 @@ xen= linux_aio= attr= vhost_net= +iommu=no xfs= gprof=no @@ -723,6 +724,10 @@ for opt do ;; --enable-vhost-net) vhost_net=yes ;; + --enable-iommu) iommu=yes + ;; + --disable-iommu) iommu=no + ;; --disable-opengl) opengl=no ;; --enable-opengl) opengl=yes @@ -934,6 +939,8 @@ echo --enable-docsenable documentation build echo --disable-docs disable documentation build echo --disable-vhost-net disable vhost-net acceleration support echo --enable-vhost-net enable vhost-net acceleration support +echo --disable-iommu disable IOMMU emulation support +echo --enable-vhost-net enable IOMMU emulation support eeh? echo --enable-trace-backend=B Set trace backend echoAvailable backends: $($source_path/scripts/tracetool --list-backends) echo --with-trace-file=NAME Full PATH,NAME of file to store traces @@ -2608,6 +2615,7 @@ echo madvise $madvise echo posix_madvise $posix_madvise echo uuid support $uuid echo vhost-net support $vhost_net +echo IOMMU support $iommu echo Trace backend $trace_backend echo Trace output file $trace_file-pid echo spice support $spice @@ -3412,6 +3420,10 @@ if test $target_softmmu = yes -a \( \ echo CONFIG_NEED_MMU=y $config_target_mak fi +if test $iommu = yes ; then + echo CONFIG_IOMMU=y $config_target_mak +fi + if test $gprof = yes ; then echo TARGET_GPROF=yes $config_target_mak if
Re: [Qemu-devel] Supporting emulation of IOMMUs
On Thu, Apr 21, 2011 at 05:03:47PM +1000, David Gibson wrote: A few months ago, Eduard - Gabriel Munteanu posted a series of patches implementing support for emulating the AMD PCI IOMMU (http://lists.nongnu.org/archive/html/qemu-devel/2011-01/msg03196.html). In fact, this series implemented a general DMA/IOMMU layer which can be used by any device model, and one translation backend for this implementing the AMD specific PCI IOMMU. These patches don't seem to have gone anywhere for the last few months, however, and so far I've been unable to contact the author (trying again with this mail). I have an interest in this code, because the pSeries machine will also need IOMMU emulation support. At present we only support virtual devices, through the PAPR interface, and we have support for the hypervisor-controller IOMMU translation in the PAPR VIO code. However, we want to add PCI device support and this will also need IOMMU translation. The series seems to have the right basic approach, so if the author is indeed MIA, I was planning to pick up the patches and resubmit them (with support for the pSeries IOMMU added). Hi, Not really MIA, but I've been a bit busy lately, so I'm sorry if I couldn't answer your mail in a timely fashion. I'll try making another merge attempt tonight/tomorrow. Before I do that, I was hoping to get some consensus that this is the right way to go. For reference, I have an updated version of the first patch (which adds the core IOMMU layer) below. Some developers expressed a few concerns during my last merge attempt, I'm going to go through them and see if they have been solved. From: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro Date: Fri, 4 Feb 2011 01:32:55 +0200 Subject: [PATCH] Generic DMA memory access interface This introduces replacements for memory access functions like cpu_physical_memory_read(). The new interface can handle address translation and access checking through an IOMMU. David Gibson: I have made several bugfixes and cleanups to Eduard's original patch. * dma_memory_rw() was incorrectly using (uninitialized) plen instead of len in the fallback to no-IOMMU case. * the dma_memory_map() tracking was storing the guest physical address of each mapping, but not the qemu user virtual address. However in unmap() it was then attempting to lookup by virtual using a completely bogus cast. Thanks. Map invalidation didn't get much testing, maybe figuring out a way to trigger it from a guest would be nice, say a testcase. * The dma_memory_rw() function is moved from dma_rw.h to dma_rw.c, it was a bit too much code for an inline. * IOMMU support is now available on all target platforms, not just i386, but is configurable (--enable-iommu/--disable-iommu). Stubs are used so that individual drivers can use the new dma interface and it will turn into old-style cpu physical accesses at no cost on IOMMU-less builds. My impression was people were in favor of having the IOMMU code always built in (and go through the direct cpu_physical_* when not configured by the guest). And perhaps poison the old interfaces once everything goes through the new DMA layer. I'm okay any way, though. Signed-off-by: Eduard - Gabriel Munteanu eduard.munte...@linux360.ro Signed-off-by: David Gibson da...@gibson.dropbear.id.au --- Makefile.target |1 + configure | 12 hw/dma_rw.c | 147 +++ hw/dma_rw.h | 156 +++ 4 files changed, 316 insertions(+), 0 deletions(-) create mode 100644 hw/dma_rw.c create mode 100644 hw/dma_rw.h diff --git a/Makefile.target b/Makefile.target index 95f5eda..c3d36c6 100644 --- a/Makefile.target +++ b/Makefile.target @@ -198,6 +198,7 @@ obj-$(CONFIG_REALLY_VIRTFS) += virtio-9p.o obj-y += rwhandler.o obj-$(CONFIG_KVM) += kvm.o kvm-all.o obj-$(CONFIG_NO_KVM) += kvm-stub.o +obj-$(CONFIG_IOMMU) += dma_rw.o LIBS+=-lz QEMU_CFLAGS += $(VNC_TLS_CFLAGS) diff --git a/configure b/configure index da2da04..fa6f4d5 100755 --- a/configure +++ b/configure @@ -130,6 +130,7 @@ xen= linux_aio= attr= vhost_net= +iommu=no xfs= gprof=no @@ -723,6 +724,10 @@ for opt do ;; --enable-vhost-net) vhost_net=yes ;; + --enable-iommu) iommu=yes + ;; + --disable-iommu) iommu=no + ;; --disable-opengl) opengl=no ;; --enable-opengl) opengl=yes @@ -934,6 +939,8 @@ echo --enable-docsenable documentation build echo --disable-docs disable documentation build echo --disable-vhost-net disable vhost-net acceleration support echo --enable-vhost-net enable vhost-net acceleration support +echo --disable-iommu disable IOMMU emulation support +echo --enable-vhost-net enable IOMMU emulation support echo --enable-trace-backend=B Set trace