Re: [RFC][patch 0/6] pci pass-through support for qemu/KVM on s390
On Mon, 2014-09-08 at 11:20 +0200, Paolo Bonzini wrote: > Il 06/09/2014 01:19, Alexander Graf ha scritto: > >> > 1) interpretive execution of pci load/store instruction. If we use this > >> > function > >> >pci access does not get intercepted (no SIE exit) but is handled via > >> > microcode. > >> >To enable this we have to disable zpci device and enable it again > >> > with information > >> >from the SIE control block. > > Hrm. So how about you create a special vm ioctl for KVM that allows you > > to attach a VFIO device fd into the KVM VM context? Then the default > > would stay "accessible by mmap traps", but we could accelerate it with KVM. > > There is already KVM_DEV_VFIO_GROUP_ADD and KVM_DEV_VFIO_GROUP_DEL. > > Right now, they result in a call to kvm_arch_register_noncoherent_dma or > kvm_arch_unregister_noncoherent_dma, but you can add more hooks. Eric Auger is also working on a patch series to do IRQ forward control on ARM via the kvm-vfio pseudo device, extending the interface to register VFIO device fds. Sounds like that may be a good path to follow here too. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][patch 0/6] pci pass-through support for qemu/KVM on s390
Il 06/09/2014 01:19, Alexander Graf ha scritto: >> > 1) interpretive execution of pci load/store instruction. If we use this >> > function >> >pci access does not get intercepted (no SIE exit) but is handled via >> > microcode. >> >To enable this we have to disable zpci device and enable it again with >> > information >> >from the SIE control block. > Hrm. So how about you create a special vm ioctl for KVM that allows you > to attach a VFIO device fd into the KVM VM context? Then the default > would stay "accessible by mmap traps", but we could accelerate it with KVM. There is already KVM_DEV_VFIO_GROUP_ADD and KVM_DEV_VFIO_GROUP_DEL. Right now, they result in a call to kvm_arch_register_noncoherent_dma or kvm_arch_unregister_noncoherent_dma, but you can add more hooks. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][patch 0/6] pci pass-through support for qemu/KVM on s390
On 05.09.14 13:39, Frank Blaschka wrote: > On Fri, Sep 05, 2014 at 10:21:27AM +0200, Alexander Graf wrote: >> >> >> On 04.09.14 12:52, frank.blasc...@de.ibm.com wrote: >>> This set of patches implements pci pass-through support for qemu/KVM on >>> s390. >>> PCI support on s390 is very different from other platforms. >>> Major differences are: >>> >>> 1) all PCI operations are driven by special s390 instructions >>> 2) all s390 PCI instructions are privileged >>> 3) PCI config and memory spaces can not be mmap'ed >> >> That's ok, vfio abstracts config space anyway. >> >>> 4) no classic interrupts (INTX, MSI). The pci hw understands the concept >>>of requesting MSIX irqs but irqs are delivered as s390 adapter irqs. >> >> This is in line with other implementations. Interrupts go from >> >> device -> PHB -> PIC -> CPU >> >> (some times you can have another converter device in between) >> >> In your case, the PHB converts INTX and MSI interrupts to Adapter >> interrupts to go to the floating interrupt controller. Same thing as >> everyone else really. >> > > Yes, I think this can be done, but we need s390 specific changes in vfio. > >>> 5) For DMA access there is always an IOMMU required. s390 pci implementation >>>does not support a complete memory to iommu mapping, dma mappings are >>>created on request. >> >> Sounds great :). So I suppose we should implement a guest facing IOMMU? >> >>> 6) The OS does not get any informations about the physical layout >>>of the PCI bus. >> >> So how does it know whether different devices are behind the same IOMMU >> context? Or can we assume that every device has its own context? > > Actually yes That greatly simplifies things. Awesome :). > >> >>> 7) To take advantage of system z specific virtualization features >>>we need to access the SIE control block residing in the kernel KVM >> >> Pleas elaborate. >> >>> 8) To enable system z specific virtualization features we have to manipulate >>>the zpci device in kernel. >> >> Why? >> > > We have following s390 specific virtualization features: > > 1) interpretive execution of pci load/store instruction. If we use this > function >pci access does not get intercepted (no SIE exit) but is handled via > microcode. >To enable this we have to disable zpci device and enable it again with > information >from the SIE control block. Hrm. So how about you create a special vm ioctl for KVM that allows you to attach a VFIO device fd into the KVM VM context? Then the default would stay "accessible by mmap traps", but we could accelerate it with KVM. >Further in qemu problem is: vfio traps access to >MSIX table so we have to find another way programming msix if we do not get >intercepts for memory space access. We trap access to the MSIX table because it's a shared resource. If it's not shared for you, there's no need to trap it. > 2) Adapter event forwarding (with alerting). This is a mechanism the adpater > event (irq) >is directly forwarded to the guest. To set this up we also need to > manipulate >the zpci device (in kernel) with information form the SIE block. Exploiting >GISA is only one part of this mechanism. How does this work when the VM is not running (because it's idle)? Either way, we have a very similar thing on x86. It's called "posted interrupts" there. I'm not sure everything's in place for VFIO and posted interrupts to work properly, but whatever we do it sounds like the interfaces and configuration flow should be identical. > Both might be possible with some more or less nice looking vfio extensions. > As I said > before we have to dig more into. Also this can be further optimazation steps > later > if we have a running vfio implementation on the platform. Yup :). That's the nice part about it. > >>> >>> For this reasons I decided to implement a kernel based approach similar >>> to x86 device assignment. There is a new qemu device (s390-pci) >>> representing a >> >> I fail to see the rationale and I definitely don't want to see anything >> even remotely similar to the legacy x86 device assignment on s390 ;). >> >> Can't we just enhance VFIO? >> > > Probably yes, but we need some vfio changes (kernel and qemu) We need changes either way ;). So let's better do the right ones. > >> Also, I think we'll get the cleanest model if we start off with an >> implementation that allows us to add emulated PCI devices to an s390x >> machine and only then follow on with physical ones. >> > > I can already do this. With some more s390 intercepts a device can be > detected and > guest is able to access config/memory space. Unfortunately s390 platform does > not > support I/O bars so non of the emulated devices will work on the platform ... Oh? How about "nec-usb-xhci" or "intel-hda"? Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vge
Re: [RFC][patch 0/6] pci pass-through support for qemu/KVM on s390
On 05.09.14 13:55, Frank Blaschka wrote: > On Fri, Sep 05, 2014 at 10:35:59AM +0200, Alexander Graf wrote: >> >> >> On 05.09.14 09:46, Frank Blaschka wrote: >>> On Thu, Sep 04, 2014 at 07:16:24AM -0600, Alex Williamson wrote: On Thu, 2014-09-04 at 12:52 +0200, frank.blasc...@de.ibm.com wrote: > This set of patches implements pci pass-through support for qemu/KVM on > s390. > PCI support on s390 is very different from other platforms. > Major differences are: > > 1) all PCI operations are driven by special s390 instructions Generating config cycles is always arch specific. > 2) all s390 PCI instructions are privileged While the operations to generate config cycles on x86 are not privileged, they must be arbitrated between accesses, so in a sense they're privileged. > 3) PCI config and memory spaces can not be mmap'ed VFIO has mapping flags that allow any region to specify mmap support. >>> >>> Hi Alex, >>> >>> thx for your reply. >>> >>> Let me elaborate a little bit ore on 1 - 3. Config and memory space can not >>> be accessed via memory operations. You have to use special s390 >>> instructions. >>> This instructions can not be executed in user space. So there is no other >>> way than executing this instructions in kernel. Yes vfio does support a >>> slow path via ioctrl we could use, but this seems suboptimal from >>> performance >>> point of view. >> >> Ah, I missed the "memory spaces" part ;). I agree that it's "suboptimal" >> to call into the kernel for every PCI access, but I still think that >> VFIO provides the correct abstraction layer for us to use. If nothing >> else, it would at least give us identical configuration to x86 and nice >> debugability en par with the other platforms. >> >>> > 4) no classic interrupts (INTX, MSI). The pci hw understands the concept >of requesting MSIX irqs but irqs are delivered as s390 adapter irqs. VFIO delivers interrupts as eventfds regardless of the underlying platform mechanism. >>> >>> yes that's right, but then we have to do platform specific stuff to present >>> the irq to the guest. I do not say this is impossible but we have add s390 >>> specific code to vfio. >> >> Not at all - interrupt delivery is completely transparent to VFIO. >> > > interrupt yes, but MSIX no > >>> > 5) For DMA access there is always an IOMMU required. x86 requires the same. > s390 pci implementation >does not support a complete memory to iommu mapping, dma mappings are >created on request. Sounds like POWER. >>> >>> Don't know the details from power, maybe it is similar but not the same. >>> We might be able to extend vfio to have a new interface allowing >>> us to do DMA mappings on request. >> >> We already have that. >> > > Great, can you give me some pointers how to use? Thx! Sure! :) So on POWER (sPAPR) you get a list of page entries that describe the device -> ram mapping. Every time you want to modify any of these entries, you need to invoke a hypercall (H_PUT_TCE). So every time the guest wants to runtime add a DMA window, we trap into put_tce_emu() in hw/ppc/spapr_iommu.c. Here we call memory_region_notify_iommu(). This call goes either to an emulated IOMMU context for emulated devices or to the special VFIO IOMMU context for VFIO devices. In the VFIO case, we end up in vfio_iommu_map_notify() at hw/misc/vfio.c which calls ioctl(VFIO_IOMMU_MAP_DMA) at the end of the day. The in-kernel implementation of the host IOMMU provider uses this map to create the virtual DMA window map. Basically, VFIO *only* supports "DMA mappings on request" as you call them. Prepopulated DMA windows are just a coincidence that may or may not happen. I hope that makes it slightly more clear what the path looks like :). If you have more questions on this, don't hesitate to ask. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][patch 0/6] pci pass-through support for qemu/KVM on s390
On Fri, Sep 05, 2014 at 10:35:59AM +0200, Alexander Graf wrote: > > > On 05.09.14 09:46, Frank Blaschka wrote: > > On Thu, Sep 04, 2014 at 07:16:24AM -0600, Alex Williamson wrote: > >> On Thu, 2014-09-04 at 12:52 +0200, frank.blasc...@de.ibm.com wrote: > >>> This set of patches implements pci pass-through support for qemu/KVM on > >>> s390. > >>> PCI support on s390 is very different from other platforms. > >>> Major differences are: > >>> > >>> 1) all PCI operations are driven by special s390 instructions > >> > >> Generating config cycles is always arch specific. > >> > >>> 2) all s390 PCI instructions are privileged > >> > >> While the operations to generate config cycles on x86 are not > >> privileged, they must be arbitrated between accesses, so in a sense > >> they're privileged. > >> > >>> 3) PCI config and memory spaces can not be mmap'ed > >> > >> VFIO has mapping flags that allow any region to specify mmap support. > >> > > > > Hi Alex, > > > > thx for your reply. > > > > Let me elaborate a little bit ore on 1 - 3. Config and memory space can not > > be accessed via memory operations. You have to use special s390 > > instructions. > > This instructions can not be executed in user space. So there is no other > > way than executing this instructions in kernel. Yes vfio does support a > > slow path via ioctrl we could use, but this seems suboptimal from > > performance > > point of view. > > Ah, I missed the "memory spaces" part ;). I agree that it's "suboptimal" > to call into the kernel for every PCI access, but I still think that > VFIO provides the correct abstraction layer for us to use. If nothing > else, it would at least give us identical configuration to x86 and nice > debugability en par with the other platforms. > > > > >>> 4) no classic interrupts (INTX, MSI). The pci hw understands the concept > >>>of requesting MSIX irqs but irqs are delivered as s390 adapter irqs. > >> > >> VFIO delivers interrupts as eventfds regardless of the underlying > >> platform mechanism. > >> > > > > yes that's right, but then we have to do platform specific stuff to present > > the irq to the guest. I do not say this is impossible but we have add s390 > > specific code to vfio. > > Not at all - interrupt delivery is completely transparent to VFIO. > interrupt yes, but MSIX no > > > >>> 5) For DMA access there is always an IOMMU required. > >> > >> x86 requires the same. > >> > >>> s390 pci implementation > >>>does not support a complete memory to iommu mapping, dma mappings are > >>>created on request. > >> > >> Sounds like POWER. > > > > Don't know the details from power, maybe it is similar but not the same. > > We might be able to extend vfio to have a new interface allowing > > us to do DMA mappings on request. > > We already have that. > Great, can you give me some pointers how to use? Thx! > > > >> > >>> 6) The OS does not get any informations about the physical layout > >>>of the PCI bus. > >> > >> If that means that every device is isolated (seems unlikely for > >> multifunction devices) then that makes IOMMU group support really easy. > >> > > > > OK > > > >>> 7) To take advantage of system z specific virtualization features > >>>we need to access the SIE control block residing in the kernel KVM > >> > >> The KVM-VFIO device allows interaction between VFIO devices and KVM. > >> > >>> 8) To enable system z specific virtualization features we have to > >>> manipulate > >>>the zpci device in kernel. > >> > >> VFIO supports different device backends, currently pci_dev and working > >> towards platform devices. zpci might just be an extension to standard > >> pci. > >> > > > > 7 - 8 At least this is not as straightforward as the pure kernel approach, > > but > > I have to dig into that in more detail if we could only agree on a vfio > > solution. > > Please do so, yes :). > > > > >>> For this reasons I decided to implement a kernel based approach similar > >>> to x86 device assignment. There is a new qemu device (s390-pci) > >>> representing a > >>> pass through device on the host. Here is a sample qemu device > >>> configuration: > >>> > >>> -device s390-pci,host=:00:00.0 > >>> > >>> The device executes the KVM_ASSIGN_PCI_DEVICE ioctl to create a proxy > >>> instance > >>> in the kernel KVM and connect this instance to the host pci device. > >>> > >>> kernel patches apply to linux-kvm > >>> > >>> s390: cio: chsc function to register GIB > >>> s390: pci: export pci functions for pass-through usage > >>> KVM: s390: Add GISA support > >>> KVM: s390: Add PCI pass-through support > >>> > >>> qemu patches apply to qemu-master > >>> > >>> s390: Add PCI bus support > >>> s390: Add PCI pass-through device support > >>> > >>> Feedback and discussion is highly welcome ... > >> > >> KVM-based device assignment needs to go away. It's a horrible model for > >> devices, it offers very little protection to the kernel, assumes every > >> device is full
Re: [RFC][patch 0/6] pci pass-through support for qemu/KVM on s390
On Fri, Sep 05, 2014 at 10:21:27AM +0200, Alexander Graf wrote: > > > On 04.09.14 12:52, frank.blasc...@de.ibm.com wrote: > > This set of patches implements pci pass-through support for qemu/KVM on > > s390. > > PCI support on s390 is very different from other platforms. > > Major differences are: > > > > 1) all PCI operations are driven by special s390 instructions > > 2) all s390 PCI instructions are privileged > > 3) PCI config and memory spaces can not be mmap'ed > > That's ok, vfio abstracts config space anyway. > > > 4) no classic interrupts (INTX, MSI). The pci hw understands the concept > >of requesting MSIX irqs but irqs are delivered as s390 adapter irqs. > > This is in line with other implementations. Interrupts go from > > device -> PHB -> PIC -> CPU > > (some times you can have another converter device in between) > > In your case, the PHB converts INTX and MSI interrupts to Adapter > interrupts to go to the floating interrupt controller. Same thing as > everyone else really. > Yes, I think this can be done, but we need s390 specific changes in vfio. > > 5) For DMA access there is always an IOMMU required. s390 pci implementation > >does not support a complete memory to iommu mapping, dma mappings are > >created on request. > > Sounds great :). So I suppose we should implement a guest facing IOMMU? > > > 6) The OS does not get any informations about the physical layout > >of the PCI bus. > > So how does it know whether different devices are behind the same IOMMU > context? Or can we assume that every device has its own context? Actually yes > > > 7) To take advantage of system z specific virtualization features > >we need to access the SIE control block residing in the kernel KVM > > Pleas elaborate. > > > 8) To enable system z specific virtualization features we have to manipulate > >the zpci device in kernel. > > Why? > We have following s390 specific virtualization features: 1) interpretive execution of pci load/store instruction. If we use this function pci access does not get intercepted (no SIE exit) but is handled via microcode. To enable this we have to disable zpci device and enable it again with information from the SIE control block. Further in qemu problem is: vfio traps access to MSIX table so we have to find another way programming msix if we do not get intercepts for memory space access. 2) Adapter event forwarding (with alerting). This is a mechanism the adpater event (irq) is directly forwarded to the guest. To set this up we also need to manipulate the zpci device (in kernel) with information form the SIE block. Exploiting GISA is only one part of this mechanism. Both might be possible with some more or less nice looking vfio extensions. As I said before we have to dig more into. Also this can be further optimazation steps later if we have a running vfio implementation on the platform. > > > > For this reasons I decided to implement a kernel based approach similar > > to x86 device assignment. There is a new qemu device (s390-pci) > > representing a > > I fail to see the rationale and I definitely don't want to see anything > even remotely similar to the legacy x86 device assignment on s390 ;). > > Can't we just enhance VFIO? > Probably yes, but we need some vfio changes (kernel and qemu) > Also, I think we'll get the cleanest model if we start off with an > implementation that allows us to add emulated PCI devices to an s390x > machine and only then follow on with physical ones. > I can already do this. With some more s390 intercepts a device can be detected and guest is able to access config/memory space. Unfortunately s390 platform does not support I/O bars so non of the emulated devices will work on the platform ... > > Alex > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][patch 0/6] pci pass-through support for qemu/KVM on s390
On 05.09.14 09:46, Frank Blaschka wrote: > On Thu, Sep 04, 2014 at 07:16:24AM -0600, Alex Williamson wrote: >> On Thu, 2014-09-04 at 12:52 +0200, frank.blasc...@de.ibm.com wrote: >>> This set of patches implements pci pass-through support for qemu/KVM on >>> s390. >>> PCI support on s390 is very different from other platforms. >>> Major differences are: >>> >>> 1) all PCI operations are driven by special s390 instructions >> >> Generating config cycles is always arch specific. >> >>> 2) all s390 PCI instructions are privileged >> >> While the operations to generate config cycles on x86 are not >> privileged, they must be arbitrated between accesses, so in a sense >> they're privileged. >> >>> 3) PCI config and memory spaces can not be mmap'ed >> >> VFIO has mapping flags that allow any region to specify mmap support. >> > > Hi Alex, > > thx for your reply. > > Let me elaborate a little bit ore on 1 - 3. Config and memory space can not > be accessed via memory operations. You have to use special s390 instructions. > This instructions can not be executed in user space. So there is no other > way than executing this instructions in kernel. Yes vfio does support a > slow path via ioctrl we could use, but this seems suboptimal from performance > point of view. Ah, I missed the "memory spaces" part ;). I agree that it's "suboptimal" to call into the kernel for every PCI access, but I still think that VFIO provides the correct abstraction layer for us to use. If nothing else, it would at least give us identical configuration to x86 and nice debugability en par with the other platforms. > >>> 4) no classic interrupts (INTX, MSI). The pci hw understands the concept >>>of requesting MSIX irqs but irqs are delivered as s390 adapter irqs. >> >> VFIO delivers interrupts as eventfds regardless of the underlying >> platform mechanism. >> > > yes that's right, but then we have to do platform specific stuff to present > the irq to the guest. I do not say this is impossible but we have add s390 > specific code to vfio. Not at all - interrupt delivery is completely transparent to VFIO. > >>> 5) For DMA access there is always an IOMMU required. >> >> x86 requires the same. >> >>> s390 pci implementation >>>does not support a complete memory to iommu mapping, dma mappings are >>>created on request. >> >> Sounds like POWER. > > Don't know the details from power, maybe it is similar but not the same. > We might be able to extend vfio to have a new interface allowing > us to do DMA mappings on request. We already have that. > >> >>> 6) The OS does not get any informations about the physical layout >>>of the PCI bus. >> >> If that means that every device is isolated (seems unlikely for >> multifunction devices) then that makes IOMMU group support really easy. >> > > OK > >>> 7) To take advantage of system z specific virtualization features >>>we need to access the SIE control block residing in the kernel KVM >> >> The KVM-VFIO device allows interaction between VFIO devices and KVM. >> >>> 8) To enable system z specific virtualization features we have to manipulate >>>the zpci device in kernel. >> >> VFIO supports different device backends, currently pci_dev and working >> towards platform devices. zpci might just be an extension to standard >> pci. >> > > 7 - 8 At least this is not as straightforward as the pure kernel approach, but > I have to dig into that in more detail if we could only agree on a vfio > solution. Please do so, yes :). > >>> For this reasons I decided to implement a kernel based approach similar >>> to x86 device assignment. There is a new qemu device (s390-pci) >>> representing a >>> pass through device on the host. Here is a sample qemu device configuration: >>> >>> -device s390-pci,host=:00:00.0 >>> >>> The device executes the KVM_ASSIGN_PCI_DEVICE ioctl to create a proxy >>> instance >>> in the kernel KVM and connect this instance to the host pci device. >>> >>> kernel patches apply to linux-kvm >>> >>> s390: cio: chsc function to register GIB >>> s390: pci: export pci functions for pass-through usage >>> KVM: s390: Add GISA support >>> KVM: s390: Add PCI pass-through support >>> >>> qemu patches apply to qemu-master >>> >>> s390: Add PCI bus support >>> s390: Add PCI pass-through device support >>> >>> Feedback and discussion is highly welcome ... >> >> KVM-based device assignment needs to go away. It's a horrible model for >> devices, it offers very little protection to the kernel, assumes every >> device is fully isolated and visible to the IOMMU, relies on smattering >> of sysfs files to operate, etc. x86, POWER, and ARM are all moving to >> VFIO-based device assignment. Why is s390 special enough to repeat all >> the mistakes that x86 did? Thanks, >> > > Is this your personal opinion or was this a strategic decision of the > QEMU/KVM community? Can anybody give us direction about this? > > Actually I can understand your point. In the last
Re: [RFC][patch 0/6] pci pass-through support for qemu/KVM on s390
On 04.09.14 12:52, frank.blasc...@de.ibm.com wrote: > This set of patches implements pci pass-through support for qemu/KVM on s390. > PCI support on s390 is very different from other platforms. > Major differences are: > > 1) all PCI operations are driven by special s390 instructions > 2) all s390 PCI instructions are privileged > 3) PCI config and memory spaces can not be mmap'ed That's ok, vfio abstracts config space anyway. > 4) no classic interrupts (INTX, MSI). The pci hw understands the concept >of requesting MSIX irqs but irqs are delivered as s390 adapter irqs. This is in line with other implementations. Interrupts go from device -> PHB -> PIC -> CPU (some times you can have another converter device in between) In your case, the PHB converts INTX and MSI interrupts to Adapter interrupts to go to the floating interrupt controller. Same thing as everyone else really. > 5) For DMA access there is always an IOMMU required. s390 pci implementation >does not support a complete memory to iommu mapping, dma mappings are >created on request. Sounds great :). So I suppose we should implement a guest facing IOMMU? > 6) The OS does not get any informations about the physical layout >of the PCI bus. So how does it know whether different devices are behind the same IOMMU context? Or can we assume that every device has its own context? > 7) To take advantage of system z specific virtualization features >we need to access the SIE control block residing in the kernel KVM Pleas elaborate. > 8) To enable system z specific virtualization features we have to manipulate >the zpci device in kernel. Why? > > For this reasons I decided to implement a kernel based approach similar > to x86 device assignment. There is a new qemu device (s390-pci) representing a I fail to see the rationale and I definitely don't want to see anything even remotely similar to the legacy x86 device assignment on s390 ;). Can't we just enhance VFIO? Also, I think we'll get the cleanest model if we start off with an implementation that allows us to add emulated PCI devices to an s390x machine and only then follow on with physical ones. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][patch 0/6] pci pass-through support for qemu/KVM on s390
On Thu, Sep 04, 2014 at 07:16:24AM -0600, Alex Williamson wrote: > On Thu, 2014-09-04 at 12:52 +0200, frank.blasc...@de.ibm.com wrote: > > This set of patches implements pci pass-through support for qemu/KVM on > > s390. > > PCI support on s390 is very different from other platforms. > > Major differences are: > > > > 1) all PCI operations are driven by special s390 instructions > > Generating config cycles is always arch specific. > > > 2) all s390 PCI instructions are privileged > > While the operations to generate config cycles on x86 are not > privileged, they must be arbitrated between accesses, so in a sense > they're privileged. > > > 3) PCI config and memory spaces can not be mmap'ed > > VFIO has mapping flags that allow any region to specify mmap support. > Hi Alex, thx for your reply. Let me elaborate a little bit ore on 1 - 3. Config and memory space can not be accessed via memory operations. You have to use special s390 instructions. This instructions can not be executed in user space. So there is no other way than executing this instructions in kernel. Yes vfio does support a slow path via ioctrl we could use, but this seems suboptimal from performance point of view. > > 4) no classic interrupts (INTX, MSI). The pci hw understands the concept > >of requesting MSIX irqs but irqs are delivered as s390 adapter irqs. > > VFIO delivers interrupts as eventfds regardless of the underlying > platform mechanism. > yes that's right, but then we have to do platform specific stuff to present the irq to the guest. I do not say this is impossible but we have add s390 specific code to vfio. > > 5) For DMA access there is always an IOMMU required. > > x86 requires the same. > > > s390 pci implementation > >does not support a complete memory to iommu mapping, dma mappings are > >created on request. > > Sounds like POWER. Don't know the details from power, maybe it is similar but not the same. We might be able to extend vfio to have a new interface allowing us to do DMA mappings on request. > > > 6) The OS does not get any informations about the physical layout > >of the PCI bus. > > If that means that every device is isolated (seems unlikely for > multifunction devices) then that makes IOMMU group support really easy. > OK > > 7) To take advantage of system z specific virtualization features > >we need to access the SIE control block residing in the kernel KVM > > The KVM-VFIO device allows interaction between VFIO devices and KVM. > > > 8) To enable system z specific virtualization features we have to manipulate > >the zpci device in kernel. > > VFIO supports different device backends, currently pci_dev and working > towards platform devices. zpci might just be an extension to standard > pci. > 7 - 8 At least this is not as straightforward as the pure kernel approach, but I have to dig into that in more detail if we could only agree on a vfio solution. > > For this reasons I decided to implement a kernel based approach similar > > to x86 device assignment. There is a new qemu device (s390-pci) > > representing a > > pass through device on the host. Here is a sample qemu device configuration: > > > > -device s390-pci,host=:00:00.0 > > > > The device executes the KVM_ASSIGN_PCI_DEVICE ioctl to create a proxy > > instance > > in the kernel KVM and connect this instance to the host pci device. > > > > kernel patches apply to linux-kvm > > > > s390: cio: chsc function to register GIB > > s390: pci: export pci functions for pass-through usage > > KVM: s390: Add GISA support > > KVM: s390: Add PCI pass-through support > > > > qemu patches apply to qemu-master > > > > s390: Add PCI bus support > > s390: Add PCI pass-through device support > > > > Feedback and discussion is highly welcome ... > > KVM-based device assignment needs to go away. It's a horrible model for > devices, it offers very little protection to the kernel, assumes every > device is fully isolated and visible to the IOMMU, relies on smattering > of sysfs files to operate, etc. x86, POWER, and ARM are all moving to > VFIO-based device assignment. Why is s390 special enough to repeat all > the mistakes that x86 did? Thanks, > Is this your personal opinion or was this a strategic decision of the QEMU/KVM community? Can anybody give us direction about this? Actually I can understand your point. In the last weeks I did some development and testing regarding the use of vfio too. But the in kernel solutions seems to offer the best performance and most straighforward implementation for our platform. Greetings, Frank > Alex > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.k
Re: [RFC][patch 0/6] pci pass-through support for qemu/KVM on s390
On Thu, 2014-09-04 at 12:52 +0200, frank.blasc...@de.ibm.com wrote: > This set of patches implements pci pass-through support for qemu/KVM on s390. > PCI support on s390 is very different from other platforms. > Major differences are: > > 1) all PCI operations are driven by special s390 instructions Generating config cycles is always arch specific. > 2) all s390 PCI instructions are privileged While the operations to generate config cycles on x86 are not privileged, they must be arbitrated between accesses, so in a sense they're privileged. > 3) PCI config and memory spaces can not be mmap'ed VFIO has mapping flags that allow any region to specify mmap support. > 4) no classic interrupts (INTX, MSI). The pci hw understands the concept >of requesting MSIX irqs but irqs are delivered as s390 adapter irqs. VFIO delivers interrupts as eventfds regardless of the underlying platform mechanism. > 5) For DMA access there is always an IOMMU required. x86 requires the same. > s390 pci implementation >does not support a complete memory to iommu mapping, dma mappings are >created on request. Sounds like POWER. > 6) The OS does not get any informations about the physical layout >of the PCI bus. If that means that every device is isolated (seems unlikely for multifunction devices) then that makes IOMMU group support really easy. > 7) To take advantage of system z specific virtualization features >we need to access the SIE control block residing in the kernel KVM The KVM-VFIO device allows interaction between VFIO devices and KVM. > 8) To enable system z specific virtualization features we have to manipulate >the zpci device in kernel. VFIO supports different device backends, currently pci_dev and working towards platform devices. zpci might just be an extension to standard pci. > For this reasons I decided to implement a kernel based approach similar > to x86 device assignment. There is a new qemu device (s390-pci) representing a > pass through device on the host. Here is a sample qemu device configuration: > > -device s390-pci,host=:00:00.0 > > The device executes the KVM_ASSIGN_PCI_DEVICE ioctl to create a proxy instance > in the kernel KVM and connect this instance to the host pci device. > > kernel patches apply to linux-kvm > > s390: cio: chsc function to register GIB > s390: pci: export pci functions for pass-through usage > KVM: s390: Add GISA support > KVM: s390: Add PCI pass-through support > > qemu patches apply to qemu-master > > s390: Add PCI bus support > s390: Add PCI pass-through device support > > Feedback and discussion is highly welcome ... KVM-based device assignment needs to go away. It's a horrible model for devices, it offers very little protection to the kernel, assumes every device is fully isolated and visible to the IOMMU, relies on smattering of sysfs files to operate, etc. x86, POWER, and ARM are all moving to VFIO-based device assignment. Why is s390 special enough to repeat all the mistakes that x86 did? Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][patch 0/6] pci pass-through support for qemu/KVM on s390
This set of patches implements pci pass-through support for qemu/KVM on s390. PCI support on s390 is very different from other platforms. Major differences are: 1) all PCI operations are driven by special s390 instructions 2) all s390 PCI instructions are privileged 3) PCI config and memory spaces can not be mmap'ed 4) no classic interrupts (INTX, MSI). The pci hw understands the concept of requesting MSIX irqs but irqs are delivered as s390 adapter irqs. 5) For DMA access there is always an IOMMU required. s390 pci implementation does not support a complete memory to iommu mapping, dma mappings are created on request. 6) The OS does not get any informations about the physical layout of the PCI bus. 7) To take advantage of system z specific virtualization features we need to access the SIE control block residing in the kernel KVM 8) To enable system z specific virtualization features we have to manipulate the zpci device in kernel. For this reasons I decided to implement a kernel based approach similar to x86 device assignment. There is a new qemu device (s390-pci) representing a pass through device on the host. Here is a sample qemu device configuration: -device s390-pci,host=:00:00.0 The device executes the KVM_ASSIGN_PCI_DEVICE ioctl to create a proxy instance in the kernel KVM and connect this instance to the host pci device. kernel patches apply to linux-kvm s390: cio: chsc function to register GIB s390: pci: export pci functions for pass-through usage KVM: s390: Add GISA support KVM: s390: Add PCI pass-through support qemu patches apply to qemu-master s390: Add PCI bus support s390: Add PCI pass-through device support Feedback and discussion is highly welcome ... Thx! Frank -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html