Re: [RFC][patch 0/6] pci pass-through support for qemu/KVM on s390

2014-09-08 Thread Alex Williamson
On Mon, 2014-09-08 at 11:20 +0200, Paolo Bonzini wrote:
> Il 06/09/2014 01:19, Alexander Graf ha scritto:
> >> > 1) interpretive execution of pci load/store instruction. If we use this 
> >> > function
> >> >pci access does not get intercepted (no SIE exit) but is handled via 
> >> > microcode.
> >> >To enable this we have to disable zpci device and enable it again 
> >> > with information
> >> >from the SIE control block.
> > Hrm. So how about you create a special vm ioctl for KVM that allows you
> > to attach a VFIO device fd into the KVM VM context? Then the default
> > would stay "accessible by mmap traps", but we could accelerate it with KVM.
> 
> There is already KVM_DEV_VFIO_GROUP_ADD and KVM_DEV_VFIO_GROUP_DEL.
> 
> Right now, they result in a call to kvm_arch_register_noncoherent_dma or
> kvm_arch_unregister_noncoherent_dma, but you can add more hooks.

Eric Auger is also working on a patch series to do IRQ forward control
on ARM via the kvm-vfio pseudo device, extending the interface to
register VFIO device fds.  Sounds like that may be a good path to follow
here too.  Thanks,

Alex


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][patch 0/6] pci pass-through support for qemu/KVM on s390

2014-09-08 Thread Paolo Bonzini
Il 06/09/2014 01:19, Alexander Graf ha scritto:
>> > 1) interpretive execution of pci load/store instruction. If we use this 
>> > function
>> >pci access does not get intercepted (no SIE exit) but is handled via 
>> > microcode.
>> >To enable this we have to disable zpci device and enable it again with 
>> > information
>> >from the SIE control block.
> Hrm. So how about you create a special vm ioctl for KVM that allows you
> to attach a VFIO device fd into the KVM VM context? Then the default
> would stay "accessible by mmap traps", but we could accelerate it with KVM.

There is already KVM_DEV_VFIO_GROUP_ADD and KVM_DEV_VFIO_GROUP_DEL.

Right now, they result in a call to kvm_arch_register_noncoherent_dma or
kvm_arch_unregister_noncoherent_dma, but you can add more hooks.

Paolo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][patch 0/6] pci pass-through support for qemu/KVM on s390

2014-09-05 Thread Alexander Graf


On 05.09.14 13:39, Frank Blaschka wrote:
> On Fri, Sep 05, 2014 at 10:21:27AM +0200, Alexander Graf wrote:
>>
>>
>> On 04.09.14 12:52, frank.blasc...@de.ibm.com wrote:
>>> This set of patches implements pci pass-through support for qemu/KVM on 
>>> s390.
>>> PCI support on s390 is very different from other platforms.
>>> Major differences are:
>>>
>>> 1) all PCI operations are driven by special s390 instructions
>>> 2) all s390 PCI instructions are privileged
>>> 3) PCI config and memory spaces can not be mmap'ed
>>
>> That's ok, vfio abstracts config space anyway.
>>
>>> 4) no classic interrupts (INTX, MSI). The pci hw understands the concept
>>>of requesting MSIX irqs but irqs are delivered as s390 adapter irqs.
>>
>> This is in line with other implementations. Interrupts go from
>>
>>   device -> PHB -> PIC -> CPU
>>
>> (some times you can have another converter device in between)
>>
>> In your case, the PHB converts INTX and MSI interrupts to Adapter
>> interrupts to go to the floating interrupt controller. Same thing as
>> everyone else really.
>>
> 
> Yes, I think this can be done, but we need s390 specific changes in vfio.
> 
>>> 5) For DMA access there is always an IOMMU required. s390 pci implementation
>>>does not support a complete memory to iommu mapping, dma mappings are
>>>created on request.
>>
>> Sounds great :). So I suppose we should implement a guest facing IOMMU?
>>
>>> 6) The OS does not get any informations about the physical layout
>>>of the PCI bus.
>>
>> So how does it know whether different devices are behind the same IOMMU
>> context? Or can we assume that every device has its own context?
> 
> Actually yes

That greatly simplifies things. Awesome :).

> 
>>
>>> 7) To take advantage of system z specific virtualization features
>>>we need to access the SIE control block residing in the kernel KVM
>>
>> Pleas elaborate.
>>
>>> 8) To enable system z specific virtualization features we have to manipulate
>>>the zpci device in kernel.
>>
>> Why?
>>
> 
> We have following s390 specific virtualization features:
> 
> 1) interpretive execution of pci load/store instruction. If we use this 
> function
>pci access does not get intercepted (no SIE exit) but is handled via 
> microcode.
>To enable this we have to disable zpci device and enable it again with 
> information
>from the SIE control block.

Hrm. So how about you create a special vm ioctl for KVM that allows you
to attach a VFIO device fd into the KVM VM context? Then the default
would stay "accessible by mmap traps", but we could accelerate it with KVM.

>Further in qemu problem is: vfio traps access to
>MSIX table so we have to find another way programming msix if we do not get
>intercepts for memory space access.

We trap access to the MSIX table because it's a shared resource. If it's
not shared for you, there's no need to trap it.

> 2) Adapter event forwarding (with alerting). This is a mechanism the adpater 
> event (irq)
>is directly forwarded to the guest. To set this up we also need to 
> manipulate
>the zpci device (in kernel) with information form the SIE block. Exploiting
>GISA is only one part of this mechanism.

How does this work when the VM is not running (because it's idle)?

Either way, we have a very similar thing on x86. It's called "posted
interrupts" there. I'm not sure everything's in place for VFIO and
posted interrupts to work properly, but whatever we do it sounds like
the interfaces and configuration flow should be identical.

> Both might be possible with some more or less nice looking vfio extensions. 
> As I said
> before we have to dig more into. Also this can be further optimazation steps 
> later
> if we have a running vfio implementation on the platform. 

Yup :). That's the nice part about it.

>  
>>>
>>> For this reasons I decided to implement a kernel based approach similar
>>> to x86 device assignment. There is a new qemu device (s390-pci) 
>>> representing a
>>
>> I fail to see the rationale and I definitely don't want to see anything
>> even remotely similar to the legacy x86 device assignment on s390 ;).
>>
>> Can't we just enhance VFIO?
>>
> 
> Probably yes, but we need some vfio changes (kernel and qemu)

We need changes either way ;). So let's better do the right ones.

> 
>> Also, I think we'll get the cleanest model if we start off with an
>> implementation that allows us to add emulated PCI devices to an s390x
>> machine and only then follow on with physical ones.
>>
> 
> I can already do this. With some more s390 intercepts a device can be 
> detected and
> guest is able to access config/memory space. Unfortunately s390 platform does 
> not
> support I/O bars so non of the emulated devices will work on the platform ...

Oh? How about "nec-usb-xhci" or "intel-hda"?


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vge

Re: [RFC][patch 0/6] pci pass-through support for qemu/KVM on s390

2014-09-05 Thread Alexander Graf


On 05.09.14 13:55, Frank Blaschka wrote:
> On Fri, Sep 05, 2014 at 10:35:59AM +0200, Alexander Graf wrote:
>>
>>
>> On 05.09.14 09:46, Frank Blaschka wrote:
>>> On Thu, Sep 04, 2014 at 07:16:24AM -0600, Alex Williamson wrote:
 On Thu, 2014-09-04 at 12:52 +0200, frank.blasc...@de.ibm.com wrote:
> This set of patches implements pci pass-through support for qemu/KVM on 
> s390.
> PCI support on s390 is very different from other platforms.
> Major differences are:
>
> 1) all PCI operations are driven by special s390 instructions

 Generating config cycles is always arch specific.

> 2) all s390 PCI instructions are privileged

 While the operations to generate config cycles on x86 are not
 privileged, they must be arbitrated between accesses, so in a sense
 they're privileged.

> 3) PCI config and memory spaces can not be mmap'ed

 VFIO has mapping flags that allow any region to specify mmap support.

>>>
>>> Hi Alex,
>>>
>>> thx for your reply.
>>>
>>> Let me elaborate a little bit ore on 1 - 3. Config and memory space can not
>>> be accessed via memory operations. You have to use special s390 
>>> instructions.
>>> This instructions can not be executed in user space. So there is no other
>>> way than executing this instructions in kernel. Yes vfio does support a
>>> slow path via ioctrl we could use, but this seems suboptimal from 
>>> performance
>>> point of view.
>>
>> Ah, I missed the "memory spaces" part ;). I agree that it's "suboptimal"
>> to call into the kernel for every PCI access, but I still think that
>> VFIO provides the correct abstraction layer for us to use. If nothing
>> else, it would at least give us identical configuration to x86 and nice
>> debugability en par with the other platforms.
>>
>>>  
> 4) no classic interrupts (INTX, MSI). The pci hw understands the concept
>of requesting MSIX irqs but irqs are delivered as s390 adapter irqs.

 VFIO delivers interrupts as eventfds regardless of the underlying
 platform mechanism.

>>>
>>> yes that's right, but then we have to do platform specific stuff to present
>>> the irq to the guest. I do not say this is impossible but we have add s390
>>> specific code to vfio. 
>>
>> Not at all - interrupt delivery is completely transparent to VFIO.
>>
> 
> interrupt yes, but MSIX no
>  
>>>
> 5) For DMA access there is always an IOMMU required.

 x86 requires the same.

>  s390 pci implementation
>does not support a complete memory to iommu mapping, dma mappings are
>created on request.

 Sounds like POWER.
>>>
>>> Don't know the details from power, maybe it is similar but not the same.
>>> We might be able to extend vfio to have a new interface allowing
>>> us to do DMA mappings on request.
>>
>> We already have that.
>>
> 
> Great, can you give me some pointers how to use? Thx!

Sure! :)

So on POWER (sPAPR) you get a list of page entries that describe the
device -> ram mapping. Every time you want to modify any of these
entries, you need to invoke a hypercall (H_PUT_TCE).

So every time the guest wants to runtime add a DMA window, we trap into
put_tce_emu() in hw/ppc/spapr_iommu.c. Here we call
memory_region_notify_iommu().

This call goes either to an emulated IOMMU context for emulated devices
or to the special VFIO IOMMU context for VFIO devices.

In the VFIO case, we end up in vfio_iommu_map_notify() at hw/misc/vfio.c
which calls ioctl(VFIO_IOMMU_MAP_DMA) at the end of the day. The
in-kernel implementation of the host IOMMU provider uses this map to
create the virtual DMA window map.

Basically, VFIO *only* supports "DMA mappings on request" as you call
them. Prepopulated DMA windows are just a coincidence that may or may
not happen.

I hope that makes it slightly more clear what the path looks like :). If
you have more questions on this, don't hesitate to ask.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][patch 0/6] pci pass-through support for qemu/KVM on s390

2014-09-05 Thread Frank Blaschka
On Fri, Sep 05, 2014 at 10:35:59AM +0200, Alexander Graf wrote:
> 
> 
> On 05.09.14 09:46, Frank Blaschka wrote:
> > On Thu, Sep 04, 2014 at 07:16:24AM -0600, Alex Williamson wrote:
> >> On Thu, 2014-09-04 at 12:52 +0200, frank.blasc...@de.ibm.com wrote:
> >>> This set of patches implements pci pass-through support for qemu/KVM on 
> >>> s390.
> >>> PCI support on s390 is very different from other platforms.
> >>> Major differences are:
> >>>
> >>> 1) all PCI operations are driven by special s390 instructions
> >>
> >> Generating config cycles is always arch specific.
> >>
> >>> 2) all s390 PCI instructions are privileged
> >>
> >> While the operations to generate config cycles on x86 are not
> >> privileged, they must be arbitrated between accesses, so in a sense
> >> they're privileged.
> >>
> >>> 3) PCI config and memory spaces can not be mmap'ed
> >>
> >> VFIO has mapping flags that allow any region to specify mmap support.
> >>
> > 
> > Hi Alex,
> > 
> > thx for your reply.
> > 
> > Let me elaborate a little bit ore on 1 - 3. Config and memory space can not
> > be accessed via memory operations. You have to use special s390 
> > instructions.
> > This instructions can not be executed in user space. So there is no other
> > way than executing this instructions in kernel. Yes vfio does support a
> > slow path via ioctrl we could use, but this seems suboptimal from 
> > performance
> > point of view.
> 
> Ah, I missed the "memory spaces" part ;). I agree that it's "suboptimal"
> to call into the kernel for every PCI access, but I still think that
> VFIO provides the correct abstraction layer for us to use. If nothing
> else, it would at least give us identical configuration to x86 and nice
> debugability en par with the other platforms.
> 
> >  
> >>> 4) no classic interrupts (INTX, MSI). The pci hw understands the concept
> >>>of requesting MSIX irqs but irqs are delivered as s390 adapter irqs.
> >>
> >> VFIO delivers interrupts as eventfds regardless of the underlying
> >> platform mechanism.
> >>
> > 
> > yes that's right, but then we have to do platform specific stuff to present
> > the irq to the guest. I do not say this is impossible but we have add s390
> > specific code to vfio. 
> 
> Not at all - interrupt delivery is completely transparent to VFIO.
>

interrupt yes, but MSIX no
 
> > 
> >>> 5) For DMA access there is always an IOMMU required.
> >>
> >> x86 requires the same.
> >>
> >>>  s390 pci implementation
> >>>does not support a complete memory to iommu mapping, dma mappings are
> >>>created on request.
> >>
> >> Sounds like POWER.
> > 
> > Don't know the details from power, maybe it is similar but not the same.
> > We might be able to extend vfio to have a new interface allowing
> > us to do DMA mappings on request.
> 
> We already have that.
>

Great, can you give me some pointers how to use? Thx!
 
> > 
> >>
> >>> 6) The OS does not get any informations about the physical layout
> >>>of the PCI bus.
> >>
> >> If that means that every device is isolated (seems unlikely for
> >> multifunction devices) then that makes IOMMU group support really easy.
> >>
> > 
> > OK
> >  
> >>> 7) To take advantage of system z specific virtualization features
> >>>we need to access the SIE control block residing in the kernel KVM
> >>
> >> The KVM-VFIO device allows interaction between VFIO devices and KVM.
> >>
> >>> 8) To enable system z specific virtualization features we have to 
> >>> manipulate
> >>>the zpci device in kernel.
> >>
> >> VFIO supports different device backends, currently pci_dev and working
> >> towards platform devices.  zpci might just be an extension to standard
> >> pci.
> >>
> > 
> > 7 - 8 At least this is not as straightforward as the pure kernel approach, 
> > but
> > I have to dig into that in more detail if we could only agree on a vfio 
> > solution.
> 
> Please do so, yes :).
> 
> > 
> >>> For this reasons I decided to implement a kernel based approach similar
> >>> to x86 device assignment. There is a new qemu device (s390-pci) 
> >>> representing a
> >>> pass through device on the host. Here is a sample qemu device 
> >>> configuration:
> >>>
> >>> -device s390-pci,host=:00:00.0
> >>>
> >>> The device executes the KVM_ASSIGN_PCI_DEVICE ioctl to create a proxy 
> >>> instance
> >>> in the kernel KVM and connect this instance to the host pci device.
> >>>
> >>> kernel patches apply to linux-kvm
> >>>
> >>> s390: cio: chsc function to register GIB
> >>> s390: pci: export pci functions for pass-through usage
> >>> KVM: s390: Add GISA support
> >>> KVM: s390: Add PCI pass-through support
> >>>
> >>> qemu patches apply to qemu-master
> >>>
> >>> s390: Add PCI bus support
> >>> s390: Add PCI pass-through device support
> >>>
> >>> Feedback and discussion is highly welcome ...
> >>
> >> KVM-based device assignment needs to go away.  It's a horrible model for
> >> devices, it offers very little protection to the kernel, assumes every
> >> device is full

Re: [RFC][patch 0/6] pci pass-through support for qemu/KVM on s390

2014-09-05 Thread Frank Blaschka
On Fri, Sep 05, 2014 at 10:21:27AM +0200, Alexander Graf wrote:
> 
> 
> On 04.09.14 12:52, frank.blasc...@de.ibm.com wrote:
> > This set of patches implements pci pass-through support for qemu/KVM on 
> > s390.
> > PCI support on s390 is very different from other platforms.
> > Major differences are:
> > 
> > 1) all PCI operations are driven by special s390 instructions
> > 2) all s390 PCI instructions are privileged
> > 3) PCI config and memory spaces can not be mmap'ed
> 
> That's ok, vfio abstracts config space anyway.
> 
> > 4) no classic interrupts (INTX, MSI). The pci hw understands the concept
> >of requesting MSIX irqs but irqs are delivered as s390 adapter irqs.
> 
> This is in line with other implementations. Interrupts go from
> 
>   device -> PHB -> PIC -> CPU
> 
> (some times you can have another converter device in between)
> 
> In your case, the PHB converts INTX and MSI interrupts to Adapter
> interrupts to go to the floating interrupt controller. Same thing as
> everyone else really.
> 

Yes, I think this can be done, but we need s390 specific changes in vfio.

> > 5) For DMA access there is always an IOMMU required. s390 pci implementation
> >does not support a complete memory to iommu mapping, dma mappings are
> >created on request.
> 
> Sounds great :). So I suppose we should implement a guest facing IOMMU?
> 
> > 6) The OS does not get any informations about the physical layout
> >of the PCI bus.
> 
> So how does it know whether different devices are behind the same IOMMU
> context? Or can we assume that every device has its own context?

Actually yes

> 
> > 7) To take advantage of system z specific virtualization features
> >we need to access the SIE control block residing in the kernel KVM
> 
> Pleas elaborate.
> 
> > 8) To enable system z specific virtualization features we have to manipulate
> >the zpci device in kernel.
> 
> Why?
>

We have following s390 specific virtualization features:

1) interpretive execution of pci load/store instruction. If we use this function
   pci access does not get intercepted (no SIE exit) but is handled via 
microcode.
   To enable this we have to disable zpci device and enable it again with 
information
   from the SIE control block. Further in qemu problem is: vfio traps access to
   MSIX table so we have to find another way programming msix if we do not get
   intercepts for memory space access.

2) Adapter event forwarding (with alerting). This is a mechanism the adpater 
event (irq)
   is directly forwarded to the guest. To set this up we also need to manipulate
   the zpci device (in kernel) with information form the SIE block. Exploiting
   GISA is only one part of this mechanism.

Both might be possible with some more or less nice looking vfio extensions. As 
I said
before we have to dig more into. Also this can be further optimazation steps 
later
if we have a running vfio implementation on the platform. 
 
> > 
> > For this reasons I decided to implement a kernel based approach similar
> > to x86 device assignment. There is a new qemu device (s390-pci) 
> > representing a
> 
> I fail to see the rationale and I definitely don't want to see anything
> even remotely similar to the legacy x86 device assignment on s390 ;).
> 
> Can't we just enhance VFIO?
> 

Probably yes, but we need some vfio changes (kernel and qemu)

> Also, I think we'll get the cleanest model if we start off with an
> implementation that allows us to add emulated PCI devices to an s390x
> machine and only then follow on with physical ones.
> 

I can already do this. With some more s390 intercepts a device can be detected 
and
guest is able to access config/memory space. Unfortunately s390 platform does 
not
support I/O bars so non of the emulated devices will work on the platform ...

> 
> Alex
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][patch 0/6] pci pass-through support for qemu/KVM on s390

2014-09-05 Thread Alexander Graf


On 05.09.14 09:46, Frank Blaschka wrote:
> On Thu, Sep 04, 2014 at 07:16:24AM -0600, Alex Williamson wrote:
>> On Thu, 2014-09-04 at 12:52 +0200, frank.blasc...@de.ibm.com wrote:
>>> This set of patches implements pci pass-through support for qemu/KVM on 
>>> s390.
>>> PCI support on s390 is very different from other platforms.
>>> Major differences are:
>>>
>>> 1) all PCI operations are driven by special s390 instructions
>>
>> Generating config cycles is always arch specific.
>>
>>> 2) all s390 PCI instructions are privileged
>>
>> While the operations to generate config cycles on x86 are not
>> privileged, they must be arbitrated between accesses, so in a sense
>> they're privileged.
>>
>>> 3) PCI config and memory spaces can not be mmap'ed
>>
>> VFIO has mapping flags that allow any region to specify mmap support.
>>
> 
> Hi Alex,
> 
> thx for your reply.
> 
> Let me elaborate a little bit ore on 1 - 3. Config and memory space can not
> be accessed via memory operations. You have to use special s390 instructions.
> This instructions can not be executed in user space. So there is no other
> way than executing this instructions in kernel. Yes vfio does support a
> slow path via ioctrl we could use, but this seems suboptimal from performance
> point of view.

Ah, I missed the "memory spaces" part ;). I agree that it's "suboptimal"
to call into the kernel for every PCI access, but I still think that
VFIO provides the correct abstraction layer for us to use. If nothing
else, it would at least give us identical configuration to x86 and nice
debugability en par with the other platforms.

>  
>>> 4) no classic interrupts (INTX, MSI). The pci hw understands the concept
>>>of requesting MSIX irqs but irqs are delivered as s390 adapter irqs.
>>
>> VFIO delivers interrupts as eventfds regardless of the underlying
>> platform mechanism.
>>
> 
> yes that's right, but then we have to do platform specific stuff to present
> the irq to the guest. I do not say this is impossible but we have add s390
> specific code to vfio. 

Not at all - interrupt delivery is completely transparent to VFIO.

> 
>>> 5) For DMA access there is always an IOMMU required.
>>
>> x86 requires the same.
>>
>>>  s390 pci implementation
>>>does not support a complete memory to iommu mapping, dma mappings are
>>>created on request.
>>
>> Sounds like POWER.
> 
> Don't know the details from power, maybe it is similar but not the same.
> We might be able to extend vfio to have a new interface allowing
> us to do DMA mappings on request.

We already have that.

> 
>>
>>> 6) The OS does not get any informations about the physical layout
>>>of the PCI bus.
>>
>> If that means that every device is isolated (seems unlikely for
>> multifunction devices) then that makes IOMMU group support really easy.
>>
> 
> OK
>  
>>> 7) To take advantage of system z specific virtualization features
>>>we need to access the SIE control block residing in the kernel KVM
>>
>> The KVM-VFIO device allows interaction between VFIO devices and KVM.
>>
>>> 8) To enable system z specific virtualization features we have to manipulate
>>>the zpci device in kernel.
>>
>> VFIO supports different device backends, currently pci_dev and working
>> towards platform devices.  zpci might just be an extension to standard
>> pci.
>>
> 
> 7 - 8 At least this is not as straightforward as the pure kernel approach, but
> I have to dig into that in more detail if we could only agree on a vfio 
> solution.

Please do so, yes :).

> 
>>> For this reasons I decided to implement a kernel based approach similar
>>> to x86 device assignment. There is a new qemu device (s390-pci) 
>>> representing a
>>> pass through device on the host. Here is a sample qemu device configuration:
>>>
>>> -device s390-pci,host=:00:00.0
>>>
>>> The device executes the KVM_ASSIGN_PCI_DEVICE ioctl to create a proxy 
>>> instance
>>> in the kernel KVM and connect this instance to the host pci device.
>>>
>>> kernel patches apply to linux-kvm
>>>
>>> s390: cio: chsc function to register GIB
>>> s390: pci: export pci functions for pass-through usage
>>> KVM: s390: Add GISA support
>>> KVM: s390: Add PCI pass-through support
>>>
>>> qemu patches apply to qemu-master
>>>
>>> s390: Add PCI bus support
>>> s390: Add PCI pass-through device support
>>>
>>> Feedback and discussion is highly welcome ...
>>
>> KVM-based device assignment needs to go away.  It's a horrible model for
>> devices, it offers very little protection to the kernel, assumes every
>> device is fully isolated and visible to the IOMMU, relies on smattering
>> of sysfs files to operate, etc.  x86, POWER, and ARM are all moving to
>> VFIO-based device assignment.  Why is s390 special enough to repeat all
>> the mistakes that x86 did?  Thanks,
>>
> 
> Is this your personal opinion or was this a strategic decision of the
> QEMU/KVM community? Can anybody give us direction about this?
> 
> Actually I can understand your point. In the last

Re: [RFC][patch 0/6] pci pass-through support for qemu/KVM on s390

2014-09-05 Thread Alexander Graf


On 04.09.14 12:52, frank.blasc...@de.ibm.com wrote:
> This set of patches implements pci pass-through support for qemu/KVM on s390.
> PCI support on s390 is very different from other platforms.
> Major differences are:
> 
> 1) all PCI operations are driven by special s390 instructions
> 2) all s390 PCI instructions are privileged
> 3) PCI config and memory spaces can not be mmap'ed

That's ok, vfio abstracts config space anyway.

> 4) no classic interrupts (INTX, MSI). The pci hw understands the concept
>of requesting MSIX irqs but irqs are delivered as s390 adapter irqs.

This is in line with other implementations. Interrupts go from

  device -> PHB -> PIC -> CPU

(some times you can have another converter device in between)

In your case, the PHB converts INTX and MSI interrupts to Adapter
interrupts to go to the floating interrupt controller. Same thing as
everyone else really.

> 5) For DMA access there is always an IOMMU required. s390 pci implementation
>does not support a complete memory to iommu mapping, dma mappings are
>created on request.

Sounds great :). So I suppose we should implement a guest facing IOMMU?

> 6) The OS does not get any informations about the physical layout
>of the PCI bus.

So how does it know whether different devices are behind the same IOMMU
context? Or can we assume that every device has its own context?

> 7) To take advantage of system z specific virtualization features
>we need to access the SIE control block residing in the kernel KVM

Pleas elaborate.

> 8) To enable system z specific virtualization features we have to manipulate
>the zpci device in kernel.

Why?

> 
> For this reasons I decided to implement a kernel based approach similar
> to x86 device assignment. There is a new qemu device (s390-pci) representing a

I fail to see the rationale and I definitely don't want to see anything
even remotely similar to the legacy x86 device assignment on s390 ;).

Can't we just enhance VFIO?

Also, I think we'll get the cleanest model if we start off with an
implementation that allows us to add emulated PCI devices to an s390x
machine and only then follow on with physical ones.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][patch 0/6] pci pass-through support for qemu/KVM on s390

2014-09-05 Thread Frank Blaschka
On Thu, Sep 04, 2014 at 07:16:24AM -0600, Alex Williamson wrote:
> On Thu, 2014-09-04 at 12:52 +0200, frank.blasc...@de.ibm.com wrote:
> > This set of patches implements pci pass-through support for qemu/KVM on 
> > s390.
> > PCI support on s390 is very different from other platforms.
> > Major differences are:
> > 
> > 1) all PCI operations are driven by special s390 instructions
> 
> Generating config cycles is always arch specific.
> 
> > 2) all s390 PCI instructions are privileged
> 
> While the operations to generate config cycles on x86 are not
> privileged, they must be arbitrated between accesses, so in a sense
> they're privileged.
> 
> > 3) PCI config and memory spaces can not be mmap'ed
> 
> VFIO has mapping flags that allow any region to specify mmap support.
>

Hi Alex,

thx for your reply.

Let me elaborate a little bit ore on 1 - 3. Config and memory space can not
be accessed via memory operations. You have to use special s390 instructions.
This instructions can not be executed in user space. So there is no other
way than executing this instructions in kernel. Yes vfio does support a
slow path via ioctrl we could use, but this seems suboptimal from performance
point of view.
 
> > 4) no classic interrupts (INTX, MSI). The pci hw understands the concept
> >of requesting MSIX irqs but irqs are delivered as s390 adapter irqs.
> 
> VFIO delivers interrupts as eventfds regardless of the underlying
> platform mechanism.
> 

yes that's right, but then we have to do platform specific stuff to present
the irq to the guest. I do not say this is impossible but we have add s390
specific code to vfio. 

> > 5) For DMA access there is always an IOMMU required.
> 
> x86 requires the same.
> 
> >  s390 pci implementation
> >does not support a complete memory to iommu mapping, dma mappings are
> >created on request.
> 
> Sounds like POWER.

Don't know the details from power, maybe it is similar but not the same.
We might be able to extend vfio to have a new interface allowing
us to do DMA mappings on request.

> 
> > 6) The OS does not get any informations about the physical layout
> >of the PCI bus.
> 
> If that means that every device is isolated (seems unlikely for
> multifunction devices) then that makes IOMMU group support really easy.
>

OK
 
> > 7) To take advantage of system z specific virtualization features
> >we need to access the SIE control block residing in the kernel KVM
> 
> The KVM-VFIO device allows interaction between VFIO devices and KVM.
> 
> > 8) To enable system z specific virtualization features we have to manipulate
> >the zpci device in kernel.
> 
> VFIO supports different device backends, currently pci_dev and working
> towards platform devices.  zpci might just be an extension to standard
> pci.
> 

7 - 8 At least this is not as straightforward as the pure kernel approach, but
I have to dig into that in more detail if we could only agree on a vfio 
solution.

> > For this reasons I decided to implement a kernel based approach similar
> > to x86 device assignment. There is a new qemu device (s390-pci) 
> > representing a
> > pass through device on the host. Here is a sample qemu device configuration:
> > 
> > -device s390-pci,host=:00:00.0
> > 
> > The device executes the KVM_ASSIGN_PCI_DEVICE ioctl to create a proxy 
> > instance
> > in the kernel KVM and connect this instance to the host pci device.
> > 
> > kernel patches apply to linux-kvm
> > 
> > s390: cio: chsc function to register GIB
> > s390: pci: export pci functions for pass-through usage
> > KVM: s390: Add GISA support
> > KVM: s390: Add PCI pass-through support
> > 
> > qemu patches apply to qemu-master
> > 
> > s390: Add PCI bus support
> > s390: Add PCI pass-through device support
> > 
> > Feedback and discussion is highly welcome ...
> 
> KVM-based device assignment needs to go away.  It's a horrible model for
> devices, it offers very little protection to the kernel, assumes every
> device is fully isolated and visible to the IOMMU, relies on smattering
> of sysfs files to operate, etc.  x86, POWER, and ARM are all moving to
> VFIO-based device assignment.  Why is s390 special enough to repeat all
> the mistakes that x86 did?  Thanks,
> 

Is this your personal opinion or was this a strategic decision of the
QEMU/KVM community? Can anybody give us direction about this?

Actually I can understand your point. In the last weeks I did some development
and testing regarding the use of vfio too. But the in kernel solutions seems to
offer the best performance and most straighforward implementation for our
platform.

Greetings,

Frank

> Alex
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.k

Re: [RFC][patch 0/6] pci pass-through support for qemu/KVM on s390

2014-09-04 Thread Alex Williamson
On Thu, 2014-09-04 at 12:52 +0200, frank.blasc...@de.ibm.com wrote:
> This set of patches implements pci pass-through support for qemu/KVM on s390.
> PCI support on s390 is very different from other platforms.
> Major differences are:
> 
> 1) all PCI operations are driven by special s390 instructions

Generating config cycles is always arch specific.

> 2) all s390 PCI instructions are privileged

While the operations to generate config cycles on x86 are not
privileged, they must be arbitrated between accesses, so in a sense
they're privileged.

> 3) PCI config and memory spaces can not be mmap'ed

VFIO has mapping flags that allow any region to specify mmap support.

> 4) no classic interrupts (INTX, MSI). The pci hw understands the concept
>of requesting MSIX irqs but irqs are delivered as s390 adapter irqs.

VFIO delivers interrupts as eventfds regardless of the underlying
platform mechanism.

> 5) For DMA access there is always an IOMMU required.

x86 requires the same.

>  s390 pci implementation
>does not support a complete memory to iommu mapping, dma mappings are
>created on request.

Sounds like POWER.

> 6) The OS does not get any informations about the physical layout
>of the PCI bus.

If that means that every device is isolated (seems unlikely for
multifunction devices) then that makes IOMMU group support really easy.

> 7) To take advantage of system z specific virtualization features
>we need to access the SIE control block residing in the kernel KVM

The KVM-VFIO device allows interaction between VFIO devices and KVM.

> 8) To enable system z specific virtualization features we have to manipulate
>the zpci device in kernel.

VFIO supports different device backends, currently pci_dev and working
towards platform devices.  zpci might just be an extension to standard
pci.

> For this reasons I decided to implement a kernel based approach similar
> to x86 device assignment. There is a new qemu device (s390-pci) representing a
> pass through device on the host. Here is a sample qemu device configuration:
> 
> -device s390-pci,host=:00:00.0
> 
> The device executes the KVM_ASSIGN_PCI_DEVICE ioctl to create a proxy instance
> in the kernel KVM and connect this instance to the host pci device.
> 
> kernel patches apply to linux-kvm
> 
> s390: cio: chsc function to register GIB
> s390: pci: export pci functions for pass-through usage
> KVM: s390: Add GISA support
> KVM: s390: Add PCI pass-through support
> 
> qemu patches apply to qemu-master
> 
> s390: Add PCI bus support
> s390: Add PCI pass-through device support
> 
> Feedback and discussion is highly welcome ...

KVM-based device assignment needs to go away.  It's a horrible model for
devices, it offers very little protection to the kernel, assumes every
device is fully isolated and visible to the IOMMU, relies on smattering
of sysfs files to operate, etc.  x86, POWER, and ARM are all moving to
VFIO-based device assignment.  Why is s390 special enough to repeat all
the mistakes that x86 did?  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][patch 0/6] pci pass-through support for qemu/KVM on s390

2014-09-04 Thread frank . blaschka
This set of patches implements pci pass-through support for qemu/KVM on s390.
PCI support on s390 is very different from other platforms.
Major differences are:

1) all PCI operations are driven by special s390 instructions
2) all s390 PCI instructions are privileged
3) PCI config and memory spaces can not be mmap'ed
4) no classic interrupts (INTX, MSI). The pci hw understands the concept
   of requesting MSIX irqs but irqs are delivered as s390 adapter irqs.
5) For DMA access there is always an IOMMU required. s390 pci implementation
   does not support a complete memory to iommu mapping, dma mappings are
   created on request.
6) The OS does not get any informations about the physical layout
   of the PCI bus.
7) To take advantage of system z specific virtualization features
   we need to access the SIE control block residing in the kernel KVM
8) To enable system z specific virtualization features we have to manipulate
   the zpci device in kernel.

For this reasons I decided to implement a kernel based approach similar
to x86 device assignment. There is a new qemu device (s390-pci) representing a
pass through device on the host. Here is a sample qemu device configuration:

-device s390-pci,host=:00:00.0

The device executes the KVM_ASSIGN_PCI_DEVICE ioctl to create a proxy instance
in the kernel KVM and connect this instance to the host pci device.

kernel patches apply to linux-kvm

s390: cio: chsc function to register GIB
s390: pci: export pci functions for pass-through usage
KVM: s390: Add GISA support
KVM: s390: Add PCI pass-through support

qemu patches apply to qemu-master

s390: Add PCI bus support
s390: Add PCI pass-through device support

Feedback and discussion is highly welcome ...
Thx!

Frank

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html