On 12/21/2017 10:16 AM, Cédric Le Goater wrote: > On 12/21/2017 01:12 AM, Benjamin Herrenschmidt wrote: >> On Wed, 2017-12-20 at 16:09 +1100, David Gibson wrote: >>> >>> As you've suggested in yourself, I think we might need to more >>> explicitly model the different components of the XIVE system. As part >>> of that, I think you need to be clearer in this base skeleton about >>> exactly what component your XIVE object represents. >>> >>> If the answer is "the overall thing" I suspect that's not what you >>> want - I had one of those for XICs which proved to be a mistake >>> (eventually replaced by the XICSFabric interface). >>> >>> Changing the model later isn't impossible, but doing so without >>> breaking migration can be a real pain, so I think it's worth a >>> reasonable effort to try and get it right initially. >> >> Note: we do need to speed things up a bit, as having exploitation mode >> in KVM will significantly help with IPI performance among other things. >> >> I'm about ready to do the KVM bits. The one thing we need to discuss >> and figure a good design for is how we map all those interrupt control >> pages into qemu. >> >> Each interrupt (either PCIe pass-through or the "generic XIVE IPIs" >> which are used for guest IPIs and for vio/virtio/emulated interrupts) >> comes with a "control page" (ESB page) which needs to be mapped into >> the guest, and the generic IPIs also come with a trigger page which >> needs to be mapped into the guest for guest IPIs or OpenCAPI >> interrupts, or just qemu for emulated devices. > > what about the OS TIMA page ? Do we trap the accesses in QEMU and > forward them to KVM ? or do we use a similar mechanism. > >> Now that can be thousands of these critters. I certainly don't want to >> create thousands of VMAs in qemu and even less thousands of memory >> regions in KVM. > > we can provision one mapping per kvmppc_xive_src_block maybe ? > >> So we need some kind of mechanism by wich a single large VMA gets >> mmap'ed into qemu (or maybe a couple of these, but not too many) and >> the interrupt pages can be assigned to slots in there and demand >> faulted. > > Frederic has started to put in place a similar mecanism for OpenCAPI. > >> For the generic interrupts, this can probably be covered by KVM, adding >> some arch ioctls for allocating IPIs and mmap'ing that region etc... > > The KVM device has a ioctl handler : > > struct kvm_device_ops { > > long (*ioctl)(struct kvm_device *dev, unsigned int ioctl, > unsigned long arg); > }; > > So a KVM device for the XIVE interrupt controller can implement a couple > of extra calls for its need, like getting the VMA addresses, etc
or use set/get_attr. I wonder if it would be possible to add a 'mmap' ops to kvm_device_fops for the KVM_DEV_TYPE_XIVE device. C.