Hello, On a POWER9 sPAPR machine, the Client Architecture Support (CAS) negotiation process determines whether the guest operates with an interrupt controller using the XICS legacy model, as found on POWER8, or in XIVE exploitation mode, the newer POWER9 interrupt model. XIVE is a complex interrupt controller introducing a large number of new features, for virtualization in particular.
It is composed of three sub-engines : - Interrupt Virtualization Source Engine (IVSE). These are in PHBs, in the main controller for the IPIS and in the PSI host bridge. They are configured to feed the IVRE with events. - Interrupt Virtualization Routing Engine (IVRE). Their job is to match an event source with a Notification Virtualization Target (NVT), a priority and an Event Queue (EQ) to determine if a Virtual Processor can handle the event. - Interrupt Virtualization Presentation Engine (IVPE). It maintains the interrupt state of each hardware thread and present the notification as an external exception. Each of the engines uses a set of internal tables to redirect exceptions from event sources to CPU threads. Interrupt sources have a 2-bit state machine, the Event State Buffer (ESB), that allows events to be triggered. If the event is let through, the IVRE looks up in the Interrupt Virtualization Entry (IVE) table for the Event Queue Descriptor configured for the source. Each Event Queue Descriptor defines a notification path to a CPU and an in-memory queue in which will be recorded an event identifier for the OS to pull. The high level ideas of the current design are : - introduce a persistent XIVE object under the sPAPR machine for newer machines and let the CAS negotiation process decide whether it should be used or not. Use the 'ov5_cas' attribute for this purpose. - introduce a persistent XIVE interrupt presenter under the sPAPR core and switch ICP after CAS. Each core has now two ICPs, one active through the 'intc' pointer and another one among its children ready to be used if the guest requires it. - move the XIVE EQs under the cores to simplify the XIVE model - allocate the CPU IPIs at the beginning of the IRQ number space to be compatible with XICS (which starts at 4096) and also to simplify the model. This means that the XIVE model covers the whole IRQ number space. There are no offset like in XICS splitting the IRQ number space. The patchset first introduces new models for XIVE : - sPAPRXive holding the internal tables and the MMIO regions used by the XIVE controller. - sPAPRXiveNVT object storing the interrupt state of the CPU and acting as the XIVE interrupt presenter then, describes the notification process and the interrupt delivery to the CPU. It finishes with the integration of sPAPRXive object under the sPAPR machine, the introducion of the new XIVE hcalls, the device tree layout, and the necessary adjustments to support the CAS negotiation. Migration is addressed, CPU hotplug, and support for older machines and QEMU versions also. KVM support is not addressed yet and the guest needs to be run with kernel_irqchip=off on a POWER9 system. Code is here: https://github.com/legoater/qemu/commits/xive Thanks, C. Changes since v1 : - used g_new0 instead of g_malloc0 - removed VMSTATE_STRUCT_VARRAY_UINT32_ALLOC - introduced a device reset handler. the object needs to be parented to sysbus when created. - renamed spapr_xive_irq_set() to spapr_xive_irq_enable() - renamed spapr_xive_irq_unset() to spapr_xive_irq_disable() - moved the PPC_BIT macros under target/ppc/cpu.h - shrinked file copyright header - reworked the event notification logic of the qemu_irq handlers. - introduced XIVE_ESB_STORE_EOI support - removed 'esb_shift' field - removed a useless check on the validity of the IVE in the memory region handlers. - removed the overall ESB memory region. We now have only one region for the provisioned sources. - improved 'info pic' output - improved LSI support - renamed 'sPAPRXiveICP' to 'sPAPRXiveNVT' - renamed 'tima' field to 'regs' - renamed 'tima_os' fiels to 'ring_os' - removed 'tm_shift' field - introduced a memory region to model the User TIMA and another one for the OS TIMA. One page size for each. - removed useless checks in the memory region handlers - removed support for 970 ... - removed spapr_xive_eq_for_server() which did the EQ indexing. - changed spapr_xive_get_eq() to use a server and a priority parameter - introduced a couple of macro for the EQ indexing. - replaced dma_memory_write() by stl_be_dma() - set initial TM_PIPR to 0xFF in sPAPRXiveNVT - conditioned the creation of the sPAPRXive object to the xive_exploitation bool which false on older pseries machine. - parented the sPAPRXive object to sysbus. - simplified priority_is_valid() routine (to its minimum) - used PPC_BIT() macros to define the hcall flags - removed useless casts - defined the default characteristic of the single XIVE interrupt source to be : *XIVE_SRC_TRIGGER | XIVE_SRC_STORE_EOI* - removed EQ_W0_UCOND_NOTIFY when the EQ is reseted - fixed XIVE_EQ_DEBUG support. Offset for the generation bit was wrong - added a unit id to the nodename - added properties for the LSIs - simplified the array for the "ibm,plat-res-int-priorities" property - renamed spapr_xive_populate() to spapr_dt_xive() - moved the mapping of the XIVE memory region and the setting of the ICP under the machine reset handler. - introduced a spapr_xive_qirq() helper - introduced a spapr_xive_nvt_create() helper - handled more errors in spapr_post_load() to return EINVAL Cédric Le Goater (19): dma-helpers: add a return value to store helpers spapr: introduce a skeleton for the XIVE interrupt controller spapr: introduce the XIVE interrupt sources spapr: add support for the LSI interrupt sources spapr: introduce a XIVE interrupt presenter model spapr: introduce the XIVE Event Queues spapr: push the XIVE EQ data in OS event queue spapr: notify the CPU when the XIVE interrupt priority is more privileged spapr: add support for the SET_OS_PENDING command (XIVE) spapr: introduce a 'xive_exploitation' boolean to enable XIVE spapr: add a sPAPRXive object to the machine spapr: add hcalls support for the XIVE exploitation interrupt mode spapr: add device tree support for the XIVE interrupt mode spapr: introduce a helper to map the XIVE memory regions spapr: add XIVE support to spapr_qirq() spapr: introduce a spapr_icp_create() helper spapr: toggle the ICP depending on the selected interrupt mode spapr: add support to dump XIVE information spapr: advertise XIVE exploitation mode in CAS default-configs/ppc64-softmmu.mak | 1 + hw/intc/Makefile.objs | 1 + hw/intc/spapr_xive.c | 1013 +++++++++++++++++++++++++++++++++++++ hw/intc/spapr_xive_hcall.c | 923 +++++++++++++++++++++++++++++++++ hw/intc/xive-internal.h | 196 +++++++ hw/ppc/spapr.c | 188 ++++++- hw/ppc/spapr_cpu_core.c | 37 +- hw/ppc/spapr_hcall.c | 6 + include/hw/ppc/spapr.h | 20 +- include/hw/ppc/spapr_cpu_core.h | 1 + include/hw/ppc/spapr_xive.h | 72 +++ include/sysemu/dma.h | 4 +- 12 files changed, 2449 insertions(+), 13 deletions(-) create mode 100644 hw/intc/spapr_xive.c create mode 100644 hw/intc/spapr_xive_hcall.c create mode 100644 hw/intc/xive-internal.h create mode 100644 include/hw/ppc/spapr_xive.h -- 2.13.6