On 06/21/2013 12:34 PM, Alex Williamson wrote:
> On Fri, 2013-06-21 at 11:56 +1000, Alexey Kardashevskiy wrote:
>> On 06/21/2013 02:51 AM, Alex Williamson wrote:
>>> On Fri, 2013-06-21 at 00:08 +1000, Alexey Kardashevskiy wrote:
>>>> At the moment QEMU creates a route for every MSI IRQ.
>>>>
>>>> Now we are about to add IRQFD support on PPC64-pseries platform.
>>>> pSeries already has in-kernel emulated interrupt controller with
>>>> 8192 IRQs. Also, pSeries PHB already supports MSIMessage to IRQ
>>>> mapping as a part of PAPR requirements for MSI/MSIX guests.
>>>> Specifically, the pSeries guest does not touch MSIMessage's at
>>>> all, instead it uses rtas_ibm_change_msi and 
>>>> rtas_ibm_query_interrupt_source
>>>> rtas calls to do the mapping.
>>>>
>>>> Therefore we do not really need more routing than we got already.
>>>> The patch introduces the infrastructure to enable direct IRQ mapping.
>>>>
>>>> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru>
>>>>
>>>> ---
>>>>
>>>> The patch is raw and ugly indeed, I made it only to demonstrate
>>>> the idea and see if it has right to live or not.
>>>>
>>>> For some reason which I do not really understand (limited GSI numbers?)
>>>> the existing code always adds routing and I do not see why we would need 
>>>> it.
>>>
>>> It's an IOAPIC, a pin gets toggled from the device and an MSI message
>>> gets written to the CPU.  So the route allocates and programs the
>>> pin->MSI, then we tell it what notifier triggers that pin.
>>
>>> On x86 the MSI vector doesn't encode any information about the device
>>> sending the MSI, here you seem to be able to figure out the device and
>>> vector space number from the address.  Then your pin to MSI is
>>> effectively fixed.  So why isn't this just your
>>> kvm_irqchip_add_msi_route function?  On pSeries it's a lookup, on x86
>>> it's a allocate and program.
>>>  What does kvm_irqchip_add_msi_route do on
>>> pSeries today?  Thanks,
>>
>>
>> As we just started implementing this thing, I commented it out for the
>> starter. Once called, it destroys direct mapping in the host kernel and
>> everything stops working as routing is not implemented (yet? ever?).
> 
> Yay, it's broken, you can rewrite it ;)


There is nothing to rewrite, my understanding is that it is just not
written yet and Paul would like not do that :)


>> My point here is that MSIMessage to irq translation is made on a PCI domain
>> as PAPR (ppc64 server) spec says. The guest never uses MSIMessage, it is
>> all in QEMU, the guest dynamically allocates MSI IRQs and it is up to a
>> hypeviser (QEMU) to take care of actual MSIMessage for the device.
> 
> MSIMessage is what the guest has programmed for the address/data fields,
> it's not just a QEMU invention.  From the guest perspective, the device
> writes msg.data to msg.address to signal the CPU for the interrupt.


Our guests do never program MSIMessage. Hypercalls are used instead.


>> And the only reason to use MSIMessage in QEMU for us is to support
>> msi_notify()/msix_notify() in places like vfio_msi_interrupt(), I have
>> added a MSI window for that long time ago which we do not need as much as
>> we already have an irq number in vfio_msi_interrupt(), etc.
> 
> It seems like you just have another layer of indirection via your
> msi_table.  For x86 there's a layer of indirection via the virq virtual
> IOAPIC pin.  Seems similar.  Thanks,


Do not follow you, sorry. For x86, is it that MSI routing table which is
updated via KVM_SET_GSI_ROUTING in KVM? When there is no KVM, what piece of
code responds on msi_notify() in qemu-x86 and does qemu_irq_pulse()?



> 
> Alex
> 
>>>> ---
>>>>  hw/misc/vfio.c           |   11 +++++++++--
>>>>  hw/pci/pci.c             |   13 +++++++++++++
>>>>  hw/ppc/spapr_pci.c       |   13 +++++++++++++
>>>>  hw/virtio/virtio-pci.c   |   26 ++++++++++++++++++++------
>>>>  include/hw/pci/pci.h     |    4 ++++
>>>>  include/hw/pci/pci_bus.h |    1 +
>>>>  6 files changed, 60 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/hw/misc/vfio.c b/hw/misc/vfio.c
>>>> index 14aac04..2d9eef7 100644
>>>> --- a/hw/misc/vfio.c
>>>> +++ b/hw/misc/vfio.c
>>>> @@ -639,7 +639,11 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, 
>>>> unsigned int nr,
>>>>       * Attempt to enable route through KVM irqchip,
>>>>       * default to userspace handling if unavailable.
>>>>       */
>>>> -    vector->virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : -1;
>>>> +
>>>> +    vector->virq = msg ? pci_bus_map_msi(vdev->pdev.bus, *msg) : -1;
>>>> +    if (vector->virq < 0) {
>>>> +        vector->virq = msg ? kvm_irqchip_add_msi_route(kvm_state, *msg) : 
>>>> -1;
>>>> +    }
>>>>      if (vector->virq < 0 ||
>>>>          kvm_irqchip_add_irqfd_notifier(kvm_state, &vector->interrupt,
>>>>                                         vector->virq) < 0) {
>>>> @@ -807,7 +811,10 @@ retry:
>>>>           * Attempt to enable route through KVM irqchip,
>>>>           * default to userspace handling if unavailable.
>>>>           */
>>>> -        vector->virq = kvm_irqchip_add_msi_route(kvm_state, msg);
>>>> +        vector->virq = pci_bus_map_msi(vdev->pdev.bus, msg);
>>>> +        if (vector->virq < 0) {
>>>> +            vector->virq = kvm_irqchip_add_msi_route(kvm_state, msg);
>>>> +        }
>>>>          if (vector->virq < 0 ||
>>>>              kvm_irqchip_add_irqfd_notifier(kvm_state, &vector->interrupt,
>>>>                                             vector->virq) < 0) {
>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>> index a976e46..a9875e9 100644
>>>> --- a/hw/pci/pci.c
>>>> +++ b/hw/pci/pci.c
>>>> @@ -1254,6 +1254,19 @@ void pci_device_set_intx_routing_notifier(PCIDevice 
>>>> *dev,
>>>>      dev->intx_routing_notifier = notifier;
>>>>  }
>>>>  
>>>> +void pci_bus_set_map_msi_fn(PCIBus *bus, pci_map_msi_fn map_msi_fn)
>>>> +{
>>>> +    bus->map_msi = map_msi_fn;
>>>> +}
>>>> +
>>>> +int pci_bus_map_msi(PCIBus *bus, MSIMessage msg)
>>>> +{
>>>> +    if (bus->map_msi) {
>>>> +        return bus->map_msi(bus, msg);
>>>> +    }
>>>> +    return -1;
>>>> +}
>>>> +
>>>>  /*
>>>>   * PCI-to-PCI bridge specification
>>>>   * 9.1: Interrupt routing. Table 9-1
>>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>>>> index 80408c9..9ef9a29 100644
>>>> --- a/hw/ppc/spapr_pci.c
>>>> +++ b/hw/ppc/spapr_pci.c
>>>> @@ -500,6 +500,18 @@ static void spapr_msi_write(void *opaque, hwaddr addr,
>>>>      qemu_irq_pulse(xics_get_qirq(spapr->icp, irq));
>>>>  }
>>>>  
>>>> +static int spapr_msi_get_irq(PCIBus *bus, MSIMessage msg)
>>>> +{
>>>> +    DeviceState *par = bus->qbus.parent;
>>>> +    sPAPRPHBState *sphb = (sPAPRPHBState *) par;
>>>> +    unsigned long addr = msg.address - sphb->msi_win_addr;
>>>> +    int ndev = addr >> 16;
>>>> +    int vec = ((addr & 0xFFFF) >> 2) | msg.data;
>>>> +    uint32_t irq = sphb->msi_table[ndev].irq + vec;
>>>> +
>>>> +    return (int)irq;
>>>> +}
>>>> +
>>>>  static const MemoryRegionOps spapr_msi_ops = {
>>>>      /* There is no .read as the read result is undefined by PCI spec */
>>>>      .read = NULL,
>>>> @@ -664,6 +676,7 @@ static int _spapr_phb_init(SysBusDevice *s)
>>>>  
>>>>          sphb->lsi_table[i].irq = irq;
>>>>      }
>>>> +    pci_bus_set_map_msi_fn(bus, spapr_msi_get_irq);
>>>>  
>>>>      return 0;
>>>>  }
>>>> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
>>>> index d309416..587f53e 100644
>>>> --- a/hw/virtio/virtio-pci.c
>>>> +++ b/hw/virtio/virtio-pci.c
>>>> @@ -472,6 +472,8 @@ static unsigned virtio_pci_get_features(DeviceState *d)
>>>>      return proxy->host_features;
>>>>  }
>>>>  
>>>> +extern int spapr_msi_get_irq(PCIBus *bus, MSIMessage *msg);
>>>> +
>>>>  static int kvm_virtio_pci_vq_vector_use(VirtIOPCIProxy *proxy,
>>>>                                          unsigned int queue_no,
>>>>                                          unsigned int vector,
>>>> @@ -481,7 +483,10 @@ static int 
>>>> kvm_virtio_pci_vq_vector_use(VirtIOPCIProxy *proxy,
>>>>      int ret;
>>>>  
>>>>      if (irqfd->users == 0) {
>>>> -        ret = kvm_irqchip_add_msi_route(kvm_state, msg);
>>>> +        ret = pci_bus_map_msi(proxy->pci_dev.bus, msg);
>>>> +        if (ret < 0) {
>>>> +            ret = kvm_irqchip_add_msi_route(kvm_state, msg);
>>>> +        }
>>>>          if (ret < 0) {
>>>>              return ret;
>>>>          }
>>>> @@ -609,14 +614,23 @@ static int 
>>>> virtio_pci_vq_vector_unmask(VirtIOPCIProxy *proxy,
>>>>      VirtQueue *vq = virtio_get_queue(proxy->vdev, queue_no);
>>>>      EventNotifier *n = virtio_queue_get_guest_notifier(vq);
>>>>      VirtIOIRQFD *irqfd;
>>>> -    int ret = 0;
>>>> +    int ret = 0, tmp;
>>>>  
>>>>      if (proxy->vector_irqfd) {
>>>>          irqfd = &proxy->vector_irqfd[vector];
>>>> -        if (irqfd->msg.data != msg.data || irqfd->msg.address != 
>>>> msg.address) {
>>>> -            ret = kvm_irqchip_update_msi_route(kvm_state, irqfd->virq, 
>>>> msg);
>>>> -            if (ret < 0) {
>>>> -                return ret;
>>>> +
>>>> +        tmp = pci_bus_map_msi(proxy->pci_dev.bus, msg);
>>>> +        if (tmp >= 0) {
>>>> +            if (irqfd->virq != tmp) {
>>>> +                fprintf(stderr, "FIXME: MSI(-X) vector has changed from 
>>>> %X to %x\n",
>>>> +                        irqfd->virq, tmp);
>>>> +            }
>>>> +        } else {
>>>> +            if (irqfd->msg.data != msg.data || irqfd->msg.address != 
>>>> msg.address) {
>>>> +                ret = kvm_irqchip_update_msi_route(kvm_state, 
>>>> irqfd->virq, msg);
>>>> +                if (ret < 0) {
>>>> +                    return ret;
>>>> +                }
>>>>              }
>>>>          }
>>>>      }
>>>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
>>>> index 8797802..632739a 100644
>>>> --- a/include/hw/pci/pci.h
>>>> +++ b/include/hw/pci/pci.h
>>>> @@ -332,6 +332,7 @@ MemoryRegion *pci_address_space_io(PCIDevice *dev);
>>>>  typedef void (*pci_set_irq_fn)(void *opaque, int irq_num, int level);
>>>>  typedef int (*pci_map_irq_fn)(PCIDevice *pci_dev, int irq_num);
>>>>  typedef PCIINTxRoute (*pci_route_irq_fn)(void *opaque, int pin);
>>>> +typedef int (*pci_map_msi_fn)(PCIBus *bus, MSIMessage msg);
>>>>  
>>>>  typedef enum {
>>>>      PCI_HOTPLUG_DISABLED,
>>>> @@ -375,6 +376,9 @@ bool pci_intx_route_changed(PCIINTxRoute *old, 
>>>> PCIINTxRoute *new);
>>>>  void pci_bus_fire_intx_routing_notifier(PCIBus *bus);
>>>>  void pci_device_set_intx_routing_notifier(PCIDevice *dev,
>>>>                                            PCIINTxRoutingNotifier 
>>>> notifier);
>>>> +void pci_bus_set_map_msi_fn(PCIBus *bus, pci_map_msi_fn map_msi_fn);
>>>> +int pci_bus_map_msi(PCIBus *bus, MSIMessage msg);
>>>> +
>>>>  void pci_device_reset(PCIDevice *dev);
>>>>  void pci_bus_reset(PCIBus *bus);
>>>>  
>>>> diff --git a/include/hw/pci/pci_bus.h b/include/hw/pci/pci_bus.h
>>>> index 66762f6..81efd2b 100644
>>>> --- a/include/hw/pci/pci_bus.h
>>>> +++ b/include/hw/pci/pci_bus.h
>>>> @@ -16,6 +16,7 @@ struct PCIBus {
>>>>      pci_set_irq_fn set_irq;
>>>>      pci_map_irq_fn map_irq;
>>>>      pci_route_irq_fn route_intx_to_irq;
>>>> +    pci_map_msi_fn map_msi;
>>>>      pci_hotplug_fn hotplug;
>>>>      DeviceState *hotplug_qdev;
>>>>      void *irq_opaque;
>>>
>>>
>>>
>>
>>
> 
> 
> 


-- 
Alexey

Reply via email to