On Fri, Mar 04, 2016 at 01:57:05PM +0100, Markus Armbruster wrote: > "Michael S. Tsirkin" <m...@redhat.com> writes: > > > On Fri, Mar 04, 2016 at 09:42:02AM +0100, Markus Armbruster wrote: > >> Plugging an MSI-capable device into an MSI-incapable board works just > >> fine, both for physical and for virtual hardware. What doesn't work is > >> plugging an MSI-capable device into an MSI-capable board with *broken* > >> MSI support. > >> > >> As a convenience feature, we summarily and silently remove a device's > >> MSI capability when we detect such a broken board. At least that's what > >> the commit message you quoted claims. > > > > And this makes sense, right? > > Yes, except for the part where we ignore the user's explicit orders, > and, to a lesser degree, for the part where we create variants of > physical devices that don't exist physically. > > >> In reality, we remove it not just for broken boards, but even for > >> MSI-incapable boards. > >> > >> I take issue with "summarily and silently", and "even for MSI-incapable > >> boards". > >> > >> When multiple variants of a device exist, and the user didn't ask for a > >> specific one, then picking the one that works best with the board is > >> just fine. > >> > >> It's absolutely not fine when the user did ask for a specific one. When > >> I ask for msi=on, I mean it. If it can't work with this board, tell me. > >> But silently ignoring my orders is a bug. > > > > I agree. msi is not the only case either. We really need - for any boolean > > flag - a way to figure out whether it was set by user. > > With that in place we could fix it. > > QMP provides that directly as "optional bool", but qdev properties do > "optional" diffently, and when you see the default value, you don't know > whether it comes from the user or not. > > Another solution is an on/off/auto type. We got it already in the QAPI > schema, as OnOffAuto, and my recent "[PATCH 32/38] qdev: New > DEFINE_PROP_ON_OFF_AUTO" makes it available as qdev property. With > default set to auto, we should be set.
Should we somehow change all on/off properties to on/off/auto? > > However, almost no one uses the msi/msi-x flag - we introduced > > them only for one reason - for backwards compatibility. > > The fact that each time we need a compatibility flag > > we also expose a new user interface is very unfortunate. > > IMO it was a design mistake made a long time ago and it won't > > be easy to fix now. > > I agree there's no easy fix, but we can try to find a non-easy one. > > For backward compatibility, we need to configure some device models for > certain machine types. We use the only object configuration mechanism > we have: properties. The properties we use for compatibility are all > exposed to the user. > > We could easily provide a flag to mark properties private, and only > accept non-private properties at external interfaces. This should help > stopping growth of the problem, but it's not an easy fix for reducing > it, because making a property private retroactively is problematic. We > could have a flag to mark them deprecated instead, warn on use, remove > them from documentation, and perhaps drop them a couple of releases > later. Sounds good. > > And for the above reason I personally do not intend to > > spend time designing a specific hack just for the msi > > property. > > > >> It's fine to have emulations of MSI-capable boards where MSI doesn't yet > >> work. Even if that means we have to reject MSI-capable devices. > > > > I don't know what does reject mean here. Removing msi capability? > > In that case I agree. > > By "reject" I mean "reject the user's order to plug in an MSI-capable > device." I don't believe anyone tweaks these properties in practice. However, I have to wonder. Assume that you have supplied a device with 10 properties. QEMU can not support them. At your suggesion, device is rejected. How does user know which property to tweak? Try all values for them all? > >> It's absolutely not fine to reject them for MSI-incapable boards, where > >> they'd work just fine. > > > > I think that as long as users did not ask for msi explicitly, > > and board is msi incapable, it does not matter much whether > > device has msi capability or not - guest will not try > > to use it anyway. > > If the device comes in MSI-capable and MSI-incapable variants, and the > user didn't specify a variant, then picking one that fits the board is > fine. > > If the device does not come in variants (and many physical devices > don't), then rejecting it just because it's MSI-capable and the board > isn't is not fine. > > To fix, we'd have to limit what is now !msi_supported to the boards that > are nominally MSI-capable, except our emulation doesn't work, i.e. do > exactly what the commit message promised. I rather think it's academic. MSI is a performance optimization, and is optional for guests anyway. It's hard to see when may users absolutely require it. > The PCI core encourages creating both variants, and most (but not all) > device models do, but: > > >> Furthermore, I doubt the wisdom of creating virtual devices that don't > >> exist physically just to have something that works in our broken boards. > >> If the physical device exists in MSI-capable and MSI-incapable variants, > >> emulating both is fine. But if it only ever existed MSI-capable, having > >> the PCI core(!) drop the MSI capability is a questionable idea. I > >> suspect that the need for this dubious hack would be much smaller if we > >> didn't foolishly treat every MSI-incapable board as broken MSI-capable > >> board. > >> > >> If you conclude that cleaning up this carelessly made mess is not worth > >> the bother now, then at least explain the mess in the code, please. > >> It's not obvious, and figuring out what's going on and why it is the way > >> it is has taken me several hours, and Marcel's help. > > > > I think it's worth cleaning up, or at least documenting. > > Fixing it will take much more than the patch proposed here, > > and we can not start with this patch as it will cause > > regressions. > > Adding a comment won't be too much work. > > How about the below? > > > > --> > > > > msi_supported -> msi_nonbroken > > > > Rename controller flag to make it clearer what it means. > > Add some documentation as well. > > > > Signed-off-by: Michael S. Tsirkin <m...@redhat.com> > > > > --- > > > > diff --git a/include/hw/pci/msi.h b/include/hw/pci/msi.h > > index 50e452b..8124908 100644 > > --- a/include/hw/pci/msi.h > > +++ b/include/hw/pci/msi.h > > @@ -29,7 +29,7 @@ struct MSIMessage { > > uint32_t data; > > }; > > > > -extern bool msi_supported; > > +extern bool msi_nonbroken; > > > > void msi_set_message(PCIDevice *dev, MSIMessage msg); > > MSIMessage msi_get_message(PCIDevice *dev, unsigned int vector); > > diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c > > index 694d398..3c7c8fa 100644 > > --- a/hw/i386/kvm/apic.c > > +++ b/hw/i386/kvm/apic.c > > @@ -186,7 +186,7 @@ static void kvm_apic_realize(DeviceState *dev, Error > > **errp) > > APIC_SPACE_SIZE); > > > > if (kvm_has_gsi_routing()) { > > - msi_supported = true; > > + msi_nonbroken = true; > > } > > } > > > > diff --git a/hw/i386/xen/xen_apic.c b/hw/i386/xen/xen_apic.c > > index 2b8d709..21d68ee 100644 > > --- a/hw/i386/xen/xen_apic.c > > +++ b/hw/i386/xen/xen_apic.c > > @@ -44,7 +44,7 @@ static void xen_apic_realize(DeviceState *dev, Error > > **errp) > > s->vapic_control = 0; > > memory_region_init_io(&s->io_memory, OBJECT(s), &xen_apic_io_ops, s, > > "xen-apic-msi", APIC_SPACE_SIZE); > > - msi_supported = true; > > + msi_nonbroken = true; > > } > > > > static void xen_apic_set_base(APICCommonState *s, uint64_t val) > > diff --git a/hw/intc/apic.c b/hw/intc/apic.c > > index a299462..28c2ea5 100644 > > --- a/hw/intc/apic.c > > +++ b/hw/intc/apic.c > > @@ -874,7 +874,7 @@ static void apic_realize(DeviceState *dev, Error **errp) > > s->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, apic_timer, s); > > local_apics[s->idx] = s; > > > > - msi_supported = true; > > + msi_nonbroken = true; > > } > > > > static void apic_class_init(ObjectClass *klass, void *data) > > diff --git a/hw/intc/arm_gicv2m.c b/hw/intc/arm_gicv2m.c > > index 70c0b97..ebd368b 100644 > > --- a/hw/intc/arm_gicv2m.c > > +++ b/hw/intc/arm_gicv2m.c > > @@ -148,7 +148,7 @@ static void gicv2m_realize(DeviceState *dev, Error > > **errp) > > sysbus_init_irq(SYS_BUS_DEVICE(dev), &s->spi[i]); > > } > > > > - msi_supported = true; > > + msi_nonbroken = true; > > kvm_gsi_direct_mapping = true; > > kvm_msi_via_irqfd_allowed = kvm_irqfds_enabled(); > > } > > diff --git a/hw/intc/openpic.c b/hw/intc/openpic.c > > index 903888c..7685250 100644 > > --- a/hw/intc/openpic.c > > +++ b/hw/intc/openpic.c > > @@ -1375,7 +1375,7 @@ static void fsl_common_init(OpenPICState *opp) > > > > opp->irq_msi = 224; > > > > - msi_supported = true; > > + msi_nonbroken = true; > > for (i = 0; i < opp->fsl->max_ext; i++) { > > opp->src[i].level = false; > > } > > diff --git a/hw/intc/openpic_kvm.c b/hw/intc/openpic_kvm.c > > index 4dcdb61..778af4a 100644 > > --- a/hw/intc/openpic_kvm.c > > +++ b/hw/intc/openpic_kvm.c > > @@ -239,7 +239,7 @@ static void kvm_openpic_realize(DeviceState *dev, Error > > **errp) > > memory_listener_register(&opp->mem_listener, &address_space_memory); > > > > /* indicate pic capabilities */ > > - msi_supported = true; > > + msi_nonbroken = true; > > kvm_kernel_irqchip = true; > > kvm_async_interrupts_allowed = true; > > > > diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c > > index 100bb5e..862a2366 100644 > > --- a/hw/pci-bridge/pci_bridge_dev.c > > +++ b/hw/pci-bridge/pci_bridge_dev.c > > @@ -72,7 +72,7 @@ static int pci_bridge_dev_initfn(PCIDevice *dev) > > goto slotid_error; > > } > > if ((bridge_dev->flags & (1 << PCI_BRIDGE_DEV_F_MSI_REQ)) && > > - msi_supported) { > > + msi_nonbroken) { > > err = msi_init(dev, 0, 1, true, true); > > if (err < 0) { > > goto msi_error; > > diff --git a/hw/pci/msi.c b/hw/pci/msi.c > > index 85f21b8..e0e64c2 100644 > > --- a/hw/pci/msi.c > > +++ b/hw/pci/msi.c > > @@ -34,8 +34,21 @@ > > > > #define PCI_MSI_VECTORS_MAX 32 > > > > -/* Flag for interrupt controller to declare MSI/MSI-X support */ > > -bool msi_supported; > > +/* > > + * Flag for interrupt controllers to declare broken MSI/MSI-X support. > > + * values: false - broken; true - non-broken. > > + * > > + * Setting this flag to false will remove MSI/MSI-X capability from all > > devices. > > + * > > + * It is preferrable for controllers to set this to true (non-broken) even > > if > > + * they do not actually support MSI/MSI-X: guests normally probe the > > controller > > + * type and do not attempt to enable MSI/MSI-X with interrupt controllers > > not > > + * supporting such, so removing the capability is not required, and > > + * it seems cleaner to have a given device look the same for all boards. > > + * > > + * TODO: some existing controllers violate the above rule. Identify and > > fix them. > > + */ > > +bool msi_nonbroken; > > Good description. I'd use FIXME rather than TODO, but that's detail. > > > > > /* If we get rid of cap allocator, we won't need this. */ > > static inline uint8_t msi_cap_sizeof(uint16_t flags) > > @@ -160,7 +173,7 @@ int msi_init(struct PCIDevice *dev, uint8_t offset, > > uint8_t cap_size; > > int config_offset; > > > > - if (!msi_supported) { > > + if (!msi_nonbroken) { > > return -ENOTSUP; > > } > > > > diff --git a/hw/pci/msix.c b/hw/pci/msix.c > > index 537fdba..b75f0e9 100644 > > --- a/hw/pci/msix.c > > +++ b/hw/pci/msix.c > > @@ -249,7 +249,7 @@ int msix_init(struct PCIDevice *dev, unsigned short > > nentries, > > uint8_t *config; > > > > /* Nothing to do if MSI is not supported by interrupt controller */ > > - if (!msi_supported) { > > + if (!msi_nonbroken) { > > return -ENOTSUP; > > } > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > index e9d4abf..c4fb206 100644 > > --- a/hw/ppc/spapr.c > > +++ b/hw/ppc/spapr.c > > @@ -439,7 +439,7 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base, > > _FDT((fdt_property_cell(fdt, "rtas-event-scan-rate", > > RTAS_EVENT_SCAN_RATE))); > > > > - if (msi_supported) { > > + if (msi_nonbroken) { > > _FDT((fdt_property(fdt, "ibm,change-msix-capable", NULL, 0))); > > } > > > > @@ -1743,7 +1743,7 @@ static void ppc_spapr_init(MachineState *machine) > > bool kernel_le = false; > > char *filename; > > > > - msi_supported = true; > > + msi_nonbroken = true; > > > > QLIST_INIT(&spapr->phbs); > > > > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > > index e8edad3..3fc7895 100644 > > --- a/hw/ppc/spapr_pci.c > > +++ b/hw/ppc/spapr_pci.c > > @@ -1790,7 +1790,7 @@ void spapr_pci_rtas_init(void) > > rtas_ibm_read_pci_config); > > spapr_rtas_register(RTAS_IBM_WRITE_PCI_CONFIG, "ibm,write-pci-config", > > rtas_ibm_write_pci_config); > > - if (msi_supported) { > > + if (msi_nonbroken) { > > spapr_rtas_register(RTAS_IBM_QUERY_INTERRUPT_SOURCE_NUMBER, > > "ibm,query-interrupt-source-number", > > rtas_ibm_query_interrupt_source_number); > > diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c > > index dba0202..f5f679f 100644 > > --- a/hw/s390x/s390-pci-bus.c > > +++ b/hw/s390x/s390-pci-bus.c > > @@ -597,7 +597,7 @@ static void s390_pcihost_class_init(ObjectClass *klass, > > void *data) > > k->init = s390_pcihost_init; > > hc->plug = s390_pcihost_hot_plug; > > hc->unplug = s390_pcihost_hot_unplug; > > - msi_supported = true; > > + msi_nonbroken = true; > > } > > > > static const TypeInfo s390_pcihost_info = { > > Much appreciated! > > Reviewed-by: Markus Armbruster <arm...@redhat.com>