On Fri, Nov 25, 2016 at 08:47:26AM +0200, Michael S. Tsirkin wrote: > On Fri, Nov 25, 2016 at 01:28:36PM +0800, Peter Xu wrote: > > On Fri, Nov 25, 2016 at 06:11:01AM +0200, Michael S. Tsirkin wrote: > > > On Tue, Nov 22, 2016 at 04:08:50PM +0800, Peter Xu wrote: > > > > We are very strict in the past getting MSIs from commit > > > > d1f6af6a1 ("kvm-irqchip: simplify kvm_irqchip_add_msi_route"), assuming > > > > that MSI should be configured before hand when fetching. When we have > > > > unrecognized configurations, we panic the system. However looks like > > > > this is too strict to be working on some platform, and issues occured. > > > > Firstly it's found on a ppc case and fixed by David in: > > > > > > > > 6d17a01 vfio/pci: Fix regression in MSI routing configuration > > > > > > > > However we encountered another case now with windows virtio driver and > > > > reported (and possibly more): > > > > > > > > http://bugs.debian.org/844361 > > > > > > > > To make every driver/hardware happy, let's loosen the rule and go back > > > > to the original behavior - instead of panic the system, when we try to > > > > fetch MSI without configured MSI/MSI-X system, we just provide an empty > > > > message to make drivers happy. > > > > > > > > Reported-by: Maciej KotliĆski <makotlin...@gmail.com> > > > > Signed-off-by: Peter Xu <pet...@redhat.com> > > > > --- > > > > hw/pci/pci.c | 8 +++++--- > > > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c > > > > index 24fae16..0cd2bb0 100644 > > > > --- a/hw/pci/pci.c > > > > +++ b/hw/pci/pci.c > > > > @@ -2606,9 +2606,11 @@ MSIMessage pci_get_msi_message(PCIDevice *dev, > > > > int vector) > > > > } else if (msi_enabled(dev)) { > > > > msg = msi_get_message(dev, vector); > > > > } else { > > > > - /* Should never happen */ > > > > - error_report("%s: unknown interrupt type", __func__); > > > > - abort(); > > > > + /* > > > > + * Device is not configured with MSI/MSI-X yet, let's provide > > > > + * an empty message to make all device drivers happy. > > > > + */ > > > > + memset(&msg, 0, sizeof(msg)); > > > > } > > > > return msg; > > > > } > > > > > > Looks like a hack to me. Even if callers happen to treat > > > 0 message as a nop, there's no guarantee that's true for > > > all platforms. Pls change pci_get_msi_message to > > > return an error code, and fix callers to check that. > > > > Actually I'd say I still cannot understand why some guests will > > encounter this issue - the driver just tries to fetch and enable > > MSI/MSI-X messages without fully activate MSI/MSI-X in general (that's > > why msi_enabled() and msix_enabled() both returned false). > > > > But indeed some drivers won't work properly after commit d1f6af6a. > > David fixed some case for ppc and vfio (6d17a01), but problem still > > exist, like Debian's report (http://bugs.debian.org/844361). > > > > To avoid going into every details of guest drivers, I was thinking > > maybe we can just "recover" the behavior to the past: old QEMU (before > > d1f6af6a) allows to fetch an meaningless message (e.g., when > > msi_get_message() is called, it's possible that MSI is still not > > setup, but IMHO that message is meaningless). So here we emulate that > > case, and return a meaningless message. From the Debian bug report, we > > can see that this does solve the issue (yeah I know this is not even > > an excuse to agree on this patch, but again I just think this is > > currently the best way to solve the problem...). > > > > If we change this into return error, then kvm_irqchip_add_msi_route() > > might fail, and that's a different behavior again, I don't know > > whether it'll bring more troubles (only if I can test every guest > > driver with that code, but that's merely impossible). So IMHO the best > > way to solve the problem is use the current patch and try to keep the > > old behavior. > > > > I would really appreciate if you (or anyone) has better suggestion. > > > > Thanks, > > > > -- peterx > > You need to figure out the call stack. What does driver do? > I'm fine with working around it for now but > 1. maybe we should fix the driver anyway > 2. we might want to emit an error message > > -- > MST
Limin posted another comment here: http://bugs.debian.org/844361 We can see whether that bug is finally caused by pci-assign too, and whether that pci-assign fix can solve the problem. If that works, it'll be nice, then we won't need this patch at least for now. Thanks, -- peterx