On Wed, 2018-01-24 at 19:13 +0000, Ghannam, Yazen wrote: > > -----Original Message----- > > From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel- > > ow...@vger.kernel.org] On Behalf Of Lyude Paul > > Sent: Wednesday, January 24, 2018 12:49 PM > > To: Thomas Gleixner <t...@linutronix.de> > > Cc: h...@zytor.com; keith.bu...@intel.com; mi...@kernel.org; linux- > > ker...@vger.kernel.org > > Subject: Re: "irq/matrix: Spread interrupts on allocation" breaks nouveau > > in > > mainline kernel > > > > Hi, please ignore the warning: it happens before and after the regressing > > commit (I didn't actually mean to include it on the log I gave here, > > whoops). > > As for how I determined nouveau is getting assigned the same IRQ vector as > > another device, I checked using /sys/kernel/debug/irq. Additionally; when > > nouveau does initialize properly after resume (e.g. after reverting this > > patch) I see it get assigned a seperate vector from the other devices. > > > > +Boris. This thread seems to have split. > > Lyude, > Does the warning show on mainline or does it only show when bisecting? > > Sorry, I'm not sure what you mean by "it happens before and after the > regressing commit". Sorry about that! Let me clarify a little bit: this is a problem that shows up on mainline. Normally when we suspend the GPU in nouveau, we free the IRQs it's using before going into suspend (drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c:88), then reserve IRQs again on resume (drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c:134). Since this patch got pushed to mainline, the IRQ we get from request_irq() ends up having the same MSI vector as another device on the system:
Before suspend, nouveau's IRQ allocation: handler: handle_edge_irq device: 0000:22:00.0 status: 0x00000000 istate: 0x00000000 ddepth: 0 wdepth: 0 dstate: 0x01400200 IRQD_ACTIVATED IRQD_IRQ_STARTED IRQD_SINGLE_TARGET node: 0 affinity: 0-7 effectiv: 1 pending: domain: PCI-MSI-2 hwirq: 0x1100000 chip: PCI-MSI flags: 0x10 IRQCHIP_SKIP_SET_WAKE parent: domain: VECTOR hwirq: 0x2f chip: APIC flags: 0x0 Vector: 35 Target: 1 After resume and allocating the interrupt for nouveau again, we get a message from the kernel saying: [ 217.150787] do_IRQ: 1.35 No irq handler for vector As well, nouveau ends up getting no interrupts from the card and as a result fails to come back up: [ 219.153049] nouveau 0000:22:00.0: DRM: EVO timeout [ 220.226254] r8169 0000:1e:00.0 enp30s0: link up [ 221.153054] nouveau 0000:22:00.0: DRM: base-0: timeout [ 223.153528] nouveau 0000:22:00.0: DRM: base-0: timeout If we look through all of the other IRQ allocations, we'll find that now two devices have the MSI vector 35: nouveau: handler: handle_edge_irq device: 0000:22:00.0 status: 0x00000000 istate: 0x00000000 ddepth: 0 wdepth: 0 dstate: 0x01400200 IRQD_ACTIVATED IRQD_IRQ_STARTED IRQD_SINGLE_TARGET node: 0 affinity: 0-7 effectiv: 1 pending: domain: PCI-MSI-2 hwirq: 0x1100000 chip: PCI-MSI flags: 0x10 IRQCHIP_SKIP_SET_WAKE parent: domain: VECTOR hwirq: 0x2f chip: APIC flags: 0x0 Vector: 35 Target: 1 and the PCI bridge (00:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge): handler: handle_edge_irq device: 0000:00:01.3 status: 0x00000000 istate: 0x00000000 ddepth: 0 wdepth: 0 dstate: 0x03400200 IRQD_ACTIVATED IRQD_IRQ_STARTED IRQD_SINGLE_TARGET node: 0 affinity: 0-7 effectiv: 0 pending: domain: PCI-MSI-2 hwirq: 0x5800 chip: PCI-MSI flags: 0x10 IRQCHIP_SKIP_SET_WAKE parent: domain: VECTOR hwirq: 0x19 chip: APIC flags: 0x0 Vector: 35 Target: 0 hope this helps clarify, I will keep looking at this from my end as well > > > Boris, > In any case, I like your idea on saving the block addresses. I can look into > this. > > Thanks, > Yazen -- Cheers, Lyude Paul