Re: [Intel-gfx] [PATCH] pci: Add a few new IDs for Intel GPU "spurious interrupt" quirk

2018-10-11 Thread Bin Meng
Hi Bjorn,

On Wed, Oct 10, 2018 at 1:02 AM Bjorn Helgaas  wrote:
>
> On Mon, Oct 08, 2018 at 05:44:08PM +0800, Bin Meng wrote:
> > On Thu, Oct 4, 2018 at 4:12 AM Bjorn Helgaas  wrote:
> > > On Thu, Sep 27, 2018 at 10:10:07AM +0800, Bin Meng wrote:
> > > > On Thu, Sep 27, 2018 at 12:57 AM Bjorn Helgaas  
> > > > wrote:
> > > > > On Wed, Sep 26, 2018 at 08:14:01AM -0700, Bin Meng wrote:
> > > > > > Add more PCI IDs to the Intel GPU "spurious interrupt" quirk table,
> > > > > > which are known to break.
> > > > >
> > > > > Do you have a reference for this?  Any public bug reports, bugzilla,
> > > > > Intel spec reference or errata?  "Which are known to break" is pretty
> > > > > vague.
> > > >
> > > > Sorry I used wrong words and should have been clearer. These devices
> > > > are validated to be broken. The test I used is very simple, just
> > > > unplug the VGA cable and plug it again, and "spurious interrupt" will
> > > > be seen on the interrupt line of the IGD device. I was not aware of
> > > > any public bugs filed to Intel, nor seen any errata from Intel.
> > >
> > > The original commit, f67fd55fa96f ("PCI: Add quirk for still enabled
> > > interrupts on Intel Sandy Bridge GPUs"), says some systems "crash"
> > > (not sure if that means an oops or an actual crash that requires a
> > > reboot) and on other systems, Linux disables the shared interrupt
> > > line.  I assume disabling the interrupt line keeps devices using that
> > > line from working, but does not directly cause a crash.
> > >
> >
> > Correct, disable the shared interrupt line keeps all devices using
> > that line from working, which is current kernel's behavior w/o this
> > quirk handling: it disables the (shared) interrupt line after 100.000+
> > generated interrupts. But the side effect is that other devices become
> > unusable after that (eg: USB devices which share the same interrupt
> > line with the Intel GPU). That's why the original commit, f67fd55fa96f
> > ("PCI: Add quirk for still enabled interrupts on Intel Sandy Bridge
> > GPUs") disables the GPU's interrupt directly, which should really be
> > done by the VGA BIOS itself (a buggy VBIOS!).
> >
> > > What specific symptom do you see here?  I think it might be useful to
> > > collect details, e.g., dmesg logs, /proc/interrupts contents, output
> > > of "sudo lspci -vv", etc., for the systems you're quirking here.  I'm
> > > hoping we can eventually figure out a solution that doesn't require a
> > > quirk for every new GPU, and maybe that info will help find it.
> >
> > The symptom was described briefly in the original commit f67fd55fa96f
> > too, that disables the (shared) interrupt line after 100.000+
> > generated interrupts (can be observed via /proc/interrupts).
> >
> > > > > > See commit f67fd55fa96f ("PCI: Add quirk for still enabled 
> > > > > > interrupts
> > > > > > on Intel Sandy Bridge GPUs"), and commit 7c82126a94e6 ("PCI: Add new
> > > > > > ID for Intel GPU "spurious interrupt" quirk") for some history.
> > > > > >
> > > > > > Based on current findings, it is highly possible that all Intel
> > > > > > 1st/2nd/3rd generation Core processors' IGD has such quirk.
> > > > >
> > > > > Can you include a reference to these "current findings"?  I assume you
> > > > > have bug reports that include the device IDs you're adding?  If not,
> > > > > how did you build this list of new IDs?
> > > >
> > > > By "current findings" I mean given the IDs we have here, plus previous
> > > > one added by Thomas, it's highly possible this VGA BIOS bug exists in
> > > > every 1st/2nd/3rd generation Core processors.
> > > >
> > > > > The function comment added by f67fd55fa96f ("PCI: Add quirk for still
> > > > > enabled interrupts on Intel Sandy Bridge GPUs") suggests that this is
> > > > > actually a BIOS issue, not a hardware erratum, i.e., I don't see
> > > > > anything there that suggests a hardware defect.
> > > > >
> > > > > But there must be a hole somewhere -- the kernel can't be expected to
> > > > > disable interrupts

Re: [Intel-gfx] [PATCH] pci: Add a few new IDs for Intel GPU "spurious interrupt" quirk

2018-10-08 Thread Bin Meng
Hi Bjorn,

On Thu, Oct 4, 2018 at 4:12 AM Bjorn Helgaas  wrote:
>
> On Thu, Sep 27, 2018 at 10:10:07AM +0800, Bin Meng wrote:
> > On Thu, Sep 27, 2018 at 12:57 AM Bjorn Helgaas  wrote:
> > > On Wed, Sep 26, 2018 at 08:14:01AM -0700, Bin Meng wrote:
> > > > Add more PCI IDs to the Intel GPU "spurious interrupt" quirk table,
> > > > which are known to break.
> > >
> > > Do you have a reference for this?  Any public bug reports, bugzilla,
> > > Intel spec reference or errata?  "Which are known to break" is pretty
> > > vague.
> >
> > Sorry I used wrong words and should have been clearer. These devices
> > are validated to be broken. The test I used is very simple, just
> > unplug the VGA cable and plug it again, and "spurious interrupt" will
> > be seen on the interrupt line of the IGD device. I was not aware of
> > any public bugs filed to Intel, nor seen any errata from Intel.
>
> The original commit, f67fd55fa96f ("PCI: Add quirk for still enabled
> interrupts on Intel Sandy Bridge GPUs"), says some systems "crash"
> (not sure if that means an oops or an actual crash that requires a
> reboot) and on other systems, Linux disables the shared interrupt
> line.  I assume disabling the interrupt line keeps devices using that
> line from working, but does not directly cause a crash.
>

Correct, disable the shared interrupt line keeps all devices using
that line from working, which is current kernel's behavior w/o this
quirk handling: it disables the (shared) interrupt line after 100.000+
generated interrupts. But the side effect is that other devices become
unusable after that (eg: USB devices which share the same interrupt
line with the Intel GPU). That's why the original commit, f67fd55fa96f
("PCI: Add quirk for still enabled interrupts on Intel Sandy Bridge
GPUs") disables the GPU's interrupt directly, which should really be
done by the VGA BIOS itself (a buggy VBIOS!).

> What specific symptom do you see here?  I think it might be useful to
> collect details, e.g., dmesg logs, /proc/interrupts contents, output
> of "sudo lspci -vv", etc., for the systems you're quirking here.  I'm
> hoping we can eventually figure out a solution that doesn't require a
> quirk for every new GPU, and maybe that info will help find it.
>

The symptom was described briefly in the original commit f67fd55fa96f
too, that disables the (shared) interrupt line after 100.000+
generated interrupts (can be observed via /proc/interrupts).

> > > > See commit f67fd55fa96f ("PCI: Add quirk for still enabled interrupts
> > > > on Intel Sandy Bridge GPUs"), and commit 7c82126a94e6 ("PCI: Add new
> > > > ID for Intel GPU "spurious interrupt" quirk") for some history.
> > > >
> > > > Based on current findings, it is highly possible that all Intel
> > > > 1st/2nd/3rd generation Core processors' IGD has such quirk.
> > >
> > > Can you include a reference to these "current findings"?  I assume you
> > > have bug reports that include the device IDs you're adding?  If not,
> > > how did you build this list of new IDs?
> >
> > By "current findings" I mean given the IDs we have here, plus previous
> > one added by Thomas, it's highly possible this VGA BIOS bug exists in
> > every 1st/2nd/3rd generation Core processors.
> >
> > > The function comment added by f67fd55fa96f ("PCI: Add quirk for still
> > > enabled interrupts on Intel Sandy Bridge GPUs") suggests that this is
> > > actually a BIOS issue, not a hardware erratum, i.e., I don't see
> > > anything there that suggests a hardware defect.
> > >
> > > But there must be a hole somewhere -- the kernel can't be expected to
> > > disable interrupts in device-specific ways when there's no driver
> > > loaded.  Maybe it's simply a BIOS defect or maybe there's some
> > > interrupt or _PRT-related setup we're missing.
> >
> > It's a pure VGA BIOS bug, not the BIOS bug or _PRT etc. The VGA BIOS
> > forgot to turn off the interrupt on these devices.
>
> If this is a VGA BIOS defect, it's not very likely that it will
> magically be fixed for all new Intel GPUs, so in effect it sounds like
> we need to update this list of quirks in Linux every time a new Intel
> GPU comes out.  That prospect is a little daunting.
>

I don't have a relatively newer Intel board at hand for testing right
now. I can try to locate one. But as I said, it's highly possible at
least all 1st/2nd/3rd ge

Re: [Intel-gfx] [PATCH] pci: Add a few new IDs for Intel GPU "spurious interrupt" quirk

2018-10-08 Thread Bin Meng
Hi David,

On Mon, Oct 8, 2018 at 6:06 PM David Laight  wrote:
>
> From: Bin Meng
> > Sent: 08 October 2018 10:44
> ...
> > Correct, disable the shared interrupt line keeps all devices using
> > that line from working, which is current kernel's behavior w/o this
> > quirk handling: it disables the (shared) interrupt line after 100.000+
> > generated interrupts. But the side effect is that other devices become
> > unusable after that (eg: USB devices which share the same interrupt
> > line with the Intel GPU). That's why the original commit, f67fd55fa96f
> > ("PCI: Add quirk for still enabled interrupts on Intel Sandy Bridge
> > GPUs") disables the GPU's interrupt directly, which should really be
> > done by the VGA BIOS itself (a buggy VBIOS!).
>
> Shouldn't the kernel just disable all PCI(e) interrupts by writing
> 1 to the config space control register bit during grope?
> Can it ever by right for this to be set?
>

Do you mean PCI_COMMAND_INTX_DISABLE bit of the command register in
the configuration space? Setting this bit indeed could disable the
INTx interrupt, but it does not work for all PCI devices as this bit
was introduced in PCI spec v2.3.

> Apart from VGA the 'bus master' bit also needs to be clear.
>
> ISTR some very early PCI systems which failed to reset the PCI
> bus during reboot - at least the 'bus master' bit remained
> set for an ethernet card.
> On a private LAN the OS got reinstalled and rebooted without
> using all the ethernet receive buffers and then died because
> a receive frame got written into 'random' memory.

Regards,
Bin
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] pci: Add a few new IDs for Intel GPU "spurious interrupt" quirk

2018-09-27 Thread Bin Meng
Hi Bjorn,

On Thu, Sep 27, 2018 at 12:57 AM Bjorn Helgaas  wrote:
>
> [+cc Intel DRM maintainers, etc]
>
> On Wed, Sep 26, 2018 at 08:14:01AM -0700, Bin Meng wrote:
> > Add more PCI IDs to the Intel GPU "spurious interrupt" quirk table,
> > which are known to break.
>
> Do you have a reference for this?  Any public bug reports, bugzilla,
> Intel spec reference or errata?  "Which are known to break" is pretty
> vague.
>

Sorry I used wrong words and should have been clearer. These devices
are validated to be broken. The test I used is very simple, just
unplug the VGA cable and plug it again, and "spurious interrupt" will
be seen on the interrupt line of the IGD device. I was not aware of
any public bugs filed to Intel, nor seen any errata from Intel.

> > See commit f67fd55fa96f ("PCI: Add quirk for still enabled interrupts
> > on Intel Sandy Bridge GPUs"), and commit 7c82126a94e6 ("PCI: Add new
> > ID for Intel GPU "spurious interrupt" quirk") for some history.
> >
> > Based on current findings, it is highly possible that all Intel
> > 1st/2nd/3rd generation Core processors' IGD has such quirk.
>
> Can you include a reference to these "current findings"?  I assume you
> have bug reports that include the device IDs you're adding?  If not,
> how did you build this list of new IDs?
>

By "current findings" I mean given the IDs we have here, plus previous
one added by Thomas, it's highly possible this VGA BIOS bug exists in
every 1st/2nd/3rd generation Core processors.

> The function comment added by f67fd55fa96f ("PCI: Add quirk for still
> enabled interrupts on Intel Sandy Bridge GPUs") suggests that this is
> actually a BIOS issue, not a hardware erratum, i.e., I don't see
> anything there that suggests a hardware defect.
>
> But there must be a hole somewhere -- the kernel can't be expected to
> disable interrupts in device-specific ways when there's no driver
> loaded.  Maybe it's simply a BIOS defect or maybe there's some
> interrupt or _PRT-related setup we're missing.
>

It's a pure VGA BIOS bug, not the BIOS bug or _PRT etc. The VGA BIOS
forgot to turn off the interrupt on these devices.

> > Signed-off-by: Bin Meng 
> > Cc:  # v3.4+
> > ---
> >
> >  drivers/pci/quirks.c | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index 6bc27b7..c0673a7 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -3190,7 +3190,11 @@ static void disable_igfx_irq(struct pci_dev *dev)
> >
> >   pci_iounmap(dev, regs);
> >  }
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x0042, disable_igfx_irq);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x0046, disable_igfx_irq);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x004a, disable_igfx_irq);
> >  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x0102, disable_igfx_irq);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x0106, disable_igfx_irq);
> >  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x010a, disable_igfx_irq);
> >  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x0152, disable_igfx_irq);
> >
> > --

Regards,
Bin
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx