> -----Original Message-----
> From: Bjorn Helgaas <helg...@kernel.org>
> Sent: Tuesday, April 9, 2019 5:59 PM
> To: Nikolai Kostrigin <nic...@altlinux.org>
> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org;
> jroe...@suse.de; Deucher, Alexander <alexander.deuc...@amd.com>
> Subject: Re: [PATCH RESEND 1/1] PCI: Add ATS-disable quirk for AMD Radeon
> R7 GPUs
> 
> [+cc Alex]
> 
> This claims to be a resend, but I don't see a previous posting.
> 
> There *was* discussion when the quirk was added two years ago for a
> different device.  As part of that, Alex thought only that device would be
> affected and ATS was validated on other GPUs:
> 
> 
> https://lore.kernel.org/lkml/BN6PR12MB165278346BE8A76B1E4412AFF7EA0
> @BN6PR12MB1652.namprd12.prod.outlook.com/
> 
> On Mon, Apr 08, 2019 at 01:37:25PM +0300, Nikolai Kostrigin wrote:
> > ATS is broken on this hardware (at least for Stoney Ridge based
> > laptop) and causes IOMMU stalls and system failure. Disable ATS on
> > these devices to make them usable again with IOMMU enabled Thanks to
> > Joerg Roedel <jroe...@suse.de> for help.
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=194521
> >

+ a few AMD people

Seeing this bug makes it more clear.  I don't think this is a problem with the 
GPU.  I think it's a problem with either the sbios or iommu.  I think the 
original quirk added for stoney (0x98e4) is probably wrong as well.  I suspect 
we need a quirk for a particular laptop or sbios versions.  We validated ATS 
extensively with Carrizo based systems (the system in the bug report above is 
Carrizo based) since it is the basis of our ROCm support on APUs.  We have also 
been involved in tons of Linux OEM preloads with both Carrizo and Stoney based 
APUs in combination with TOPAZ dGPUs (0x6900) and haven't seen this issue in 
those programs.  We also have TOPAZ dGPUs used in OEM programs with Intel 
chipsets and haven't seen the issue.  I suspect since windows does not use the 
IOMMU by default, the sbios settings may not be well validated on certain 
windows only skus.  I'd rather make these DMI matches or something like that 
for the platform or at the very least match the SSIDs as well.

Alex

> > Signed-off-by: Nikolai Kostrigin <nic...@altlinux.org>
> 
> Joerg, I'm happy to merge this if you would review or ack it.  I don't know
> enough to conclude that this is the root cause.  It'd be nice to have an 
> actual
> AMD erratum.  Maybe it would even have a list of affected devices so we
> could get them all at once so people wouldn't have to trip over them one by
> one.
> 
> > ---
> >  drivers/pci/quirks.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index
> > 4700d24e5d55..abb2532e16bf 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -4876,6 +4876,7 @@ static void quirk_no_ats(struct pci_dev *pdev)
> >
> >  /* AMD Stoney platform GPU */
> >  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4, quirk_no_ats);
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x6900,
> quirk_no_ats);
> >  #endif /* CONFIG_PCI_ATS */
> >
> >  /* Freescale PCIe doesn't support MSI in RC mode */
> > --
> > 2.21.0
> >

Reply via email to