Re: [Nouveau] [PATCH] PCI: add prefetch quirk to work around Asus/Nvidia suspend issues

2018-09-06 Thread Thomas Martitz

Am 31.08.2018 um 09:30 schrieb Daniel Drake:

On over 40 Intel-based Asus products, the nvidia GPU becomes unusable
after S3 suspend/resume. The affected products include multiple
generations of nvidia GPUs and Intel SoCs. After resume, nouveau logs
many errors such as:

 fifo: fault 00 [READ] at 00555000 engine 00 [GR] client 04 
[HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown]
 DRM: failed to idle channel 0 [DRM]

Similarly, the nvidia proprietary driver also fails after resume
(black screen, 100% CPU usage in Xorg process). We shipped a sample
to Nvidia for diagnosis, and their response indicated that it's a
problem with the parent PCI bridge (on the Intel SoC), not the GPU.

We found a workaround: on resume, rewrite the Intel PCI bridge
'Prefetchable Base Upper 32 Bits' register. In the cases that I checked,
this register has value 0 and we just have to rewrite that value.

It's very strange that rewriting the exact same register value
makes a difference, but it definitely makes the issue go away.
It's not just acting as some kind of memory barrier, because rewriting
other bridge registers does not work around the issue. There's something
magic in this particular register.

We examined our database of Asus hardware and identified 43 products
that we believe are affected. Checking the nvidia GPU parent PCI bridge
on each one, in total 5 Intel PCI bridges need quirking as below.
The quirk will run on bridges even where no nvidia GPU is connected,
but it should be harmless, and we at least limit it to only running
on Asus products.

This fix was tested on all the affected models that we have in hands
(X542UQ, UX533FD, X530UN, V272UN).


Hello,

this patch helps on my HP Zbook 14u G5 which otherwise fails to resume 
the dGPU after suspend. In this case it's a radeon gpu (polaris 10). Of 
course I had to remove the check for ASUS, but made no other changes.


With this patch I can successfully run "DRI_PRIME=1 glxinfo | grep -i 
renderer" and see the radeon, as well as "DRI_PRIME=1 glxgears", after 
resuming from suspend. Attemting that without the patch makes the system 
hang for a few seconds followed by lots of powerplay errors in dmesg. 
glxinfo/gears sometimes use the Intel graphics or show a blank window.


FWIW, this problem was discussed a lot in bug 
https://bugs.freedesktop.org/show_bug.cgi?id=105760 (it's closed only 
because the original bug crash is solved but the root problem is still 
unfixed). Therefore I add Peter Wu and Alex Deucher who attempted to 
help me out already.


I think this supports your other mail where you suggest it should be 
done unconditionally.


Thanks for the patch!

Best regards
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] PCI: add prefetch quirk to work around Asus/Nvidia suspend issues

2018-09-06 Thread Daniel Drake
On Sat, Sep 1, 2018 at 3:12 AM, Bjorn Helgaas  wrote:
> Can we tell whether Windows rewrites this register unconditionally at
> resume-time?  If so, it may be more robust for Linux to do the same.
> The whole thing is black magic, which I hate, but if it's our only
> choice, it may be better to have this applied everywhere so we don't
> keep stubbing our toes on new systems that require the quirk.

Checked this with qemu adding a PCI-to-PCI bridge (ioh3420).

$ qemu-system-x86_64 -enable-kvm -M q35,accel=kvm -m 2G -vga qxl -cpu
host -hda testimg.img -device
ioh3420,id=rp1,bus=pcie.0,addr=1c.0,port=1 -trace events=events.txt

events.txt has:
pci_cfg_read
pci_cfg_write

Logged cfg space accesses during boot:
https://gist.github.com/dsd/135fb255cb2b237567d8ea2d6bfc6917#file-boot-txt

Suspend:
https://gist.github.com/dsd/135fb255cb2b237567d8ea2d6bfc6917#file-suspend-txt

Resume:
https://gist.github.com/dsd/135fb255cb2b237567d8ea2d6bfc6917#file-resume-txt

Notably during resume, the prefetch-related registers get rewritten:
  pci_cfg_write ioh3420 28:0 @0x24 <- 0xfeb0fea0
  pci_cfg_write ioh3420 28:0 @0x28 <- 0x0
  pci_cfg_write ioh3420 28:0 @0x2c <- 0x0

This happened even though there was nothing behind the bridge.
Windows failed to resume in this test (black screen) but the traced
register writes seem indicative enough.

Peter Wu confirms the same results in a similar experiment:
https://marc.info/?l=linux-pci&m=153616336225386&w=2

I'll look into creating a new patch that unconditionally reprograms
the PCI bridge prefetch stuff on resume.

Thanks
Daniel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] PCI: add prefetch quirk to work around Asus/Nvidia suspend issues

2018-09-04 Thread kbuild test robot
Hi Daniel,

I love your patch! Perhaps something to improve:

[auto build test WARNING on pci/next]
[also build test WARNING on v4.19-rc2 next-20180831]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Daniel-Drake/PCI-add-prefetch-quirk-to-work-around-Asus-Nvidia-suspend-issues/20180901-043245
base:   https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git next
config: x86_64-randconfig-s5-09031857 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 
:: branch date: 3 days ago
:: commit date: 3 days ago

All warnings (new ones prefixed by >>):

   In file included from include/linux/export.h:45:0,
from include/linux/linkage.h:7,
from include/linux/kernel.h:7,
from drivers/pci/quirks.c:16:
   drivers/pci/quirks.c: In function 'quirk_asus_pci_prefetch':
   drivers/pci/quirks.c:5134:6: warning: argument 1 null where non-null 
expected [-Wnonnull]
 if (strcmp(sys_vendor, "ASUSTeK COMPUTER INC.") != 0)
 ^~~
   include/linux/compiler.h:58:30: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^~~~
>> drivers/pci/quirks.c:5134:2: note: in expansion of macro 'if'
 if (strcmp(sys_vendor, "ASUSTeK COMPUTER INC.") != 0)
 ^~
   In file included from include/linux/uuid.h:20:0,
from include/linux/mod_devicetable.h:13,
from include/linux/pci.h:21,
from drivers/pci/quirks.c:18:
   include/linux/string.h:44:12: note: in a call to function 'strcmp' declared 
here
extern int strcmp(const char *,const char *);
   ^~

# 
https://github.com/0day-ci/linux/commit/eccd2a8c40e1a705a666e6fe1c52aca3f2130984
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout eccd2a8c40e1a705a666e6fe1c52aca3f2130984
vim +/if +5134 drivers/pci/quirks.c

e7aaf90f9 Bjorn Helgaas 2018-08-15  4983  
e7aaf90f9 Bjorn Helgaas 2018-08-15  4984  /*
ad281ecf1 Doug Meyer2018-05-23  4985   * Microsemi Switchtec NTB uses devfn 
proxy IDs to move TLPs between
ad281ecf1 Doug Meyer2018-05-23  4986   * NT endpoints via the internal 
switch fabric. These IDs replace the
ad281ecf1 Doug Meyer2018-05-23  4987   * originating requestor ID TLPs 
which access host memory on peer NTB
ad281ecf1 Doug Meyer2018-05-23  4988   * ports. Therefore, all proxy IDs 
must be aliased to the NTB device
ad281ecf1 Doug Meyer2018-05-23  4989   * to permit access when the IOMMU is 
turned on.
ad281ecf1 Doug Meyer2018-05-23  4990   */
ad281ecf1 Doug Meyer2018-05-23  4991  static void 
quirk_switchtec_ntb_dma_alias(struct pci_dev *pdev)
ad281ecf1 Doug Meyer2018-05-23  4992  {
ad281ecf1 Doug Meyer2018-05-23  4993void __iomem *mmio;
ad281ecf1 Doug Meyer2018-05-23  4994struct ntb_info_regs __iomem 
*mmio_ntb;
ad281ecf1 Doug Meyer2018-05-23  4995struct ntb_ctrl_regs __iomem 
*mmio_ctrl;
ad281ecf1 Doug Meyer2018-05-23  4996struct sys_info_regs __iomem 
*mmio_sys_info;
ad281ecf1 Doug Meyer2018-05-23  4997u64 partition_map;
ad281ecf1 Doug Meyer2018-05-23  4998u8 partition;
ad281ecf1 Doug Meyer2018-05-23  4999int pp;
ad281ecf1 Doug Meyer2018-05-23  5000  
ad281ecf1 Doug Meyer2018-05-23  5001if (pci_enable_device(pdev)) {
ad281ecf1 Doug Meyer2018-05-23  5002pci_err(pdev, "Cannot 
enable Switchtec device\n");
ad281ecf1 Doug Meyer2018-05-23  5003return;
ad281ecf1 Doug Meyer2018-05-23  5004}
ad281ecf1 Doug Meyer2018-05-23  5005  
ad281ecf1 Doug Meyer2018-05-23  5006mmio = pci_iomap(pdev, 0, 0);
ad281ecf1 Doug Meyer2018-05-23  5007if (mmio == NULL) {
ad281ecf1 Doug Meyer2018-05-23  5008
pci_disable_device(pdev);
ad281ecf1 Doug Meyer2018-05-23  5009pci_err(pdev, "Cannot 
iomap Switchtec device\n");
ad281ecf1 Doug Meyer2018-05-23  5010return;
ad281ecf1 Doug Meyer2018-05-23  5011}
ad281ecf1 Doug Meyer2018-05-23  5012  
ad281ecf1 Doug Meyer2018-05-23  5013pci_info(pdev, "Setting 
Switchtec proxy ID aliases\n");
ad281ecf1 Doug Meyer2018-05-23  5014  
ad281ecf1 Doug Meyer2018-05-23  5015mmio_ntb = mmio + 
SWITCHTEC_GAS_NTB_OFFSET;
ad281ecf1 Doug Meyer2018-05-23  5016mmio_ctrl = (void __iomem *) 
mmio_ntb + SWITCHTEC_NTB_REG_CTRL_OFFSET;
ad281ecf1 Doug Meyer2018-05-23  5017mmio_sys_info = mmio + 
SWITCHTEC_GAS_SYS_INFO_OFFSET;
ad281ecf1 Doug Meyer2018-05-23  5018  
ad281ecf1 Doug Meyer2018-05-23  5019partition = 
ioread8(

Re: [Nouveau] [PATCH] PCI: add prefetch quirk to work around Asus/Nvidia suspend issues

2018-09-04 Thread Mika Westerberg
On Tue, Sep 04, 2018 at 03:07:52PM +0800, Daniel Drake wrote:
> On Tue, Sep 4, 2018 at 2:43 PM, Mika Westerberg
>  wrote:
> > Yes, can you check if the failing device BAR is included in any of the
> > above entries? If not then it is probably not related.
> 
> mtrr again for reference:
> reg00: base=0x0c000 ( 3072MB), size= 1024MB, count=1: uncachable
> reg01: base=0x0a000 ( 2560MB), size=  512MB, count=1: uncachable
> reg02: base=0x09000 ( 2304MB), size=  256MB, count=1: uncachable
> reg03: base=0x08c00 ( 2240MB), size=   64MB, count=1: uncachable
> reg04: base=0x08b80 ( 2232MB), size=8MB, count=1: uncachable
> 
> 
> The PCI bridge is:
> 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express
> Root Port (rev f1) (prog-if 00 [Normal decode])
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx+
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> SERR-  Latency: 0, Cache Line Size: 64 bytes
> Interrupt: pin A routed to IRQ 122
> Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
> I/O behind bridge: e000-efff
> Memory behind bridge: ee00-ef0f
> Prefetchable memory behind bridge: d000-e1ff
> 
> The memory behind bridge at ee00 is included in the mtrr region
> reg00 which is 0xc000 to 0x.
> Same for the prefetchable memory behind bridge.

Yeah and it is uncachable so it should be fine.

> The nvidia GPU which becomes unresponsive is:
> 
> 01:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 940MX] (rev a2)
> Subsystem: ASUSTeK Computer Inc. GM108M [GeForce 940MX]
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx+
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> SERR-  Latency: 0, Cache Line Size: 64 bytes
> Interrupt: pin A routed to IRQ 133
> Region 0: Memory at ee00 (32-bit, non-prefetchable) [size=16M]
> Region 1: Memory at d000 (64-bit, prefetchable) [size=256M]
> Region 3: Memory at e000 (64-bit, prefetchable) [size=32M]
> Region 5: I/O ports at e000 [size=128]
> Expansion ROM at ef00 [disabled] [size=512K]
> 
> Region 0, 1, 3 and the expansion ROM are all included in the mtrr region 
> reg00.
> 
> 
> The magic register that we write to workaround the issue is in PCI
> bridge config space - not in a BAR.

OK, I just wanted to rule out MTRR misconfiguration but I guess it is
not the case here.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] PCI: add prefetch quirk to work around Asus/Nvidia suspend issues

2018-09-04 Thread Daniel Drake
On Tue, Sep 4, 2018 at 2:43 PM, Mika Westerberg
 wrote:
> Yes, can you check if the failing device BAR is included in any of the
> above entries? If not then it is probably not related.

mtrr again for reference:
reg00: base=0x0c000 ( 3072MB), size= 1024MB, count=1: uncachable
reg01: base=0x0a000 ( 2560MB), size=  512MB, count=1: uncachable
reg02: base=0x09000 ( 2304MB), size=  256MB, count=1: uncachable
reg03: base=0x08c00 ( 2240MB), size=   64MB, count=1: uncachable
reg04: base=0x08b80 ( 2232MB), size=8MB, count=1: uncachable


The PCI bridge is:
00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express
Root Port (rev f1) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
SERR- TAbort-
SERR- https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] PCI: add prefetch quirk to work around Asus/Nvidia suspend issues

2018-09-03 Thread Mika Westerberg
On Tue, Sep 04, 2018 at 09:52:02AM +0800, Daniel Drake wrote:
> # cat /proc/mtrr
> reg00: base=0x0c000 ( 3072MB), size= 1024MB, count=1: uncachable
> reg01: base=0x0a000 ( 2560MB), size=  512MB, count=1: uncachable
> reg02: base=0x09000 ( 2304MB), size=  256MB, count=1: uncachable
> reg03: base=0x08c00 ( 2240MB), size=   64MB, count=1: uncachable
> reg04: base=0x08b80 ( 2232MB), size=8MB, count=1: uncachable
> 
> # cat /sys/kernel/debug/x86/pat_memtype_list
> PAT memtype list:
> write-back @ 0x84a23000-0x84a24000
> write-back @ 0x8ad34000-0x8ad6
> write-back @ 0x8ad5f000-0x8ad66000
> write-back @ 0x8ad5f000-0x8ad6
> write-back @ 0x8ad65000-0x8ad6a000
> write-back @ 0x8ad69000-0x8ad6b000
> write-back @ 0x8ad6a000-0x8ad6c000
> write-back @ 0x8ad6b000-0x8ad6e000
> write-back @ 0x8ad9c000-0x8ad9d000
> write-back @ 0x8adce000-0x8adcf000
> write-back @ 0x8adcf000-0x8add
> write-back @ 0x8adcf000-0x8add2000
> write-back @ 0x8add3000-0x8add4000
> write-back @ 0x8ae04000-0x8ae05000
> write-back @ 0x8b208000-0x8b209000
> write-combining @ 0xc000-0xd000
> write-combining @ 0xd000-0xe000
> write-combining @ 0xe000-0xe004
> write-combining @ 0xe004-0xe005
> write-combining @ 0xe005-0xe0051000
> write-combining @ 0xe0051000-0xe0151000
> write-combining @ 0xe0151000-0xe0191000
> write-combining @ 0xe0191000-0xe01a1000
> write-combining @ 0xe01a1000-0xe01b1000
> write-combining @ 0xe01b1000-0xe01c1000
> write-combining @ 0xe01c1000-0xe01c3000
> write-combining @ 0xe01c3000-0xe01c5000
> write-combining @ 0xe01c5000-0xe01cd000
> write-combining @ 0xe01cd000-0xe01d5000
> write-combining @ 0xe01d5000-0xe01dd000
> write-combining @ 0xe01dd000-0xe01e5000
> write-combining @ 0xe01e5000-0xe01ed000
> write-combining @ 0xe01ed000-0xe01f5000
> write-combining @ 0xe01f5000-0xe01fd000
> write-combining @ 0xe01fd000-0xe0205000
> write-combining @ 0xe0205000-0xe020d000
> write-combining @ 0xe020d000-0xe0215000
> uncached-minus @ 0xed00-0xed20
> write-combining @ 0xed80-0xee00
> uncached-minus @ 0xee00-0xef00
> uncached-minus @ 0xef20-0xef40
> uncached-minus @ 0xef40-0xef401000
> uncached-minus @ 0xef404000-0xef405000
> uncached-minus @ 0xef51-0xef52
> uncached-minus @ 0xef528000-0xef52c000
> uncached-minus @ 0xef533000-0xef534000
> uncached-minus @ 0xef533000-0xef534000
> uncached-minus @ 0xef533000-0xef534000
> uncached-minus @ 0xef534000-0xef535000
> uncached-minus @ 0xef534000-0xef535000
> uncached-minus @ 0xef534000-0xef535000
> uncached-minus @ 0xef535000-0xef536000
> uncached-minus @ 0xef537000-0xef538000
> uncached-minus @ 0xef538000-0xef539000
> uncached-minus @ 0xef538000-0xef539000
> uncached-minus @ 0xef538000-0xef539000
> uncached-minus @ 0xef539000-0xef53a000
> uncached-minus @ 0xef539000-0xef53a000
> uncached-minus @ 0xef539000-0xef53a000
> uncached-minus @ 0xef53a000-0xef53b000
> uncached-minus @ 0xf000-0xf800
> uncached-minus @ 0xf00e-0xf00e1000
> uncached-minus @ 0xf010-0xf0101000
> uncached-minus @ 0xf0101000-0xf0102000
> uncached-minus @ 0xfdac-0xfdad
> uncached-minus @ 0xfdae-0xfdaf
> uncached-minus @ 0xfdaf-0xfdb0
> uncached-minus @ 0xfdc43000-0xfdc44000
> uncached-minus @ 0xfe00-0xfe001000
> uncached-minus @ 0xfe00-0xfe001000
> uncached-minus @ 0xfed0-0xfed01000
> uncached-minus @ 0xfed15000-0xfed16000
> uncached-minus @ 0xfed4-0xfed41000
> uncached-minus @ 0xfed9-0xfed91000
> uncached-minus @ 0xfed91000-0xfed92000
> 
> Is that the info you were looking for?

Yes, can you check if the failing device BAR is included in any of the
above entries? If not then it is probably not related.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] PCI: add prefetch quirk to work around Asus/Nvidia suspend issues

2018-09-03 Thread Daniel Drake
On Mon, Sep 3, 2018 at 8:12 PM, Mika Westerberg
 wrote:
> We have seen one similar issue with LPSS devices when BIOS assigns
> device BARs above 4G (which is not the case here) and it turned out to
> be misconfigured MTRR register or something like that. It may not be
> related at all but it could be worth a try to dump out MTRR registers of
> one of the affected systems and see if the memory areas are listed there
> (and if the attributes are somehow wrong if found).

From Asus X542UQ:

# cat /proc/mtrr
reg00: base=0x0c000 ( 3072MB), size= 1024MB, count=1: uncachable
reg01: base=0x0a000 ( 2560MB), size=  512MB, count=1: uncachable
reg02: base=0x09000 ( 2304MB), size=  256MB, count=1: uncachable
reg03: base=0x08c00 ( 2240MB), size=   64MB, count=1: uncachable
reg04: base=0x08b80 ( 2232MB), size=8MB, count=1: uncachable

# cat /sys/kernel/debug/x86/pat_memtype_list
PAT memtype list:
write-back @ 0x84a23000-0x84a24000
write-back @ 0x8ad34000-0x8ad6
write-back @ 0x8ad5f000-0x8ad66000
write-back @ 0x8ad5f000-0x8ad6
write-back @ 0x8ad65000-0x8ad6a000
write-back @ 0x8ad69000-0x8ad6b000
write-back @ 0x8ad6a000-0x8ad6c000
write-back @ 0x8ad6b000-0x8ad6e000
write-back @ 0x8ad9c000-0x8ad9d000
write-back @ 0x8adce000-0x8adcf000
write-back @ 0x8adcf000-0x8add
write-back @ 0x8adcf000-0x8add2000
write-back @ 0x8add3000-0x8add4000
write-back @ 0x8ae04000-0x8ae05000
write-back @ 0x8b208000-0x8b209000
write-combining @ 0xc000-0xd000
write-combining @ 0xd000-0xe000
write-combining @ 0xe000-0xe004
write-combining @ 0xe004-0xe005
write-combining @ 0xe005-0xe0051000
write-combining @ 0xe0051000-0xe0151000
write-combining @ 0xe0151000-0xe0191000
write-combining @ 0xe0191000-0xe01a1000
write-combining @ 0xe01a1000-0xe01b1000
write-combining @ 0xe01b1000-0xe01c1000
write-combining @ 0xe01c1000-0xe01c3000
write-combining @ 0xe01c3000-0xe01c5000
write-combining @ 0xe01c5000-0xe01cd000
write-combining @ 0xe01cd000-0xe01d5000
write-combining @ 0xe01d5000-0xe01dd000
write-combining @ 0xe01dd000-0xe01e5000
write-combining @ 0xe01e5000-0xe01ed000
write-combining @ 0xe01ed000-0xe01f5000
write-combining @ 0xe01f5000-0xe01fd000
write-combining @ 0xe01fd000-0xe0205000
write-combining @ 0xe0205000-0xe020d000
write-combining @ 0xe020d000-0xe0215000
uncached-minus @ 0xed00-0xed20
write-combining @ 0xed80-0xee00
uncached-minus @ 0xee00-0xef00
uncached-minus @ 0xef20-0xef40
uncached-minus @ 0xef40-0xef401000
uncached-minus @ 0xef404000-0xef405000
uncached-minus @ 0xef51-0xef52
uncached-minus @ 0xef528000-0xef52c000
uncached-minus @ 0xef533000-0xef534000
uncached-minus @ 0xef533000-0xef534000
uncached-minus @ 0xef533000-0xef534000
uncached-minus @ 0xef534000-0xef535000
uncached-minus @ 0xef534000-0xef535000
uncached-minus @ 0xef534000-0xef535000
uncached-minus @ 0xef535000-0xef536000
uncached-minus @ 0xef537000-0xef538000
uncached-minus @ 0xef538000-0xef539000
uncached-minus @ 0xef538000-0xef539000
uncached-minus @ 0xef538000-0xef539000
uncached-minus @ 0xef539000-0xef53a000
uncached-minus @ 0xef539000-0xef53a000
uncached-minus @ 0xef539000-0xef53a000
uncached-minus @ 0xef53a000-0xef53b000
uncached-minus @ 0xf000-0xf800
uncached-minus @ 0xf00e-0xf00e1000
uncached-minus @ 0xf010-0xf0101000
uncached-minus @ 0xf0101000-0xf0102000
uncached-minus @ 0xfdac-0xfdad
uncached-minus @ 0xfdae-0xfdaf
uncached-minus @ 0xfdaf-0xfdb0
uncached-minus @ 0xfdc43000-0xfdc44000
uncached-minus @ 0xfe00-0xfe001000
uncached-minus @ 0xfe00-0xfe001000
uncached-minus @ 0xfed0-0xfed01000
uncached-minus @ 0xfed15000-0xfed16000
uncached-minus @ 0xfed4-0xfed41000
uncached-minus @ 0xfed9-0xfed91000
uncached-minus @ 0xfed91000-0xfed92000

Is that the info you were looking for?

Thanks
Daniel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] PCI: add prefetch quirk to work around Asus/Nvidia suspend issues

2018-09-03 Thread Mika Westerberg
On Mon, Sep 03, 2018 at 04:56:32PM +0800, Daniel Drake wrote:
> On Sat, Sep 1, 2018 at 3:12 AM, Bjorn Helgaas  wrote:
> > If true, this sounds like some sort of erratum, so it would be good to
> > get some input from Intel, and I cc'd a few Intel folks.
> 
> Yes, it would be great to get their input.

We have seen one similar issue with LPSS devices when BIOS assigns
device BARs above 4G (which is not the case here) and it turned out to
be misconfigured MTRR register or something like that. It may not be
related at all but it could be worth a try to dump out MTRR registers of
one of the affected systems and see if the memory areas are listed there
(and if the attributes are somehow wrong if found).
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] PCI: add prefetch quirk to work around Asus/Nvidia suspend issues

2018-09-03 Thread Daniel Drake
On Sat, Sep 1, 2018 at 3:12 AM, Bjorn Helgaas  wrote:
> If true, this sounds like some sort of erratum, so it would be good to
> get some input from Intel, and I cc'd a few Intel folks.

Yes, it would be great to get their input.

> It's interesting that all the systems below are from Asus.  That makes
> me think there's some BIOS or SMM connection, e.g., SMM traps the
> register write and does something magic.

Is there a way I can check if there is a SMM trap active for this address?

> Does this problem happen after a full system suspend/resume, or does
> it happen after runtime suspend of only the GPU?  Or runtime suspend
> of only the GPU and the upstream bridge?

runtime suspend/resume works fine. It only happens after S3 suspend.

> Can we tell whether Windows rewrites this register unconditionally at
> resume-time?  If so, it may be more robust for Linux to do the same.
> The whole thing is black magic, which I hate, but if it's our only
> choice, it may be better to have this applied everywhere so we don't
> keep stubbing our toes on new systems that require the quirk.

Any suggestions for how to make this happen? Booting windows in
virt-manager (hoping that I could then spy on PCI config space reg
accesses), I don't see an option for S3 suspend, but I'll keep looking
into this.

Thanks
Daniel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH] PCI: add prefetch quirk to work around Asus/Nvidia suspend issues

2018-08-31 Thread kbuild test robot
Hi Daniel,

I love your patch! Perhaps something to improve:

[auto build test WARNING on pci/next]
[also build test WARNING on v4.19-rc1 next-20180831]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Daniel-Drake/PCI-add-prefetch-quirk-to-work-around-Asus-Nvidia-suspend-issues/20180901-043245
base:   https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git next
config: x86_64-randconfig-x000-201834 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   drivers/pci/quirks.c: In function 'quirk_asus_pci_prefetch':
>> drivers/pci/quirks.c:5134:6: warning: argument 1 null where non-null 
>> expected [-Wnonnull]
 if (strcmp(sys_vendor, "ASUSTeK COMPUTER INC.") != 0)
 ^~~
   In file included from include/linux/uuid.h:20:0,
from include/linux/mod_devicetable.h:13,
from include/linux/pci.h:21,
from drivers/pci/quirks.c:18:
   include/linux/string.h:44:12: note: in a call to function 'strcmp' declared 
here
extern int strcmp(const char *,const char *);
   ^~

vim +5134 drivers/pci/quirks.c

  4983  
  4984  /*
  4985   * Microsemi Switchtec NTB uses devfn proxy IDs to move TLPs between
  4986   * NT endpoints via the internal switch fabric. These IDs replace the
  4987   * originating requestor ID TLPs which access host memory on peer NTB
  4988   * ports. Therefore, all proxy IDs must be aliased to the NTB device
  4989   * to permit access when the IOMMU is turned on.
  4990   */
  4991  static void quirk_switchtec_ntb_dma_alias(struct pci_dev *pdev)
  4992  {
  4993  void __iomem *mmio;
  4994  struct ntb_info_regs __iomem *mmio_ntb;
  4995  struct ntb_ctrl_regs __iomem *mmio_ctrl;
  4996  struct sys_info_regs __iomem *mmio_sys_info;
  4997  u64 partition_map;
  4998  u8 partition;
  4999  int pp;
  5000  
  5001  if (pci_enable_device(pdev)) {
  5002  pci_err(pdev, "Cannot enable Switchtec device\n");
  5003  return;
  5004  }
  5005  
  5006  mmio = pci_iomap(pdev, 0, 0);
  5007  if (mmio == NULL) {
  5008  pci_disable_device(pdev);
  5009  pci_err(pdev, "Cannot iomap Switchtec device\n");
  5010  return;
  5011  }
  5012  
  5013  pci_info(pdev, "Setting Switchtec proxy ID aliases\n");
  5014  
  5015  mmio_ntb = mmio + SWITCHTEC_GAS_NTB_OFFSET;
  5016  mmio_ctrl = (void __iomem *) mmio_ntb + 
SWITCHTEC_NTB_REG_CTRL_OFFSET;
  5017  mmio_sys_info = mmio + SWITCHTEC_GAS_SYS_INFO_OFFSET;
  5018  
  5019  partition = ioread8(&mmio_ntb->partition_id);
  5020  
  5021  partition_map = ioread32(&mmio_ntb->ep_map);
  5022  partition_map |= ((u64) ioread32(&mmio_ntb->ep_map + 4)) << 32;
  5023  partition_map &= ~(1ULL << partition);
  5024  
  5025  for (pp = 0; pp < (sizeof(partition_map) * 8); pp++) {
  5026  struct ntb_ctrl_regs __iomem *mmio_peer_ctrl;
  5027  u32 table_sz = 0;
  5028  int te;
  5029  
  5030  if (!(partition_map & (1ULL << pp)))
  5031  continue;
  5032  
  5033  pci_dbg(pdev, "Processing partition %d\n", pp);
  5034  
  5035  mmio_peer_ctrl = &mmio_ctrl[pp];
  5036  
  5037  table_sz = ioread16(&mmio_peer_ctrl->req_id_table_size);
  5038  if (!table_sz) {
  5039  pci_warn(pdev, "Partition %d table_sz 0\n", pp);
  5040  continue;
  5041  }
  5042  
  5043  if (table_sz > 512) {
  5044  pci_warn(pdev,
  5045   "Invalid Switchtec partition %d 
table_sz %d\n",
  5046   pp, table_sz);
  5047  continue;
  5048  }
  5049  
  5050  for (te = 0; te < table_sz; te++) {
  5051  u32 rid_entry;
  5052  u8 devfn;
  5053  
  5054  rid_entry = 
ioread32(&mmio_peer_ctrl->req_id_table[te]);
  5055  devfn = (rid_entry >> 1) & 0xFF;
  5056  pci_dbg(pdev,
  5057  "Aliasing Partition %d Proxy ID 
%02x.%d\n",
  5058  pp, PCI_SLOT(devfn), PCI_FUNC(devfn));
  5059  pci_add_dma_alias(pdev, devfn);
  5060  }
  5061  }
  5062  
  5063  pci_iounmap(pdev, mmio);
  5064  pci_disable_device(

Re: [Nouveau] [PATCH] PCI: add prefetch quirk to work around Asus/Nvidia suspend issues

2018-08-31 Thread Bjorn Helgaas
[+cc Intel folks]

On Fri, Aug 31, 2018 at 03:30:57PM +0800, Daniel Drake wrote:
> On over 40 Intel-based Asus products, the nvidia GPU becomes unusable
> after S3 suspend/resume. The affected products include multiple
> generations of nvidia GPUs and Intel SoCs. After resume, nouveau logs
> many errors such as:
> 
> fifo: fault 00 [READ] at 00555000 engine 00 [GR] client 04 
> [HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown]
> DRM: failed to idle channel 0 [DRM]
> 
> Similarly, the nvidia proprietary driver also fails after resume
> (black screen, 100% CPU usage in Xorg process). We shipped a sample
> to Nvidia for diagnosis, and their response indicated that it's a
> problem with the parent PCI bridge (on the Intel SoC), not the GPU.
> 
> We found a workaround: on resume, rewrite the Intel PCI bridge
> 'Prefetchable Base Upper 32 Bits' register. In the cases that I checked,
> this register has value 0 and we just have to rewrite that value.
> 
> It's very strange that rewriting the exact same register value
> makes a difference, but it definitely makes the issue go away.
> It's not just acting as some kind of memory barrier, because rewriting
> other bridge registers does not work around the issue. There's something
> magic in this particular register.

If true, this sounds like some sort of erratum, so it would be good to
get some input from Intel, and I cc'd a few Intel folks.

It's interesting that all the systems below are from Asus.  That makes
me think there's some BIOS or SMM connection, e.g., SMM traps the
register write and does something magic.

Does this problem happen after a full system suspend/resume, or does
it happen after runtime suspend of only the GPU?  Or runtime suspend
of only the GPU and the upstream bridge?

Can we tell whether Windows rewrites this register unconditionally at
resume-time?  If so, it may be more robust for Linux to do the same.
The whole thing is black magic, which I hate, but if it's our only
choice, it may be better to have this applied everywhere so we don't
keep stubbing our toes on new systems that require the quirk.

> We examined our database of Asus hardware and identified 43 products
> that we believe are affected. Checking the nvidia GPU parent PCI bridge
> on each one, in total 5 Intel PCI bridges need quirking as below.
> The quirk will run on bridges even where no nvidia GPU is connected,
> but it should be harmless, and we at least limit it to only running
> on Asus products.
> 
> This fix was tested on all the affected models that we have in hands
> (X542UQ, UX533FD, X530UN, V272UN).
> 
> Signed-off-by: Daniel Drake 
> ---
> 
> Notes:
> If anyone has ideas for why writing this register makes a difference, or
> suggestions for other approaches then I'm all ears...
> 
> Here is some basic info of the 43 products believed to be affected:
> basic DMI data, nvidia GPU PCI info, parent PCI bridge info.

Can you attach the list below to a kernel.org bugzilla and include the
URL in your changelog?

> sys_vendor: ASUSTeK COMPUTER INC.
> board_name: FX502VD
> product_name: FX502VD
> 01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev 
> ff) (prog-if ff)
>   !!! Unknown header type 7f
> 00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) 
> (prog-if 00 [Normal decode])
> 
> sys_vendor: ASUSTeK COMPUTER INC.
> board_name: FX570UD
> product_name: ASUS Gaming FX570UD
> 01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev 
> a1)
>   Subsystem: ASUSTeK Computer Inc. Device [1043:1f40]
> 00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:9d10] (rev f1) 
> (prog-if 00 [Normal decode])
> 
> sys_vendor: ASUSTeK COMPUTER INC.
> board_name: GL553VD
> product_name: GL553VD
> 01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev 
> a1)
>   Subsystem: ASUSTeK Computer Inc. Device [1043:15e0]
> 00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) 
> (prog-if 00 [Normal decode])
> 
> sys_vendor: ASUSTeK COMPUTER INC.
> board_name: GL553VD
> product_name: GL553VD
> 01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev 
> a1)
>   Subsystem: ASUSTeK Computer Inc. Device [1043:15e0]
> 00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) 
> (prog-if 00 [Normal decode])
> 
> sys_vendor: ASUSTeK COMPUTER INC.
> board_name: GL753VD
> product_name: GL753VD
> 01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1c8d] (rev 
> a1)
>   Subsystem: ASUSTeK Computer Inc. Device [1043:1590]
> 00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:1901] (rev 05) 
> (prog-if 00 [Normal decode])
> 
> sys_vendor: ASUSTeK COMPUTER INC.
> board_name: GL753VD
> product_name: GL753VD
> 01:00.0 3D controller [0302]: NVIDIA Corporation Device [10de:1