On 06.11.20 18:32, Pavel Procopiuc wrote:
Op 05.11.2020 om 21:23 schreef David Hildenbrand:
So just to make sure I understand you correctly, you'd like to see if the 
problem with ath11k driver on my hardware persists when I boot pristine 
5.10-rc2 kernel (without reverting commit 
7fef431be9c9ac255838a9578331567b9dba4477) and with page_alloc.shuffle=1, right?


Right, but as lists are randomized then it might take a couple of tries to 
reproduce. I‘ll have a look at the driver code / failing path on Monday, when 
back to work.

I have done 5 boots of pristine 5.10-rc2 with page_alloc.shuffle=1. Out of 
those: 1st, 2nd, 4th and 5th resulted in
working ath11k driver, logs were the same as with the commit 
7fef431be9c9ac255838a9578331567b9dba4477 reverted. The 3rd
one failed, but in a different way, I just had no output from the driver after 
initialization lines:

Nov 06 18:19:41 razor kernel: Linux version 5.10.0-rc2 (root@razor) (gcc 
(Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34
p6) 2.34.0) #8 SMP Fri Nov 6 18:14:36 CET 2020
Nov 06 18:19:41 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 
0x028000
Nov 06 18:19:41 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 
0xd2100000-0xd21fffff 64bit]
Nov 06 18:19:41 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot 
D3cold
Nov 06 18:19:41 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe 
bandwidth, limited by 5.0 GT/s PCIe x1 link at
0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
Nov 06 18:19:41 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
Nov 06 18:19:42 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI 
support is experimental!
Nov 06 18:19:42 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 
0xd2100000-0xd21fffff 64bit]
Nov 06 18:19:42 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 
0002)
Nov 06 18:19:42 razor kernel: mhi 0000:05:00.0: Requested to power ON
Nov 06 18:19:42 razor kernel: mhi 0000:05:00.0: Power on setup success

I had this before and usually it was fixed after rebooting into Windows and 
back. This time I just went and rebooted
into Linux again and driver was working on that boot (4th).

I'm sorry, but "WARNING: ath11k PCI support is experimental!" and such occasional issues don't give me the best feeling that everything is operating as it should :)


After that I removed page_alloc.shuffle=1 and did 2 additional boots, both of 
them resulted in a non-working driver with
the error messages about not being able to talk to firmware like I had before 
on the clean 5.10-rc2:

Nov 06 18:24:07 razor kernel: Linux version 5.10.0-rc2 (root@razor) (gcc 
(Gentoo 9.3.0-r1 p3) 9.3.0, GNU ld (Gentoo 2.34
p6) 2.34.0) #9 SMP Fri Nov 6 18:22:43 CET 2020
Nov 06 18:24:07 razor kernel: pci 0000:05:00.0: [17cb:1101] type 00 class 
0x028000
Nov 06 18:24:07 razor kernel: pci 0000:05:00.0: reg 0x10: [mem 
0xd2100000-0xd21fffff 64bit]
Nov 06 18:24:07 razor kernel: pci 0000:05:00.0: PME# supported from D0 D3hot 
D3cold
Nov 06 18:24:07 razor kernel: pci 0000:05:00.0: 4.000 Gb/s available PCIe 
bandwidth, limited by 5.0 GT/s PCIe x1 link at
0000:00:1c.1 (capable of 7.876 Gb/s with 8.0 GT/s PCIe x1 link)
Nov 06 18:24:07 razor kernel: pci 0000:05:00.0: Adding to iommu group 21
Nov 06 18:24:08 razor kernel: ath11k_pci 0000:05:00.0: WARNING: ath11k PCI 
support is experimental!
Nov 06 18:24:08 razor kernel: ath11k_pci 0000:05:00.0: BAR 0: assigned [mem 
0xd2100000-0xd21fffff 64bit]
Nov 06 18:24:08 razor kernel: ath11k_pci 0000:05:00.0: enabling device (0000 -> 
0002)
Nov 06 18:24:08 razor kernel: mhi 0000:05:00.0: Requested to power ON
Nov 06 18:24:08 razor kernel: mhi 0000:05:00.0: Power on setup success
Nov 06 18:24:08 razor kernel: ath11k_pci 0000:05:00.0: Respond mem req failed, 
result: 1, err: 0
Nov 06 18:24:08 razor kernel: ath11k_pci 0000:05:00.0: qmi failed to respond fw 
mem req:-22
Nov 06 18:24:13 razor kernel: ath11k_pci 0000:05:00.0: qmi failed memory 
request, err = -110
Nov 06 18:24:13 razor kernel: ath11k_pci 0000:05:00.0: qmi failed to respond fw 
mem req:-110
Nov 06 18:25:39 razor kernel: mhi 0000:05:00.0: Device failed to exit MHI Reset 
state


Okay, that means that you should be able to reproduce pre-7fef431be9c9ac255838a9578331567b9dba4477 with page_alloc.shuffle=1 as well ... it just might take a lot of tries to get a problematic page.

I could also imagine that loading the driver deferred, after quite some system/mm activity could result in the same issue.

Looks like something either cannot handle a specific address we received via dma_alloc_coherent(), or something is reading out of bounds, and the content after our allocated page doesn't have the expected value anymore (e.g., used to be zero, now no longer zero).

What puzzles me is that "err: 0". That should have been properly set by HW, no?

--
Thanks,

David / dhildenb

Reply via email to