Bug#1029968: And some patches
As well as the fixes in 6.6, we also need this patchup series here: https://lore.kernel.org/linux-media/ZWibhE350L3BTRK8@gallifrey/T/#t These seem to make it pretty nicely. -- -Open up your eyes, open up your mind, open up your code --- / Dr. David Alan Gilbert| Running GNU/Linux | Happy \ \dave @ treblig.org | | In Hex / \ _|_ http://www.treblig.org |___/
Bug#1029968: fixed in 6.6
This looks like it's fixed in 6.6, I think they had a major rewrite in there. It's a conversion to vb2 in the series starting with d1846d72587e9241e73a18da14a325b43700013b There are a couple of minor oddities with that (they list the sequence cost the bttv had) but that's relatively minor. Dave -- -Open up your eyes, open up your mind, open up your code --- / Dr. David Alan Gilbert| Running GNU/Linux | Happy \ \dave @ treblig.org | | In Hex / \ _|_ http://www.treblig.org |___/
Bug#1029968: Info received (Bug#1029968: Info received (Bug#1029968: Info received (Bug#1029968: Acknowledgement (bttv/v4l: WARNING: CPU: 6 PID: 6164 at mm/vmalloc.c:487 __vmap_pages_range_noflush+0x3
It's a bit messy... a) The patch I bisected to is not the root cause of the bug; it just triggers a ~9 year old bug in the v4l code - so this patch isn't going to get changed. b) The ~9 year old bug is in a particularly hairy piece of memory management code in v4l that I doubt anyone is going to fix. c) The plan is all the drivers using that API are to either be retired or rewritten using a new API; that's already been done for some of the drivers and the bttv one is a few months out. I'm not sure that's any use to this version of Debian though. d) The work arounds are: 1) Disable iommu 2) some v4l tools can use an mmap interface rather than the read(2) interface; that seems to be OK. Dave -- -Open up your eyes, open up your mind, open up your code --- / Dr. David Alan Gilbert| Running GNU/Linux | Happy \ \dave @ treblig.org | | In Hex / \ _|_ http://www.treblig.org |___/
Bug#1029968: Info received (Bug#1029968: Info received (Bug#1029968: Info received (Bug#1029968: Acknowledgement (bttv/v4l: WARNING: CPU: 6 PID: 6164 at mm/vmalloc.c:487 __vmap_pages_range_noflush+0x3
I sent this upstream report: https://lore.kernel.org/linux-iommu/Y9qSwkLxeMpffZK%2F@gallifrey/T/#u -- -Open up your eyes, open up your mind, open up your code --- / Dr. David Alan Gilbert| Running GNU/Linux | Happy \ \dave @ treblig.org | | In Hex / \ _|_ http://www.treblig.org |___/
Bug#1029968: Info received (Bug#1029968: Info received (Bug#1029968: Acknowledgement (bttv/v4l: WARNING: CPU: 6 PID: 6164 at mm/vmalloc.c:487 __vmap_pages_range_noflush+0x3e0/0x4d0)))
Confirmed still happens on upstream 6.2.0-rc6 -- -Open up your eyes, open up your mind, open up your code --- / Dr. David Alan Gilbert| Running GNU/Linux | Happy \ \dave @ treblig.org | | In Hex / \ _|_ http://www.treblig.org |___/
Bug#1029968: Acknowledgement (bttv/v4l: WARNING: CPU: 6 PID: 6164 at mm/vmalloc.c:487 __vmap_pages_range_noflush+0x3e0/0x4d0)
* Diederik de Haas (didi.deb...@cknow.org) wrote: > Thanks for that thorough analyses! Thanks for the reply, > If you're 'penguin42' on IRC, Yep, that's me. > then I'd suggest to present your findings to > io...@lists.linux.dev as both the author and the reviewer are highly likely > subscribed to that list. > > scripts/get_maintainer.pl drivers/iommu/dma-iommu.c > scripts/get_maintainer.pl kernel/dma/Makefile > > list them both and both results have also that ML in their result. Yep, will do; I'm just going to try a 6.2rc as well just in case it's got fixed very recently, and have a poke about in case I can see any obvious cause now I know the change that triggered it. I'll include the linux-media list as well since it's just as likely that it's a fault on the v4l/bttv driver. Dave > HTH -- -Open up your eyes, open up your mind, open up your code --- / Dr. David Alan Gilbert| Running GNU/Linux | Happy \ \dave @ treblig.org | | In Hex / \ _|_ http://www.treblig.org |___/
Bug#1029968: Info received (Bug#1029968: Acknowledgement (bttv/v4l: WARNING: CPU: 6 PID: 6164 at mm/vmalloc.c:487 __vmap_pages_range_noflush+0x3e0/0x4d0))
0() GS:9679efdc() knlGS: [ 78.988343] CS: 0010 DS: ES: CR0: 80050033 [ 78.988346] CR2: bd7fc110 CR3: 00022ce02000 CR4: 06e0 -- -Open up your eyes, open up your mind, open up your code --- / Dr. David Alan Gilbert| Running GNU/Linux | Happy \ \dave @ treblig.org | | In Hex / \ _|_ http://www.treblig.org |___/
Bug#1029968: Acknowledgement (bttv/v4l: WARNING: CPU: 6 PID: 6164 at mm/vmalloc.c:487 __vmap_pages_range_noflush+0x3e0/0x4d0)
bisected: GOOD [37fcacb50be7071d146144a6c5c5bf0194b9a1cf] phy: PHY_FSL_LYNX_28G should depend on ARCH_LAYERSCAPE BAD [f5ff79fddf0efecca538046b5cc20fb3ded2ec4f] dma-mapping: remove CONFIG_DMA_REMAP GOOD [e62c17f0455a74b182ce6373e2777817256afaa1] MAINTAINERS: update maintainer list of DMA MAPPING BENCHMARK GOOD [0fb3436b4b36cf69f4544385aa2bb8c5a4913509] sparc: Remove usage of the deprecated "pci-dma-compat.h" API GOOD [fba09099c6e506608e05e08ac717bf34501f821b] media: v4l2-pci-skeleton: Remove usage of the deprecated "pci-dma-compat.h" API dg@major:~/kernel/kernel-clone$ git bisect good f5ff79fddf0efecca538046b5cc20fb3ded2ec4f is the first bad commit commit f5ff79fddf0efecca538046b5cc20fb3ded2ec4f Author: Christoph Hellwig Date: Sat Feb 26 16:40:21 2022 +0100 dma-mapping: remove CONFIG_DMA_REMAP That sounds like a believable cause given that it's IOMMU related and device related. -- -Open up your eyes, open up your mind, open up your code --- / Dr. David Alan Gilbert| Running GNU/Linux | Happy \ \dave @ treblig.org | | In Hex / \ _|_ http://www.treblig.org |___/
Bug#1029968: Acknowledgement (bttv/v4l: WARNING: CPU: 6 PID: 6164 at mm/vmalloc.c:487 __vmap_pages_range_noflush+0x3e0/0x4d0)
Upstream 5.17 works Upstream 5.18 fails (with intel_iommu=on) Let the bisect begin. -- -Open up your eyes, open up your mind, open up your code --- / Dr. David Alan Gilbert| Running GNU/Linux | Happy \ \dave @ treblig.org | | In Hex / \ _|_ http://www.treblig.org |___/
Bug#1029968: Acknowledgement (bttv/v4l: WARNING: CPU: 6 PID: 6164 at mm/vmalloc.c:487 __vmap_pages_range_noflush+0x3e0/0x4d0)
This is IOMMU related. Upstream 6.1 and 5.18 *do* exhibit the bug, but only with intel_iommu=on where as Debian seems to default it to on. -- -Open up your eyes, open up your mind, open up your code --- / Dr. David Alan Gilbert| Running GNU/Linux | Happy \ \dave @ treblig.org | | In Hex / \ _|_ http://www.treblig.org |___/
Bug#1029968: Acknowledgement (bttv/v4l: WARNING: CPU: 6 PID: 6164 at mm/vmalloc.c:487 __vmap_pages_range_noflush+0x3e0/0x4d0)
I built upstream kernels 5.18.0 and 6.1.0 and both of them work for me. Which makes life much more painful to find. -- -Open up your eyes, open up your mind, open up your code --- / Dr. David Alan Gilbert| Running GNU/Linux | Happy \ \dave @ treblig.org | | In Hex / \ _|_ http://www.treblig.org |___/
Bug#1029968: bisecting we go...
Note an easier way to trigger the bug iscat /dev/vbi0 > /dev/null WORKS https://snapshot.debian.org/archive/debian/20220101T024315Z/pool/main/l/linux-signed-amd64/linux-image-5.10.0-9-amd64_5.10.70-1_amd64.deb linux-image-5.10.0-9-amd64_5.10.70-1_amd64.deb - kexecing in works WORKS https://snapshot.debian.org/archive/debian/20220101T024315Z/pool/main/l/linux-signed-amd64/linux-image-5.15.0-2-amd64_5.15.5-2_amd64.deb 5.15.5-2 FAILS https://snapshot.debian.org/archive/debian/20220701T034227Z/pool/main/l/linux-signed-amd64/linux-image-amd64_5.18.5-1_amd64.deb [ 98.154835] BUG: unable to handle page fault for address: bc2480f4 [ 98.154848] #PF: supervisor write access in kernel mode [ 98.154853] #PF: error_code(0x000b) - reserved bit violation That 5.18 failure is a bit different, but the backtrace is similar; somewhere between 5.18 it's changed from a BUG to a WARN but back in 5.15 it just worked. Dave -- -Open up your eyes, open up your mind, open up your code --- / Dr. David Alan Gilbert| Running GNU/Linux | Happy \ \dave @ treblig.org | | In Hex / \ _|_ http://www.treblig.org |___/
Bug#1029968: Acknowledgement (bttv/v4l: WARNING: CPU: 6 PID: 6164 at mm/vmalloc.c:487 __vmap_pages_range_noflush+0x3e0/0x4d0)
WORKS https://snapshot.debian.org/archive/debian/20220601T031637Z/pool/main/l/linux-signed-amd64/linux-image-5.17.0-1-amd64_5.17.3-1_amd64.deb 5.17.0-1-amd64 #1 SMP PREEMPT Debian 5.17.3-1 So I think it's time to move upstream and bisect between 5.17 and 5.18 Dave -- -Open up your eyes, open up your mind, open up your code --- / Dr. David Alan Gilbert| Running GNU/Linux | Happy \ \dave @ treblig.org | | In Hex / \ _|_ http://www.treblig.org |___/