Bug#1029968: And some patches

2023-12-03 Thread Dr. David Alan Gilbert
As well as the fixes in 6.6, we also need this patchup series here:

https://lore.kernel.org/linux-media/ZWibhE350L3BTRK8@gallifrey/T/#t

These seem to make it pretty nicely.
-- 
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert|   Running GNU/Linux   | Happy  \ 
\dave @ treblig.org |   | In Hex /
 \ _|_ http://www.treblig.org   |___/



Bug#1029968: fixed in 6.6

2023-11-12 Thread Dr. David Alan Gilbert
This looks like it's fixed in 6.6, I think they had a major rewrite
in there.
It's a conversion to vb2 in the series starting with
d1846d72587e9241e73a18da14a325b43700013b

There are a couple of minor oddities with that
(they list the sequence cost the bttv had) but that's relatively minor.

Dave

-- 
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert|   Running GNU/Linux   | Happy  \ 
\dave @ treblig.org |   | In Hex /
 \ _|_ http://www.treblig.org   |___/



Bug#1029968: Info received (Bug#1029968: Info received (Bug#1029968: Info received (Bug#1029968: Acknowledgement (bttv/v4l: WARNING: CPU: 6 PID: 6164 at mm/vmalloc.c:487 __vmap_pages_range_noflush+0x3

2023-02-03 Thread Dr. David Alan Gilbert
It's a bit messy...

a) The patch I bisected to is not the root cause of the bug; it just
triggers a ~9 year old bug in the v4l code - so this patch isn't going
to get changed.

b) The ~9 year old bug is in a particularly hairy piece of memory management
code  in v4l that I doubt anyone is going to fix.

c) The plan is all the drivers using that API are to either be retired
or rewritten using a new API; that's already been done for some of the
drivers and the bttv one is a few months out.  I'm not sure that's
any use to this version of Debian though.

d) The work arounds are:
  1) Disable iommu
  2) some v4l tools can use an mmap interface rather than the read(2)
interface; that seems to be OK.

Dave

-- 
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert|   Running GNU/Linux   | Happy  \ 
\dave @ treblig.org |   | In Hex /
 \ _|_ http://www.treblig.org   |___/



Bug#1029968: Info received (Bug#1029968: Info received (Bug#1029968: Info received (Bug#1029968: Acknowledgement (bttv/v4l: WARNING: CPU: 6 PID: 6164 at mm/vmalloc.c:487 __vmap_pages_range_noflush+0x3

2023-02-01 Thread Dr. David Alan Gilbert
I sent this upstream report:
https://lore.kernel.org/linux-iommu/Y9qSwkLxeMpffZK%2F@gallifrey/T/#u
-- 
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert|   Running GNU/Linux   | Happy  \ 
\dave @ treblig.org |   | In Hex /
 \ _|_ http://www.treblig.org   |___/



Bug#1029968: Info received (Bug#1029968: Info received (Bug#1029968: Acknowledgement (bttv/v4l: WARNING: CPU: 6 PID: 6164 at mm/vmalloc.c:487 __vmap_pages_range_noflush+0x3e0/0x4d0)))

2023-02-01 Thread Dr. David Alan Gilbert
Confirmed still happens on upstream 6.2.0-rc6
-- 
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert|   Running GNU/Linux   | Happy  \ 
\dave @ treblig.org |   | In Hex /
 \ _|_ http://www.treblig.org   |___/



Bug#1029968: Acknowledgement (bttv/v4l: WARNING: CPU: 6 PID: 6164 at mm/vmalloc.c:487 __vmap_pages_range_noflush+0x3e0/0x4d0)

2023-02-01 Thread Dr. David Alan Gilbert
* Diederik de Haas (didi.deb...@cknow.org) wrote:

> Thanks for that thorough analyses!

Thanks for the reply,

> If you're 'penguin42' on IRC,

Yep, that's me.

> then I'd suggest to present your findings to
> io...@lists.linux.dev as both the author and the reviewer are highly likely 
> subscribed to that list.
> 
> scripts/get_maintainer.pl drivers/iommu/dma-iommu.c
> scripts/get_maintainer.pl kernel/dma/Makefile
> 
> list them both and both results have also that ML in their result.

Yep, will do; I'm just going to try a 6.2rc as well just in case it's
got fixed very recently, and have a poke about in case I can see
any obvious cause now I know the change that triggered it.
I'll include the linux-media list as well since it's just as likely
that it's a fault on the v4l/bttv driver.

Dave

> HTH


-- 
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert|   Running GNU/Linux   | Happy  \ 
\dave @ treblig.org |   | In Hex /
 \ _|_ http://www.treblig.org   |___/



Bug#1029968: Info received (Bug#1029968: Acknowledgement (bttv/v4l: WARNING: CPU: 6 PID: 6164 at mm/vmalloc.c:487 __vmap_pages_range_noflush+0x3e0/0x4d0))

2023-01-31 Thread Dr. David Alan Gilbert
0() GS:9679efdc() 
knlGS:
[   78.988343] CS:  0010 DS:  ES:  CR0: 80050033
[   78.988346] CR2: bd7fc110 CR3: 00022ce02000 CR4: 06e0

-- 
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert|   Running GNU/Linux   | Happy  \ 
\dave @ treblig.org |   | In Hex /
 \ _|_ http://www.treblig.org   |___/



Bug#1029968: Acknowledgement (bttv/v4l: WARNING: CPU: 6 PID: 6164 at mm/vmalloc.c:487 __vmap_pages_range_noflush+0x3e0/0x4d0)

2023-01-31 Thread Dr. David Alan Gilbert
bisected:
GOOD [37fcacb50be7071d146144a6c5c5bf0194b9a1cf] phy: PHY_FSL_LYNX_28G should 
depend on ARCH_LAYERSCAPE
BAD [f5ff79fddf0efecca538046b5cc20fb3ded2ec4f] dma-mapping: remove 
CONFIG_DMA_REMAP
GOOD [e62c17f0455a74b182ce6373e2777817256afaa1] MAINTAINERS: update maintainer 
list of DMA MAPPING BENCHMARK
GOOD [0fb3436b4b36cf69f4544385aa2bb8c5a4913509] sparc: Remove usage of the 
deprecated "pci-dma-compat.h" API
GOOD [fba09099c6e506608e05e08ac717bf34501f821b] media: v4l2-pci-skeleton: 
Remove usage of the deprecated "pci-dma-compat.h" API

dg@major:~/kernel/kernel-clone$ git bisect good
f5ff79fddf0efecca538046b5cc20fb3ded2ec4f is the first bad commit
commit f5ff79fddf0efecca538046b5cc20fb3ded2ec4f
Author: Christoph Hellwig 
Date:   Sat Feb 26 16:40:21 2022 +0100

dma-mapping: remove CONFIG_DMA_REMAP

That sounds like a believable cause given that it's IOMMU related
and device related.

-- 
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert|   Running GNU/Linux   | Happy  \ 
\dave @ treblig.org |   | In Hex /
 \ _|_ http://www.treblig.org   |___/



Bug#1029968: Acknowledgement (bttv/v4l: WARNING: CPU: 6 PID: 6164 at mm/vmalloc.c:487 __vmap_pages_range_noflush+0x3e0/0x4d0)

2023-01-30 Thread Dr. David Alan Gilbert
Upstream 5.17 works
Upstream 5.18 fails

(with intel_iommu=on)

Let the bisect begin.

-- 
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert|   Running GNU/Linux   | Happy  \ 
\dave @ treblig.org |   | In Hex /
 \ _|_ http://www.treblig.org   |___/



Bug#1029968: Acknowledgement (bttv/v4l: WARNING: CPU: 6 PID: 6164 at mm/vmalloc.c:487 __vmap_pages_range_noflush+0x3e0/0x4d0)

2023-01-30 Thread Dr. David Alan Gilbert
This is IOMMU related.

Upstream 6.1 and 5.18 *do* exhibit the bug, but only with intel_iommu=on
where as Debian seems to default it to on.

-- 
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert|   Running GNU/Linux   | Happy  \ 
\dave @ treblig.org |   | In Hex /
 \ _|_ http://www.treblig.org   |___/



Bug#1029968: Acknowledgement (bttv/v4l: WARNING: CPU: 6 PID: 6164 at mm/vmalloc.c:487 __vmap_pages_range_noflush+0x3e0/0x4d0)

2023-01-29 Thread Dr. David Alan Gilbert
I built upstream kernels 5.18.0 and 6.1.0 and both of them work for me.
Which makes life much more painful to find.

-- 
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert|   Running GNU/Linux   | Happy  \ 
\dave @ treblig.org |   | In Hex /
 \ _|_ http://www.treblig.org   |___/



Bug#1029968: bisecting we go...

2023-01-29 Thread Dr. David Alan Gilbert
Note an easier way to trigger the bug iscat /dev/vbi0 > /dev/null

   WORKS 
https://snapshot.debian.org/archive/debian/20220101T024315Z/pool/main/l/linux-signed-amd64/linux-image-5.10.0-9-amd64_5.10.70-1_amd64.deb
   linux-image-5.10.0-9-amd64_5.10.70-1_amd64.deb - kexecing in works

   WORKS 
https://snapshot.debian.org/archive/debian/20220101T024315Z/pool/main/l/linux-signed-amd64/linux-image-5.15.0-2-amd64_5.15.5-2_amd64.deb
   5.15.5-2

   FAILS 
https://snapshot.debian.org/archive/debian/20220701T034227Z/pool/main/l/linux-signed-amd64/linux-image-amd64_5.18.5-1_amd64.deb
 [   98.154835] BUG: unable to handle page fault for address: 
bc2480f4
[   98.154848] #PF: supervisor write access in kernel mode
[   98.154853] #PF: error_code(0x000b) - reserved bit violation

That 5.18 failure is a bit different, but the backtrace is similar;
somewhere between 5.18 it's changed from a BUG to a WARN but
back in 5.15 it just worked.

Dave

-- 
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert|   Running GNU/Linux   | Happy  \ 
\dave @ treblig.org |   | In Hex /
 \ _|_ http://www.treblig.org   |___/



Bug#1029968: Acknowledgement (bttv/v4l: WARNING: CPU: 6 PID: 6164 at mm/vmalloc.c:487 __vmap_pages_range_noflush+0x3e0/0x4d0)

2023-01-29 Thread Dr. David Alan Gilbert
   WORKS 
https://snapshot.debian.org/archive/debian/20220601T031637Z/pool/main/l/linux-signed-amd64/linux-image-5.17.0-1-amd64_5.17.3-1_amd64.deb
   5.17.0-1-amd64 #1 SMP PREEMPT Debian 5.17.3-1


So I think it's time to move upstream and bisect between 5.17 and 5.18

Dave

-- 
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert|   Running GNU/Linux   | Happy  \ 
\dave @ treblig.org |   | In Hex /
 \ _|_ http://www.treblig.org   |___/