Bug#939633: linux: rpi3b+ hangs with kernels 5.2, 5.3 (device tree issue?)

2019-11-20 Thread Thorsten Glaser
On Wed, 20 Nov 2019, Thorsten Glaser wrote:

> Please advice as to a fix ☻

By the advice of the original poster of this bugreport, we tried to
use the DTB from the buster kernel instead:

+ cp /usr/lib/linux-image-4.19.0-6-arm64/broadcom/bcm2837-rpi-3-b.dtb 
/boot/firmware/bcm2710-rpi-3-b.dtb
+ cp /usr/lib/linux-image-4.19.0-6-arm64/broadcom/bcm2837-rpi-3-b-plus.dtb 
/boot/firmware/bcm2710-rpi-3-b-plus.dtb
+ cp /usr/lib/linux-image-4.19.0-6-arm64/broadcom/bcm2837-rpi-cm3-io3.dtb 
/boot/firmware/bcm2710-rpi-cm3.dtb

I’m attaching full serial console output of a boot with the stable DTB
and one with the unstable DTB. In both cases, we tried to get the system
to crash; in the 5.x one it died quickly, in the 4.x case we didn’t manage.

Notable differences:

• with the 4.x DTB, 'reboot' works (cf. #941597)
  (like with the stable kernel)
• with the 4.x DTB, WLAN firmware is missing, which does work with 5.x
• with the 4.x DTB, audio is not working (like with the stable kernel)
• with the 4.x DTB, 3D graphics still works (unlike the stable kernel)
• with the 4.x DTB (like the stable kernel), we get some lines à la…
[  570.800140] alloc_contig_range: [33550, 33552) PFNs busy
  … which are almost (but not completely) unheard of with the 5.x DTB
  (and which are, incidentally, also mentioned in #925334)

The 5.x DTB log ends with the CMA having been filled up (by a program
written in python3-pygame, at that) before the crashing line. This is
the same as #925334, except it afterwards crashes with…

[  307.942205] bcm2835-power bcm2835-power: Failed to disable ASB master for v3d

… whereas, in #925334, it continued with…

[  739.334049] vc4_v3d 3fc0.v3d: Failed to allocate memory for tile 
binning: -12. You may need to enable CMA or give it more memory.

… and ceased updating the screen until reboot (but did not crash the
system). *HOWEVER*, we also managed to crash the 5.x system without
overrunning the CMA, earlier, but we forgot to save serial console
output for that run (it was directly before I wrote the previous mail
to this bugreport).

The memory overrunning issue is effectively a local DoS, given it can
be triggered from unprivilegued userspace. (But, as I said above, the
crash can also be triggered without overrunning memory; mostly using
OpenGL.)

It might be useful to diff and bisect the DTB sources (with includes
expanded, probably), but I’ve not got any experience with DTBs or ARM
devices.

bye,
//mirabilos
-- 
tarent solutions GmbH
Rochusstraße 2-4, D-53123 Bonn • http://www.tarent.de/
Tel: +49 228 54881-393 • Fax: +49 228 54881-235
HRB 5168 (AG Bonn) • USt-ID (VAT): DE122264941
Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg

**

Mit der tarent Academy bieten wir auch Trainings und Schulungen in den
Bereichen Softwareentwicklung, Agiles Arbeiten und Zukunftstechnologien an.

Besuchen Sie uns auf www.tarent.de/academy. Wir freuen uns auf Ihren Kontakt.

**[0.00] Booting Linux on physical CPU 0x00 [0x410fd034]
[0.00] Linux version 5.3.0-2-arm64 (debian-ker...@lists.debian.org) 
(gcc version 9.2.1 20191109 (Debian 9.2.1-19)) #1 SMP Debian 5.3.9-3 
(2019-11-19)
[0.00] Machine model: Raspberry Pi 3 Model B Plus Rev 1.3
[0.00] efi: Getting EFI parameters from FDT:
[0.00] efi: UEFI not found.
[0.00] cma: Reserved 128 MiB at 0x3340
[0.00] NUMA: No NUMA configuration found
[0.00] NUMA: Faking a node at [mem 
0x-0x3b3f]
[0.00] NUMA: NODE_DATA [mem 0x3320d840-0x3320efff]
[0.00] Zone ranges:
[0.00]   DMA32[mem 0x-0x3b3f]
[0.00]   Normal   empty
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   0: [mem 0x-0x3b3f]
[0.00] Initmem setup node 0 [mem 0x-0x3b3f]
[0.00] percpu: Embedded 32 pages/cpu s93016 r8192 d29864 u131072
[0.00] Detected VIPT I-cache on CPU0
[0.00] CPU features: detected: ARM erratum 845719
[0.00] CPU features: kernel page table isolation forced ON by KASLR
[0.00] CPU features: detected: Kernel page table isolation (KPTI)
[0.00] CPU features: detected: ARM erratum 843419
[0.00] Built 1 zonelists, mobility grouping on.  Total pages: 238896
[0.00] Policy zone: DMA32
[0.00] Kernel command line: video=HDMI-A-1:1920x1080@60 
dma.dmachans=0x7f35 bcm2709.boardrev=0xa020d3 bcm2709.serial=0xa5e576b6 
bcm2709.uart_clock=4800 bcm2709.disk_led_gpio=29 
bcm2709.disk_led_active_low=0 smsc95xx.macaddr=B8:27:EB:E5:76:B6 
vc_mem.mem_base=0x3ec0 vc_mem.mem_size=0x4000  console=ttyS1,115200 
console=tty0 root=/dev/mmcblk0p2 rw elevator=deadline fsck.repair=yes 
net.ifnames=0 cma=128M rootwait
[0.00] Dentry cache hash table entries: 131072 (order: 8, 1048576 
bytes, linear)

Bug#939633: linux: rpi3b+ hangs with kernels 5.2, 5.3 (device tree issue?)

2019-11-20 Thread Thorsten Glaser
retitle 939633 linux: rpi3b+ hangs with kernels 5.2, 5.3 (device tree issue?)
notfound 939633 4.19.67-2+deb10u1
notfound 939633 4.19.67-2+deb10u2
found 939633 5.2.9-2~bpo10+1
found 939633 5.2.17-1~bpo10+1
found 939633 5.3.9-3
thanks

We’re currently suffering from a similar situation.

Debian buster/arm64 installation, straight from this script:
https://evolvis.org/plugins/scmgit/cgi-bin/gitweb.cgi?p=shellsnippets/shellsnippets.git;a=blob;f=posix/mkrpi3b%2Bimg.sh;hb=HEAD

With the buster kernel, the analog audio doesn’t work, and
3D graphics using llvmpipe are really slow, but the system
is, overall, stable.

With the backports kernel and either buster’s raspi3-firmware
or sid’s raspi-firmware (tested both), 3D graphics are accelerated,
poweroff/reboot don’t work, and the system crashes after some
uptime (confirmed it’s not a thermal issue, and the official
PSU is used, so power is enough).

With the sid kernel and sid’s raspi-firmware, the same issue
happens. We managed to get a serial console running, and the
crash message is what led us to this bugreport:

[timestamp] bcm2835-power bcm2835-power: Failed to disable ASB master for v3d

This is a fairly stock setup, with CMA raised to 128M and
consoles fixed (see the outstanding bugreports in the
raspi3-firmware/raspi-firmware package), and video size fixed:

tglase@tglase:/mnt $ grep '^[^#]' etc/default/raspi-firmware
CMA=128M
CONSOLES='ttyS1,115200 tty0'
tglase@tglase:/mnt $ cat etc/default/raspi-firmware-custom 
disable_overscan=1


Incidentally, shortly before, we see this in syslog:

Jan  1 01:00:26 rpi3bplus kernel: [   26.255342] broken atomic modeset 
userspace detected, disabling atomic

Unsure whether it’s related.


Other things are reports in Fedora with instabilities relating
to the RPi 3B+ and power management…

Please advice as to a fix ☻

bye,
//mirabilos
-- 
tarent solutions GmbH
Rochusstraße 2-4, D-53123 Bonn • http://www.tarent.de/
Tel: +49 228 54881-393 • Fax: +49 228 54881-235
HRB 5168 (AG Bonn) • USt-ID (VAT): DE122264941
Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg

**

Mit der tarent Academy bieten wir auch Trainings und Schulungen in den
Bereichen Softwareentwicklung, Agiles Arbeiten und Zukunftstechnologien an.

Besuchen Sie uns auf www.tarent.de/academy. Wir freuen uns auf Ihren Kontakt.

**