Bug#925334: vc4: CMA fills up and screen not updated anymore on raspi3
Hi, * Fabian Pietsch (Wed, 03 Apr 2019 12:30:01 +0200): > As suggested, I tested more compact cmdline.txt, though: > > | root=/dev/mmcblk0p2 > > | console=tty0 root=/dev/mmcblk0p2 > > | console=tty0 root=/dev/mmcblk0p2 cma=128M > > With the first two, cma defaulted to 64M and already the lightdm login > screen stops updating after a few seconds to minutes. With the 3rd, > the bug initially didn't happen so I used X a bit, then logged out and > let the system fall idle; the bug then seems to have happened > 9868 seconds after boot (according to dmesg --follow). Another round (system mostly idle except for manually renaming enx[...] to eth0 after boot and restarting wicd, which came with lxde meta-package to manage the networking, to get a correct system time via NTP) -> 3960 seconds after boot (in dmesg). That seems to suggest to me that the cmdline.txt built by raspi3-firmware was not the issue, here. Still don't know whether it's possible or sensible to disable the "weird" initial cmdline passed by the firmware, though. In any case, the tile binning error ... | kernel: vc4_v3d 3fc0.v3d: Failed to allocate memory for tile binning: -12. You may need to enable CMA or give it more memory. ... together with a preceding, e.g., ... | kernel: [drm:vc4_bo_create [vc4]] *ERROR* Failed to allocate from CMA: | kernel: [drm]V3D: 23468kb BOs (10) | kernel: [drm] V3D shader:144kb BOs (35) | kernel: [drm] dumb: 8148kb BOs (4) ... is usually surrounded (in journal) by nothing but many: | kernel: alloc_contig_range: [36400, 37500) PFNs busy What I'm trying to say is that there seems to be no log-noticeable concurrent activity going on. Watching the CMA use via /proc/meminfo suggests that it's much more than half free, most of the time. The tile binning error seems to be entirely random, at this point. Looking at the source[1], vc4_allocate_bin_bo() seems to use a strategy of successively allocating memory until it randomly finds a block that fits certain requirements. Maybe randomly sometimes there is no such block available, leading to it failing with -ENOMEM. It's not clear to me, though, when and why the function is called, seemingly randomly on an idle system. It reads to me as an initialization, not something that is randomly repeated. (?) [1] https://sources.debian.org/src/linux/4.19.28-2/drivers/gpu/drm/vc4/vc4_v3d.c/#L244 Again, it would be nice if the resulting device error state could somehow be reset / the function retried with more/different CMA free at a later point during the same boot. Perhaps that's already possible (maybe even in a general way?) but I don't know how. (Trying to unbind the driver (vc4_v3d) via sysfs led to a kernel oops (IIRC paging fault?), and an attempt to bind it again was rejected without any noticeable action.) Regards, Fabian -- Fabian "canvon" Pietsch - https://www.canvon.de/
Bug#925334: vc4: CMA fills up and screen not updated anymore on raspi3
Hi, * Romain Perier (Wed, 27 Mar 2019 21:33:19 +0100): > Return-Path: > Message-ID: <20190327203319.ga15...@debby.home> > From: Romain Perier > Subject: Re: Bug#925334: vc4: CMA fills up and screen not updated anymore > on raspi3 > To: Fabian Pietsch , 925...@bugs.debian.org > User-Agent: Mutt/1.10.1 (2018-07-13) > > On Wed, Mar 27, 2019 at 03:29:21PM +0100, Fabian Pietsch wrote: [...] > > the bug is still present in the current version. It took a freshly > > booted, idle system 3304 seconds for the bug to occur, though, > > so this time rather an hour (than 10-30min as stated before). [...] > > So vc4_v3d which reported the error seems to be in status=error > > and counts as suspended. Manual attempts to get it to resume again, > > now that more cma is free again, failed, but there are probably ways > > I don't know about. [...] > Could you test, two stuffs for me ? : > > 1. The VC4 driver seems to use runtime pm operations, could you try to > disable runtime suspend completly (there are kernel parameters for this > if I remember correctly) ? Didn't find anything useful to that regard in linux-doc Documentation/ and by googling. Could you please tell me how? > 2. Your kernel cmdline are... weird, could you try with minimalistic > kernel cmdline ? Only keep console= for logging to uart and/or to your > tty0 and keep your rootfs. The initial part until the two spaces seems to be auto-generated by the firmware at boot, so I guess I can't change or suppress it; if it should be possible, again, please tell me how. My kernel cmdline as passed *to* the firmware is: | fabian@rpi3:~$ cat /boot/firmware/cmdline.txt | console=tty0 console=ttyS1,115200 root=/dev/mmcblk0p2 rw elevator=deadline fsck.repair=yes net.ifnames=0 cma=128M rootwait This is just auto-generated by the raspi3-firmware package to which you (Romain Perier) seem to be contributing. As suggested, I tested more compact cmdline.txt, though: | root=/dev/mmcblk0p2 | console=tty0 root=/dev/mmcblk0p2 | console=tty0 root=/dev/mmcblk0p2 cma=128M With the first two, cma defaulted to 64M and already the lightdm login screen stops updating after a few seconds to minutes. With the 3rd, the bug initially didn't happen so I used X a bit, then logged out and let the system fall idle; the bug then seems to have happened 9868 seconds after boot (according to dmesg --follow). Regards, Fabian -- Fabian "canvon" Pietsch - https://www.canvon.de/
Bug#925334: vc4: CMA fills up and screen not updated anymore on raspi3
On Wed, Mar 27, 2019 at 03:29:21PM +0100, Fabian Pietsch wrote: > Package: src:linux > Version: 4.19.28-2 > Followup-For: Bug #925334 > > Dear Maintainer, > > the bug is still present in the current version. It took a freshly > booted, idle system 3304 seconds for the bug to occur, though, > so this time rather an hour (than 10-30min as stated before). > > After the bug has happened, I get the following information from sysfs: > > | root@rpi3:/sys/bus/platform/drivers/vc4_v3d# for I in enabled status > active_time suspended_time; do echo -n "$I=$(cat > 3fc0.v3d/power/runtime_$I) "; done; echo > | enabled=enabled status=error active_time=16800 suspended_time=55464216 > > And again: > > | enabled=enabled status=error active_time=16800 suspended_time=55479160 > > So vc4_v3d which reported the error seems to be in status=error > and counts as suspended. Manual attempts to get it to resume again, > now that more cma is free again, failed, but there are probably ways > I don't know about. > > "echo on > .../power/control" changed runtime_enabled=forbidden > (and echo auto > control changed it back to runtime_enabled=enabled), > but runtime_status remained at "error". > > Regards, Fabian > > > -- System Information: > Debian Release: buster/sid > APT prefers testing > APT policy: (500, 'testing') > Architecture: arm64 (aarch64) > > Kernel: Linux 4.19.0-4-arm64 (SMP w/4 CPU cores) > Kernel taint flags: TAINT_CRAP > Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), > LANGUAGE=en_GB.UTF-8 (charmap=UTF-8) > Shell: /bin/sh linked to /usr/bin/dash > Init: systemd (via /run/systemd/system) > LSM: AppArmor: enabled Hi, As discussed on IRC, thanks for your bug report and for the refresh. Could you test, two stuffs for me ? : 1. The VC4 driver seems to use runtime pm operations, could you try to disable runtime suspend completly (there are kernel parameters for this if I remember correctly) ? 2. Your kernel cmdline are... weird, could you try with minimalistic kernel cmdline ? Only keep console= for logging to uart and/or to your tty0 and keep your rootfs. Thanks, Romain signature.asc Description: PGP signature
Bug#925334: vc4: CMA fills up and screen not updated anymore on raspi3
Package: src:linux Version: 4.19.28-2 Followup-For: Bug #925334 Dear Maintainer, the bug is still present in the current version. It took a freshly booted, idle system 3304 seconds for the bug to occur, though, so this time rather an hour (than 10-30min as stated before). After the bug has happened, I get the following information from sysfs: | root@rpi3:/sys/bus/platform/drivers/vc4_v3d# for I in enabled status active_time suspended_time; do echo -n "$I=$(cat 3fc0.v3d/power/runtime_$I) "; done; echo | enabled=enabled status=error active_time=16800 suspended_time=55464216 And again: | enabled=enabled status=error active_time=16800 suspended_time=55479160 So vc4_v3d which reported the error seems to be in status=error and counts as suspended. Manual attempts to get it to resume again, now that more cma is free again, failed, but there are probably ways I don't know about. "echo on > .../power/control" changed runtime_enabled=forbidden (and echo auto > control changed it back to runtime_enabled=enabled), but runtime_status remained at "error". Regards, Fabian -- System Information: Debian Release: buster/sid APT prefers testing APT policy: (500, 'testing') Architecture: arm64 (aarch64) Kernel: Linux 4.19.0-4-arm64 (SMP w/4 CPU cores) Kernel taint flags: TAINT_CRAP Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), LANGUAGE=en_GB.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled
Bug#925334: vc4: CMA fills up and screen not updated anymore on raspi3
Package: src:linux Version: 4.19.16-1 Severity: important Tags: upstream Dear Maintainer, (please also see the additional information in the footnotes below.) When using the vc4 video driver with Xorg on the Raspberry Pi 3 [raspi3], it seems to sooner or later fill up the CMA, leading to, e.g., this: [ 739.307920] [drm:vc4_bo_create [vc4]] *ERROR* Failed to allocate from CMA: [ 739.314932] [drm]V3D: 23468kb BOs (10) [ 739.321333] [drm] V3D shader:144kb BOs (35) [ 739.327734] [drm] dumb: 8148kb BOs (4) When additionally the following happens, the screen stops updating until reboot [stops_updating]: [ 739.334049] vc4_v3d 3fc0.v3d: Failed to allocate memory for tile binning: -12. You may need to enable CMA or give it more memory. This happens with cma=64M, cma=128M and cma=256M, though 128M seems to be the recommended size for vc4 use, according to comments in package raspi3-firmware's /etc/default/raspi3-firmware. ii raspi3-firmware 1.20190215-1 arm64Raspberry Pi 2 and 3 GPU firmware and bootloaders With cma=128M, the error happens on a freshly booted, idle system running lightdm, after perhaps 10m-30m, or can be provoked by running a [screensaver]. A work-around is to force Xorg to use driver fbdev, which seems to work even after the tile binning error which otherwise would have required a reboot. I also get a lot of, e.g.: [ 739.135918] alloc_contig_range: [37400, 38400) PFNs busy I don't know if this is related or a different problem. Please somehow make it possible (or even automatic) for the screen update to continue without the need for a reboot. Fixes to CMA handling so that the tile binning error doesn't even happen anymore would be even better! Regards, Fabian (Please also see the footnotes below.) [raspi3] This machine was set up based on a Raspberry Pi 3 Debian buster preview image, from https://wiki.debian.org/RaspberryPi3 ; it says unofficial/unsupported, there, I hope this bug report still is ok as, as far as I understand, the unmodified Debian buster arm64 kernel seems to be used[1]. [1] Pulled in via apt from official mirror in sources.list: https://github.com/Debian/raspi3-image-spec/blob/master/raspi3.yaml#L72 [stops_updating] "Stops updating the screen" here means that the Xorg tty graphical contents don't change anymore, except for the mouse cursor. The mouse cursor can still be moved around, and even changes form on, e.g., window borders, and I can resize the terminal window up one line, and the cursor change on mouseover reflects the change; just nothing changes to the non-mousecursor visual picture. Also, tasking into console works fine, and tasking back into the Xorg tty restores the previous visual picture, and tasking back into the console again is also still possible. So no real "freeze" or hard crash, here, except that, after the tile binning error in dmesg, the Xorg tty graphical contents don't change (except for mouse cursor). [screensaver] The error can be provoked by running the following for some time: /usr/lib/xscreensaver/tessellimage -root -fill-screen This leads to a slightly different pattern of dmesg messages, though. The thing in common is that the screen stops updating, there, too, when the tile binning error quoted above happens. When provoked in this way, "grep ^CmaFree: /proc/meminfo" in a loop said 25564 kB directly before a tile binning error message, 7160 kB directly after, with values between 8000-1 kB after, with recovery to 81740 kB after exiting tessellimage. But the screen update stays stopped. Additional dmesg output: [0.00] Booting Linux on physical CPU 0x00 [0x410fd034] [0.00] Linux version 4.19.0-2-arm64 (debian-ker...@lists.debian.org) (gcc version 8.2.0 (Debian 8.2.0-14)) #1 SMP Debian 4.19.16-1 (2019-01-17) [0.00] Machine model: Raspberry Pi 3 Model B Rev 1.2 [...] [0.00] cma: Reserved 128 MiB at 0x3340 [...] [0.00] Memory: 780176K/970752K available (8700K kernel code, 1600K rwdata, 2788K rodata, 4864K init, 530K bss, 59504K reserved, 131072K cma-reserved) [...] [0.000193] Console: colour dummy device 80x25 [0.000682] console [tty0] enabled [...] [0.141070] simple-framebuffer 3e887000.framebuffer: framebuffer at 0x3e887000, 0x373800 bytes, mapped to 0x(ptrval) [0.141106] simple-framebuffer 3e887000.framebuffer: format=r5g6b5, mode=1824x984x16, linelength=3648 [0.162242] Console: switching to colour frame buffer device 228x61 [0.182688] simple-framebuffer 3e887000.framebuffer: fb0: simplefb registered! [...] [4.768075] raspberrypi-firmware soc:firmware: Attached to firmware from 2019-02-12 19:42 [...] [ 11.863850] vc4_hdmi 3f902000.hdmi: ASoC: Failed to create component debugfs directory [ 11.921037] vc4_hdmi 3f902000.hdmi: vc4-hdmi-hifi <-> 3f902000.hdmi mapping ok [ 11.945385] vc4_hdmi