Bug#925334: vc4: CMA fills up and screen not updated anymore on raspi3

2019-04-03 Thread Fabian Pietsch
Hi,


* Fabian Pietsch (Wed, 03 Apr 2019 12:30:01 +0200):
> As suggested, I tested more compact cmdline.txt, though:
> 
> | root=/dev/mmcblk0p2
> 
> | console=tty0 root=/dev/mmcblk0p2
> 
> | console=tty0 root=/dev/mmcblk0p2 cma=128M
> 
> With the first two, cma defaulted to 64M and already the lightdm login
> screen stops updating after a few seconds to minutes. With the 3rd,
> the bug initially didn't happen so I used X a bit, then logged out and
> let the system fall idle; the bug then seems to have happened
> 9868 seconds after boot (according to dmesg --follow).

Another round (system mostly idle except for manually renaming enx[...]
to eth0 after boot and restarting wicd, which came with lxde meta-package
to manage the networking, to get a correct system time via NTP)  ->
3960 seconds after boot (in dmesg). That seems to suggest to me that
the cmdline.txt built by raspi3-firmware was not the issue, here.
Still don't know whether it's possible or sensible to disable the
"weird" initial cmdline passed by the firmware, though.

In any case, the tile binning error ...

| kernel: vc4_v3d 3fc0.v3d: Failed to allocate memory for tile binning: 
-12. You may need to enable CMA or give it more memory.

... together with a preceding, e.g., ...

| kernel: [drm:vc4_bo_create [vc4]] *ERROR* Failed to allocate from CMA:
| kernel: [drm]V3D:  23468kb BOs (10)
| kernel: [drm] V3D shader:144kb BOs (35)
| kernel: [drm]   dumb:   8148kb BOs (4)

... is usually surrounded (in journal) by nothing but many:

| kernel: alloc_contig_range: [36400, 37500) PFNs busy

What I'm trying to say is that there seems to be no log-noticeable
concurrent activity going on.  Watching the CMA use via /proc/meminfo
suggests that it's much more than half free, most of the time.
The tile binning error seems to be entirely random, at this point.


Looking at the source[1], vc4_allocate_bin_bo() seems to use a strategy
of successively allocating memory until it randomly finds a block that
fits certain requirements. Maybe randomly sometimes there is no such
block available, leading to it failing with -ENOMEM. It's not clear
to me, though, when and why the function is called, seemingly randomly
on an idle system. It reads to me as an initialization, not something
that is randomly repeated. (?)

[1] 
https://sources.debian.org/src/linux/4.19.28-2/drivers/gpu/drm/vc4/vc4_v3d.c/#L244


Again, it would be nice if the resulting device error state could
somehow be reset / the function retried with more/different CMA free
at a later point during the same boot. Perhaps that's already possible
(maybe even in a general way?) but I don't know how.

(Trying to unbind the driver (vc4_v3d) via sysfs led to a kernel oops
(IIRC paging fault?), and an attempt to bind it again was rejected
without any noticeable action.)


Regards, Fabian

-- 
Fabian "canvon" Pietsch - https://www.canvon.de/



Bug#925334: vc4: CMA fills up and screen not updated anymore on raspi3

2019-04-03 Thread Fabian Pietsch
Hi,

* Romain Perier (Wed, 27 Mar 2019 21:33:19 +0100):
> Return-Path: 
> Message-ID: <20190327203319.ga15...@debby.home>
> From: Romain Perier 
> Subject: Re: Bug#925334: vc4: CMA fills up and screen not updated anymore
>  on raspi3
> To: Fabian Pietsch , 925...@bugs.debian.org
> User-Agent: Mutt/1.10.1 (2018-07-13)
> 
> On Wed, Mar 27, 2019 at 03:29:21PM +0100, Fabian Pietsch wrote:

[...]

> > the bug is still present in the current version. It took a freshly
> > booted, idle system 3304 seconds for the bug to occur, though,
> > so this time rather an hour (than 10-30min as stated before).

[...]

> > So vc4_v3d which reported the error seems to be in status=error
> > and counts as suspended. Manual attempts to get it to resume again,
> > now that more cma is free again, failed, but there are probably ways
> > I don't know about.

[...]

> Could you test, two stuffs for me ? :
> 
> 1. The VC4 driver seems to use runtime pm operations, could you try to
> disable runtime suspend completly (there are kernel parameters for this
> if I remember correctly) ?

Didn't find anything useful to that regard in linux-doc Documentation/
and by googling. Could you please tell me how?

> 2. Your kernel cmdline are... weird, could you try with minimalistic
> kernel cmdline ? Only keep console= for logging to uart and/or to your
> tty0 and keep your rootfs.

The initial part until the two spaces seems to be auto-generated by the
firmware at boot, so I guess I can't change or suppress it; if it should
be possible, again, please tell me how.

My kernel cmdline as passed *to* the firmware is:

| fabian@rpi3:~$ cat /boot/firmware/cmdline.txt
| console=tty0 console=ttyS1,115200 root=/dev/mmcblk0p2 rw elevator=deadline 
fsck.repair=yes net.ifnames=0 cma=128M rootwait

This is just auto-generated by the raspi3-firmware package to which
you (Romain Perier) seem to be contributing.

As suggested, I tested more compact cmdline.txt, though:

| root=/dev/mmcblk0p2

| console=tty0 root=/dev/mmcblk0p2

| console=tty0 root=/dev/mmcblk0p2 cma=128M

With the first two, cma defaulted to 64M and already the lightdm login
screen stops updating after a few seconds to minutes. With the 3rd,
the bug initially didn't happen so I used X a bit, then logged out and
let the system fall idle; the bug then seems to have happened
9868 seconds after boot (according to dmesg --follow).

Regards, Fabian

-- 
Fabian "canvon" Pietsch - https://www.canvon.de/



Bug#925334: vc4: CMA fills up and screen not updated anymore on raspi3

2019-03-27 Thread Romain Perier
On Wed, Mar 27, 2019 at 03:29:21PM +0100, Fabian Pietsch wrote:
> Package: src:linux
> Version: 4.19.28-2
> Followup-For: Bug #925334
> 
> Dear Maintainer,
> 
> the bug is still present in the current version. It took a freshly
> booted, idle system 3304 seconds for the bug to occur, though,
> so this time rather an hour (than 10-30min as stated before).
> 
> After the bug has happened, I get the following information from sysfs:
> 
> | root@rpi3:/sys/bus/platform/drivers/vc4_v3d# for I in enabled status 
> active_time suspended_time; do echo -n "$I=$(cat 
> 3fc0.v3d/power/runtime_$I) "; done; echo
> | enabled=enabled status=error active_time=16800 suspended_time=55464216
> 
> And again:
> 
> | enabled=enabled status=error active_time=16800 suspended_time=55479160
> 
> So vc4_v3d which reported the error seems to be in status=error
> and counts as suspended. Manual attempts to get it to resume again,
> now that more cma is free again, failed, but there are probably ways
> I don't know about.
> 
> "echo on > .../power/control" changed runtime_enabled=forbidden
> (and echo auto > control changed it back to runtime_enabled=enabled),
> but runtime_status remained at "error".
> 
> Regards, Fabian
> 
> 
> -- System Information:
> Debian Release: buster/sid
>   APT prefers testing
>   APT policy: (500, 'testing')
> Architecture: arm64 (aarch64)
> 
> Kernel: Linux 4.19.0-4-arm64 (SMP w/4 CPU cores)
> Kernel taint flags: TAINT_CRAP
> Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), 
> LANGUAGE=en_GB.UTF-8 (charmap=UTF-8)
> Shell: /bin/sh linked to /usr/bin/dash
> Init: systemd (via /run/systemd/system)
> LSM: AppArmor: enabled

Hi,

As discussed on IRC, thanks for your bug report and for the refresh.

Could you test, two stuffs for me ? :

1. The VC4 driver seems to use runtime pm operations, could you try to
disable runtime suspend completly (there are kernel parameters for this
if I remember correctly) ?

2. Your kernel cmdline are... weird, could you try with minimalistic
kernel cmdline ? Only keep console= for logging to uart and/or to your
tty0 and keep your rootfs.


Thanks,
Romain


signature.asc
Description: PGP signature


Bug#925334: vc4: CMA fills up and screen not updated anymore on raspi3

2019-03-27 Thread Fabian Pietsch
Package: src:linux
Version: 4.19.28-2
Followup-For: Bug #925334

Dear Maintainer,

the bug is still present in the current version. It took a freshly
booted, idle system 3304 seconds for the bug to occur, though,
so this time rather an hour (than 10-30min as stated before).

After the bug has happened, I get the following information from sysfs:

| root@rpi3:/sys/bus/platform/drivers/vc4_v3d# for I in enabled status 
active_time suspended_time; do echo -n "$I=$(cat 3fc0.v3d/power/runtime_$I) 
"; done; echo
| enabled=enabled status=error active_time=16800 suspended_time=55464216

And again:

| enabled=enabled status=error active_time=16800 suspended_time=55479160

So vc4_v3d which reported the error seems to be in status=error
and counts as suspended. Manual attempts to get it to resume again,
now that more cma is free again, failed, but there are probably ways
I don't know about.

"echo on > .../power/control" changed runtime_enabled=forbidden
(and echo auto > control changed it back to runtime_enabled=enabled),
but runtime_status remained at "error".

Regards, Fabian


-- System Information:
Debian Release: buster/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: arm64 (aarch64)

Kernel: Linux 4.19.0-4-arm64 (SMP w/4 CPU cores)
Kernel taint flags: TAINT_CRAP
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), 
LANGUAGE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled



Bug#925334: vc4: CMA fills up and screen not updated anymore on raspi3

2019-03-23 Thread Fabian Pietsch
Package: src:linux
Version: 4.19.16-1
Severity: important
Tags: upstream

Dear Maintainer,

(please also see the additional information in the footnotes below.)

When using the vc4 video driver with Xorg on the Raspberry Pi 3 [raspi3],
it seems to sooner or later fill up the CMA, leading to, e.g., this:

[  739.307920] [drm:vc4_bo_create [vc4]] *ERROR* Failed to allocate from CMA:
[  739.314932] [drm]V3D:  23468kb BOs (10)
[  739.321333] [drm] V3D shader:144kb BOs (35)
[  739.327734] [drm]   dumb:   8148kb BOs (4)

When additionally the following happens, the screen stops updating
until reboot [stops_updating]:

[  739.334049] vc4_v3d 3fc0.v3d: Failed to allocate memory for tile 
binning: -12. You may need to enable CMA or give it more memory.

This happens with cma=64M, cma=128M and cma=256M, though 128M seems to
be the recommended size for vc4 use, according to comments in
package raspi3-firmware's /etc/default/raspi3-firmware.

ii  raspi3-firmware 1.20190215-1 arm64Raspberry Pi 2 and 3 GPU firmware 
and bootloaders

With cma=128M, the error happens on a freshly booted, idle system
running lightdm, after perhaps 10m-30m, or can be provoked by
running a [screensaver].

A work-around is to force Xorg to use driver fbdev, which seems to work
even after the tile binning error which otherwise would have required
a reboot.


I also get a lot of, e.g.:

[  739.135918] alloc_contig_range: [37400, 38400) PFNs busy

I don't know if this is related or a different problem.


Please somehow make it possible (or even automatic) for the screen
update to continue without the need for a reboot. Fixes to CMA handling
so that the tile binning error doesn't even happen anymore would be
even better!

Regards, Fabian

(Please also see the footnotes below.)


[raspi3]
This machine was set up based on a Raspberry Pi 3 Debian buster preview
image, from https://wiki.debian.org/RaspberryPi3 ; it says
unofficial/unsupported, there, I hope this bug report still is ok
as, as far as I understand, the unmodified Debian buster arm64 kernel
seems to be used[1].

[1] Pulled in via apt from official mirror in sources.list:
https://github.com/Debian/raspi3-image-spec/blob/master/raspi3.yaml#L72


[stops_updating] "Stops updating the screen" here means that the
Xorg tty graphical contents don't change anymore, except for the
mouse cursor. The mouse cursor can still be moved around, and even
changes form on, e.g., window borders, and I can resize the terminal
window up one line, and the cursor change on mouseover reflects
the change; just nothing changes to the non-mousecursor visual picture.

Also, tasking into console works fine, and tasking back into the Xorg
tty restores the previous visual picture, and tasking back into the
console again is also still possible. So no real "freeze" or hard crash,
here, except that, after the tile binning error in dmesg, the Xorg
tty graphical contents don't change (except for mouse cursor).


[screensaver] The error can be provoked by running the following
for some time:

  /usr/lib/xscreensaver/tessellimage -root -fill-screen

This leads to a slightly different pattern of dmesg messages, though.
The thing in common is that the screen stops updating, there, too,
when the tile binning error quoted above happens.

When provoked in this way, "grep ^CmaFree: /proc/meminfo" in a loop
said 25564 kB directly before a tile binning error message, 7160 kB
directly after, with values between 8000-1 kB after, with
recovery to 81740 kB after exiting tessellimage. But the screen update
stays stopped.


Additional dmesg output:

[0.00] Booting Linux on physical CPU 0x00 [0x410fd034]
[0.00] Linux version 4.19.0-2-arm64 (debian-ker...@lists.debian.org) 
(gcc version 8.2.0 (Debian 8.2.0-14)) #1 SMP Debian 4.19.16-1 (2019-01-17)
[0.00] Machine model: Raspberry Pi 3 Model B Rev 1.2
[...]
[0.00] cma: Reserved 128 MiB at 0x3340
[...]
[0.00] Memory: 780176K/970752K available (8700K kernel code, 1600K 
rwdata, 2788K rodata, 4864K init, 530K bss, 59504K reserved, 131072K 
cma-reserved)
[...]
[0.000193] Console: colour dummy device 80x25
[0.000682] console [tty0] enabled
[...]
[0.141070] simple-framebuffer 3e887000.framebuffer: framebuffer at 
0x3e887000, 0x373800 bytes, mapped to 0x(ptrval)
[0.141106] simple-framebuffer 3e887000.framebuffer: format=r5g6b5, 
mode=1824x984x16, linelength=3648
[0.162242] Console: switching to colour frame buffer device 228x61
[0.182688] simple-framebuffer 3e887000.framebuffer: fb0: simplefb 
registered!
[...]
[4.768075] raspberrypi-firmware soc:firmware: Attached to firmware from 
2019-02-12 19:42
[...]
[   11.863850] vc4_hdmi 3f902000.hdmi: ASoC: Failed to create component debugfs 
directory
[   11.921037] vc4_hdmi 3f902000.hdmi: vc4-hdmi-hifi <-> 3f902000.hdmi mapping 
ok
[   11.945385] vc4_hdmi