On Sat, 2023-10-07 at 23:05 +0100, Richard Purdie via
lists.openembedded.org wrote:
> I thought I'd summarise where things are at with the 6.5 kernel.
> 
> We've fixed:
> * the ARM LTP OOM lockup (kernel patch)
> * the locale ARM selftest failure which was OOM due to silly buffer 
>   allocations in 6.5 (kernel commandline)
> * the ARM jitterentropy errors (kernel patch)
> * the cryptodev build failures (recipe updated)
> 
> We've also:
> * disabled the strace tests that fail with 6.5.
> * made sure the serial ports and getty counts match
> * added ttyrun which wraps serial consoles and avoids hacks
> * made the qemurunner logging save all the port logs
> * made the qemurunner write the binary data it is sent verbatim
> * made sure to use nodelay on qemu's tcpserial
> 
> This leaves an annoying serial console problem where ttyS1 never has
> the getty login prompt appear.
> 
> What we know:
> 
> * We've only seen this on x86 more recently (yesterday/today) but have
> seen it on ARM in the days before that.
> 
> * It affects both sysvinit and systemd images.
> 
> * Systemd does print that it started a getty on ttyS0 and ttyS1 when
> the failure occurs
> 
> * There is a getty running according to "ps" when the failure occurs
> 
> * There are only ever one or three characters received to ttyS1 in the
> failure case (0x0d and 0x0a chars, i.e. CR and LF)
> 
> * It can't be any kind of utf-8 conversion issue since the login prompt
> isn't visible in the binary log
> 
> * the kernel boot logs do show the serial port created with the same
> ioport and irq on x86.
> 
> Previously we did see some logs with timing issues on the ttyS0 port
> but the nodelay parameter may have helped with that.
> 
> There are debug patches in master-next against qemurunner which try and
> poke around to gather more debug when things fail using ttyS0.
> 
> The best failure log we have is now this one:
> 
> https://autobuilder.yoctoproject.org/typhoon/#/builders/79/builds/5874/steps/14/logs/stdio
> 
> where I've saved the logs:
> 
> https://autobuilder.yocto.io/pub/failed-builds-data/6.5%20kernel/j/qemu_boot_log.20231007084853
> and
> https://autobuilder.yocto.io/pub/failed-builds-data/6.5%20kernel/j/qemu_boot_log.20231007084853.2
> 
> You can see ttyS1 times out after 1000 seconds and the port only has a
> single byte (in the .2 file). The other log shows ps output showing the
> getty running for ttyS1.
> 
> Ideas welcome on where from here. 
> 
> I've tweaked master-next to keep reading the ttyS1 port after we poke
> it from ttyS0 to see if that reveals anything next time it fails (build
> running).

Testing overnight with the new debug yielded:

https://autobuilder.yoctoproject.org/typhoon/#/builders/87/builds/5895/steps/14/logs/stdio

The interesting bit being:

"""
WARNING: core-image-full-cmdline-1.0-r0 do_testimage: Extra read data: 
Poky (Yocto Project Reference Distro) 4.2+snapshot-
7cb4ffbd8380b0509d7fac9191095379af321686 qemux86-64 ttyS1

qemux86-64 login: helloA

Poky (Yocto Project Reference Distro) 4.2+snapshot-
7cb4ffbd8380b0509d7fac9191095379af321686 qemux86-64 ttyS1
qemux86-64 login: 

"""

i.e. the getty didn't appear in 1000s but sometime in shutdown the
original prompt, the "helloA" and the new getty prompt did.

So the data *is* there but stuck in a buffer somehow. Kernel or qemu
side, I don't know.

Cheers,

Richard




-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#188821): 
https://lists.openembedded.org/g/openembedded-core/message/188821
Mute This Topic: https://lists.openembedded.org/mt/101824562/21656
Group Owner: openembedded-core+ow...@lists.openembedded.org
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to