On Sat, 2023-10-07 at 23:05 +0100, Richard Purdie via lists.openembedded.org wrote: > I thought I'd summarise where things are at with the 6.5 kernel. > > We've fixed: > * the ARM LTP OOM lockup (kernel patch) > * the locale ARM selftest failure which was OOM due to silly buffer > allocations in 6.5 (kernel commandline) > * the ARM jitterentropy errors (kernel patch) > * the cryptodev build failures (recipe updated) > > We've also: > * disabled the strace tests that fail with 6.5. > * made sure the serial ports and getty counts match > * added ttyrun which wraps serial consoles and avoids hacks > * made the qemurunner logging save all the port logs > * made the qemurunner write the binary data it is sent verbatim > * made sure to use nodelay on qemu's tcpserial > > This leaves an annoying serial console problem where ttyS1 never has > the getty login prompt appear. > > What we know: > > * We've only seen this on x86 more recently (yesterday/today) but have > seen it on ARM in the days before that. > > * It affects both sysvinit and systemd images. > > * Systemd does print that it started a getty on ttyS0 and ttyS1 when > the failure occurs > > * There is a getty running according to "ps" when the failure occurs > > * There are only ever one or three characters received to ttyS1 in the > failure case (0x0d and 0x0a chars, i.e. CR and LF) > > * It can't be any kind of utf-8 conversion issue since the login prompt > isn't visible in the binary log > > * the kernel boot logs do show the serial port created with the same > ioport and irq on x86. > > Previously we did see some logs with timing issues on the ttyS0 port > but the nodelay parameter may have helped with that. > > There are debug patches in master-next against qemurunner which try and > poke around to gather more debug when things fail using ttyS0. > > The best failure log we have is now this one: > > https://autobuilder.yoctoproject.org/typhoon/#/builders/79/builds/5874/steps/14/logs/stdio > > where I've saved the logs: > > https://autobuilder.yocto.io/pub/failed-builds-data/6.5%20kernel/j/qemu_boot_log.20231007084853 > and > https://autobuilder.yocto.io/pub/failed-builds-data/6.5%20kernel/j/qemu_boot_log.20231007084853.2 > > You can see ttyS1 times out after 1000 seconds and the port only has a > single byte (in the .2 file). The other log shows ps output showing the > getty running for ttyS1. > > Ideas welcome on where from here. > > I've tweaked master-next to keep reading the ttyS1 port after we poke > it from ttyS0 to see if that reveals anything next time it fails (build > running).
Testing overnight with the new debug yielded: https://autobuilder.yoctoproject.org/typhoon/#/builders/87/builds/5895/steps/14/logs/stdio The interesting bit being: """ WARNING: core-image-full-cmdline-1.0-r0 do_testimage: Extra read data: Poky (Yocto Project Reference Distro) 4.2+snapshot- 7cb4ffbd8380b0509d7fac9191095379af321686 qemux86-64 ttyS1 qemux86-64 login: helloA Poky (Yocto Project Reference Distro) 4.2+snapshot- 7cb4ffbd8380b0509d7fac9191095379af321686 qemux86-64 ttyS1 qemux86-64 login: """ i.e. the getty didn't appear in 1000s but sometime in shutdown the original prompt, the "helloA" and the new getty prompt did. So the data *is* there but stuck in a buffer somehow. Kernel or qemu side, I don't know. Cheers, Richard
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#188821): https://lists.openembedded.org/g/openembedded-core/message/188821 Mute This Topic: https://lists.openembedded.org/mt/101824562/21656 Group Owner: openembedded-core+ow...@lists.openembedded.org Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-