[Re: [OE-core] Summary of the remaining 6.5 kernel serial issue (and 6.5 
summary)] On 10/10/2023 (Tue 21:53) Richard Purdie via lists.openembedded.org 
wrote:

> On Tue, 2023-10-10 at 21:37 +0100, Richard Purdie via
> lists.openembedded.org wrote:
> > I've been asked how to reproduce this setup more easily outside of
> > OEQA. I hacked this up:
> > 
> > https://www.rpsys.net/wp/rp/simpleqemu.tgz
> > 
> > extract it, hack the paths in commands.py to point at a qemux86-64
> > build of core-image-minimal and then "./commands" should boot an qemu
> > image using the system's qemu binary as it would during the QA tests,
> > setting up the two serial ports as it would during tests.
> > 
> > Obviously I've just brutally hacked this out of the QA code so there
> > are plenty of sharp edges. It should allow the serial setup to be
> > replicated and experimented with more easily though without bitbake
> > involved. It could be cut down a lot more too.
> 
> and set runqemuparams = 'nographic' in commands.py if you can't cope
> with graphics :).

So I think we are onto something here.  Let me rewind a bit to provide
a bit of context for those who didn't follow the IRC discussion.

The OE/QA is running qemu with the "-serial tcp:<IP>:<port>" option and
binds to that directly through python.  In theory, one should also be
able to use netcat/socat (or even telnet) to bind to the qemu TCP via
manual command line testing without the autobuilder/OEQA python.

But I couldn't get that to work without getting "broken" echo behaviour
and similar tty/terminal odd characters displayed.  As that was also
observed on v6.1 and mickledore, I didn't want to go off on a wild goose
chase after something that probably has nothing to do with the issue at
hand here.

Hence leading to what Richard created above.  The basic premise is that
similar to the AB output, we are looking for log files ending in ".2"
that are from ttyS1 -- that are smaller than normal - indicating that a
getty wasn't able to get the login banner out successfully.  Now the results.

After 300 runs on mickledore with v6.1, all the .2 files were the same
size - 477 bytes.

After 155 runs on today's master with v6.5 kernel, I got two runs where
instead of 477 bytes, they were 345 bytes.

dot-two$ls -l1|wc -l
155
dot-two$ls -l1| grep -v 477
total 616
-rw-r--r-- 1 pgortmak users 345 Oct 10 17:54 qemu_boot_log.20231010175351.2
-rw-r--r-- 1 pgortmak users 345 Oct 10 18:43 qemu_boot_log.20231010184307.2

The difference?  The "short" files ended with "Starting crond: OK" and
were missing the login banner:
   --------
   Poky (Yocto Project Reference Distro) 
4.2+snapshot-0fb1ff0ffb527a6364cef7159c0d9dcb879e8dd9 qemux86-64 ttyS1

   qemux86-64 login:
   --------

Similarly, for the ttyS0 logs, if we strip off the printk.time prefixes
thus allowing us to diff the corresponding "outlier" case against any
one of the 153 "normal" cases, we see that this line is missing:

Last login: Tue Oct 10 22:40:31 +0000 2023 on /dev/ttyS1.

...which makes sense, since the login never appeared on ttyS1 for OEQA
to interact with and login to.

(interesting that otherwise, ttyS1 always "wins" the race to update
wtmp/utmp over ttyS0)

Anyway, that works out to approximately a reproduce factor of a dismal
1.5% by my math.  Ugh.  But at least it isn't coupled to a full build!

Finally, using an identical build on master but with PREF VERSION of
v6.4 kernel, I ran 300+ instances and got zero instances of short .2
ttyS1 log files that were missing the getty output.

Note that these runs were using the default qemu v4.x that ships with
ubuntu-20.04 ; I've not (yet?) run similar tests with our own more
up-to-date qemu build from the sysroot.  Maybe I don't need to?

Possible conclusions:

The reproducibility seems pretty low, which seems to match what I've
heard with respect to the AB results.  It also seems confined to the
v6.5.x kernel, based on these limited results - matching the AB results.

So maybe we have a reproducer that is independent of OEQA and the
backing of a full build to get there?  I'd like to think so, but I'm
going to run more v6.5+master tests overnight to try and get better
stats - a +/- of just 1 changes my current reproduce # by a large
percentage!


Next steps:

Each run takes approximately 20s, and with a 1.5% success factor, a
minimum of 150 clean runs is needed before one can say "OK, *maybe* this
kernel currently under test is OK.  That translates to over an hour to
get a data point you have some faith in as being "good".  [Still a heck
of a lot better than doing 150 *builds* though!]

On the other hand, just one instance of missing login on ttyS1 is
concretely "bad" and hence the test run can be stopped as soon as one
instance of that is seen.

This "good is really maybe perhaps good, but bad is for sure bad" type
of bisects that I love to hate.

Since the data points are "expensive", I probably won't just attempt a
mindless brute force bisect.  I'd probably do something more like...

--ensure modules (or lack of) don't impact the results, so I don't have
  to rebundle rootfs images.

--maybe see if I can build with host gcc vs. using devshell/sysroot gcc
  and still see the issue.  Although really, kernel build time optimization
  won't be the bottleneck in this particular case, so not worthwhile?

--check vanilla v6.5 (i.e. v6.5.0 -- no linux-stable content) so I know
  whether to dig around in v6.5.0 --> v6.5.5, or dig around in the
  much much larger v6.4 --> v6.5 additions.  I expect it to be the
  latter, i.e. digging through mainline, but who knows.

--assuming mainline introduced it, instead of a brute force bisect, I'd
  look for merges by Linus from gregKH pull requests for the tty and
  serial subsystems in the merge window (v6.4 ---> v6.5-rc1) and test
  both sides of that/those merge(s).

--if things haven't completely fallen apart by that point, I'd be left
  with a relatively small set of subsystem specific commits from a single
  pull request which then could be bisected.

Sometimes in the past, when I've got down to that small set of commits,
the culprit almost jumps out at you when scanning over the shortlogs.

Anyway, that is where my thoughts are at.  But of course we don't even
yet know if this "test" is even accurately reflecting the OEQA/AB issue!
So nobody should get their hopes up too early.

As a side note, and something that kind of was briefly discussed on IRC,
it seems to make sense to someday informally "support" the idea of
developers being able to use a chunk of the OEQA like this in a
stand-alone mode - since we always seem to be chasing issues that only
show up in AB output - with people struggling to reproduce elsewhere.

Paul.
--
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#188941): 
https://lists.openembedded.org/g/openembedded-core/message/188941
Mute This Topic: https://lists.openembedded.org/mt/101824562/21656
Group Owner: openembedded-core+ow...@lists.openembedded.org
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to