Re: [OE-core] [swat] ltp failures on autobuilder

2021-06-16 Thread Richard Purdie
On Wed, 2021-06-16 at 08:56 -0400, Paul Gortmaker wrote:
> [Re: [swat] ltp failures on autobuilder] On 11/06/2021 (Fri 14:19) Richard 
> Purdie wrote:
> 
> > On Fri, 2021-06-11 at 12:36 +0100, Richard Purdie via 
> > lists.yoctoproject.org wrote:
> > > as a .cfg to the kernel and that still reproduced the crash. However:
> > > 
> > > CONFIG_DEBUG_KERNEL=y
> > > CONFIG_CGROUP_DEBUG=y
> > > CONFIG_SCHED_DEBUG=y
> > > CONFIG_DEBUG_PREEMPT=y
> > > # CONFIG_RCU_TRACE is not set
> > > # CONFIG_X86_DEBUG_FPU is not set
> > > # CONFIG_CONSOLE_POLL is not set
> > > # CONFIG_DEBUG_INFO is not set
> > > # CONFIG_KGDB is not set
> > > # CONFIG_KGDB_HONOUR_BLOCKLIST is not set
> > > # CONFIG_KGDB_SERIAL_CONSOLE is not set
> > > # CONFIG_KGDB_LOW_LEVEL_TRAP is not set
> > > # CONFIG_KGDB_KDB is not set
> > > # CONFIG_KDB_KEYBOARD is not set
> > > # CONFIG_DEBUG_MISC is not set
> > > 
> > 
> > Isolated down to CONFIG_SCHED_DEBUG=y being the line which somehow "fixes" 
> > the crash. I can enable all the above apart from that and we can reproduce
> > it.
> > 
> > Also, I changed gatesgarth to use qemu 5.2.0 copied in from hardknott and 
> > that
> > breaks it. Dropping the 27 CVE patches "fixes" it again. It is possible it
> > is one of the CVE fixes. Continuing to try and isolate.
> 
> For the mail archive trail, and for those not follwing the ongoing
> research on IRC, we are hopeful that this fixes it.
> 
> https://lore.kernel.org/lkml/20210616125157.438837-1-paul.gortma...@windriver.com/

Awesome work in tracking that down, much appreciated, thanks!

Curious what upstream will make of it now...

Cheers,

Richard


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#153017): 
https://lists.openembedded.org/g/openembedded-core/message/153017
Mute This Topic: https://lists.openembedded.org/mt/83466238/21656
Group Owner: openembedded-core+ow...@lists.openembedded.org
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [OE-core] [swat] ltp failures on autobuilder

2021-06-16 Thread Paul Gortmaker
[Re: [swat] ltp failures on autobuilder] On 11/06/2021 (Fri 14:19) Richard 
Purdie wrote:

> On Fri, 2021-06-11 at 12:36 +0100, Richard Purdie via lists.yoctoproject.org 
> wrote:
> > as a .cfg to the kernel and that still reproduced the crash. However:
> > 
> > CONFIG_DEBUG_KERNEL=y
> > CONFIG_CGROUP_DEBUG=y
> > CONFIG_SCHED_DEBUG=y
> > CONFIG_DEBUG_PREEMPT=y
> > # CONFIG_RCU_TRACE is not set
> > # CONFIG_X86_DEBUG_FPU is not set
> > # CONFIG_CONSOLE_POLL is not set
> > # CONFIG_DEBUG_INFO is not set
> > # CONFIG_KGDB is not set
> > # CONFIG_KGDB_HONOUR_BLOCKLIST is not set
> > # CONFIG_KGDB_SERIAL_CONSOLE is not set
> > # CONFIG_KGDB_LOW_LEVEL_TRAP is not set
> > # CONFIG_KGDB_KDB is not set
> > # CONFIG_KDB_KEYBOARD is not set
> > # CONFIG_DEBUG_MISC is not set
> > 
> 
> Isolated down to CONFIG_SCHED_DEBUG=y being the line which somehow "fixes" 
> the crash. I can enable all the above apart from that and we can reproduce
> it.
> 
> Also, I changed gatesgarth to use qemu 5.2.0 copied in from hardknott and that
> breaks it. Dropping the 27 CVE patches "fixes" it again. It is possible it
> is one of the CVE fixes. Continuing to try and isolate.

For the mail archive trail, and for those not follwing the ongoing
research on IRC, we are hopeful that this fixes it.

https://lore.kernel.org/lkml/20210616125157.438837-1-paul.gortma...@windriver.com/

Paul.
--

> 
> Cheers,
> 
> Richard
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#153007): 
https://lists.openembedded.org/g/openembedded-core/message/153007
Mute This Topic: https://lists.openembedded.org/mt/83466238/21656
Group Owner: openembedded-core+ow...@lists.openembedded.org
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [OE-core] [swat] ltp failures on autobuilder

2021-06-11 Thread Richard Purdie
On Fri, 2021-06-11 at 12:36 +0100, Richard Purdie via lists.yoctoproject.org 
wrote:
> as a .cfg to the kernel and that still reproduced the crash. However:
> 
> CONFIG_DEBUG_KERNEL=y
> CONFIG_CGROUP_DEBUG=y
> CONFIG_SCHED_DEBUG=y
> CONFIG_DEBUG_PREEMPT=y
> # CONFIG_RCU_TRACE is not set
> # CONFIG_X86_DEBUG_FPU is not set
> # CONFIG_CONSOLE_POLL is not set
> # CONFIG_DEBUG_INFO is not set
> # CONFIG_KGDB is not set
> # CONFIG_KGDB_HONOUR_BLOCKLIST is not set
> # CONFIG_KGDB_SERIAL_CONSOLE is not set
> # CONFIG_KGDB_LOW_LEVEL_TRAP is not set
> # CONFIG_KGDB_KDB is not set
> # CONFIG_KDB_KEYBOARD is not set
> # CONFIG_DEBUG_MISC is not set
> 

Isolated down to CONFIG_SCHED_DEBUG=y being the line which somehow "fixes" 
the crash. I can enable all the above apart from that and we can reproduce
it.

Also, I changed gatesgarth to use qemu 5.2.0 copied in from hardknott and that
breaks it. Dropping the 27 CVE patches "fixes" it again. It is possible it
is one of the CVE fixes. Continuing to try and isolate.

Cheers,

Richard


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#152868): 
https://lists.openembedded.org/g/openembedded-core/message/152868
Mute This Topic: https://lists.openembedded.org/mt/83466238/21656
Group Owner: openembedded-core+ow...@lists.openembedded.org
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [OE-core] [swat] ltp failures on autobuilder

2021-06-11 Thread Richard Purdie
On Thu, 2021-06-10 at 18:02 +0100, Richard Purdie via lists.yoctoproject.org 
wrote:
> Noting down what we know about the ltp issue:
> 
> We've seen intermittent issues on the autobuilder where some ltp tests fail 
> or 
> hang. I've been trying to figure out how to reproduce the issue and narrow 
> down
> the cause.
> 
> I was able to isolate a patch which reproduces the issue for me:
> 
> http://git.yoctoproject.org/cgit.cgi/poky-contrib/commit/?h=rpurdie/t222=d7d65aae104caa03afc28837b0abe0b486d5a8b8
> 
> with master-next, setting:
> 
> IMAGE_INSTALL_append = ' ltp' 
> TEST_SUITES = 'ping ssh ltp' 

also:

IMAGE_CLASSES += "testimage"
QEMU_USE_KVM_qemux86-64 = "True"


> then 
> 
> bitbake core-image-sato; bitbake core-image-sato -c testimage
> 
> where the issue shows up as a kernel "BUG:" in the logs in 
> WORKDIR/testimage/qemu_*
> 
> The above patch runs the minimum of ltp tests I could find which replicate 
> the issue.
> 
> I've reproduced this on 5.10.1 -> 5.10.42, 5.4.123 and 5.13-rc5.
> (and we've ruled out linux-yocto with plain kernels)
> Also reproduced on both qemu 6.0.0 and 5.2.0.
> 
> My build machine is an Ubuntu 20.04.2 LTS with:
> Linux version 5.4.0-74-generic (buildd@lgw01-amd64-038) (gcc version 9.3.0 
> (Ubuntu 9.3.0-17ubuntu1~20.04)) #83-Ubuntu SMP Sat May 8 02:35:39 UTC 2021

Good news (for me) is that Randy and Paul can now reproduce this with the above 
additional key pieces of config.

We have confirmed that the issue is present:

* with gcc 11.1.1 and 10.3
* in hardknott
* if QB_SMP is disabled (i.e. in a single processor qemu)
* on 18.04, 20.04 and 21.04 Ubuntu host distros which have varying 5.4 and 5.11 
  host kernels

I was not able to make the bug appear with in gatesgarth as yet 
(gcc 10.2, 5.8 kernel, qemu 5.1.0) (had to hack -b /dev/null to the ltp 
commandline)

I did backport the qemu platform, smp and qemu commandline changes back to
gatesgarth and it still doesn't crash.

I also found that setting CONFIG_DEBUG_KERNEL makes the issue 'go away'. 
Since that is a large hammer, I tried:

CONFIG_DEBUG_KERNEL=y
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_SCHED_DEBUG is not set
# CONFIG_DEBUG_PREEMPT is not set
# CONFIG_RCU_TRACE is not set
# CONFIG_X86_DEBUG_FPU is not set
# CONFIG_CONSOLE_POLL is not set
# CONFIG_DEBUG_INFO is not set
# CONFIG_KGDB is not set
# CONFIG_KGDB_HONOUR_BLOCKLIST is not set
# CONFIG_KGDB_SERIAL_CONSOLE is not set
# CONFIG_KGDB_LOW_LEVEL_TRAP is not set
# CONFIG_KGDB_KDB is not set
# CONFIG_KDB_KEYBOARD is not set
# CONFIG_DEBUG_MISC is not set

as a .cfg to the kernel and that still reproduced the crash. However:

CONFIG_DEBUG_KERNEL=y
CONFIG_CGROUP_DEBUG=y
CONFIG_SCHED_DEBUG=y
CONFIG_DEBUG_PREEMPT=y
# CONFIG_RCU_TRACE is not set
# CONFIG_X86_DEBUG_FPU is not set
# CONFIG_CONSOLE_POLL is not set
# CONFIG_DEBUG_INFO is not set
# CONFIG_KGDB is not set
# CONFIG_KGDB_HONOUR_BLOCKLIST is not set
# CONFIG_KGDB_SERIAL_CONSOLE is not set
# CONFIG_KGDB_LOW_LEVEL_TRAP is not set
# CONFIG_KGDB_KDB is not set
# CONFIG_KDB_KEYBOARD is not set
# CONFIG_DEBUG_MISC is not set

doesn't seem to want to reproduce the crash so something about
those three options seems to make things 'work'.

What does that all mean? No idea.

Cheers,

Richard





-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#152866): 
https://lists.openembedded.org/g/openembedded-core/message/152866
Mute This Topic: https://lists.openembedded.org/mt/83466238/21656
Group Owner: openembedded-core+ow...@lists.openembedded.org
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-