On Tue, 2023-08-22 at 22:08 +0100, Richard Purdie via
lists.openembedded.org wrote:
> On Tue, 2023-08-22 at 15:49 +0100, Richard Purdie via
> lists.openembedded.org wrote:
> > 03b2c470a136a83a9961a2a855cde59498361598 shows as broken
> > 
> > deda0761dc6161f03278da4679d96d4727992e91 doesn't seem to break
> > 
> > but this doesn't seem to make any sense as the changes are:
> > 
> > kernel-source$ git diff 03b2c470a136a83a9961a2a855cde59498361598 
> > deda0761dc6161f03278da4679d96d4727992e91 | diffstat
> >  arch/arm/boot/dts/iwg20d-q7-common.dtsi               |    2 +-
> >  arch/arm/boot/dts/meson8.dtsi                         |    4 ++--
> >  arch/arm/boot/dts/qcom-apq8074-dragonboard.dts        |    4 ----
> >  arch/arm/boot/dts/stm32mp15xx-dhcor-avenger96.dtsi    |    2 +-
> >  arch/arm/mach-ep93xx/timer-ep93xx.c                   |    3 +--
> >  arch/arm/mach-omap2/board-generic.c                   |    1 -
> >  arch/arm64/boot/dts/mediatek/mt8183-kukui.dtsi        |    4 ----
> >  arch/arm64/boot/dts/qcom/apq8096-ifc6640.dts          |    4 ++--
> >  arch/arm64/boot/dts/qcom/pm7250b.dtsi                 |    1 -
> >  arch/arm64/boot/dts/renesas/ulcb-kf.dtsi              |    3 ++-
> >  arch/arm64/boot/dts/ti/k3-j7200-common-proc-board.dts |   28 
> > ++++++++++++++--------------
> >  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c            |    2 +-
> >  drivers/infiniband/hw/hfi1/ipoib_tx.c                 |    4 ++--
> >  drivers/infiniband/hw/hfi1/mmu_rb.c                   |  101 
> > ++++++++++++++++++++++++++++++++++++++---------------------------------------------------------------
> >  drivers/infiniband/hw/hfi1/mmu_rb.h                   |    3 ---
> >  drivers/infiniband/hw/hfi1/sdma.c                     |   23 
> > +++++------------------
> >  drivers/infiniband/hw/hfi1/sdma.h                     |   47 
> > +++++++++++++++--------------------------------
> >  drivers/infiniband/hw/hfi1/sdma_txreq.h               |    2 --
> >  drivers/infiniband/hw/hfi1/user_sdma.c                |  137 
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------------------------------------
> >  drivers/infiniband/hw/hfi1/user_sdma.h                |    1 +
> >  drivers/infiniband/hw/hfi1/vnic_sdma.c                |    4 ++--
> >  drivers/infiniband/hw/hns/hns_roce_hem.c              |    7 +++----
> >  drivers/infiniband/hw/irdma/uk.c                      |   10 ++++------
> >  drivers/input/misc/pm8941-pwrkey.c                    |   19 
> > ++++---------------
> >  drivers/memory/brcmstb_dpfe.c                         |    4 +---
> >  drivers/soc/fsl/qe/Kconfig                            |    1 -
> >  drivers/video/fbdev/omap/lcd_mipid.c                  |    6 +-----
> >  sound/soc/codecs/es8316.c                             |   23 
> > +++++++++--------------
> >  28 files changed, 191 insertions(+), 259 deletions(-)
> > 
> > 03b2c470a136a83a9961a2a855cde59498361598 Input: pm8941-powerkey - fix 
> > debounce on gen2+ PMICs
> > 421ce97657a84b81ce2cb915e75037e1a356736a arm64: dts: ti: k3-j7200: Fix 
> > physical address of pin
> > 3b4c21804076e461a6453ee4d09872172336aa1d fbdev: omapfb: lcd_mipid: Fix an 
> > error handling path in mipid_spi_probe()
> > 52b04ac85f5f4b485bf658101e464143225e68f9 drm/msm/dpu: set DSC flush bit 
> > correctly at MDP CTL flush register
> > 6878bdd7571827babc1c4c1ff66ea1affe951020 arm64: dts: renesas: ulcb-kf: 
> > Remove flow control for SCIF1
> > 5d14292dba9554881a137039c048a67ddf321395 ARM: dts: iwg20d-q7-common: Fix 
> > backlight pwm specifier
> > 766e0b6f4c9649f126e59c06f100b8581d0773b8 RDMA/hns: Fix hns_roce_table_get 
> > return value
> > b99395ab605fb0570d1e62c9459425ac6fc58d46 IB/hfi1: Fix wrong mmu_node used 
> > for user SDMA packet after invalidate
> > ebec507398e11b1c25ce9fb05fb509878233051c RDMA/irdma: avoid fortify-string 
> > warning in irdma_clr_wqes
> > 750f0a302a10dc2327a6656860a22b7da7251cea soc/fsl/qe: fix usb.c build errors
> > b2194d7dfc95a404990da73ebd394f6f6946a4a0 ARM: dts: meson8: correct uart_B 
> > and uart_C clock references
> > 
> > 863054be8d4d2c9b38985371166a37c0e14111e1 ASoC: es8316: Do not set rate 
> > constraints for unsupported MCLKs
> > 3b575d93020f20f6a711efd8c69bfed26837d694 ASoC: es8316: Increment max value 
> > for ALC Capture Target Volume control
> > c02f27c2950abfed58bd8aa6bf50e79d9cc1fc77 ARM: dts: qcom: 
> > apq8074-dragonboard: Set DMA as remotely controlled
> > 9f79e638d45100dad43c56ad9b47eaff1b98fe9a memory: brcmstb_dpfe: fix testing 
> > array offset after use
> > 09722ac9f1e557ea65098202d0b18336c3f04420 ARM: dts: stm32: Shorten the AV96 
> > HDMI sound card name
> > 666be7fef4d39d67c041460e29ddf2b875101133 arm64: dts: mediatek: mt8183: Add 
> > mediatek,broken-save-restore-fw to kukui
> > 1bdb9751b4c64f5709cb3608fd93a709c4e9b2e1 arm64: dts: qcom: apq8096: fix 
> > fixed regulator name property
> > 75c019119ebcc919e717fbce5552c2a0908405cf arm64: dts: qcom: pm7250b: add 
> > missing spmi-vadc include
> > c63997426da6f24f33ae6caf2423170d5bb80ebb ARM: omap2: fix missing 
> > tick_broadcast() prototype
> > e91ffbd6553348cdd3d04b263f8207d919681fac ARM: ep93xx: fix missing-prototype 
> > warnings
> > 
> > and I can't see how any of that is compiled into qemuppc. Am I missing 
> > something?
> 
> After banging my head against this for hours, I'm not really any
> further forward. With commits prior to
> deda0761dc6161f03278da4679d96d4727992e91 I can't seem to trigger rcu
> stalls. I have seen them on 863054be8d4d2c9b38985371166a37c0e14111e1
> which "isolates" it to the 10 commits above. They're arm or sound or
> memory devices we don't build afaict.
> 
> I did cut kernel-devsrc, lttng-tools, perf and similar from the image
> to reduce rebuild times a bit and the rcu stalls appear with them
> missing. I also cut the systemd, dnf and dnf_runtime tests.
> 
> I have a suspicion that the rcu stalls are "always" there and the
> emulation speed is marginal so some code patterns trigger it, some
> don't. On a loaded autobuilder, it tips the balance to more stalls. The
> more the rcu stalls trigger, the more likely an OOM situation is and
> perhaps we just get unlucky on some loads?
> 
> Whilst I can see the rcu stalls locally, I've not had the patience/time
> to see any hung QA test. I have let a few run through to completion but
> not all, I've been assuming if configure passes, we wouldn't see
> anything interesting later.

I've gone back to the logs of recent failures and it is always a 255
exit code from ssh, not a timeout, e.g.:

core-image-sato/log.do_testimage.20329.20230822154435:DEBUG: [Command returned 
'255' after 235.73 seconds]
core-image-sato/log.do_testimage.20329.20230822154435-DEBUG: Command: dnf 
--repofrompath=oe-testimage-repo-qemuppc,http://192.168.7.1:39265/qemuppc 
--repofrompath=oe-testimage-repo-noarch,http://192.168.7.1:39265/noarch 
--repofrompath=oe-testimage-repo-ppc7400,http://192.168.7.1:39265/ppc7400 
--nogpgcheck install --installroot=/home/root/chroot/test -v -y 
--rpmverbosity=debug busybox
core-image-sato/log.do_testimage.20329.20230822154435-Status: 255 Output:  DNF 
version: 4.16.1
core-image-sato/log.do_testimage.20329.20230822154435-cachedir: 
/home/root/chroot/test/var/cache/dnf
core-image-sato/log.do_testimage.20329.20230822154435-Added 
oe-testimage-repo-qemuppc repo from http://192.168.7.1:39265/qemuppc
core-image-sato/log.do_testimage.20329.20230822154435-Added 
oe-testimage-repo-noarch repo from http://192.168.7.1:39265/noarch



core-image-sato-sdk/log.do_testimage.20325.20230822154435-checking for 
unistd.h... yes
core-image-sato-sdk/log.do_testimage.20325.20230822154435-checking 
minix/config.h usability... no
core-image-sato-sdk/log.do_testimage.20325.20230822154435-checking 
minix/config.h presence... no
core-image-sato-sdk/log.do_testimage.20325.20230822154435-checking for 
minix/config.h... no
core-image-sato-sdk/log.do_testimage.20325.20230822154435-checking whether it 
is safe to define __EXTENSIONS__...
core-image-sato-sdk/log.do_testimage.20325.20230822154435:DEBUG: [Command 
returned '255' after 312.18 seconds]
core-image-sato-sdk/log.do_testimage.20325.20230822154435-DEBUG: Command: cd 
~/buildtest/cpio-2.13; gnu-configize;  ./configure --disable-maintainer-mode
core-image-sato-sdk/log.do_testimage.20325.20230822154435-Status: 255 Output:  
aclocal.m4:17: warning: this file was generated for autoconf 2.69.

so the commands are stopping mid flow for unknown reasons or the ssh
connection fails. I can't tell if this coincides with an rcu stall or
not. Both logs do have rcu stalls in.

After these failures the system does continue to otherwise work
normally and subsequent tests pass.

I wonder if the slow emulation might be causing the networking to
glitch and break the ssh connection.

I'm at a bit of a loss on where from here. 

Cheers,

Richard



-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#186540): 
https://lists.openembedded.org/g/openembedded-core/message/186540
Mute This Topic: https://lists.openembedded.org/mt/100733646/21656
Group Owner: openembedded-core+ow...@lists.openembedded.org
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to