Re: [Avocado-devel] acceptance-system-fedora and acceptance-system-centos failing sporadically with timeout?
Hi all, I will try to give my two cents: On Fri, Feb 05, 2021 at 03:31:40PM -0500, John Snow wrote: > On 2/5/21 11:43 AM, Philippe Mathieu-Daudé wrote: > > Cc'ing Avocado team & John (Python inferior exit delay?). > > > > On 1/28/21 11:10 AM, Thomas Huth wrote: > > > On 28/01/2021 10.45, Claudio Fontana wrote: > > > > > > > > is it just me, or is the CI sometimes failing with timeout? > > > > > > > > Fedora: > > > > https://gitlab.com/hw-claudio/qemu/-/jobs/986936506 > > > > > > I've sent a patch for that issue just yesterday: > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg06852.html > > > > > > > CentOS: > > > > https://gitlab.com/hw-claudio/qemu/-/jobs/980769080 > > > > > > Never seen that one before - if you hit it again, could you please save > > > the artifacts and have a look at the log file in there to see what's > > > exactly the problem? > > > > https://gitlab.com/philmd/qemu/-/jobs/1008007125 > > > > (28/36) > > tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_ppc_mac99: > > ERROR: Test reported status but did not finish (90.09 s) > > > > Attached debug.log. > > > > ¯\_(ツ)_/¯ > > I don't know what "reported status but did not finish" means. > > The debug log looks like it passes, too, so... I don't know that this has > much do with code I maintain yet. I'm sure the Avocado team will find me if > I am wrong :) Afaict, this happen when the process exceeds the deadline to finish. Sometimes the test is finished but the "post test" stage is stucked because of some reason. Maybe setting 'runner.timeout.process_alive' to a higher number could help here: $ avocado config reference | grep process_alive -A 6 runner.timeout.process_alive The amount of time to wait after a test has reported status but the test process has not finished * Default: 60 * Type: But I might be wrong. I know that Cleber was working with this, so probably he could help here too. -- Beraldo
Re: acceptance-system-fedora and acceptance-system-centos failing sporadically with timeout?
On 2/5/21 11:43 AM, Philippe Mathieu-Daudé wrote: Cc'ing Avocado team & John (Python inferior exit delay?). On 1/28/21 11:10 AM, Thomas Huth wrote: On 28/01/2021 10.45, Claudio Fontana wrote: is it just me, or is the CI sometimes failing with timeout? Fedora: https://gitlab.com/hw-claudio/qemu/-/jobs/986936506 I've sent a patch for that issue just yesterday: https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg06852.html CentOS: https://gitlab.com/hw-claudio/qemu/-/jobs/980769080 Never seen that one before - if you hit it again, could you please save the artifacts and have a look at the log file in there to see what's exactly the problem? https://gitlab.com/philmd/qemu/-/jobs/1008007125 (28/36) tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_ppc_mac99: ERROR: Test reported status but did not finish (90.09 s) Attached debug.log. ¯\_(ツ)_/¯ I don't know what "reported status but did not finish" means. The debug log looks like it passes, too, so... I don't know that this has much do with code I maintain yet. I'm sure the Avocado team will find me if I am wrong :) --js
Re: acceptance-system-fedora and acceptance-system-centos failing sporadically with timeout?
On 2/5/21 5:49 PM, Thomas Huth wrote: > On 05/02/2021 17.43, Philippe Mathieu-Daudé wrote: >> Cc'ing Avocado team & John (Python inferior exit delay?). >> >> On 1/28/21 11:10 AM, Thomas Huth wrote: >>> On 28/01/2021 10.45, Claudio Fontana wrote: is it just me, or is the CI sometimes failing with timeout? Fedora: https://gitlab.com/hw-claudio/qemu/-/jobs/986936506 >>> >>> I've sent a patch for that issue just yesterday: >>> >>> https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg06852.html >>> CentOS: https://gitlab.com/hw-claudio/qemu/-/jobs/980769080 >>> >>> Never seen that one before - if you hit it again, could you please save >>> the artifacts and have a look at the log file in there to see what's >>> exactly the problem? >> >> https://gitlab.com/philmd/qemu/-/jobs/1008007125 >> >> (28/36) >> tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_ppc_mac99: >> ERROR: Test reported status but did not finish (90.09 s) >> >> Attached debug.log. > > That's again the failing test on the mac99 machine where I've already > sent a patch for. I'm looking for a log of the failing or1k machine that > Claudio has experienced in the CentOS pipeline. Oh sorry I should have started a new thread instead :/ There is still a problem that "Test reported status but did not finish" and make the CI red, while the debug.log show the test succeeded. This is where I'd like Avocado guys and John feedback.
Re: acceptance-system-fedora and acceptance-system-centos failing sporadically with timeout?
On 05/02/2021 17.43, Philippe Mathieu-Daudé wrote: Cc'ing Avocado team & John (Python inferior exit delay?). On 1/28/21 11:10 AM, Thomas Huth wrote: On 28/01/2021 10.45, Claudio Fontana wrote: is it just me, or is the CI sometimes failing with timeout? Fedora: https://gitlab.com/hw-claudio/qemu/-/jobs/986936506 I've sent a patch for that issue just yesterday: https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg06852.html CentOS: https://gitlab.com/hw-claudio/qemu/-/jobs/980769080 Never seen that one before - if you hit it again, could you please save the artifacts and have a look at the log file in there to see what's exactly the problem? https://gitlab.com/philmd/qemu/-/jobs/1008007125 (28/36) tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_ppc_mac99: ERROR: Test reported status but did not finish (90.09 s) Attached debug.log. That's again the failing test on the mac99 machine where I've already sent a patch for. I'm looking for a log of the failing or1k machine that Claudio has experienced in the CentOS pipeline. Thomas
Re: acceptance-system-fedora and acceptance-system-centos failing sporadically with timeout?
Cc'ing Avocado team & John (Python inferior exit delay?). On 1/28/21 11:10 AM, Thomas Huth wrote: > On 28/01/2021 10.45, Claudio Fontana wrote: >> >> is it just me, or is the CI sometimes failing with timeout? >> >> Fedora: >> https://gitlab.com/hw-claudio/qemu/-/jobs/986936506 > > I've sent a patch for that issue just yesterday: > > https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg06852.html > >> CentOS: >> https://gitlab.com/hw-claudio/qemu/-/jobs/980769080 > > Never seen that one before - if you hit it again, could you please save > the artifacts and have a look at the log file in there to see what's > exactly the problem? https://gitlab.com/philmd/qemu/-/jobs/1008007125 (28/36) tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_ppc_mac99: ERROR: Test reported status but did not finish (90.09 s) Attached debug.log. 14:44:28 DEBUG| PARAMS (key=arch, path=*, default=ppc) => 'ppc' 14:44:28 DEBUG| PARAMS (key=machine, path=*, default=mac99) => 'mac99' 14:44:28 DEBUG| PARAMS (key=qemu_bin, path=*, default=./qemu-system-ppc) => './qemu-system-ppc' 14:44:28 INFO | recording the execution... 14:44:28 DEBUG| VM launch command: './qemu-system-ppc -display none -vga none -chardev socket,id=mon,path=/var/tmp/avo_qemu_sock_ettyrtlx/qemu-725-monitor.sock -mon chardev=mon,mode=control -machine mac99 -chardev socket,id=console,path=/var/tmp/avo_qemu_sock_ettyrtlx/qemu-725-console.sock,server=on,wait=off -serial chardev:console -icount shift=7,rr=record,rrfile=/var/tmp/avocado_tovfx415/avocado_job_i58i8d2a/28-tests_acceptance_replay_kernel.py_ReplayKernelNormal.test_ppc_mac99/replay.bin -kernel /var/tmp/avocado_tovfx415/avocado_job_i58i8d2a/28-tests_acceptance_replay_kernel.py_ReplayKernelNormal.test_ppc_mac99/day15/invaders.elf -append -net none -no-reboot -M graphics=off' 14:44:28 DEBUG| >>> {'execute': 'qmp_capabilities'} 14:44:28 DEBUG| <<< {'return': {}} 14:44:29 DEBUG| >> = 14:44:29 DEBUG| >> OpenBIOS 1.1 [Jul 27 2020 08:14] 14:44:29 DEBUG| >> Configuration device id QEMU version 1 machine id 1 14:44:29 DEBUG| >> CPUs: 1 14:44:29 DEBUG| >> Memory: 128M 14:44:29 DEBUG| >> UUID: ---- 14:44:29 DEBUG| >> CPU type PowerPC,G4 14:44:29 DEBUG| milliseconds isn't unique. 14:44:30 DEBUG| Welcome to OpenBIOS v1.1 built on Jul 27 2020 08:14 14:44:30 DEBUG| >> [ppc] Kernel already loaded (0x0100 + 0x004ed2e4) (initrd 0x + 0x) 14:44:30 DEBUG| >> [ppc] Kernel command line: 14:44:30 DEBUG| >> switching to new context: 14:44:30 DEBUG| OF stdout device is: /pci@f200/mac-io@c/escc@13000/ch-a@13020 14:44:30 DEBUG| Preparing to boot Linux version 4.11.3 (th...@thuth.remote.csb) (gcc version 6.4.0 (Buildroot 2018.05.2) ) #8 Mon Dec 10 12:05:13 CET 2018 14:44:30 DEBUG| Detected machine type: 0400 14:44:30 DEBUG| command line: 14:44:31 DEBUG| memory layout at init: 14:44:31 DEBUG| memory_limit : (16 MB aligned) 14:44:31 DEBUG| alloc_bottom : 014f2000 14:44:31 DEBUG| alloc_top: 0800 14:44:31 DEBUG| alloc_top_hi : 0800 14:44:31 DEBUG| rmo_top : 0800 14:44:31 DEBUG| ram_top : 0800 14:44:31 DEBUG| copying OF device tree... 14:44:31 DEBUG| Building dt strings... 14:44:32 DEBUG| Building dt structure... 14:44:35 DEBUG| Device tree strings 0x014f3000 -> 0x014f34f5 14:44:35 DEBUG| Device tree struct 0x014f4000 -> 0x014f6000 14:44:35 DEBUG| Quiescing Open Firmware ... 14:44:36 DEBUG| Booting Linux via __start() @ 0x0100 ... 14:44:37 DEBUG| Hello World ! 14:44:37 DEBUG| Total memory = 128MB; using 256kB for hash table (at c7fc) 14:44:37 DEBUG| Linux version 4.11.3 (th...@thuth.remote.csb) (gcc version 6.4.0 (Buildroot 2018.05.2) ) #8 Mon Dec 10 12:05:13 CET 2018 14:44:37 DEBUG| Found UniNorth memory controller & host bridge @ 0xf800 revision: 0x07 14:44:37 DEBUG| Mapped at 0xff7c 14:44:37 DEBUG| Found a Keylargo mac-io controller, rev: 0, mapped at 0xff74 14:44:37 DEBUG| PowerMac motherboard: PowerMac G4 AGP Graphics 14:44:37 DEBUG| boot stdout isn't a display ! 14:44:37 DEBUG| Using PowerMac machine description 14:44:37 DEBUG| bootconsole [udbg0] enabled 14:44:37 DEBUG| - 14:44:37 DEBUG| Hash_size = 0x4 14:44:37 DEBUG| phys_mem_size = 0x800 14:44:37 DEBUG| dcache_bsize = 0x20 14:44:37 DEBUG| icache_bsize = 0x20 14:44:37 DEBUG| cpu_features = 0x0020047a 14:44:37 DEBUG| possible= 0x05a6fd7f 14:44:37 DEBUG| always = 0x 14:44:37 DEBUG| cpu_user_features = 0x9c01 0x 14:44:37 DEBUG| mmu_features = 0x0001 14:44:37 DEBUG| Hash = 0xc7fc 14:44:37 DEBUG| Hash_mask = 0xfff 14:44:37 DEBUG| - 14:44:37 DEBUG| Found UniNorth PCI host bridge at 0xf200. Firmware bus number: 0->0 14:44:37 DEBUG| PCI host bridge /pci@f2000
Re: acceptance-system-fedora and acceptance-system-centos failing sporadically with timeout?
On 1/28/21 11:10 AM, Thomas Huth wrote: > On 28/01/2021 10.45, Claudio Fontana wrote: >> Hi, >> >> is it just me, or is the CI sometimes failing with timeout? >> >> One nice feature that cirrus and travis have is the ability to relaunch one >> specific test, >> do you know if there is some way to do it in gitlab too? >> >> I could not find it.. >> >> Fedora: >> https://gitlab.com/hw-claudio/qemu/-/jobs/986936506 > > I've sent a patch for that issue just yesterday: > > https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg06852.html > >> CentOS: >> https://gitlab.com/hw-claudio/qemu/-/jobs/980769080 > > Never seen that one before - if you hit it again, could you please save the > artifacts and have a look at the log file in there to see what's exactly the > problem? > > Thanks >Thomas > Hello Thomas, will do! Ciao, Claudio
Re: acceptance-system-fedora and acceptance-system-centos failing sporadically with timeout?
On 28/01/2021 10.45, Claudio Fontana wrote: Hi, is it just me, or is the CI sometimes failing with timeout? One nice feature that cirrus and travis have is the ability to relaunch one specific test, do you know if there is some way to do it in gitlab too? I could not find it.. Fedora: https://gitlab.com/hw-claudio/qemu/-/jobs/986936506 I've sent a patch for that issue just yesterday: https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg06852.html CentOS: https://gitlab.com/hw-claudio/qemu/-/jobs/980769080 Never seen that one before - if you hit it again, could you please save the artifacts and have a look at the log file in there to see what's exactly the problem? Thanks Thomas
Re: acceptance-system-fedora and acceptance-system-centos failing sporadically with timeout?
On 1/28/21 10:50 AM, Paolo Bonzini wrote: > On 28/01/21 10:45, Claudio Fontana wrote: >> Hi, >> >> is it just me, or is the CI sometimes failing with timeout? >> >> One nice feature that cirrus and travis have is the ability to relaunch one >> specific test, >> do you know if there is some way to do it in gitlab too? >> >> I could not find it.. >> >> Fedora: >> https://gitlab.com/hw-claudio/qemu/-/jobs/986936506 >> >> CentOS: >> https://gitlab.com/hw-claudio/qemu/-/jobs/980769080 >> >> > > There's a retry button in the top right corner. > > Paolo > Doh! I was not logged in properly, so I didn't see it. Thanks! Ciao, Claudio
acceptance-system-fedora and acceptance-system-centos failing sporadically with timeout?
Hi, is it just me, or is the CI sometimes failing with timeout? One nice feature that cirrus and travis have is the ability to relaunch one specific test, do you know if there is some way to do it in gitlab too? I could not find it.. Fedora: https://gitlab.com/hw-claudio/qemu/-/jobs/986936506 CentOS: https://gitlab.com/hw-claudio/qemu/-/jobs/980769080 -- Claudio Fontana Engineering Manager Virtualization, SUSE Labs Core SUSE Software Solutions Italy Srl
Re: acceptance-system-fedora and acceptance-system-centos failing sporadically with timeout?
On 28/01/21 10:45, Claudio Fontana wrote: Hi, is it just me, or is the CI sometimes failing with timeout? One nice feature that cirrus and travis have is the ability to relaunch one specific test, do you know if there is some way to do it in gitlab too? I could not find it.. Fedora: https://gitlab.com/hw-claudio/qemu/-/jobs/986936506 CentOS: https://gitlab.com/hw-claudio/qemu/-/jobs/980769080 There's a retry button in the top right corner. Paolo