Re: [Avocado-devel] acceptance-system-fedora and acceptance-system-centos failing sporadically with timeout?

2021-02-05 Thread Beraldo Leal
Hi all,

I will try to give my two cents:

On Fri, Feb 05, 2021 at 03:31:40PM -0500, John Snow wrote:
> On 2/5/21 11:43 AM, Philippe Mathieu-Daudé wrote:
> > Cc'ing Avocado team & John (Python inferior exit delay?).
> > 
> > On 1/28/21 11:10 AM, Thomas Huth wrote:
> > > On 28/01/2021 10.45, Claudio Fontana wrote:
> > > > 
> > > > is it just me, or is the CI sometimes failing with timeout?
> > > > 
> > > > Fedora:
> > > > https://gitlab.com/hw-claudio/qemu/-/jobs/986936506
> > > 
> > > I've sent a patch for that issue just yesterday:
> > > 
> > >   https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg06852.html
> > > 
> > > > CentOS:
> > > > https://gitlab.com/hw-claudio/qemu/-/jobs/980769080
> > > 
> > > Never seen that one before - if you hit it again, could you please save
> > > the artifacts and have a look at the log file in there to see what's
> > > exactly the problem?
> > 
> > https://gitlab.com/philmd/qemu/-/jobs/1008007125
> > 
> >   (28/36)
> > tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_ppc_mac99:
> > ERROR: Test reported status but did not finish (90.09 s)
> > 
> > Attached debug.log.
> > 
> 
> ¯\_(ツ)_/¯
> 
> I don't know what "reported status but did not finish" means.
> 
> The debug log looks like it passes, too, so... I don't know that this has
> much do with code I maintain yet. I'm sure the Avocado team will find me if
> I am wrong :)

Afaict, this happen when the process exceeds the deadline to finish.
Sometimes the test is finished but the "post test" stage is stucked
because of some reason.

Maybe setting 'runner.timeout.process_alive' to a higher number could help
here:


$ avocado config reference | grep process_alive -A 6

runner.timeout.process_alive

The amount of time to wait after a test has reported status but the
test process has not finished

* Default: 60
* Type: 


But I might be wrong. I know that Cleber was working with this, so
probably he could help here too.

--
Beraldo




Re: acceptance-system-fedora and acceptance-system-centos failing sporadically with timeout?

2021-02-05 Thread John Snow

On 2/5/21 11:43 AM, Philippe Mathieu-Daudé wrote:

Cc'ing Avocado team & John (Python inferior exit delay?).

On 1/28/21 11:10 AM, Thomas Huth wrote:

On 28/01/2021 10.45, Claudio Fontana wrote:


is it just me, or is the CI sometimes failing with timeout?

Fedora:
https://gitlab.com/hw-claudio/qemu/-/jobs/986936506


I've sent a patch for that issue just yesterday:

  https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg06852.html


CentOS:
https://gitlab.com/hw-claudio/qemu/-/jobs/980769080


Never seen that one before - if you hit it again, could you please save
the artifacts and have a look at the log file in there to see what's
exactly the problem?


https://gitlab.com/philmd/qemu/-/jobs/1008007125

  (28/36)
tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_ppc_mac99:
ERROR: Test reported status but did not finish (90.09 s)

Attached debug.log.



¯\_(ツ)_/¯

I don't know what "reported status but did not finish" means.

The debug log looks like it passes, too, so... I don't know that this 
has much do with code I maintain yet. I'm sure the Avocado team will 
find me if I am wrong :)


--js




Re: acceptance-system-fedora and acceptance-system-centos failing sporadically with timeout?

2021-02-05 Thread Philippe Mathieu-Daudé
On 2/5/21 5:49 PM, Thomas Huth wrote:
> On 05/02/2021 17.43, Philippe Mathieu-Daudé wrote:
>> Cc'ing Avocado team & John (Python inferior exit delay?).
>>
>> On 1/28/21 11:10 AM, Thomas Huth wrote:
>>> On 28/01/2021 10.45, Claudio Fontana wrote:

 is it just me, or is the CI sometimes failing with timeout?

 Fedora:
 https://gitlab.com/hw-claudio/qemu/-/jobs/986936506
>>>
>>> I've sent a patch for that issue just yesterday:
>>>
>>>   https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg06852.html
>>>
 CentOS:
 https://gitlab.com/hw-claudio/qemu/-/jobs/980769080
>>>
>>> Never seen that one before - if you hit it again, could you please save
>>> the artifacts and have a look at the log file in there to see what's
>>> exactly the problem?
>>
>> https://gitlab.com/philmd/qemu/-/jobs/1008007125
>>
>>   (28/36)
>> tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_ppc_mac99:
>> ERROR: Test reported status but did not finish (90.09 s)
>>
>> Attached debug.log.
> 
> That's again the failing test on the mac99 machine where I've already
> sent a patch for. I'm looking for a log of the failing or1k machine that
> Claudio has experienced in the CentOS pipeline.

Oh sorry I should have started a new thread instead :/

There is still a problem that "Test reported status but did
not finish" and make the CI red, while the debug.log show the
test succeeded. This is where I'd like Avocado guys and John
feedback.




Re: acceptance-system-fedora and acceptance-system-centos failing sporadically with timeout?

2021-02-05 Thread Thomas Huth

On 05/02/2021 17.43, Philippe Mathieu-Daudé wrote:

Cc'ing Avocado team & John (Python inferior exit delay?).

On 1/28/21 11:10 AM, Thomas Huth wrote:

On 28/01/2021 10.45, Claudio Fontana wrote:


is it just me, or is the CI sometimes failing with timeout?

Fedora:
https://gitlab.com/hw-claudio/qemu/-/jobs/986936506


I've sent a patch for that issue just yesterday:

  https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg06852.html


CentOS:
https://gitlab.com/hw-claudio/qemu/-/jobs/980769080


Never seen that one before - if you hit it again, could you please save
the artifacts and have a look at the log file in there to see what's
exactly the problem?


https://gitlab.com/philmd/qemu/-/jobs/1008007125

  (28/36)
tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_ppc_mac99:
ERROR: Test reported status but did not finish (90.09 s)

Attached debug.log.


That's again the failing test on the mac99 machine where I've already sent a 
patch for. I'm looking for a log of the failing or1k machine that Claudio 
has experienced in the CentOS pipeline.


 Thomas




Re: acceptance-system-fedora and acceptance-system-centos failing sporadically with timeout?

2021-02-05 Thread Philippe Mathieu-Daudé
Cc'ing Avocado team & John (Python inferior exit delay?).

On 1/28/21 11:10 AM, Thomas Huth wrote:
> On 28/01/2021 10.45, Claudio Fontana wrote:
>>
>> is it just me, or is the CI sometimes failing with timeout?
>>
>> Fedora:
>> https://gitlab.com/hw-claudio/qemu/-/jobs/986936506
> 
> I've sent a patch for that issue just yesterday:
> 
>  https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg06852.html
> 
>> CentOS:
>> https://gitlab.com/hw-claudio/qemu/-/jobs/980769080
> 
> Never seen that one before - if you hit it again, could you please save
> the artifacts and have a look at the log file in there to see what's
> exactly the problem?

https://gitlab.com/philmd/qemu/-/jobs/1008007125

 (28/36)
tests/acceptance/replay_kernel.py:ReplayKernelNormal.test_ppc_mac99:
ERROR: Test reported status but did not finish (90.09 s)

Attached debug.log.
14:44:28 DEBUG| PARAMS (key=arch, path=*, default=ppc) => 'ppc'
14:44:28 DEBUG| PARAMS (key=machine, path=*, default=mac99) => 'mac99'
14:44:28 DEBUG| PARAMS (key=qemu_bin, path=*, default=./qemu-system-ppc) => './qemu-system-ppc'
14:44:28 INFO | recording the execution...
14:44:28 DEBUG| VM launch command: './qemu-system-ppc -display none -vga none -chardev socket,id=mon,path=/var/tmp/avo_qemu_sock_ettyrtlx/qemu-725-monitor.sock -mon chardev=mon,mode=control -machine mac99 -chardev socket,id=console,path=/var/tmp/avo_qemu_sock_ettyrtlx/qemu-725-console.sock,server=on,wait=off -serial chardev:console -icount shift=7,rr=record,rrfile=/var/tmp/avocado_tovfx415/avocado_job_i58i8d2a/28-tests_acceptance_replay_kernel.py_ReplayKernelNormal.test_ppc_mac99/replay.bin -kernel /var/tmp/avocado_tovfx415/avocado_job_i58i8d2a/28-tests_acceptance_replay_kernel.py_ReplayKernelNormal.test_ppc_mac99/day15/invaders.elf -append  -net none -no-reboot -M graphics=off'
14:44:28 DEBUG| >>> {'execute': 'qmp_capabilities'}
14:44:28 DEBUG| <<< {'return': {}}
14:44:29 DEBUG| >> =
14:44:29 DEBUG| >> OpenBIOS 1.1 [Jul 27 2020 08:14]
14:44:29 DEBUG| >> Configuration device id QEMU version 1 machine id 1
14:44:29 DEBUG| >> CPUs: 1
14:44:29 DEBUG| >> Memory: 128M
14:44:29 DEBUG| >> UUID: ----
14:44:29 DEBUG| >> CPU type PowerPC,G4
14:44:29 DEBUG| milliseconds isn't unique.
14:44:30 DEBUG| Welcome to OpenBIOS v1.1 built on Jul 27 2020 08:14
14:44:30 DEBUG| >> [ppc] Kernel already loaded (0x0100 + 0x004ed2e4) (initrd 0x + 0x)
14:44:30 DEBUG| >> [ppc] Kernel command line:
14:44:30 DEBUG| >> switching to new context:
14:44:30 DEBUG| OF stdout device is: /pci@f200/mac-io@c/escc@13000/ch-a@13020
14:44:30 DEBUG| Preparing to boot Linux version 4.11.3 (th...@thuth.remote.csb) (gcc version 6.4.0 (Buildroot 2018.05.2) ) #8 Mon Dec 10 12:05:13 CET 2018
14:44:30 DEBUG| Detected machine type: 0400
14:44:30 DEBUG| command line:
14:44:31 DEBUG| memory layout at init:
14:44:31 DEBUG| memory_limit :  (16 MB aligned)
14:44:31 DEBUG| alloc_bottom : 014f2000
14:44:31 DEBUG| alloc_top: 0800
14:44:31 DEBUG| alloc_top_hi : 0800
14:44:31 DEBUG| rmo_top  : 0800
14:44:31 DEBUG| ram_top  : 0800
14:44:31 DEBUG| copying OF device tree...
14:44:31 DEBUG| Building dt strings...
14:44:32 DEBUG| Building dt structure...
14:44:35 DEBUG| Device tree strings 0x014f3000 -> 0x014f34f5
14:44:35 DEBUG| Device tree struct  0x014f4000 -> 0x014f6000
14:44:35 DEBUG| Quiescing Open Firmware ...
14:44:36 DEBUG| Booting Linux via __start() @ 0x0100 ...
14:44:37 DEBUG| Hello World !
14:44:37 DEBUG| Total memory = 128MB; using 256kB for hash table (at c7fc)
14:44:37 DEBUG| Linux version 4.11.3 (th...@thuth.remote.csb) (gcc version 6.4.0 (Buildroot 2018.05.2) ) #8 Mon Dec 10 12:05:13 CET 2018
14:44:37 DEBUG| Found UniNorth memory controller & host bridge @ 0xf800 revision: 0x07
14:44:37 DEBUG| Mapped at 0xff7c
14:44:37 DEBUG| Found a Keylargo mac-io controller, rev: 0, mapped at 0xff74
14:44:37 DEBUG| PowerMac motherboard: PowerMac G4 AGP Graphics
14:44:37 DEBUG| boot stdout isn't a display !
14:44:37 DEBUG| Using PowerMac machine description
14:44:37 DEBUG| bootconsole [udbg0] enabled
14:44:37 DEBUG| -
14:44:37 DEBUG| Hash_size = 0x4
14:44:37 DEBUG| phys_mem_size = 0x800
14:44:37 DEBUG| dcache_bsize  = 0x20
14:44:37 DEBUG| icache_bsize  = 0x20
14:44:37 DEBUG| cpu_features  = 0x0020047a
14:44:37 DEBUG| possible= 0x05a6fd7f
14:44:37 DEBUG| always  = 0x
14:44:37 DEBUG| cpu_user_features = 0x9c01 0x
14:44:37 DEBUG| mmu_features  = 0x0001
14:44:37 DEBUG| Hash  = 0xc7fc
14:44:37 DEBUG| Hash_mask = 0xfff
14:44:37 DEBUG| -
14:44:37 DEBUG| Found UniNorth PCI host bridge at 0xf200. Firmware bus number: 0->0
14:44:37 DEBUG| PCI host bridge /pci@f2000

Re: acceptance-system-fedora and acceptance-system-centos failing sporadically with timeout?

2021-01-28 Thread Claudio Fontana
On 1/28/21 11:10 AM, Thomas Huth wrote:
> On 28/01/2021 10.45, Claudio Fontana wrote:
>> Hi,
>>
>> is it just me, or is the CI sometimes failing with timeout?
>>
>> One nice feature that cirrus and travis have is the ability to relaunch one 
>> specific test,
>> do you know if there is some way to do it in gitlab too?
>>
>> I could not find it..
>>
>> Fedora:
>> https://gitlab.com/hw-claudio/qemu/-/jobs/986936506
> 
> I've sent a patch for that issue just yesterday:
> 
>   https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg06852.html
> 
>> CentOS:
>> https://gitlab.com/hw-claudio/qemu/-/jobs/980769080
> 
> Never seen that one before - if you hit it again, could you please save the 
> artifacts and have a look at the log file in there to see what's exactly the 
> problem?
> 
>   Thanks
>Thomas
> 

Hello Thomas,

will do!

Ciao,

Claudio



Re: acceptance-system-fedora and acceptance-system-centos failing sporadically with timeout?

2021-01-28 Thread Thomas Huth

On 28/01/2021 10.45, Claudio Fontana wrote:

Hi,

is it just me, or is the CI sometimes failing with timeout?

One nice feature that cirrus and travis have is the ability to relaunch one 
specific test,
do you know if there is some way to do it in gitlab too?

I could not find it..

Fedora:
https://gitlab.com/hw-claudio/qemu/-/jobs/986936506


I've sent a patch for that issue just yesterday:

 https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg06852.html


CentOS:
https://gitlab.com/hw-claudio/qemu/-/jobs/980769080


Never seen that one before - if you hit it again, could you please save the 
artifacts and have a look at the log file in there to see what's exactly the 
problem?


 Thanks
  Thomas




Re: acceptance-system-fedora and acceptance-system-centos failing sporadically with timeout?

2021-01-28 Thread Claudio Fontana
On 1/28/21 10:50 AM, Paolo Bonzini wrote:
> On 28/01/21 10:45, Claudio Fontana wrote:
>> Hi,
>>
>> is it just me, or is the CI sometimes failing with timeout?
>>
>> One nice feature that cirrus and travis have is the ability to relaunch one 
>> specific test,
>> do you know if there is some way to do it in gitlab too?
>>
>> I could not find it..
>>
>> Fedora:
>> https://gitlab.com/hw-claudio/qemu/-/jobs/986936506
>>
>> CentOS:
>> https://gitlab.com/hw-claudio/qemu/-/jobs/980769080
>>
>>
> 
> There's a retry button in the top right corner.
> 
> Paolo
> 

Doh! I was not logged in properly, so I didn't see it.

Thanks!

Ciao,

Claudio




acceptance-system-fedora and acceptance-system-centos failing sporadically with timeout?

2021-01-28 Thread Claudio Fontana
Hi,

is it just me, or is the CI sometimes failing with timeout?

One nice feature that cirrus and travis have is the ability to relaunch one 
specific test,
do you know if there is some way to do it in gitlab too?

I could not find it..

Fedora:
https://gitlab.com/hw-claudio/qemu/-/jobs/986936506

CentOS:
https://gitlab.com/hw-claudio/qemu/-/jobs/980769080


-- 
Claudio Fontana
Engineering Manager Virtualization, SUSE Labs Core

SUSE Software Solutions Italy Srl



Re: acceptance-system-fedora and acceptance-system-centos failing sporadically with timeout?

2021-01-28 Thread Paolo Bonzini

On 28/01/21 10:45, Claudio Fontana wrote:

Hi,

is it just me, or is the CI sometimes failing with timeout?

One nice feature that cirrus and travis have is the ability to relaunch one 
specific test,
do you know if there is some way to do it in gitlab too?

I could not find it..

Fedora:
https://gitlab.com/hw-claudio/qemu/-/jobs/986936506

CentOS:
https://gitlab.com/hw-claudio/qemu/-/jobs/980769080




There's a retry button in the top right corner.

Paolo