Re: arm64 panic: reaper-related?

2020-07-14 Thread Glen Barber
On Tue, Jul 14, 2020 at 08:12:41PM +0100, Andrew Turner wrote:
> How reproducible is this? The backtrace and panic messages don’t
> line up, but that may be related __stack_chk_fail being in the
> trace. This is called when a stack overflow is detected.
> 

I do not yet know how reproducible it is, as this is the first time
I have observed this particular panic.

> I added more diagnostics to the kernel in r363191. Is it possible
> to try upgrading the kernel to that?
> 

Yes, I should be able to upgrade the system at some point this week.
Thank you for taking a look.

Glen



signature.asc
Description: PGP signature


Re: arm64 panic: reaper-related?

2020-07-14 Thread Andrew Turner

> On 13 Jul 2020, at 15:05, Glen Barber  wrote:
> 
> On Mon, Jul 13, 2020 at 01:58:21PM +, Glen Barber wrote:
>> Hi,
>> 
>> This morning, one of our arm64 build machines panicked.  It looks like
>> it is somehow reaper-related, but I am not entirely sure.  Backtrace
>> follows.  Any thoughts?  I'm not quite sure where to go from here...
>> Thanks in advance for any input.
>> 
>> db> set $lines 0
>> db> bt
>> Tracing pid 11 tid 13 td 0xfd0001634000
>> db_trace_self() at db_stack_trace+0xf8
>> pc = 0x0075fdac  lr = 0x00103e78
>> sp = 0x00011eca89b0  fp = 0x00011eca89e0
>> 
>> db_stack_trace() at db_command+0x228
>> pc = 0x00103e78  lr = 0x00103af0
>> sp = 0x00011eca89f0  fp = 0x00011eca8ad0
>> 
>> db_command() at db_command_loop+0x58
>> pc = 0x00103af0  lr = 0x00103898
>> sp = 0x00011eca8ae0  fp = 0x00011eca8b00
>> 
>> db_command_loop() at db_trap+0xf4
>> pc = 0x00103898  lr = 0x00106c0c
>> sp = 0x00011eca8b10  fp = 0x00011eca8d30
>> 
>> db_trap() at kdb pc = 0x00106c0c  lr = 0x00463b0c
>> sp = 0x00011eca8d40  fp = 0x00011eca8df0
>> 
>> kdb_trap() at do_el1h_sync+0xf4
>> pc = 0x00463b0c  lr = 0x0077b448
>> sp = 0x00011eca8e00  fp = 0x00011eca8e30
>> 
>> do_el1h_sync() at handle_el1h_sync+0x78
>> pc = 0x0077b448  lr = 0x00762878
>> sp = 0x00011eca8e40  fp = 0x00011eca8f50
>> 
>> handle_el1h_sync() at kdb_enter+0x34
>> pc = 0x00762878  lr = 0x00463168
>> sp = 0x00011eca8f60  fp = 0x00011eca8ff0
>> 
>> kdb_enter() at vpanic+0x1b0
>> pc = 0x00463168  lr = 0x00417a74
>> sp = 0x00011eca9000  fp = 0x00011eca90b0
>> 
>> vpanic() at panic+0x44
>> pc = 0x00417a74  lr = 0x004178c0
>> sp = 0x00011eca90c0  fp = 0x00011eca9140
>> 
>> panic() at __stack_chk_fail+0x10
>> pc = 0x004178c0  lr = 0x0044ab6c
>> sp = 0x00011eca9150  fp = 0x00011eca9150
>> 
>> __stack_chk_fail() at putchar+0x2bc
>> pc = 0x0044ab6c  lr = 0x00469ce8
>> sp = 0x00011eca9160  fp = 0x00011eca91e0
>> 
>> putchar() at 0x106
>> pc = 0x00469ce8  lr = 0x0106
>> sp = 0x00011eca91f0  fp = 0x
>> 
>> db> show proc 11
>> Process 11 (idle) at 0xfd000163:
>> state: NORMAL
>> uid: 0  gids: 0
>> parent: pid 0 at 0x010fae40
>> ABI: null
>> reaper: 0x010fae40 reapsubtree: 11
>> sigparent: 20
>> vmspace: 0x01109200
>>   (map 0x01109200)
>>   (map.pmap 0x011092c0)
>>   (pmap 0x01109320)
>> threads: 48
>> 13   Run CPU -1  [idle: cpu0]
>> 14   Run CPU 1   [idle: cpu1]
>> 15   Run CPU 2   [idle: cpu2]
>> 16   Run CPU 3   [idle: cpu3]
>> 17   Run CPU 4   [idle: cpu4]
>> 18   Run CPU 5   [idle: cpu5]
>> 19   Run CPU 6   [idle: cpu6]
>> 100010   Run CPU 7   [idle: cpu7]
>> 100011   Run CPU 8   [idle: cpu8]
>> 100012   CanRun  [idle: cpu9]
>> 100013   Run CPU 10  [idle: cpu10]
>> 100014   Run CPU 11  [idle: cpu11]
>> 100015   Run CPU 12  [idle: cpu12]
>> 100016   Run CPU 13  [idle: cpu13]
>> 100017   Run CPU 14  [idle: cpu14]
>> 100018   Run CPU 15  [idle: cpu15]
>> 100019   Run CPU 16  [idle: cpu16]
>> 100020   Run CPU 17  [idle: cpu17]
>> 100021   Run CPU 18  [idle: cpu18]
>> 100022   Run CPU 19  [idle: cpu19]
>> 100023   Run CPU 20  [idle: cpu20]
>> 100024   Run CPU 21  [idle: cpu21]
>> 100025   Run CPU 22  [idle: cpu22]
>> 100026   Run CPU 23  [idle: cpu23]
>> 100027   Run CPU 24  [idle: cpu24]
>> 100028   Run CPU 25  [idle: cpu25]
>> 100029   Run CPU 26  [idle: cpu26]
>> 100030

Re: arm64 panic: reaper-related?

2020-07-13 Thread Glen Barber
On Mon, Jul 13, 2020 at 01:58:21PM +, Glen Barber wrote:
> Hi,
> 
> This morning, one of our arm64 build machines panicked.  It looks like
> it is somehow reaper-related, but I am not entirely sure.  Backtrace
> follows.  Any thoughts?  I'm not quite sure where to go from here...
> Thanks in advance for any input.
> 
> db> set $lines 0
> db> bt
> Tracing pid 11 tid 13 td 0xfd0001634000
> db_trace_self() at db_stack_trace+0xf8
>  pc = 0x0075fdac  lr = 0x00103e78
>  sp = 0x00011eca89b0  fp = 0x00011eca89e0
> 
> db_stack_trace() at db_command+0x228
>  pc = 0x00103e78  lr = 0x00103af0
>  sp = 0x00011eca89f0  fp = 0x00011eca8ad0
> 
> db_command() at db_command_loop+0x58
>  pc = 0x00103af0  lr = 0x00103898
>  sp = 0x00011eca8ae0  fp = 0x00011eca8b00
> 
> db_command_loop() at db_trap+0xf4
>  pc = 0x00103898  lr = 0x00106c0c
>  sp = 0x00011eca8b10  fp = 0x00011eca8d30
> 
> db_trap() at kdb pc = 0x00106c0c  lr = 0x00463b0c
>  sp = 0x00011eca8d40  fp = 0x00011eca8df0
> 
> kdb_trap() at do_el1h_sync+0xf4
>  pc = 0x00463b0c  lr = 0x0077b448
>  sp = 0x00011eca8e00  fp = 0x00011eca8e30
> 
> do_el1h_sync() at handle_el1h_sync+0x78
>  pc = 0x0077b448  lr = 0x00762878
>  sp = 0x00011eca8e40  fp = 0x00011eca8f50
> 
> handle_el1h_sync() at kdb_enter+0x34
>  pc = 0x00762878  lr = 0x00463168
>  sp = 0x00011eca8f60  fp = 0x00011eca8ff0
> 
> kdb_enter() at vpanic+0x1b0
>  pc = 0x00463168  lr = 0x00417a74
>  sp = 0x00011eca9000  fp = 0x00011eca90b0
> 
> vpanic() at panic+0x44
>  pc = 0x00417a74  lr = 0x004178c0
>  sp = 0x00011eca90c0  fp = 0x00011eca9140
> 
> panic() at __stack_chk_fail+0x10
>  pc = 0x004178c0  lr = 0x0044ab6c
>  sp = 0x00011eca9150  fp = 0x00011eca9150
> 
> __stack_chk_fail() at putchar+0x2bc
>  pc = 0x0044ab6c  lr = 0x00469ce8
>  sp = 0x00011eca9160  fp = 0x00011eca91e0
> 
> putchar() at 0x106
>  pc = 0x00469ce8  lr = 0x0106
>  sp = 0x00011eca91f0  fp = 0x
> 
> db> show proc 11
> Process 11 (idle) at 0xfd000163:
>  state: NORMAL
>  uid: 0  gids: 0
>  parent: pid 0 at 0x010fae40
>  ABI: null
>  reaper: 0x010fae40 reapsubtree: 11
>  sigparent: 20
>  vmspace: 0x01109200
>(map 0x01109200)
>(map.pmap 0x011092c0)
>(pmap 0x01109320)
>  threads: 48
> 13   Run CPU -1  [idle: cpu0]
> 14   Run CPU 1   [idle: cpu1]
> 15   Run CPU 2   [idle: cpu2]
> 16   Run CPU 3   [idle: cpu3]
> 17   Run CPU 4   [idle: cpu4]
> 18   Run CPU 5   [idle: cpu5]
> 19   Run CPU 6   [idle: cpu6]
> 100010   Run CPU 7   [idle: cpu7]
> 100011   Run CPU 8   [idle: cpu8]
> 100012   CanRun  [idle: cpu9]
> 100013   Run CPU 10  [idle: cpu10]
> 100014   Run CPU 11  [idle: cpu11]
> 100015   Run CPU 12  [idle: cpu12]
> 100016   Run CPU 13  [idle: cpu13]
> 100017   Run CPU 14  [idle: cpu14]
> 100018   Run CPU 15  [idle: cpu15]
> 100019   Run CPU 16  [idle: cpu16]
> 100020   Run CPU 17  [idle: cpu17]
> 100021   Run CPU 18  [idle: cpu18]
> 100022   Run CPU 19  [idle: cpu19]
> 100023   Run CPU 20  [idle: cpu20]
> 100024   Run CPU 21  [idle: cpu21]
> 100025   Run CPU 22  [idle: cpu22]
> 100026   Run CPU 23  [idle: cpu23]
> 100027   Run CPU 24  [idle: cpu24]
> 100028   Run CPU 25  [idle: cpu25]
> 100029   Run CPU 26  [idle: cpu26]
> 100030   CanRun  [idle: cpu27]
> 100031   Run CPU 28  

arm64 panic: reaper-related?

2020-07-13 Thread Glen Barber
Hi,

This morning, one of our arm64 build machines panicked.  It looks like
it is somehow reaper-related, but I am not entirely sure.  Backtrace
follows.  Any thoughts?  I'm not quite sure where to go from here...
Thanks in advance for any input.

db> set $lines 0
db> bt
Tracing pid 11 tid 13 td 0xfd0001634000
db_trace_self() at db_stack_trace+0xf8
 pc = 0x0075fdac  lr = 0x00103e78
 sp = 0x00011eca89b0  fp = 0x00011eca89e0

db_stack_trace() at db_command+0x228
 pc = 0x00103e78  lr = 0x00103af0
 sp = 0x00011eca89f0  fp = 0x00011eca8ad0

db_command() at db_command_loop+0x58
 pc = 0x00103af0  lr = 0x00103898
 sp = 0x00011eca8ae0  fp = 0x00011eca8b00

db_command_loop() at db_trap+0xf4
 pc = 0x00103898  lr = 0x00106c0c
 sp = 0x00011eca8b10  fp = 0x00011eca8d30

db_trap() at kdb pc = 0x00106c0c  lr = 0x00463b0c
 sp = 0x00011eca8d40  fp = 0x00011eca8df0

kdb_trap() at do_el1h_sync+0xf4
 pc = 0x00463b0c  lr = 0x0077b448
 sp = 0x00011eca8e00  fp = 0x00011eca8e30

do_el1h_sync() at handle_el1h_sync+0x78
 pc = 0x0077b448  lr = 0x00762878
 sp = 0x00011eca8e40  fp = 0x00011eca8f50

handle_el1h_sync() at kdb_enter+0x34
 pc = 0x00762878  lr = 0x00463168
 sp = 0x00011eca8f60  fp = 0x00011eca8ff0

kdb_enter() at vpanic+0x1b0
 pc = 0x00463168  lr = 0x00417a74
 sp = 0x00011eca9000  fp = 0x00011eca90b0

vpanic() at panic+0x44
 pc = 0x00417a74  lr = 0x004178c0
 sp = 0x00011eca90c0  fp = 0x00011eca9140

panic() at __stack_chk_fail+0x10
 pc = 0x004178c0  lr = 0x0044ab6c
 sp = 0x00011eca9150  fp = 0x00011eca9150

__stack_chk_fail() at putchar+0x2bc
 pc = 0x0044ab6c  lr = 0x00469ce8
 sp = 0x00011eca9160  fp = 0x00011eca91e0

putchar() at 0x106
 pc = 0x00469ce8  lr = 0x0106
 sp = 0x00011eca91f0  fp = 0x

db> show proc 11
Process 11 (idle) at 0xfd000163:
 state: NORMAL
 uid: 0  gids: 0
 parent: pid 0 at 0x010fae40
 ABI: null
 reaper: 0x010fae40 reapsubtree: 11
 sigparent: 20
 vmspace: 0x01109200
   (map 0x01109200)
   (map.pmap 0x011092c0)
   (pmap 0x01109320)
 threads: 48
13   Run CPU -1  [idle: cpu0]
14   Run CPU 1   [idle: cpu1]
15   Run CPU 2   [idle: cpu2]
16   Run CPU 3   [idle: cpu3]
17   Run CPU 4   [idle: cpu4]
18   Run CPU 5   [idle: cpu5]
19   Run CPU 6   [idle: cpu6]
100010   Run CPU 7   [idle: cpu7]
100011   Run CPU 8   [idle: cpu8]
100012   CanRun  [idle: cpu9]
100013   Run CPU 10  [idle: cpu10]
100014   Run CPU 11  [idle: cpu11]
100015   Run CPU 12  [idle: cpu12]
100016   Run CPU 13  [idle: cpu13]
100017   Run CPU 14  [idle: cpu14]
100018   Run CPU 15  [idle: cpu15]
100019   Run CPU 16  [idle: cpu16]
100020   Run CPU 17  [idle: cpu17]
100021   Run CPU 18  [idle: cpu18]
100022   Run CPU 19  [idle: cpu19]
100023   Run CPU 20  [idle: cpu20]
100024   Run CPU 21  [idle: cpu21]
100025   Run CPU 22  [idle: cpu22]
100026   Run CPU 23  [idle: cpu23]
100027   Run CPU 24  [idle: cpu24]
100028   Run CPU 25  [idle: cpu25]
100029   Run CPU 26  [idle: cpu26]
100030   CanRun  [idle: cpu27]
100031   Run CPU 28  [idle: cpu28]
100032   Run CPU 29  [idle: cpu29]
100033   Run CPU 30  [idle: cpu30]
100034   Run CPU 31  [idle: cpu31]
100035