Re: [BISECTED] power8: watchdog: CPU 3 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x154/0x2d0

2021-12-29 Thread Stijn Tintel


On 25/12/2021 12:31, Nicholas Piggin wrote:

Excerpts from Stijn Tintel's message of December 22, 2021 11:20 am:

Hi,

After upgrading my Power8 server from 5.10 LTS to 5.15 LTS, I started
experiencing CPU hard lockups, usually rather quickly after boot:


watchdog: CPU 3 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x154/0x2d0
watchdog: CPU 3 TB:265651929071, last heartbeat TB:259344820187 (12318ms
ago)

snip

Bisecting lead to the following commit:

deb9b13eb2571fbde164ae012c77985fd14f2f02 is the first bad commit
commit deb9b13eb2571fbde164ae012c77985fd14f2f02
Author: Davidlohr Bueso
Date:   Mon Mar 8 17:59:50 2021 -0800

    powerpc/qspinlock: Use generic smp_cond_load_relaxed

Thanks for bisecting and reporting this.

Thanks for your response, much appreciated.

As far as I can see, the code should be functionally identical,
the difference is slightly in loop structure and priority nops
but that shouldn't cause complete lock ups.

I suspect possibly something is getting miscompiled. What distro
do you use, what gcc version? And would you be able to send the
output of objdump --disassemble=queued_spin_lock_slowpath vmlinux
for your bad kernel?


Gentoo hardened musl, both gcc 10.3.0 and 11.2.0 exhibit the lockups.

/boot/disable/vmlinuz-5.12.0-rc3-ppc64le-00024-gdeb9b13eb257: file 
format elf64-powerpcle



Disassembly of section .head.text:

Disassembly of section .text:

c010d0d4 :
c010d0d4:   e9 00 4c 3c addis   r2,r12,233
c010d0d8:   2c f3 42 38 addi    r2,r2,-3284
c010d0dc:   00 01 04 28 cmplwi  r4,256
c010d0e0:   2c 00 82 40 bne c010d10c 


c010d0e4:   01 02 20 39 li  r9,513
c010d0e8:   a6 03 29 7d mtctr   r9
c010d0ec:   02 00 83 e8 lwa r4,0(r3)
c010d0f0:   00 01 04 2c cmpwi   r4,256
c010d0f4:   14 00 82 40 bne c010d108 

c010d0f8:   10 00 40 42 bdz c010d108 


c010d0fc:   78 0b 21 7c mr  r1,r1
c010d100:   78 13 42 7c mr  r2,r2
c010d104:   e8 ff ff 4b b   c010d0ec 


c010d108:   20 00 84 78 clrldi  r4,r4,32
c010d10c:   2e 00 84 54 rlwinm  r4,r4,0,0,23
c010d110:   00 00 04 2c cmpwi   r4,0
c010d114:   38 00 82 40 bne c010d14c 


c010d118:   00 01 40 39 li  r10,256
c010d11c:   28 18 20 7d lwarx   r9,0,r3
c010d120:   78 4b 48 7d or  r8,r10,r9
c010d124:   2d 19 00 7d stwcx.  r8,0,r3
c010d128:   f4 ff c2 40 bne-    c010d11c 


c010d12c:   2c 01 00 4c isync
c010d130:   2e 00 28 55 rlwinm  r8,r9,0,0,23
c010d134:   20 00 2a 79 clrldi  r10,r9,32
c010d138:   00 00 08 2c cmpwi   r8,0
c010d13c:   e4 00 82 41 beq c010d220 


c010d140:   00 ff 29 71 andi.   r9,r9,65280
c010d144:   08 00 82 40 bne c010d14c 


c010d148:   01 00 23 99 stb r9,1(r3)
c010d14c:   28 00 2d e9 ld  r9,40(r13)
c010d150:   cf ff 42 3d addis   r10,r2,-49
c010d154:   01 00 00 39 li  r8,1
c010d158:   80 15 4a 39 addi    r10,r10,5504
c010d15c:   78 53 46 7d mr  r6,r10
c010d160:   14 4a c6 7c add r6,r6,r9
c010d164:   0e 00 26 e9 lwa r9,12(r6)
c010d168:   01 00 e9 38 addi    r7,r9,1
c010d16c:   03 00 09 2c cmpwi   r9,3
c010d170:   0c 00 e6 90 stw r7,12(r6)
c010d174:   00 00 ed a0 lhz r7,0(r13)
c010d178:   e4 00 81 41 bgt c010d25c 


c010d17c:   e4 26 20 79 rldicr  r0,r9,4,59
c010d180:   14 02 66 7d add r11,r6,r0
c010d184:   00 00 00 39 li  r8,0
c010d188:   08 00 0b 91 stw r8,8(r11)
c010d18c:   00 00 00 39 li  r8,0
c010d190:   2a 01 06 7d stdx    r8,r6,r0
c010d194:   00 00 03 81 lwz r8,0(r3)
c010d198:   b5 07 08 7d extsw.  r8,r8
c010d19c:   04 01 82 41 beq c010d2a0 


c010d1a0:   01 00 e7 38 addi    r7,r7,1
c010d1a4:   1e 80 29 55 rlwinm  r9,r9,16,0,15
c010d1a8:   f8 ff e1 fb std r31,-8(r1)
c010d1ac:   1a 90 e7 54 rlwinm  r7,r7,18,0,13
c010d1b0:   78 3b 27 7d or  r7,r9,r7
c010d1b4:   ac 04 20 7c lwsync
c010d1b8:   00 00 80 38 li  r4,0
c010d1bc:   02 00 03 39 addi    r8,r3,2
c010d1c0:   f8 1e 0c 55 rlwinm  r12,r8,3,27,28
c010d1c4:   3e 84 e5 54 rlwinm  r5,r7,16,16,31

Re: [BISECTED] power8: watchdog: CPU 3 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x154/0x2d0

2021-12-25 Thread Nicholas Piggin
Excerpts from Stijn Tintel's message of December 22, 2021 11:20 am:
> Hi,
> 
> After upgrading my Power8 server from 5.10 LTS to 5.15 LTS, I started
> experiencing CPU hard lockups, usually rather quickly after boot:
> 
> 
> watchdog: CPU 3 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x154/0x2d0
> watchdog: CPU 3 TB:265651929071, last heartbeat TB:259344820187 (12318ms
> ago)
> watchdog: CPU 4 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x22c/0x2d0
> watchdog: CPU 4 TB:265651929059, last heartbeat TB:259344820045 (12318ms
> ago)
> watchdog: CPU 5 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 5 TB:265651929037, last heartbeat TB:259349940303 (12308ms
> ago)
> watchdog: CPU 6 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x144/0x2d0
> watchdog: CPU 6 TB:265651929056, last heartbeat TB:259349940294 (12308ms
> ago)
> watchdog: CPU 12 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x280/0x2d0
> watchdog: CPU 12 TB:242479050267, last heartbeat TB:236822174350
> (11048ms ago)
> watchdog: CPU 26 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x22c/0x2d0
> watchdog: CPU 26 TB:265657049348, last heartbeat TB:259355060595
> (12308ms ago)
> watchdog: CPU 40 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 40 TB:265657049289, last heartbeat TB:259360180427
> (12298ms ago)
> watchdog: CPU 47 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x21c/0x2d0
> watchdog: CPU 47 TB:265657049213, last heartbeat TB:259365300321
> (12288ms ago)
> watchdog: CPU 60 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 60 TB:265651929348, last heartbeat TB:259370420527
> (12268ms ago)
> watchdog: CPU 72 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 72 TB:265718488733, last heartbeat TB:259375540545
> (12388ms ago)
> watchdog: CPU 13 detected hard LOCKUP on other CPUs 0-2,7,10,44
> watchdog: CPU 13 TB:267541867921, last SMP heartbeat TB:259380660378
> (15939ms ago)
> watchdog: CPU 34 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 34 TB:269913954376, last heartbeat TB:263456144470
> (12612ms ago)
> watchdog: CPU 41 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 41 TB:267865972392, last heartbeat TB:261408162383
> (12612ms ago)
> watchdog: CPU 74 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 74 TB:267766470637, last heartbeat TB:261423522630
> (12388ms ago)
> watchdog: CPU 8 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 8 TB:274978264599, last heartbeat TB:269237436681 (11212ms
> ago)
> watchdog: CPU 9 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 9 TB:268029810836, last heartbeat TB:261397922093 (12952ms
> ago)
> watchdog: CPU 11 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 11 TB:279685725759, last heartbeat TB:273685814104
> (11718ms ago)
> watchdog: CPU 16 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 16 TB:267865972449, last heartbeat TB:261397922458
> (12632ms ago)
> watchdog: CPU 18 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 18 TB:269913954314, last heartbeat TB:263445904285
> (12632ms ago)
> watchdog: CPU 24 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 24 TB:267865972338, last heartbeat TB:261403042311
> (12622ms ago)
> watchdog: CPU 31 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x22c/0x2d0
> watchdog: CPU 31 TB:268029811095, last heartbeat TB:261403042673
> (12942ms ago)
> watchdog: CPU 32 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 32 TB:267865972528, last heartbeat TB:261403042589
> (12622ms ago)
> watchdog: CPU 33 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 33 TB:268029811013, last heartbeat TB:261408162474
> (12932ms ago)
> watchdog: CPU 35 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 35 TB:280174344471, last heartbeat TB:273696054625
> (12652ms ago)
> watchdog: CPU 37 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 37 TB:269913954356, last heartbeat TB:263456144501
> (12612ms ago)
> watchdog: CPU 38 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x228/0x2d0
> watchdog: CPU 38 TB:290393774681, last heartbeat TB:283946212510
> (12592ms ago)
> 
> Bisecting lead to the following commit:
> 
> deb9b13eb2571fbde164ae012c77985fd14f2f02 is the first bad commit
> commit deb9b13eb2571fbde164ae012c77985fd14f2f02
> Author: Davidlohr Bueso 
> Date:   Mon Mar 8 17:59:50 2021 -0800
> 
>    powerpc/qspinlock: Use generic smp_cond_load_relaxed

Thanks for bisecting and reporting this.

As far as I can see, the code should be 

[BISECTED] power8: watchdog: CPU 3 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x154/0x2d0

2021-12-22 Thread Stijn Tintel
Hi,

After upgrading my Power8 server from 5.10 LTS to 5.15 LTS, I started
experiencing CPU hard lockups, usually rather quickly after boot:


watchdog: CPU 3 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x154/0x2d0
watchdog: CPU 3 TB:265651929071, last heartbeat TB:259344820187 (12318ms
ago)
watchdog: CPU 4 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x22c/0x2d0
watchdog: CPU 4 TB:265651929059, last heartbeat TB:259344820045 (12318ms
ago)
watchdog: CPU 5 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 5 TB:265651929037, last heartbeat TB:259349940303 (12308ms
ago)
watchdog: CPU 6 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x144/0x2d0
watchdog: CPU 6 TB:265651929056, last heartbeat TB:259349940294 (12308ms
ago)
watchdog: CPU 12 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x280/0x2d0
watchdog: CPU 12 TB:242479050267, last heartbeat TB:236822174350
(11048ms ago)
watchdog: CPU 26 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x22c/0x2d0
watchdog: CPU 26 TB:265657049348, last heartbeat TB:259355060595
(12308ms ago)
watchdog: CPU 40 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 40 TB:265657049289, last heartbeat TB:259360180427
(12298ms ago)
watchdog: CPU 47 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x21c/0x2d0
watchdog: CPU 47 TB:265657049213, last heartbeat TB:259365300321
(12288ms ago)
watchdog: CPU 60 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 60 TB:265651929348, last heartbeat TB:259370420527
(12268ms ago)
watchdog: CPU 72 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 72 TB:265718488733, last heartbeat TB:259375540545
(12388ms ago)
watchdog: CPU 13 detected hard LOCKUP on other CPUs 0-2,7,10,44
watchdog: CPU 13 TB:267541867921, last SMP heartbeat TB:259380660378
(15939ms ago)
watchdog: CPU 34 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 34 TB:269913954376, last heartbeat TB:263456144470
(12612ms ago)
watchdog: CPU 41 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 41 TB:267865972392, last heartbeat TB:261408162383
(12612ms ago)
watchdog: CPU 74 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 74 TB:267766470637, last heartbeat TB:261423522630
(12388ms ago)
watchdog: CPU 8 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 8 TB:274978264599, last heartbeat TB:269237436681 (11212ms
ago)
watchdog: CPU 9 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 9 TB:268029810836, last heartbeat TB:261397922093 (12952ms
ago)
watchdog: CPU 11 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 11 TB:279685725759, last heartbeat TB:273685814104
(11718ms ago)
watchdog: CPU 16 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 16 TB:267865972449, last heartbeat TB:261397922458
(12632ms ago)
watchdog: CPU 18 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 18 TB:269913954314, last heartbeat TB:263445904285
(12632ms ago)
watchdog: CPU 24 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 24 TB:267865972338, last heartbeat TB:261403042311
(12622ms ago)
watchdog: CPU 31 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x22c/0x2d0
watchdog: CPU 31 TB:268029811095, last heartbeat TB:261403042673
(12942ms ago)
watchdog: CPU 32 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 32 TB:267865972528, last heartbeat TB:261403042589
(12622ms ago)
watchdog: CPU 33 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 33 TB:268029811013, last heartbeat TB:261408162474
(12932ms ago)
watchdog: CPU 35 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 35 TB:280174344471, last heartbeat TB:273696054625
(12652ms ago)
watchdog: CPU 37 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x230/0x2d0
watchdog: CPU 37 TB:269913954356, last heartbeat TB:263456144501
(12612ms ago)
watchdog: CPU 38 self-detected hard LOCKUP @
queued_spin_lock_slowpath+0x228/0x2d0
watchdog: CPU 38 TB:290393774681, last heartbeat TB:283946212510
(12592ms ago)

Bisecting lead to the following commit:

deb9b13eb2571fbde164ae012c77985fd14f2f02 is the first bad commit
commit deb9b13eb2571fbde164ae012c77985fd14f2f02
Author: Davidlohr Bueso 
Date:   Mon Mar 8 17:59:50 2021 -0800

   powerpc/qspinlock: Use generic smp_cond_load_relaxed
   

The problem persists in 2f47a9a4dfa3674fad19a49b40c5103a9a8e1589 and
goes away if I revert deb9b13eb2571fbde164ae012c77985fd14f2f02 on top of
that. As deb9b13eb2571fbde164ae012c77985fd14f2f02 seems to be a revert
of 49a7d46a06c30c7beabbf9d1a8ea1de0f9e4fdfe, I suspect this problem
might have existed before 49a7d46a06c30c7beabbf9d1a8ea1de0f9e4fdfe. I
therefore tried to build