Re: [BISECTED] power8: watchdog: CPU 3 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x154/0x2d0
On 25/12/2021 12:31, Nicholas Piggin wrote: Excerpts from Stijn Tintel's message of December 22, 2021 11:20 am: Hi, After upgrading my Power8 server from 5.10 LTS to 5.15 LTS, I started experiencing CPU hard lockups, usually rather quickly after boot: watchdog: CPU 3 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x154/0x2d0 watchdog: CPU 3 TB:265651929071, last heartbeat TB:259344820187 (12318ms ago) snip Bisecting lead to the following commit: deb9b13eb2571fbde164ae012c77985fd14f2f02 is the first bad commit commit deb9b13eb2571fbde164ae012c77985fd14f2f02 Author: Davidlohr Bueso Date: Mon Mar 8 17:59:50 2021 -0800 powerpc/qspinlock: Use generic smp_cond_load_relaxed Thanks for bisecting and reporting this. Thanks for your response, much appreciated. As far as I can see, the code should be functionally identical, the difference is slightly in loop structure and priority nops but that shouldn't cause complete lock ups. I suspect possibly something is getting miscompiled. What distro do you use, what gcc version? And would you be able to send the output of objdump --disassemble=queued_spin_lock_slowpath vmlinux for your bad kernel? Gentoo hardened musl, both gcc 10.3.0 and 11.2.0 exhibit the lockups. /boot/disable/vmlinuz-5.12.0-rc3-ppc64le-00024-gdeb9b13eb257: file format elf64-powerpcle Disassembly of section .head.text: Disassembly of section .text: c010d0d4 : c010d0d4: e9 00 4c 3c addis r2,r12,233 c010d0d8: 2c f3 42 38 addi r2,r2,-3284 c010d0dc: 00 01 04 28 cmplwi r4,256 c010d0e0: 2c 00 82 40 bne c010d10c c010d0e4: 01 02 20 39 li r9,513 c010d0e8: a6 03 29 7d mtctr r9 c010d0ec: 02 00 83 e8 lwa r4,0(r3) c010d0f0: 00 01 04 2c cmpwi r4,256 c010d0f4: 14 00 82 40 bne c010d108 c010d0f8: 10 00 40 42 bdz c010d108 c010d0fc: 78 0b 21 7c mr r1,r1 c010d100: 78 13 42 7c mr r2,r2 c010d104: e8 ff ff 4b b c010d0ec c010d108: 20 00 84 78 clrldi r4,r4,32 c010d10c: 2e 00 84 54 rlwinm r4,r4,0,0,23 c010d110: 00 00 04 2c cmpwi r4,0 c010d114: 38 00 82 40 bne c010d14c c010d118: 00 01 40 39 li r10,256 c010d11c: 28 18 20 7d lwarx r9,0,r3 c010d120: 78 4b 48 7d or r8,r10,r9 c010d124: 2d 19 00 7d stwcx. r8,0,r3 c010d128: f4 ff c2 40 bne- c010d11c c010d12c: 2c 01 00 4c isync c010d130: 2e 00 28 55 rlwinm r8,r9,0,0,23 c010d134: 20 00 2a 79 clrldi r10,r9,32 c010d138: 00 00 08 2c cmpwi r8,0 c010d13c: e4 00 82 41 beq c010d220 c010d140: 00 ff 29 71 andi. r9,r9,65280 c010d144: 08 00 82 40 bne c010d14c c010d148: 01 00 23 99 stb r9,1(r3) c010d14c: 28 00 2d e9 ld r9,40(r13) c010d150: cf ff 42 3d addis r10,r2,-49 c010d154: 01 00 00 39 li r8,1 c010d158: 80 15 4a 39 addi r10,r10,5504 c010d15c: 78 53 46 7d mr r6,r10 c010d160: 14 4a c6 7c add r6,r6,r9 c010d164: 0e 00 26 e9 lwa r9,12(r6) c010d168: 01 00 e9 38 addi r7,r9,1 c010d16c: 03 00 09 2c cmpwi r9,3 c010d170: 0c 00 e6 90 stw r7,12(r6) c010d174: 00 00 ed a0 lhz r7,0(r13) c010d178: e4 00 81 41 bgt c010d25c c010d17c: e4 26 20 79 rldicr r0,r9,4,59 c010d180: 14 02 66 7d add r11,r6,r0 c010d184: 00 00 00 39 li r8,0 c010d188: 08 00 0b 91 stw r8,8(r11) c010d18c: 00 00 00 39 li r8,0 c010d190: 2a 01 06 7d stdx r8,r6,r0 c010d194: 00 00 03 81 lwz r8,0(r3) c010d198: b5 07 08 7d extsw. r8,r8 c010d19c: 04 01 82 41 beq c010d2a0 c010d1a0: 01 00 e7 38 addi r7,r7,1 c010d1a4: 1e 80 29 55 rlwinm r9,r9,16,0,15 c010d1a8: f8 ff e1 fb std r31,-8(r1) c010d1ac: 1a 90 e7 54 rlwinm r7,r7,18,0,13 c010d1b0: 78 3b 27 7d or r7,r9,r7 c010d1b4: ac 04 20 7c lwsync c010d1b8: 00 00 80 38 li r4,0 c010d1bc: 02 00 03 39 addi r8,r3,2 c010d1c0: f8 1e 0c 55 rlwinm r12,r8,3,27,28 c010d1c4: 3e 84 e5 54 rlwinm r5,r7,16,16,31 c
[BISECTED] power8: watchdog: CPU 3 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x154/0x2d0
Hi, After upgrading my Power8 server from 5.10 LTS to 5.15 LTS, I started experiencing CPU hard lockups, usually rather quickly after boot: watchdog: CPU 3 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x154/0x2d0 watchdog: CPU 3 TB:265651929071, last heartbeat TB:259344820187 (12318ms ago) watchdog: CPU 4 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x22c/0x2d0 watchdog: CPU 4 TB:265651929059, last heartbeat TB:259344820045 (12318ms ago) watchdog: CPU 5 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x230/0x2d0 watchdog: CPU 5 TB:265651929037, last heartbeat TB:259349940303 (12308ms ago) watchdog: CPU 6 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x144/0x2d0 watchdog: CPU 6 TB:265651929056, last heartbeat TB:259349940294 (12308ms ago) watchdog: CPU 12 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x280/0x2d0 watchdog: CPU 12 TB:242479050267, last heartbeat TB:236822174350 (11048ms ago) watchdog: CPU 26 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x22c/0x2d0 watchdog: CPU 26 TB:265657049348, last heartbeat TB:259355060595 (12308ms ago) watchdog: CPU 40 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x230/0x2d0 watchdog: CPU 40 TB:265657049289, last heartbeat TB:259360180427 (12298ms ago) watchdog: CPU 47 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x21c/0x2d0 watchdog: CPU 47 TB:265657049213, last heartbeat TB:259365300321 (12288ms ago) watchdog: CPU 60 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x230/0x2d0 watchdog: CPU 60 TB:265651929348, last heartbeat TB:259370420527 (12268ms ago) watchdog: CPU 72 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x230/0x2d0 watchdog: CPU 72 TB:265718488733, last heartbeat TB:259375540545 (12388ms ago) watchdog: CPU 13 detected hard LOCKUP on other CPUs 0-2,7,10,44 watchdog: CPU 13 TB:267541867921, last SMP heartbeat TB:259380660378 (15939ms ago) watchdog: CPU 34 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x230/0x2d0 watchdog: CPU 34 TB:269913954376, last heartbeat TB:263456144470 (12612ms ago) watchdog: CPU 41 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x230/0x2d0 watchdog: CPU 41 TB:267865972392, last heartbeat TB:261408162383 (12612ms ago) watchdog: CPU 74 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x230/0x2d0 watchdog: CPU 74 TB:267766470637, last heartbeat TB:261423522630 (12388ms ago) watchdog: CPU 8 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x230/0x2d0 watchdog: CPU 8 TB:274978264599, last heartbeat TB:269237436681 (11212ms ago) watchdog: CPU 9 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x230/0x2d0 watchdog: CPU 9 TB:268029810836, last heartbeat TB:261397922093 (12952ms ago) watchdog: CPU 11 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x230/0x2d0 watchdog: CPU 11 TB:279685725759, last heartbeat TB:273685814104 (11718ms ago) watchdog: CPU 16 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x230/0x2d0 watchdog: CPU 16 TB:267865972449, last heartbeat TB:261397922458 (12632ms ago) watchdog: CPU 18 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x230/0x2d0 watchdog: CPU 18 TB:269913954314, last heartbeat TB:263445904285 (12632ms ago) watchdog: CPU 24 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x230/0x2d0 watchdog: CPU 24 TB:267865972338, last heartbeat TB:261403042311 (12622ms ago) watchdog: CPU 31 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x22c/0x2d0 watchdog: CPU 31 TB:268029811095, last heartbeat TB:261403042673 (12942ms ago) watchdog: CPU 32 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x230/0x2d0 watchdog: CPU 32 TB:267865972528, last heartbeat TB:261403042589 (12622ms ago) watchdog: CPU 33 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x230/0x2d0 watchdog: CPU 33 TB:268029811013, last heartbeat TB:261408162474 (12932ms ago) watchdog: CPU 35 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x230/0x2d0 watchdog: CPU 35 TB:280174344471, last heartbeat TB:273696054625 (12652ms ago) watchdog: CPU 37 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x230/0x2d0 watchdog: CPU 37 TB:269913954356, last heartbeat TB:263456144501 (12612ms ago) watchdog: CPU 38 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x228/0x2d0 watchdog: CPU 38 TB:290393774681, last heartbeat TB:283946212510 (12592ms ago) Bisecting lead to the following commit: deb9b13eb2571fbde164ae012c77985fd14f2f02 is the first bad commit commit deb9b13eb2571fbde164ae012c77985fd14f2f02 Author: Davidlohr Bueso Date: Mon Mar 8 17:59:50 2021 -0800 powerpc/qspinlock: Use generic smp_cond_load_relaxed The problem persists in 2f47a9a4dfa3674fad19a49b40c5103a9a8e1589 and goes away if I revert deb9b13eb2571fbde164ae012c77985fd14f2f02 on top of that. As deb9b13eb2571fbde164ae012c77985fd14f2f02 seems to be a revert of 49a7d46a06c30c7beabbf9d1a8ea1de0f9e4fdfe, I suspect this problem might have existed before 49a7d46a06c30c7beabbf9d1a8ea1de0f9e4fdfe. I therefore tried to build 49a7d46a06c30c7