Re: ppc64le: `NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!` when turning off SMT
On Tue, Feb 08, 2022 at 02:17:03PM +0100, Frederic Weisbecker wrote: > On Tue, Feb 08, 2022 at 08:32:37AM +0100, Paul Menzel wrote: > > once warned about a NOHZ tick-stop error, when I executed `sudo > > /usr/sbin/ppc64_cpu --smt=off` (so that KVM would work). > > I see, so I assume this sets some CPUs offline, right? ppc64_cpu --smt=off sets all but the first CPU per core offline. PC
Re: ppc64le: `NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!` when turning off SMT
On Tue, Feb 08, 2022 at 08:32:37AM +0100, Paul Menzel wrote: > Dear Linux folks, > > > On the POWER8 server IBM S822LC running Ubuntu 21.10, Linux 5.17-rc1+ built > with > > $ grep HZ /boot/config-5.17.0-rc1+ > CONFIG_NO_HZ_COMMON=y > # CONFIG_HZ_PERIODIC is not set > CONFIG_NO_HZ_IDLE=y > # CONFIG_NO_HZ_FULL is not set > CONFIG_NO_HZ=y > # CONFIG_HZ_100 is not set > CONFIG_HZ_250=y > # CONFIG_HZ_300 is not set > # CONFIG_HZ_1000 is not set > CONFIG_HZ=250 > > once warned about a NOHZ tick-stop error, when I executed `sudo > /usr/sbin/ppc64_cpu --smt=off` (so that KVM would work). I see, so I assume this sets some CPUs offline, right? > > ``` > $ dmesg > [0.00] Linux version 5.17.0-rc1+ > (pmen...@flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (Ubuntu clang > version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28 17:13:04 CET 2022 > […] > [271272.030262] NOHZ tick-stop error: Non-RCU local softirq work is pending, > handler #20!!! > [271272.305726] NOHZ tick-stop error: Non-RCU local softirq work is pending, > handler #20!!! > [271272.549790] NOHZ tick-stop error: Non-RCU local softirq work is pending, > handler #20!!! > [271274.885167] NOHZ tick-stop error: Non-RCU local softirq work is pending, > handler #20!!! > [271275.113896] NOHZ tick-stop error: Non-RCU local softirq work is pending, > handler #20!!! > [271275.412902] NOHZ tick-stop error: Non-RCU local softirq work is pending, > handler #20!!! > [271275.625245] NOHZ tick-stop error: Non-RCU local softirq work is pending, > handler #20!!! > [271275.833107] NOHZ tick-stop error: Non-RCU local softirq work is pending, > handler #20!!! > [271276.041391] NOHZ tick-stop error: Non-RCU local softirq work is pending, > handler #20!!! > [271277.244880] NOHZ tick-stop error: Non-RCU local softirq work is pending, > handler #20!!! > ``` That's IRQ_POLL_SOFTIRQ. The problem here is probably that some of these softirqs are pending even though ksoftirqd has been parked. I see there is irq_poll_cpu_dead() that migrates the pending queue once the CPU is finally dead, so this is well handled. I'm preparing a patch to fix the warning. Thanks.
ppc64le: `NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!` when turning off SMT
Dear Linux folks, On the POWER8 server IBM S822LC running Ubuntu 21.10, Linux 5.17-rc1+ built with $ grep HZ /boot/config-5.17.0-rc1+ CONFIG_NO_HZ_COMMON=y # CONFIG_HZ_PERIODIC is not set CONFIG_NO_HZ_IDLE=y # CONFIG_NO_HZ_FULL is not set CONFIG_NO_HZ=y # CONFIG_HZ_100 is not set CONFIG_HZ_250=y # CONFIG_HZ_300 is not set # CONFIG_HZ_1000 is not set CONFIG_HZ=250 once warned about a NOHZ tick-stop error, when I executed `sudo /usr/sbin/ppc64_cpu --smt=off` (so that KVM would work). ``` $ dmesg [0.00] Linux version 5.17.0-rc1+ (pmen...@flughafenberlinbrandenburgwillybrandt.molgen.mpg.de) (Ubuntu clang version 13.0.0-2, LLD 13.0.0) #1 SMP Fri Jan 28 17:13:04 CET 2022 […] [271272.030262] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!! [271272.305726] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!! [271272.549790] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!! [271274.885167] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!! [271275.113896] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!! [271275.412902] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!! [271275.625245] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!! [271275.833107] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!! [271276.041391] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!! [271277.244880] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!! ``` Kind regards, Paul