On Thu, Feb 13, 2025 at 8:39 PM Yafang Shao <[email protected]> wrote:
>
> On Thu, Feb 13, 2025 at 8:32 PM Yafang Shao <[email protected]> wrote:
> >
> > On Thu, Feb 13, 2025 at 7:19 PM Petr Mladek <[email protected]> wrote:
> > >
> > > On Tue 2025-02-11 14:24:36, Yafang Shao wrote:
> > > > I encountered a hard lockup when attempting to reproduce the panic issue
> > > > occurred on our production servers [0]. The hard lockup is as follows,
> > > >
> > > > [15852778.150191] livepatch: klp_try_switch_task: grpc_executor:421106
> > > > is sleeping on function do_exit
> > > > [15852778.169471] livepatch: klp_try_switch_task: grpc_executor:421244
> > > > is sleeping on function do_exit
> > > > [15852778.188746] livepatch: klp_try_switch_task: grpc_executor:421457
> > > > is sleeping on function do_exit
> > > > [15852778.208021] livepatch: klp_try_switch_task: grpc_executor:422407
> > > > is sleeping on function do_exit
> > > > [15852778.227292] livepatch: klp_try_switch_task: grpc_executor:423184
> > > > is sleeping on function do_exit
> > > > [15852778.246576] livepatch: klp_try_switch_task: grpc_executor:423582
> > > > is sleeping on function do_exit
> > > > [15852778.265863] livepatch: klp_try_switch_task: grpc_executor:423738
> > > > is sleeping on function do_exit
> > > > [15852778.285149] livepatch: klp_try_switch_task: grpc_executor:423739
> > > > is sleeping on function do_exit
> > > > [15852778.304446] livepatch: klp_try_switch_task: grpc_executor:423833
> > > > is sleeping on function do_exit
> > > > [15852778.323738] livepatch: klp_try_switch_task: grpc_executor:423893
> > > > is sleeping on function do_exit
> > > > [15852778.343017] livepatch: klp_try_switch_task: grpc_executor:423894
> > > > is sleeping on function do_exit
> > > > [15852778.362292] livepatch: klp_try_switch_task: grpc_executor:423976
> > > > is sleeping on function do_exit
> > > > [15852778.381565] livepatch: klp_try_switch_task: grpc_executor:423977
> > > > is sleeping on function do_exit
> > > > [15852778.400847] livepatch: klp_try_switch_task: grpc_executor:424610
> > > > is sleeping on function do_exit
> > >
> > > This message does not exist in vanilla kernel. It looks like an extra
> > > debug message. And many extra messages might create stalls on its own.
> >
> > Right, the dynamic_debug is enabled:
> >
> > $ echo 'file kernel/* +p' > /sys/kernel/debug/dynamic_debug/control
> >
> > >
> > > AFAIK, your reproduced the problem even without these extra messages.
> >
> > There are two issues during the KLP transition:
> > 1. RCU warnings
> > 2. hard lockup
> >
> > RCU stalls can be easily reproduced without the extra messages.
> > However, hard lockups cannot be reproduced unless dynamic debugging is
> > enabled.
> >
> > > At least, I see the following at
> > > https://lore.kernel.org/r/caloahbb8j6rrpjayrkzpx2u6yhjweiprspoqq_7cvq+m0zg...@mail.gmail.com
> >
> > That's correct, this is related to the RCU warnings issue.
> >
> > >
> > > <paste>
> > > [20329703.332453] livepatch: enabling patch 'livepatch_61_release6'
> > > [20329703.340417] livepatch: 'livepatch_61_release6': starting
> > > patching transition
> > > [20329715.314215] rcu_tasks_wait_gp: rcu_tasks grace period 1109765 is
> > > 10166 jiffies old.
> > > [20329737.126207] rcu_tasks_wait_gp: rcu_tasks grace period 1109769 is
> > > 10219 jiffies old.
> > > [20329752.018236] rcu_tasks_wait_gp: rcu_tasks grace period 1109773 is
> > > 10199 jiffies old.
> > > [20329754.848036] livepatch: 'livepatch_61_release6': patching complete
> > > </paste>
> > >
> > > Could you please confirm that this the original _non-filtered_ log?
> >
> > Right.
> >
> > > I mean that the debug messages were _not_ printed and later filtered?
> >
> > Right.
> >
> > >
> > > I would like to know more about the system where RCU reported the
> > > stall. How many processes are running there in average?
> > > A rough number is enough. I wonder if it is about 1000, 10000, or
> > > 50000?
> >
> > Most of the servers have between 5,000 and 10,000 threads.
> >
> > >
> > > Also what is the CONFIG_HZ value, please?
> >
> > CONFIG_HZ_PERIODIC=y
> > CONFIG_HZ_1000=y
> > CONFIG_HZ=1000
> >
> > >
> > > Also we should get some statistics how long klp_try_switch_task()
> > > lasts in average. I have never did it but I guess that
> > > it should be rather easy with perf. Or maybe just by looking
> > > at function_graph trace.
> >
> > Currently, on one of my production servers, CPU usage is around 40%,
> > and the number of threads is approximately 9,100. The livepatch is
> > applied every 5 seconds. We can adjust the frequency, but the
> > difference won't be significant, as RCU stalls are easy to reproduce
> > even at very low frequencies.
> > The duration of klp_try_switch_task() is as follows:
> >
> > $ /usr/share/bcc/tools/funclatency klp_try_switch_task.part.0 -d 60
> > Tracing 1 functions for "klp_try_switch_task.part.0"... Hit Ctrl-C to end.
> > nsecs : count distribution
> > 0 -> 1 : 0 |
> > |
> > 2 -> 3 : 0 |
> > |
> > 4 -> 7 : 0 |
> > |
> > 8 -> 15 : 0 |
> > |
> > 16 -> 31 : 0 |
> > |
> > 32 -> 63 : 0 |
> > |
> > 64 -> 127 : 0 |
> > |
> > 128 -> 255 : 0 |
> > |
> > 256 -> 511 : 0 |
> > |
> > 512 -> 1023 : 1 |
> > |
> > 1024 -> 2047 : 26665 |***********
> > |
> > 2048 -> 4095 : 93834
> > |****************************************|
> > 4096 -> 8191 : 2695 |*
> > |
> > 8192 -> 16383 : 268 |
> > |
> > 16384 -> 32767 : 24 |
> > |
> > 32768 -> 65535 : 2 |
> > |
> >
> > avg = 2475 nsecs, total: 305745369 nsecs, count: 123489
> >
> > >
> > > I would like to be more sure that klp_try_complete_transition() really
> > > could block RCU for that long. I would like to confirm that
> > > the following is the reality:
> > >
> > > num_processes * average_klp_try_switch_task > 10second
> >
> > 9100 * 2.475 / 1000 is around 22 second
>
> It looks like I misread the nsec as usec, so the total average
> duration is actually around 22ms. ;)
> I’ll double-check it to confirm.
I've confirmed that during RCU stalls, the average_klp_try_switch_task
value isn't unusually large. However, the real issue lies in the
extended duration of the klp_try_complete_transition().
- The RCU stall
[Fri Feb 14 10:14:56 2025] livepatch: enabling patch 'livepatch_61_release6'
[Fri Feb 14 10:14:56 2025] livepatch: 'livepatch_61_release6':
starting patching transition
[Fri Feb 14 10:15:10 2025] rcu_tasks_wait_gp: rcu_tasks grace period
32001 is 10073 jiffies old.
[Fri Feb 14 10:15:14 2025] livepatch: 'livepatch_61_release6': patching complete
- klp_try_switch_task
10:17:50
nsecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 0 | |
64 -> 127 : 0 | |
128 -> 255 : 2 | |
256 -> 511 : 5 | |
512 -> 1023 : 3 | |
1024 -> 2047 : 806 |* |
2048 -> 4095 : 30382 |****************************************|
4096 -> 8191 : 10806 |************** |
8192 -> 16383 : 65 | |
16384 -> 32767 : 9 | |
avg = 3562 nsecs, total: 149912461 nsecs, count: 42078
The avg of klp_try_switch_task is only 3.5us.
- number of threads
Fri Feb 14 10:14:48
12106
Fri Feb 14 10:14:59
11900
Fri Feb 14 10:15:10
12004
Fri Feb 14 10:15:20
11899
Fri Feb 14 10:15:31
12293
Fri Feb 14 10:15:42
11831
At that point, the system has around 12,000 threads, so the total
duration comes out to approximately 12,000 * 3.5µs, or about 42ms—well
below the 10s threshold.
However, the duration of klp_try_complete_transition() exceeds 10 seconds.
10:15:41
nsecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 0 | |
64 -> 127 : 0 | |
128 -> 255 : 0 | |
256 -> 511 : 0 | |
512 -> 1023 : 0 | |
1024 -> 2047 : 0 | |
2048 -> 4095 : 0 | |
4096 -> 8191 : 0 | |
8192 -> 16383 : 0 | |
16384 -> 32767 : 0 | |
32768 -> 65535 : 0 | |
65536 -> 131071 : 0 | |
131072 -> 262143 : 0 | |
262144 -> 524287 : 0 | |
524288 -> 1048575 : 0 | |
1048576 -> 2097151 : 0 | |
2097152 -> 4194303 : 0 | |
4194304 -> 8388607 : 0 | |
8388608 -> 16777215 : 0 | |
16777216 -> 33554431 : 0 | |
33554432 -> 67108863 : 1 |********************|
67108864 -> 134217727 : 1 |********************|
134217728 -> 268435455 : 0 | |
268435456 -> 536870911 : 0 | |
536870912 -> 1073741823 : 0 | |
1073741824 -> 2147483647 : 0 | |
2147483648 -> 4294967295 : 0 | |
4294967296 -> 8589934591 : 1 |********************|
8589934592 -> 17179869183 : 1 |********************|
avg = 5630258522 nsecs, total: 22521034091 nsecs, count: 4
The longest duration of klp_try_complete_transition() ranges from 8.5
to 17.2 seconds.
It appears that the RCU stall is not only driven by num_processes *
average_klp_try_switch_task, but also by contention within
klp_try_complete_transition(), particularly around the tasklist_lock.
Interestingly, even after replacing "read_lock(&tasklist_lock)" with
"rcu_read_lock()", the RCU stall persists. My verification shows that
the only way to prevent the stall is by checking need_resched() during
each iteration of the loop.
--
Regards
Yafang