kernel test robot wrote: > > > Hello, > > FYI. we don't have enough knowledge to understand how the issues we found > in the tests are related with the code. we just run the tests up to 200 times > for both this commit and parent, noticed there are various random issues on > this commit, but always clean on parent. > > > ========================================================================================= > tbox_group/testcase/rootfs/kconfig/compiler/sleep: > > vm-snb/boot/debian-11.1-i386-20220923.cgz/i386-randconfig-141-20260117/gcc-14/1 > > 29317f8dc6ed601e bc62f5b308cbdedf29132fe96e9 > ---------------- --------------------------- > fail:runs %reproduction fail:runs > | | | > :200 2% 5:200 > dmesg.BUG:soft_lockup-CPU##stuck_for#s![kworker##:#] > :200 2% 5:200 > dmesg.BUG:soft_lockup-CPU##stuck_for#s![kworker:#:#] > :200 8% 17:200 > dmesg.BUG:soft_lockup-CPU##stuck_for#s![swapper:#] > :200 2% 4:200 dmesg.BUG:workqueue_lockup-pool > :200 0% 1:200 dmesg.EIP:__schedule > :200 0% 1:200 dmesg.EIP:_raw_spin_unlock_irq > :200 2% 4:200 > dmesg.EIP:_raw_spin_unlock_irqrestore > :200 6% 11:200 > dmesg.EIP:console_emit_next_record > :200 0% 1:200 dmesg.EIP:finish_task_switch > :200 3% 6:200 dmesg.EIP:lock_acquire > :200 1% 2:200 dmesg.EIP:lock_release > :200 1% 2:200 dmesg.EIP:queue_work_on > :200 0% 1:200 > dmesg.EIP:rcu_preempt_deferred_qs_irqrestore > :200 1% 2:200 dmesg.EIP:timekeeping_notify > :200 0% 1:200 > dmesg.INFO:rcu_preempt_detected_stalls_on_CPUs/tasks > :200 0% 1:200 > dmesg.INFO:task_blocked_for_more_than#seconds > :200 14% 27:200 > dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks > > below is full report.
So this is good data, but I do not know what to do with it. The RCU_STRICT_GRACE_PERIOD feature seems to want to make RCU usage bugs more detectable, but at the risk of false positives. My concern is that this patch disturbs 32-bit x86 builds just enough to make the softlockup detector start getting upset about this rcu_gp::strict_work_handler workqueue. So unless this causes actual boot failures all I can assume is that this is a false positive report. Nothing in this patch is touching workqueues or object lifetime issues. So I can only assume this is a side effect of instruction cache layout, or similar.

