On Mon, Jul 06, 2020 at 02:49:41PM -0400, Qian Cai wrote:
> On Sun, Jul 05, 2020 at 10:37:03AM -0700, Paul E. McKenney wrote:
> > Good catch, but someone beat you to it.  This commit contains the fix:
> > 
> > 0504bc41a62c ("kernel/smp: Provide CSD lock timeout diagnostics")
> 
> Well, I can still reproduce this on next-20200706 which contains the said fix.
> 
> CSD_LOCK_WAIT_DEBUG=n

Indeed you can, good catch, and thank you!

There was a csd_lock_record(csd) that instead needed to be
csd_lock_record(NULL).  A fix is in progress.

                                                        Thanx, Paul

> commit 0504bc41a62c4a42b9316244da7208feca7295cb
> Author: Paul E. McKenney <[email protected]>
> Date:   Tue Jun 30 13:22:54 2020 -0700
> 
>     kernel/smp: Provide CSD lock timeout diagnostics
> 
>     This commit causes csd_lock_wait() to emit diagnostics when a CPU fails
>     to respond quickly enough to one of the smp_call_function() family of
>     function calls.  These diagnostics include NMI stack traces, and so the
>     exclusion of idle CPUs is also removed.  These diagnostics are enabled
>     by a new CSD_LOCK_WAIT_DEBUG Kconfig option that depends on DEBUG_KERNEL.
> 
>     This commit was inspired by an earlier patch by Josef Bacik.
> 
>     [ paulmck: Avoid 64-bit divides per kernel test robot feedback. ]
>     [ paulmck: Fix for [email protected] ]
>     Link: https://lore.kernel.org/lkml/[email protected]
>     Link: https://lore.kernel.org/lkml/[email protected]
>     Cc: Peter Zijlstra <[email protected]>
>     Cc: Ingo Molnar <[email protected]>
>     Cc: Thomas Gleixner <[email protected]>
>     Cc: Sebastian Andrzej Siewior <[email protected]>
>     Signed-off-by: Paul E. McKenney <[email protected]>
> 
> [19929.567055][    T0] BUG: KASAN: out-of-bounds in 
> flush_smp_call_function_queue+0x65f/0x7c0
> csd_lock_record at kernel/smp.c:119
> (inlined by) flush_smp_call_function_queue at kernel/smp.c:395
> [19929.575391][    T0] Read of size 8 at addr ffffc900320879b8 by task 
> swapper/35/0
> [19929.582845][    T0] 
> [19929.585060][    T0] CPU: 35 PID: 0 Comm: swapper/35 Tainted: G           O 
>      5.8.0-rc3-next-20200706 #1
> [19929.594784][    T0] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 
> Gen10, BIOS A40 07/10/2019
> [19929.604072][    T0] Call Trace:
> [19929.607253][    T0]  dump_stack+0x9d/0xe0
> [19929.611304][    T0]  ? flush_smp_call_function_queue+0x65f/0x7c0
> [19929.617355][    T0]  ? flush_smp_call_function_queue+0x65f/0x7c0
> [19929.623415][    T0]  
> print_address_description.constprop.8.cold.9+0x56/0x4fc
> [19929.630521][    T0]  ? log_store.cold.32+0x11/0x11
> [19929.635353][    T0]  ? lock_downgrade+0x720/0x720
> [19929.640097][    T0]  ? nr_iowait_cpu+0x78/0xf0
> [19929.644576][    T0]  ? flush_smp_call_function_queue+0x65f/0x7c0
> [19929.650625][    T0]  ? flush_smp_call_function_queue+0x65f/0x7c0
> [19929.656674][    T0]  kasan_report.cold.10+0x37/0x7c
> [19929.661587][    T0]  ? flush_smp_call_function_queue+0x65f/0x7c0
> [19929.667647][    T0]  flush_smp_call_function_queue+0x65f/0x7c0
> [19929.673535][    T0]  flush_smp_call_function_from_idle+0x41/0x71
> [19929.679598][    T0]  do_idle+0x2d6/0x4f0
> [19929.683557][    T0]  ? arch_cpu_idle_exit+0x40/0x40
> [19929.688480][    T0]  cpu_startup_entry+0x14/0x16
> [19929.693143][    T0]  secondary_startup_64+0xb6/0xc0
> [19929.698059][    T0] 
> [19929.700270][    T0] 
> [19929.702476][    T0] Memory state around the buggy address:
> [19929.708007][    T0]  ffffc90032087880: 00 00 00 00 00 00 00 00 00 00 00 00 
> 00 00 00 00
> [19929.715986][    T0]  ffffc90032087900: 00 00 f2 f2 00 00 00 00 00 00 00 00 
> 00 00 00 00
> [19929.723963][    T0] >ffffc90032087980: 00 00 00 00 00 00 00 00 00 00 f1 f1 
> f1 f1 00 00
> [19929.731940][    T0]                                            ^
> [19929.737999][    T0]  ffffc90032087a00: 00 00 00 f2 f2 f2 00 00 00 00 00 00 
> 00 00 00 00
> [19929.745982][    T0]  ffffc90032087a80: 00 00 00 00 00 00 00 00 00 00 00 00 
> 00 00 00 00

Reply via email to