Re: [PATCH 4.9 03/32] MIPS: Use async IPIs for arch_trigger_cpumask_backtrace()

2018-07-16 Thread Greg Kroah-Hartman
On Mon, Jul 16, 2018 at 05:29:05PM +0800, 陈华才 wrote:
> Hi, Greg,
> 
> kernel-4.9 doesn't have call_single_data_t, we should use struct 
> call_single_data instead.

Can you send me a patch to merge with this one with that change so that
I know I get it right?

thanks,

greg k-h


Re: [PATCH 4.9 03/32] MIPS: Use async IPIs for arch_trigger_cpumask_backtrace()

2018-07-16 Thread Greg Kroah-Hartman
On Mon, Jul 16, 2018 at 05:29:05PM +0800, 陈华才 wrote:
> Hi, Greg,
> 
> kernel-4.9 doesn't have call_single_data_t, we should use struct 
> call_single_data instead.

Can you send me a patch to merge with this one with that change so that
I know I get it right?

thanks,

greg k-h


Re:[PATCH 4.9 03/32] MIPS: Use async IPIs for arch_trigger_cpumask_backtrace()

2018-07-16 Thread 陈华才
Hi, Greg,

kernel-4.9 doesn't have call_single_data_t, we should use struct 
call_single_data instead.

Huacai
 
-- Original --
From:  "Greg Kroah-Hartman";
Date:  Mon, Jul 16, 2018 03:36 PM
To:  "linux-kernel";
Cc:  "Greg Kroah-Hartman"; 
"stable"; "Paul Burton"; "James 
Hogan"; "Ralf Baechle"; "Huacai 
Chen"; "linux-mips";
Subject:  [PATCH 4.9 03/32] MIPS: Use async IPIs for 
arch_trigger_cpumask_backtrace()
 
4.9-stable review patch.  If anyone has any objections, please let me know.

--

From: Paul Burton 

commit b63e132b6433a41cf311e8bc382d33fd2b73b505 upstream.

The current MIPS implementation of arch_trigger_cpumask_backtrace() is
broken because it attempts to use synchronous IPIs despite the fact that
it may be run with interrupts disabled.

This means that when arch_trigger_cpumask_backtrace() is invoked, for
example by the RCU CPU stall watchdog, we may:

  - Deadlock due to use of synchronous IPIs with interrupts disabled,
causing the CPU that's attempting to generate the backtrace output
to hang itself.

  - Not succeed in generating the desired output from remote CPUs.

  - Produce warnings about this from smp_call_function_many(), for
example:

[42760.526910] INFO: rcu_sched detected stalls on CPUs/tasks:
[42760.535755]  0-...!: (1 GPs behind) idle=ade/140/0 
softirq=526944/526945 fqs=0
[42760.547874]  1-...!: (0 ticks this GP) idle=e4a/140/0 
softirq=547885/547885 fqs=0
[42760.559869]  (detected by 2, t=2162 jiffies, g=266689, c=266688, q=33)
[42760.568927] [ cut here ]
[42760.576146] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:416 
smp_call_function_many+0x88/0x20c
[42760.587839] Modules linked in:
[42760.593152] CPU: 2 PID: 1216 Comm: sh Not tainted 
4.15.4-00373-gee058bb4d0c2 #2
[42760.603767] Stack : 8e09bd20 8e09bd20 8e09bd20 fff0 0007 
0006  8e09bca8
[42760.616937] 95b2b379 95b2b379 807a0080 0007 81944518 
018a 0032 
[42760.630095]  0030 8000  806eca74 
0009 8017e2b8 01a0
[42760.643169]  0002  8e09baa4 0008 
808b8008 86d69080 8e09bca0
[42760.656282] 8e09ad50 805e20aa    
8017e2b8 0009 801070ca
[42760.669424] ...
[42760.673919] Call Trace:
[42760.678672] [<27fde568>] show_stack+0x70/0xf0
[42760.685417] [<84751641>] dump_stack+0xaa/0xd0
[42760.692188] [<699d671c>] __warn+0x80/0x92
[42760.698549] [<68915d41>] warn_slowpath_null+0x28/0x36
[42760.705912] [] smp_call_function_many+0x88/0x20c
[42760.713696] [<6bbdfc2a>] arch_trigger_cpumask_backtrace+0x30/0x4a
[42760.722216] [] rcu_dump_cpu_stacks+0x6a/0x98
[42760.729580] [<796e7629>] rcu_check_callbacks+0x672/0x6ac
[42760.737476] [<059b3b43>] update_process_times+0x18/0x34
[42760.744981] [<6eb94941>] tick_sched_handle.isra.5+0x26/0x38
[42760.752793] [<478d3d70>] tick_sched_timer+0x1c/0x50
[42760.759882] [] __hrtimer_run_queues+0xc6/0x226
[42760.767418] [] hrtimer_interrupt+0x88/0x19a
[42760.775031] [<6765a19e>] gic_compare_interrupt+0x2e/0x3a
[42760.782761] [<0558bf5f>] handle_percpu_devid_irq+0x78/0x168
[42760.790795] [<90c11ba2>] generic_handle_irq+0x1e/0x2c
[42760.798117] [<1b6d462c>] gic_handle_local_int+0x38/0x86
[42760.805545] [] gic_irq_dispatch+0xa/0x14
[42760.812534] [<90c11ba2>] generic_handle_irq+0x1e/0x2c
[42760.820086] [] do_IRQ+0x16/0x20
[42760.826274] [<9aef3ce6>] plat_irq_dispatch+0x62/0x94
[42760.833458] [<6a94b53c>] except_vec_vi_end+0x70/0x78
[42760.840655] [<22284043>] smp_call_function_many+0x1ba/0x20c
[42760.848501] [<54022b58>] smp_call_function+0x1e/0x2c
[42760.855693] [] flush_tlb_mm+0x2a/0x98
[42760.862730] [<0844cdd0>] tlb_flush_mmu+0x1c/0x44
[42760.869628] [] arch_tlb_finish_mmu+0x26/0x3e
[42760.877021] [<1aeaaf74>] tlb_finish_mmu+0x18/0x66
[42760.883907] [] exit_mmap+0x76/0xea
[42760.890428] [] mmput+0x80/0x11a
[42760.896632] [] do_exit+0x1f4/0x80c
[42760.903158] [] do_group_exit+0x20/0x7e
[42760.909990] [<13fa8d54>] __wake_up_parent+0x0/0x1e
[42760.917045] [<46cf89d0>] smp_call_function_many+0x1a2/0x20c
[42760.924893] [<8c21a93b>] syscall_common+0x14/0x1c
[42760.931765] ---[ end trace 02aa09da9dc52a60 ]---
[42760.938342] [ cut here ]
[42760.945311] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:291 
smp_call_function_single+0xee/0xf8
...

This patch switches MIPS' arch_trigger_cpumask_backtrace() to use async
IPIs & smp_call_function_single_async() in order to resolve this
problem. We ensure use of the pre-allocated call_single_data_t
structures is serialized by maintaining a cpumask indicating that
they're busy, and refusing to attempt to send an IPI when a CPU's bit is
set in this mask. This should only happen if 

Re:[PATCH 4.9 03/32] MIPS: Use async IPIs for arch_trigger_cpumask_backtrace()

2018-07-16 Thread 陈华才
Hi, Greg,

kernel-4.9 doesn't have call_single_data_t, we should use struct 
call_single_data instead.

Huacai
 
-- Original --
From:  "Greg Kroah-Hartman";
Date:  Mon, Jul 16, 2018 03:36 PM
To:  "linux-kernel";
Cc:  "Greg Kroah-Hartman"; 
"stable"; "Paul Burton"; "James 
Hogan"; "Ralf Baechle"; "Huacai 
Chen"; "linux-mips";
Subject:  [PATCH 4.9 03/32] MIPS: Use async IPIs for 
arch_trigger_cpumask_backtrace()
 
4.9-stable review patch.  If anyone has any objections, please let me know.

--

From: Paul Burton 

commit b63e132b6433a41cf311e8bc382d33fd2b73b505 upstream.

The current MIPS implementation of arch_trigger_cpumask_backtrace() is
broken because it attempts to use synchronous IPIs despite the fact that
it may be run with interrupts disabled.

This means that when arch_trigger_cpumask_backtrace() is invoked, for
example by the RCU CPU stall watchdog, we may:

  - Deadlock due to use of synchronous IPIs with interrupts disabled,
causing the CPU that's attempting to generate the backtrace output
to hang itself.

  - Not succeed in generating the desired output from remote CPUs.

  - Produce warnings about this from smp_call_function_many(), for
example:

[42760.526910] INFO: rcu_sched detected stalls on CPUs/tasks:
[42760.535755]  0-...!: (1 GPs behind) idle=ade/140/0 
softirq=526944/526945 fqs=0
[42760.547874]  1-...!: (0 ticks this GP) idle=e4a/140/0 
softirq=547885/547885 fqs=0
[42760.559869]  (detected by 2, t=2162 jiffies, g=266689, c=266688, q=33)
[42760.568927] [ cut here ]
[42760.576146] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:416 
smp_call_function_many+0x88/0x20c
[42760.587839] Modules linked in:
[42760.593152] CPU: 2 PID: 1216 Comm: sh Not tainted 
4.15.4-00373-gee058bb4d0c2 #2
[42760.603767] Stack : 8e09bd20 8e09bd20 8e09bd20 fff0 0007 
0006  8e09bca8
[42760.616937] 95b2b379 95b2b379 807a0080 0007 81944518 
018a 0032 
[42760.630095]  0030 8000  806eca74 
0009 8017e2b8 01a0
[42760.643169]  0002  8e09baa4 0008 
808b8008 86d69080 8e09bca0
[42760.656282] 8e09ad50 805e20aa    
8017e2b8 0009 801070ca
[42760.669424] ...
[42760.673919] Call Trace:
[42760.678672] [<27fde568>] show_stack+0x70/0xf0
[42760.685417] [<84751641>] dump_stack+0xaa/0xd0
[42760.692188] [<699d671c>] __warn+0x80/0x92
[42760.698549] [<68915d41>] warn_slowpath_null+0x28/0x36
[42760.705912] [] smp_call_function_many+0x88/0x20c
[42760.713696] [<6bbdfc2a>] arch_trigger_cpumask_backtrace+0x30/0x4a
[42760.722216] [] rcu_dump_cpu_stacks+0x6a/0x98
[42760.729580] [<796e7629>] rcu_check_callbacks+0x672/0x6ac
[42760.737476] [<059b3b43>] update_process_times+0x18/0x34
[42760.744981] [<6eb94941>] tick_sched_handle.isra.5+0x26/0x38
[42760.752793] [<478d3d70>] tick_sched_timer+0x1c/0x50
[42760.759882] [] __hrtimer_run_queues+0xc6/0x226
[42760.767418] [] hrtimer_interrupt+0x88/0x19a
[42760.775031] [<6765a19e>] gic_compare_interrupt+0x2e/0x3a
[42760.782761] [<0558bf5f>] handle_percpu_devid_irq+0x78/0x168
[42760.790795] [<90c11ba2>] generic_handle_irq+0x1e/0x2c
[42760.798117] [<1b6d462c>] gic_handle_local_int+0x38/0x86
[42760.805545] [] gic_irq_dispatch+0xa/0x14
[42760.812534] [<90c11ba2>] generic_handle_irq+0x1e/0x2c
[42760.820086] [] do_IRQ+0x16/0x20
[42760.826274] [<9aef3ce6>] plat_irq_dispatch+0x62/0x94
[42760.833458] [<6a94b53c>] except_vec_vi_end+0x70/0x78
[42760.840655] [<22284043>] smp_call_function_many+0x1ba/0x20c
[42760.848501] [<54022b58>] smp_call_function+0x1e/0x2c
[42760.855693] [] flush_tlb_mm+0x2a/0x98
[42760.862730] [<0844cdd0>] tlb_flush_mmu+0x1c/0x44
[42760.869628] [] arch_tlb_finish_mmu+0x26/0x3e
[42760.877021] [<1aeaaf74>] tlb_finish_mmu+0x18/0x66
[42760.883907] [] exit_mmap+0x76/0xea
[42760.890428] [] mmput+0x80/0x11a
[42760.896632] [] do_exit+0x1f4/0x80c
[42760.903158] [] do_group_exit+0x20/0x7e
[42760.909990] [<13fa8d54>] __wake_up_parent+0x0/0x1e
[42760.917045] [<46cf89d0>] smp_call_function_many+0x1a2/0x20c
[42760.924893] [<8c21a93b>] syscall_common+0x14/0x1c
[42760.931765] ---[ end trace 02aa09da9dc52a60 ]---
[42760.938342] [ cut here ]
[42760.945311] WARNING: CPU: 2 PID: 1216 at kernel/smp.c:291 
smp_call_function_single+0xee/0xf8
...

This patch switches MIPS' arch_trigger_cpumask_backtrace() to use async
IPIs & smp_call_function_single_async() in order to resolve this
problem. We ensure use of the pre-allocated call_single_data_t
structures is serialized by maintaining a cpumask indicating that
they're busy, and refusing to attempt to send an IPI when a CPU's bit is
set in this mask. This should only happen if