Re: [RFC] how to perform a safe NMI stack trace on all CPUs on x86?

2015-05-14 Thread long.wanglong
On 2015/5/13 22:26, Jiri Kosina wrote:
> On Wed, 13 May 2015, 王龙 wrote:
> 
>> Hi all,
>>
>> In kernel before 3.19, when trigger_all_cpu_backtrace() is called on x86, 
>> it will trigger an NMI on each CPU and call show_regs(). But this can lead
>> to a hard lock up if the NMI comes in on another printk().
>>
>> The commit a9edc88093287183ac934be44f295f183b2c62dd (x86/nmi: Perform a safe 
>> NMI stack trace on all CPUs) fix this problem on kernel mainline. when the 
>> NMI 
>> triggers, it switches the printk routine for that CPU to call a NMI safe 
>> printk 
>> function that records the printk in a per_cpu seq_buf descriptor. After all 
>> NMIs have finished recording its data, the seq_bufs are printed in a safe 
>> context. But how do we fix this problem in older version of kernel(eg, 3.10 
>> stable)? 
>> The 3.10 stable has no "switch printk routine" and "seq_buf" infrastructures.
>>
>> Could anyone give me some ideas?
> 
> Either you backport seq_buf-based aproach to the older kernel, or, if you 
> are working on 3.4 kernel or earlier (basically any kernel preceeding the 
> printk() revamp that happened in 7ff9554bb57 and after), you can use 
> slightly simpler aproach.
> 
> It's an aproach we used initially when finding out the issue for the first 
> time, and it is proven to work as well (but it's not applicable after Kay 
> added all the complexity to printk()).
> 
> You can see it in our SLE11 kernel tree, available on
>   
>   
> http://kernel.suse.com/cgit/kernel/commit/?h=SLE11-SP4=8d62ae68ff61d77ae3c4899f05dbd9c9742b14c9
> 
> for example.
> 
> It's up to you to judget which is the least painful way :)
> 

Hi Jiri Kosina,

For 3.10 stable, the only way to solve this problem is backport seq_buf-based 
aproach.

I will backport necessary patches to 3.10 stable. Welcome you to review my 
backport patches.

Best Regards
Wang Long




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] how to perform a safe NMI stack trace on all CPUs on x86?

2015-05-14 Thread long.wanglong
On 2015/5/13 22:22, Steven Rostedt wrote:
> On Wed, 13 May 2015 22:14:54 +0800
> "王龙"  wrote:
> 
> 
>> context. But how do we fix this problem in older version of kernel(eg, 3.10 
>> stable)? 
>> The 3.10 stable has no "switch printk routine" and "seq_buf" infrastructures.
>>
>> Could anyone give me some ideas?
>>
> 
> Backport the necessary patches.
> 
> -- Steve
> 
Hi Steve,

Thank you for your reply, I will backport necessary patches to 3.10 stable.
Welcome you to review my backport patches.

Best Regards
Wang Long
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] how to perform a safe NMI stack trace on all CPUs on x86?

2015-05-14 Thread long.wanglong
On 2015/5/13 22:26, Jiri Kosina wrote:
 On Wed, 13 May 2015, 王龙 wrote:
 
 Hi all,

 In kernel before 3.19, when trigger_all_cpu_backtrace() is called on x86, 
 it will trigger an NMI on each CPU and call show_regs(). But this can lead
 to a hard lock up if the NMI comes in on another printk().

 The commit a9edc88093287183ac934be44f295f183b2c62dd (x86/nmi: Perform a safe 
 NMI stack trace on all CPUs) fix this problem on kernel mainline. when the 
 NMI 
 triggers, it switches the printk routine for that CPU to call a NMI safe 
 printk 
 function that records the printk in a per_cpu seq_buf descriptor. After all 
 NMIs have finished recording its data, the seq_bufs are printed in a safe 
 context. But how do we fix this problem in older version of kernel(eg, 3.10 
 stable)? 
 The 3.10 stable has no switch printk routine and seq_buf infrastructures.

 Could anyone give me some ideas?
 
 Either you backport seq_buf-based aproach to the older kernel, or, if you 
 are working on 3.4 kernel or earlier (basically any kernel preceeding the 
 printk() revamp that happened in 7ff9554bb57 and after), you can use 
 slightly simpler aproach.
 
 It's an aproach we used initially when finding out the issue for the first 
 time, and it is proven to work as well (but it's not applicable after Kay 
 added all the complexity to printk()).
 
 You can see it in our SLE11 kernel tree, available on
   
   
 http://kernel.suse.com/cgit/kernel/commit/?h=SLE11-SP4id=8d62ae68ff61d77ae3c4899f05dbd9c9742b14c9
 
 for example.
 
 It's up to you to judget which is the least painful way :)
 

Hi Jiri Kosina,

For 3.10 stable, the only way to solve this problem is backport seq_buf-based 
aproach.

I will backport necessary patches to 3.10 stable. Welcome you to review my 
backport patches.

Best Regards
Wang Long




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] how to perform a safe NMI stack trace on all CPUs on x86?

2015-05-14 Thread long.wanglong
On 2015/5/13 22:22, Steven Rostedt wrote:
 On Wed, 13 May 2015 22:14:54 +0800
 王龙 wangl...@laoqinren.net wrote:
 
 
 context. But how do we fix this problem in older version of kernel(eg, 3.10 
 stable)? 
 The 3.10 stable has no switch printk routine and seq_buf infrastructures.

 Could anyone give me some ideas?

 
 Backport the necessary patches.
 
 -- Steve
 
Hi Steve,

Thank you for your reply, I will backport necessary patches to 3.10 stable.
Welcome you to review my backport patches.

Best Regards
Wang Long
 .
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] how to perform a safe NMI stack trace on all CPUs on x86?

2015-05-13 Thread Jiri Kosina
On Wed, 13 May 2015, 王龙 wrote:

> Hi all,
> 
> In kernel before 3.19, when trigger_all_cpu_backtrace() is called on x86, 
> it will trigger an NMI on each CPU and call show_regs(). But this can lead
> to a hard lock up if the NMI comes in on another printk().
> 
> The commit a9edc88093287183ac934be44f295f183b2c62dd (x86/nmi: Perform a safe 
> NMI stack trace on all CPUs) fix this problem on kernel mainline. when the 
> NMI 
> triggers, it switches the printk routine for that CPU to call a NMI safe 
> printk 
> function that records the printk in a per_cpu seq_buf descriptor. After all 
> NMIs have finished recording its data, the seq_bufs are printed in a safe 
> context. But how do we fix this problem in older version of kernel(eg, 3.10 
> stable)? 
> The 3.10 stable has no "switch printk routine" and "seq_buf" infrastructures.
> 
> Could anyone give me some ideas?

Either you backport seq_buf-based aproach to the older kernel, or, if you 
are working on 3.4 kernel or earlier (basically any kernel preceeding the 
printk() revamp that happened in 7ff9554bb57 and after), you can use 
slightly simpler aproach.

It's an aproach we used initially when finding out the issue for the first 
time, and it is proven to work as well (but it's not applicable after Kay 
added all the complexity to printk()).

You can see it in our SLE11 kernel tree, available on


http://kernel.suse.com/cgit/kernel/commit/?h=SLE11-SP4=8d62ae68ff61d77ae3c4899f05dbd9c9742b14c9

for example.

It's up to you to judget which is the least painful way :)

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] how to perform a safe NMI stack trace on all CPUs on x86?

2015-05-13 Thread 王龙
Hi all,

In kernel before 3.19, when trigger_all_cpu_backtrace() is called on x86, 
it will trigger an NMI on each CPU and call show_regs(). But this can lead
to a hard lock up if the NMI comes in on another printk().

The commit a9edc88093287183ac934be44f295f183b2c62dd (x86/nmi: Perform a safe 
NMI stack trace on all CPUs) fix this problem on kernel mainline. when the NMI 
triggers, it switches the printk routine for that CPU to call a NMI safe printk 
function that records the printk in a per_cpu seq_buf descriptor. After all 
NMIs have finished recording its data, the seq_bufs are printed in a safe 
context. But how do we fix this problem in older version of kernel(eg, 3.10 
stable)? 
The 3.10 stable has no "switch printk routine" and "seq_buf" infrastructures.

Could anyone give me some ideas?

Best Regards
Wang Long

Re: [RFC] how to perform a safe NMI stack trace on all CPUs on x86?

2015-05-13 Thread Steven Rostedt
On Wed, 13 May 2015 22:14:54 +0800
"王龙"  wrote:


> context. But how do we fix this problem in older version of kernel(eg, 3.10 
> stable)? 
> The 3.10 stable has no "switch printk routine" and "seq_buf" infrastructures.
> 
> Could anyone give me some ideas?
> 

Backport the necessary patches.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] how to perform a safe NMI stack trace on all CPUs on x86?

2015-05-13 Thread Steven Rostedt
On Wed, 13 May 2015 22:14:54 +0800
王龙 wangl...@laoqinren.net wrote:


 context. But how do we fix this problem in older version of kernel(eg, 3.10 
 stable)? 
 The 3.10 stable has no switch printk routine and seq_buf infrastructures.
 
 Could anyone give me some ideas?
 

Backport the necessary patches.

-- Steve
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] how to perform a safe NMI stack trace on all CPUs on x86?

2015-05-13 Thread 王龙
Hi all,

In kernel before 3.19, when trigger_all_cpu_backtrace() is called on x86, 
it will trigger an NMI on each CPU and call show_regs(). But this can lead
to a hard lock up if the NMI comes in on another printk().

The commit a9edc88093287183ac934be44f295f183b2c62dd (x86/nmi: Perform a safe 
NMI stack trace on all CPUs) fix this problem on kernel mainline. when the NMI 
triggers, it switches the printk routine for that CPU to call a NMI safe printk 
function that records the printk in a per_cpu seq_buf descriptor. After all 
NMIs have finished recording its data, the seq_bufs are printed in a safe 
context. But how do we fix this problem in older version of kernel(eg, 3.10 
stable)? 
The 3.10 stable has no switch printk routine and seq_buf infrastructures.

Could anyone give me some ideas?

Best Regards
Wang Long

Re: [RFC] how to perform a safe NMI stack trace on all CPUs on x86?

2015-05-13 Thread Jiri Kosina
On Wed, 13 May 2015, 王龙 wrote:

 Hi all,
 
 In kernel before 3.19, when trigger_all_cpu_backtrace() is called on x86, 
 it will trigger an NMI on each CPU and call show_regs(). But this can lead
 to a hard lock up if the NMI comes in on another printk().
 
 The commit a9edc88093287183ac934be44f295f183b2c62dd (x86/nmi: Perform a safe 
 NMI stack trace on all CPUs) fix this problem on kernel mainline. when the 
 NMI 
 triggers, it switches the printk routine for that CPU to call a NMI safe 
 printk 
 function that records the printk in a per_cpu seq_buf descriptor. After all 
 NMIs have finished recording its data, the seq_bufs are printed in a safe 
 context. But how do we fix this problem in older version of kernel(eg, 3.10 
 stable)? 
 The 3.10 stable has no switch printk routine and seq_buf infrastructures.
 
 Could anyone give me some ideas?

Either you backport seq_buf-based aproach to the older kernel, or, if you 
are working on 3.4 kernel or earlier (basically any kernel preceeding the 
printk() revamp that happened in 7ff9554bb57 and after), you can use 
slightly simpler aproach.

It's an aproach we used initially when finding out the issue for the first 
time, and it is proven to work as well (but it's not applicable after Kay 
added all the complexity to printk()).

You can see it in our SLE11 kernel tree, available on


http://kernel.suse.com/cgit/kernel/commit/?h=SLE11-SP4id=8d62ae68ff61d77ae3c4899f05dbd9c9742b14c9

for example.

It's up to you to judget which is the least painful way :)

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/