Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD
On Thu, May 22, 2014 at 09:55:38PM +0200, Borislav Petkov wrote: > From: Borislav Petkov > Date: Thu, 22 May 2014 16:40:54 +0200 > Subject: [PATCH] x86, MCE: Kill CPU_POST_DEAD > > In conjunction with cleaning up CPU hotplug, we want to get rid of > CPU_POST_DEAD. Kill this instance here and rediscover CMCI banks at the > end of CPU_DEAD. > > Signed-off-by: Borislav Petkov > --- > arch/x86/kernel/cpu/mcheck/mce.c | 9 - > 1 file changed, 4 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c > b/arch/x86/kernel/cpu/mcheck/mce.c > index 68317c80de7f..bfde4871848f 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce.c > +++ b/arch/x86/kernel/cpu/mcheck/mce.c > @@ -2391,6 +2391,10 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned > long action, void *hcpu) > threshold_cpu_callback(action, cpu); > mce_device_remove(cpu); > mce_intel_hcpu_update(cpu); > + > + /* intentionally ignoring frozen here */ > + if (!(action & CPU_TASKS_FROZEN)) > + cmci_rediscover(); > break; > case CPU_DOWN_PREPARE: > smp_call_function_single(cpu, mce_disable_cpu, , 1); > @@ -2402,11 +2406,6 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned > long action, void *hcpu) > break; > } > > - if (action == CPU_POST_DEAD) { > - /* intentionally ignoring frozen here */ > - cmci_rediscover(); > - } > - > return NOTIFY_OK; > } Ok, so I did a little hammering on this one by running a hotplug toggler script, reading out files under /sys/devices/system/machinecheck... and suspending to disk and resuming, all at the same time. 'Round 10ish cycles I did and the box was chugging away happily without any issues. So, I'm going to queue this one for 3.17, along with the panic-on-timeout for the default tolerance level one: http://lkml.kernel.org/r/20140523091041.ga21...@pd.tnic if you don't have any objections. I'm saying 3.17 because both are not really critical stuff and could use a full cycle of simmering in linux-next just fine. Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD
On Thu, May 22, 2014 at 09:55:38PM +0200, Borislav Petkov wrote: From: Borislav Petkov b...@suse.de Date: Thu, 22 May 2014 16:40:54 +0200 Subject: [PATCH] x86, MCE: Kill CPU_POST_DEAD In conjunction with cleaning up CPU hotplug, we want to get rid of CPU_POST_DEAD. Kill this instance here and rediscover CMCI banks at the end of CPU_DEAD. Signed-off-by: Borislav Petkov b...@suse.de --- arch/x86/kernel/cpu/mcheck/mce.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 68317c80de7f..bfde4871848f 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -2391,6 +2391,10 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) threshold_cpu_callback(action, cpu); mce_device_remove(cpu); mce_intel_hcpu_update(cpu); + + /* intentionally ignoring frozen here */ + if (!(action CPU_TASKS_FROZEN)) + cmci_rediscover(); break; case CPU_DOWN_PREPARE: smp_call_function_single(cpu, mce_disable_cpu, action, 1); @@ -2402,11 +2406,6 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) break; } - if (action == CPU_POST_DEAD) { - /* intentionally ignoring frozen here */ - cmci_rediscover(); - } - return NOTIFY_OK; } Ok, so I did a little hammering on this one by running a hotplug toggler script, reading out files under /sys/devices/system/machinecheck... and suspending to disk and resuming, all at the same time. 'Round 10ish cycles I did and the box was chugging away happily without any issues. So, I'm going to queue this one for 3.17, along with the panic-on-timeout for the default tolerance level one: http://lkml.kernel.org/r/20140523091041.ga21...@pd.tnic if you don't have any objections. I'm saying 3.17 because both are not really critical stuff and could use a full cycle of simmering in linux-next just fine. Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD
On 05/23/2014 01:25 AM, Borislav Petkov wrote: > On Thu, May 22, 2014 at 03:50:21PM +, Luck, Tony wrote: >>>> So I think we can reduce it to just the one rwsem (with recursion) if we >>>> shoot CPU_POST_DEAD in the head. >>> >>> Here's the first bullet. Stressing my box here with Steve's hotplug >>> script seems to work fine. >>> >>> Tony, any objections? >> >> what was this comment referring to: >> >> /* intentionally ignoring frozen here */ >> >> After you move the cmci_rediscover() call, it is now in a place where we are >> no longer ignoring frozen (i.e. the old placement did the rediscover even if >> the >> CPU_TASKS_FROZEN bit was set - with the new placement we will skip >> rediscovery. >> >> So we were working around some interaction between cpu hotplug and frozen. >> Do we no longer need to do that? > > Hmm, that FROZEN thing is supposedly for hotplug operations while > suspend is happening. I guess it makes a little sense to rediscover CMCI > banks while suspend is in progress. Whatever. > > Let's keep it before more crap ensues, that was a good catch, thanks. > > So, I guess something like that instead. > > Which means, I'd need to run a couple of suspend/resume rounds while > hotplugging cores to see whether we're still kosher. > > More fun tomorrow. > > --- > From: Borislav Petkov > Date: Thu, 22 May 2014 16:40:54 +0200 > Subject: [PATCH] x86, MCE: Kill CPU_POST_DEAD > > In conjunction with cleaning up CPU hotplug, we want to get rid of > CPU_POST_DEAD. Kill this instance here and rediscover CMCI banks at the > end of CPU_DEAD. > > Signed-off-by: Borislav Petkov Reviewed-by: Srivatsa S. Bhat Regards, Srivatsa S. Bhat > --- > arch/x86/kernel/cpu/mcheck/mce.c | 9 - > 1 file changed, 4 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c > b/arch/x86/kernel/cpu/mcheck/mce.c > index 68317c80de7f..bfde4871848f 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce.c > +++ b/arch/x86/kernel/cpu/mcheck/mce.c > @@ -2391,6 +2391,10 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned > long action, void *hcpu) > threshold_cpu_callback(action, cpu); > mce_device_remove(cpu); > mce_intel_hcpu_update(cpu); > + > + /* intentionally ignoring frozen here */ > + if (!(action & CPU_TASKS_FROZEN)) > + cmci_rediscover(); > break; > case CPU_DOWN_PREPARE: > smp_call_function_single(cpu, mce_disable_cpu, , 1); > @@ -2402,11 +2406,6 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned > long action, void *hcpu) > break; > } > > - if (action == CPU_POST_DEAD) { > - /* intentionally ignoring frozen here */ > - cmci_rediscover(); > - } > - > return NOTIFY_OK; > } > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD
On 05/23/2014 03:01 AM, Borislav Petkov wrote: > On Fri, May 23, 2014 at 02:43:31AM +0530, Srivatsa S. Bhat wrote: After you move the cmci_rediscover() call, it is now in a place where we are no longer ignoring frozen (i.e. the old placement did the rediscover even if the CPU_TASKS_FROZEN bit was set - with the new placement we will skip rediscovery. >> >> That's not quite true. The existing code already ignores FROZEN for all the >> cases, >> by ignoring it at the top of the switch-case itself: > > No, Tony's right and you got confused: > > Before my change, the code did: > > if (action == CPU_POST_DEAD) { > /* intentionally ignoring frozen here */ > cmci_rediscover(); > } > > which is only CPU_POST_DEAD *without* the CPU_TASKS_FROZEN bit. > > If I move it in the switch-case, cmci_rediscover() *ignores the FROZEN > bit and gets executed for both: > > CPU_DEAD: > CPU_DEAD_FROZEN: > > because with the FROZEN bit masked out, they're the same. > > But we don't want to execute it for the FROZEN bit - look for the other > two tests for CPU_TASKS_FROZEN in mce.c for an example. > > So, before we go and change the FROZEN aspect and break things in > strange ways, let's keep the _FROZEN ignore. I certainly don't want to > go down that road and chase why we needed FROZEN or not. > > Ok? > Right, I got confused about who meant what by the term 'ignore' - ignore the FROZEN _bit_ as in execute all the time irrespective of that bit being set or unset, or ignore the FROZEN _case_ as in don't execute during suspend/resume. Anyway, sorry for the confusion! Your latest code looks correct to me. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD
On Fri, May 23, 2014 at 02:43:31AM +0530, Srivatsa S. Bhat wrote: > >> After you move the cmci_rediscover() call, it is now in a place where we > >> are > >> no longer ignoring frozen (i.e. the old placement did the rediscover even > >> if the > >> CPU_TASKS_FROZEN bit was set - with the new placement we will skip > >> rediscovery. > >> > > That's not quite true. The existing code already ignores FROZEN for all the > cases, > by ignoring it at the top of the switch-case itself: No, Tony's right and you got confused: Before my change, the code did: if (action == CPU_POST_DEAD) { /* intentionally ignoring frozen here */ cmci_rediscover(); } which is only CPU_POST_DEAD *without* the CPU_TASKS_FROZEN bit. If I move it in the switch-case, cmci_rediscover() *ignores the FROZEN bit and gets executed for both: CPU_DEAD: CPU_DEAD_FROZEN: because with the FROZEN bit masked out, they're the same. But we don't want to execute it for the FROZEN bit - look for the other two tests for CPU_TASKS_FROZEN in mce.c for an example. So, before we go and change the FROZEN aspect and break things in strange ways, let's keep the _FROZEN ignore. I certainly don't want to go down that road and chase why we needed FROZEN or not. Ok? -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD
On 05/23/2014 01:25 AM, Borislav Petkov wrote: > On Thu, May 22, 2014 at 03:50:21PM +, Luck, Tony wrote: >>>> So I think we can reduce it to just the one rwsem (with recursion) if we >>>> shoot CPU_POST_DEAD in the head. >>> >>> Here's the first bullet. Stressing my box here with Steve's hotplug >>> script seems to work fine. >>> >>> Tony, any objections? >> >> what was this comment referring to: >> >> /* intentionally ignoring frozen here */ >> >> After you move the cmci_rediscover() call, it is now in a place where we are >> no longer ignoring frozen (i.e. the old placement did the rediscover even if >> the >> CPU_TASKS_FROZEN bit was set - with the new placement we will skip >> rediscovery. >> That's not quite true. The existing code already ignores FROZEN for all the cases, by ignoring it at the top of the switch-case itself: switch (action & ~CPU_TASKS_FROZEN) { case CPU_ONLINE: [...] break; case CPU_DEAD: if (threshold_cpu_callback) threshold_cpu_callback(action, cpu); mce_device_remove(cpu); mce_intel_hcpu_update(cpu); break; Then I started wondering what the comment really meant, and commit 1a65f970d1 made things clear: its actually the _other_ way around! That is, cmci_rediscover() didn't have to be invoked* during suspend/resume, so it was kept separate from the rest. * or maybe it was not _supposed_ to be invoked; I don't know which is the case.. the original commit 88ccbedd9 didn't explain that. Either way, cmci_rediscover() doesn't seem to have any reason why it should be called specifically in the POST_DEAD stage only. So I'm sure we can get rid of that one way or other easily. Regards, Srivatsa S. Bhat >> So we were working around some interaction between cpu hotplug and frozen. >> Do we no longer need to do that? > > Hmm, that FROZEN thing is supposedly for hotplug operations while > suspend is happening. I guess it makes a little sense to rediscover CMCI > banks while suspend is in progress. Whatever. > > Let's keep it before more crap ensues, that was a good catch, thanks. > > So, I guess something like that instead. > > Which means, I'd need to run a couple of suspend/resume rounds while > hotplugging cores to see whether we're still kosher. > > More fun tomorrow. > > --- > From: Borislav Petkov > Date: Thu, 22 May 2014 16:40:54 +0200 > Subject: [PATCH] x86, MCE: Kill CPU_POST_DEAD > > In conjunction with cleaning up CPU hotplug, we want to get rid of > CPU_POST_DEAD. Kill this instance here and rediscover CMCI banks at the > end of CPU_DEAD. > > Signed-off-by: Borislav Petkov > --- > arch/x86/kernel/cpu/mcheck/mce.c | 9 - > 1 file changed, 4 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c > b/arch/x86/kernel/cpu/mcheck/mce.c > index 68317c80de7f..bfde4871848f 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce.c > +++ b/arch/x86/kernel/cpu/mcheck/mce.c > @@ -2391,6 +2391,10 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned > long action, void *hcpu) > threshold_cpu_callback(action, cpu); > mce_device_remove(cpu); > mce_intel_hcpu_update(cpu); > + > + /* intentionally ignoring frozen here */ > + if (!(action & CPU_TASKS_FROZEN)) > + cmci_rediscover(); > break; > case CPU_DOWN_PREPARE: > smp_call_function_single(cpu, mce_disable_cpu, , 1); > @@ -2402,11 +2406,6 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned > long action, void *hcpu) > break; > } > > - if (action == CPU_POST_DEAD) { > - /* intentionally ignoring frozen here */ > - cmci_rediscover(); > - } > - > return NOTIFY_OK; > } > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD
On Thu, May 22, 2014 at 03:50:21PM +, Luck, Tony wrote: > >> So I think we can reduce it to just the one rwsem (with recursion) if we > >> shoot CPU_POST_DEAD in the head. > > > > Here's the first bullet. Stressing my box here with Steve's hotplug > > script seems to work fine. > > > > Tony, any objections? > > what was this comment referring to: > > /* intentionally ignoring frozen here */ > > After you move the cmci_rediscover() call, it is now in a place where we are > no longer ignoring frozen (i.e. the old placement did the rediscover even if > the > CPU_TASKS_FROZEN bit was set - with the new placement we will skip > rediscovery. > > So we were working around some interaction between cpu hotplug and frozen. > Do we no longer need to do that? Hmm, that FROZEN thing is supposedly for hotplug operations while suspend is happening. I guess it makes a little sense to rediscover CMCI banks while suspend is in progress. Whatever. Let's keep it before more crap ensues, that was a good catch, thanks. So, I guess something like that instead. Which means, I'd need to run a couple of suspend/resume rounds while hotplugging cores to see whether we're still kosher. More fun tomorrow. --- From: Borislav Petkov Date: Thu, 22 May 2014 16:40:54 +0200 Subject: [PATCH] x86, MCE: Kill CPU_POST_DEAD In conjunction with cleaning up CPU hotplug, we want to get rid of CPU_POST_DEAD. Kill this instance here and rediscover CMCI banks at the end of CPU_DEAD. Signed-off-by: Borislav Petkov --- arch/x86/kernel/cpu/mcheck/mce.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 68317c80de7f..bfde4871848f 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -2391,6 +2391,10 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) threshold_cpu_callback(action, cpu); mce_device_remove(cpu); mce_intel_hcpu_update(cpu); + + /* intentionally ignoring frozen here */ + if (!(action & CPU_TASKS_FROZEN)) + cmci_rediscover(); break; case CPU_DOWN_PREPARE: smp_call_function_single(cpu, mce_disable_cpu, , 1); @@ -2402,11 +2406,6 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) break; } - if (action == CPU_POST_DEAD) { - /* intentionally ignoring frozen here */ - cmci_rediscover(); - } - return NOTIFY_OK; } -- 1.9.0 -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] x86, MCE: Kill CPU_POST_DEAD
>> So I think we can reduce it to just the one rwsem (with recursion) if we >> shoot CPU_POST_DEAD in the head. > > Here's the first bullet. Stressing my box here with Steve's hotplug > script seems to work fine. > > Tony, any objections? what was this comment referring to: /* intentionally ignoring frozen here */ After you move the cmci_rediscover() call, it is now in a place where we are no longer ignoring frozen (i.e. the old placement did the rediscover even if the CPU_TASKS_FROZEN bit was set - with the new placement we will skip rediscovery. So we were working around some interaction between cpu hotplug and frozen. Do we no longer need to do that? -Tony N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a��� 0��h���i
[PATCH] x86, MCE: Kill CPU_POST_DEAD
On Thu, May 22, 2014 at 02:32:51PM +0200, Peter Zijlstra wrote: > So I think we can reduce it to just the one rwsem (with recursion) if we > shoot CPU_POST_DEAD in the head. Here's the first bullet. Stressing my box here with Steve's hotplug script seems to work fine. Tony, any objections? --- From: Borislav Petkov Date: Thu, 22 May 2014 16:40:54 +0200 Subject: [PATCH] x86, MCE: Kill CPU_POST_DEAD In conjunction with cleaning up CPU hotplug, we want to get rid of CPU_POST_DEAD. Kill this instance here and rediscover CMCI banks at the end of CPU_DEAD. Signed-off-by: Borislav Petkov --- arch/x86/kernel/cpu/mcheck/mce.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 68317c80de7f..ee35d621815b 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -2391,6 +2391,7 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) threshold_cpu_callback(action, cpu); mce_device_remove(cpu); mce_intel_hcpu_update(cpu); + cmci_rediscover(); break; case CPU_DOWN_PREPARE: smp_call_function_single(cpu, mce_disable_cpu, , 1); @@ -2402,11 +2403,6 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) break; } - if (action == CPU_POST_DEAD) { - /* intentionally ignoring frozen here */ - cmci_rediscover(); - } - return NOTIFY_OK; } -- 1.9.0 -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86, MCE: Kill CPU_POST_DEAD
On Thu, May 22, 2014 at 02:32:51PM +0200, Peter Zijlstra wrote: So I think we can reduce it to just the one rwsem (with recursion) if we shoot CPU_POST_DEAD in the head. Here's the first bullet. Stressing my box here with Steve's hotplug script seems to work fine. Tony, any objections? --- From: Borislav Petkov b...@suse.de Date: Thu, 22 May 2014 16:40:54 +0200 Subject: [PATCH] x86, MCE: Kill CPU_POST_DEAD In conjunction with cleaning up CPU hotplug, we want to get rid of CPU_POST_DEAD. Kill this instance here and rediscover CMCI banks at the end of CPU_DEAD. Signed-off-by: Borislav Petkov b...@suse.de --- arch/x86/kernel/cpu/mcheck/mce.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 68317c80de7f..ee35d621815b 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -2391,6 +2391,7 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) threshold_cpu_callback(action, cpu); mce_device_remove(cpu); mce_intel_hcpu_update(cpu); + cmci_rediscover(); break; case CPU_DOWN_PREPARE: smp_call_function_single(cpu, mce_disable_cpu, action, 1); @@ -2402,11 +2403,6 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) break; } - if (action == CPU_POST_DEAD) { - /* intentionally ignoring frozen here */ - cmci_rediscover(); - } - return NOTIFY_OK; } -- 1.9.0 -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] x86, MCE: Kill CPU_POST_DEAD
So I think we can reduce it to just the one rwsem (with recursion) if we shoot CPU_POST_DEAD in the head. Here's the first bullet. Stressing my box here with Steve's hotplug script seems to work fine. Tony, any objections? what was this comment referring to: /* intentionally ignoring frozen here */ After you move the cmci_rediscover() call, it is now in a place where we are no longer ignoring frozen (i.e. the old placement did the rediscover even if the CPU_TASKS_FROZEN bit was set - with the new placement we will skip rediscovery. So we were working around some interaction between cpu hotplug and frozen. Do we no longer need to do that? -Tony N�r��yb�X��ǧv�^�){.n�+{zX����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf��^jǫy�m��@A�a��� 0��h���i
Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD
On Thu, May 22, 2014 at 03:50:21PM +, Luck, Tony wrote: So I think we can reduce it to just the one rwsem (with recursion) if we shoot CPU_POST_DEAD in the head. Here's the first bullet. Stressing my box here with Steve's hotplug script seems to work fine. Tony, any objections? what was this comment referring to: /* intentionally ignoring frozen here */ After you move the cmci_rediscover() call, it is now in a place where we are no longer ignoring frozen (i.e. the old placement did the rediscover even if the CPU_TASKS_FROZEN bit was set - with the new placement we will skip rediscovery. So we were working around some interaction between cpu hotplug and frozen. Do we no longer need to do that? Hmm, that FROZEN thing is supposedly for hotplug operations while suspend is happening. I guess it makes a little sense to rediscover CMCI banks while suspend is in progress. Whatever. Let's keep it before more crap ensues, that was a good catch, thanks. So, I guess something like that instead. Which means, I'd need to run a couple of suspend/resume rounds while hotplugging cores to see whether we're still kosher. More fun tomorrow. --- From: Borislav Petkov b...@suse.de Date: Thu, 22 May 2014 16:40:54 +0200 Subject: [PATCH] x86, MCE: Kill CPU_POST_DEAD In conjunction with cleaning up CPU hotplug, we want to get rid of CPU_POST_DEAD. Kill this instance here and rediscover CMCI banks at the end of CPU_DEAD. Signed-off-by: Borislav Petkov b...@suse.de --- arch/x86/kernel/cpu/mcheck/mce.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 68317c80de7f..bfde4871848f 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -2391,6 +2391,10 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) threshold_cpu_callback(action, cpu); mce_device_remove(cpu); mce_intel_hcpu_update(cpu); + + /* intentionally ignoring frozen here */ + if (!(action CPU_TASKS_FROZEN)) + cmci_rediscover(); break; case CPU_DOWN_PREPARE: smp_call_function_single(cpu, mce_disable_cpu, action, 1); @@ -2402,11 +2406,6 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) break; } - if (action == CPU_POST_DEAD) { - /* intentionally ignoring frozen here */ - cmci_rediscover(); - } - return NOTIFY_OK; } -- 1.9.0 -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD
On 05/23/2014 01:25 AM, Borislav Petkov wrote: On Thu, May 22, 2014 at 03:50:21PM +, Luck, Tony wrote: So I think we can reduce it to just the one rwsem (with recursion) if we shoot CPU_POST_DEAD in the head. Here's the first bullet. Stressing my box here with Steve's hotplug script seems to work fine. Tony, any objections? what was this comment referring to: /* intentionally ignoring frozen here */ After you move the cmci_rediscover() call, it is now in a place where we are no longer ignoring frozen (i.e. the old placement did the rediscover even if the CPU_TASKS_FROZEN bit was set - with the new placement we will skip rediscovery. That's not quite true. The existing code already ignores FROZEN for all the cases, by ignoring it at the top of the switch-case itself: switch (action ~CPU_TASKS_FROZEN) { case CPU_ONLINE: [...] break; case CPU_DEAD: if (threshold_cpu_callback) threshold_cpu_callback(action, cpu); mce_device_remove(cpu); mce_intel_hcpu_update(cpu); break; Then I started wondering what the comment really meant, and commit 1a65f970d1 made things clear: its actually the _other_ way around! That is, cmci_rediscover() didn't have to be invoked* during suspend/resume, so it was kept separate from the rest. * or maybe it was not _supposed_ to be invoked; I don't know which is the case.. the original commit 88ccbedd9 didn't explain that. Either way, cmci_rediscover() doesn't seem to have any reason why it should be called specifically in the POST_DEAD stage only. So I'm sure we can get rid of that one way or other easily. Regards, Srivatsa S. Bhat So we were working around some interaction between cpu hotplug and frozen. Do we no longer need to do that? Hmm, that FROZEN thing is supposedly for hotplug operations while suspend is happening. I guess it makes a little sense to rediscover CMCI banks while suspend is in progress. Whatever. Let's keep it before more crap ensues, that was a good catch, thanks. So, I guess something like that instead. Which means, I'd need to run a couple of suspend/resume rounds while hotplugging cores to see whether we're still kosher. More fun tomorrow. --- From: Borislav Petkov b...@suse.de Date: Thu, 22 May 2014 16:40:54 +0200 Subject: [PATCH] x86, MCE: Kill CPU_POST_DEAD In conjunction with cleaning up CPU hotplug, we want to get rid of CPU_POST_DEAD. Kill this instance here and rediscover CMCI banks at the end of CPU_DEAD. Signed-off-by: Borislav Petkov b...@suse.de --- arch/x86/kernel/cpu/mcheck/mce.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 68317c80de7f..bfde4871848f 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -2391,6 +2391,10 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) threshold_cpu_callback(action, cpu); mce_device_remove(cpu); mce_intel_hcpu_update(cpu); + + /* intentionally ignoring frozen here */ + if (!(action CPU_TASKS_FROZEN)) + cmci_rediscover(); break; case CPU_DOWN_PREPARE: smp_call_function_single(cpu, mce_disable_cpu, action, 1); @@ -2402,11 +2406,6 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) break; } - if (action == CPU_POST_DEAD) { - /* intentionally ignoring frozen here */ - cmci_rediscover(); - } - return NOTIFY_OK; } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD
On Fri, May 23, 2014 at 02:43:31AM +0530, Srivatsa S. Bhat wrote: After you move the cmci_rediscover() call, it is now in a place where we are no longer ignoring frozen (i.e. the old placement did the rediscover even if the CPU_TASKS_FROZEN bit was set - with the new placement we will skip rediscovery. That's not quite true. The existing code already ignores FROZEN for all the cases, by ignoring it at the top of the switch-case itself: No, Tony's right and you got confused: Before my change, the code did: if (action == CPU_POST_DEAD) { /* intentionally ignoring frozen here */ cmci_rediscover(); } which is only CPU_POST_DEAD *without* the CPU_TASKS_FROZEN bit. If I move it in the switch-case, cmci_rediscover() *ignores the FROZEN bit and gets executed for both: CPU_DEAD: CPU_DEAD_FROZEN: because with the FROZEN bit masked out, they're the same. But we don't want to execute it for the FROZEN bit - look for the other two tests for CPU_TASKS_FROZEN in mce.c for an example. So, before we go and change the FROZEN aspect and break things in strange ways, let's keep the _FROZEN ignore. I certainly don't want to go down that road and chase why we needed FROZEN or not. Ok? -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD
On 05/23/2014 03:01 AM, Borislav Petkov wrote: On Fri, May 23, 2014 at 02:43:31AM +0530, Srivatsa S. Bhat wrote: After you move the cmci_rediscover() call, it is now in a place where we are no longer ignoring frozen (i.e. the old placement did the rediscover even if the CPU_TASKS_FROZEN bit was set - with the new placement we will skip rediscovery. That's not quite true. The existing code already ignores FROZEN for all the cases, by ignoring it at the top of the switch-case itself: No, Tony's right and you got confused: Before my change, the code did: if (action == CPU_POST_DEAD) { /* intentionally ignoring frozen here */ cmci_rediscover(); } which is only CPU_POST_DEAD *without* the CPU_TASKS_FROZEN bit. If I move it in the switch-case, cmci_rediscover() *ignores the FROZEN bit and gets executed for both: CPU_DEAD: CPU_DEAD_FROZEN: because with the FROZEN bit masked out, they're the same. But we don't want to execute it for the FROZEN bit - look for the other two tests for CPU_TASKS_FROZEN in mce.c for an example. So, before we go and change the FROZEN aspect and break things in strange ways, let's keep the _FROZEN ignore. I certainly don't want to go down that road and chase why we needed FROZEN or not. Ok? Right, I got confused about who meant what by the term 'ignore' - ignore the FROZEN _bit_ as in execute all the time irrespective of that bit being set or unset, or ignore the FROZEN _case_ as in don't execute during suspend/resume. Anyway, sorry for the confusion! Your latest code looks correct to me. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, MCE: Kill CPU_POST_DEAD
On 05/23/2014 01:25 AM, Borislav Petkov wrote: On Thu, May 22, 2014 at 03:50:21PM +, Luck, Tony wrote: So I think we can reduce it to just the one rwsem (with recursion) if we shoot CPU_POST_DEAD in the head. Here's the first bullet. Stressing my box here with Steve's hotplug script seems to work fine. Tony, any objections? what was this comment referring to: /* intentionally ignoring frozen here */ After you move the cmci_rediscover() call, it is now in a place where we are no longer ignoring frozen (i.e. the old placement did the rediscover even if the CPU_TASKS_FROZEN bit was set - with the new placement we will skip rediscovery. So we were working around some interaction between cpu hotplug and frozen. Do we no longer need to do that? Hmm, that FROZEN thing is supposedly for hotplug operations while suspend is happening. I guess it makes a little sense to rediscover CMCI banks while suspend is in progress. Whatever. Let's keep it before more crap ensues, that was a good catch, thanks. So, I guess something like that instead. Which means, I'd need to run a couple of suspend/resume rounds while hotplugging cores to see whether we're still kosher. More fun tomorrow. --- From: Borislav Petkov b...@suse.de Date: Thu, 22 May 2014 16:40:54 +0200 Subject: [PATCH] x86, MCE: Kill CPU_POST_DEAD In conjunction with cleaning up CPU hotplug, we want to get rid of CPU_POST_DEAD. Kill this instance here and rediscover CMCI banks at the end of CPU_DEAD. Signed-off-by: Borislav Petkov b...@suse.de Reviewed-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com Regards, Srivatsa S. Bhat --- arch/x86/kernel/cpu/mcheck/mce.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 68317c80de7f..bfde4871848f 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -2391,6 +2391,10 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) threshold_cpu_callback(action, cpu); mce_device_remove(cpu); mce_intel_hcpu_update(cpu); + + /* intentionally ignoring frozen here */ + if (!(action CPU_TASKS_FROZEN)) + cmci_rediscover(); break; case CPU_DOWN_PREPARE: smp_call_function_single(cpu, mce_disable_cpu, action, 1); @@ -2402,11 +2406,6 @@ mce_cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu) break; } - if (action == CPU_POST_DEAD) { - /* intentionally ignoring frozen here */ - cmci_rediscover(); - } - return NOTIFY_OK; } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/