Re: [PATCH v2] x86/mce: Distirbute the clear operation of mces_seen to Per-CPU rather than only monarch CPU
On Fri, May 23, 2014 at 4:57 AM, Chen Yucong wrote: > If (mca_cfg.tolerant == 2 || mce_cfg.tolerant == 3), what can you do for > it? Maybe we need to look again at the effects of "tolerant" - and maybe specify what happens at various levels, There are some obvious silly bits of code (picking one that is my fault): if (cfg->tolerant < 3) { if (no_way_out) mce_panic("Fatal machine check on current CPU", &m, msg); if (worst == MCE_AR_SEVERITY) { /* schedule action before return to userland */ mce_save_info(m.addr, m.mcgstatus & MCG_STATUS_RIPV); set_thread_flag(TIF_MCE_NOTIFY); } else if (kill_it) { force_sig(SIGBUS, current); } } Why is the MCE_AR_SEVERITY recovery code not even attempted if tolerant is >=3? That block of code dates back to before there were any recoverable cases ... so the insane option of just ignoring the error and hoping that the end result wasn't too bad made some sort of sense when compared against a machine crash and not getting any answer at all. Or one that Andi pointed out years ago (and had a fix in a tree for): if (order == 1) { /* CHECKME: Can this race with a parallel hotplug? */ int cpus = num_online_cpus(); /* * Monarch: Wait for everyone to go through their scanning * loops. */ while (atomic_read(&mce_executing) <= cpus) { What if some cpus were offline when this machine check arrived? Our "offline" code doesn't do anything to the h/w to prevent those cpus from joining in the machine check fun. So we'll see more than num_online_cpus() processors arrive to process the machine check. Andi's fix was in the start of do_machine_check() and just had each cpu that showed up check whether it was listed as "online" by Linux. If not, it just cleared MCG_STATUS and returned. I didn't apply it because I thought we needed to be a bit more robust (what if the offline cpu actually did have a problem? ... we should at least check that MCG_STATUS.RIPV=1 before rashly returning ... perhaps even more tests are needed if the cpu had never been online at all). So I'm happy that you are taking an interest in machine check code. I think there are places where it can be made a lot better. I don't think that moving where mces_seen gets cleared is one of those places. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86/mce: Distirbute the clear operation of mces_seen to Per-CPU rather than only monarch CPU
On Thu, May 22, 2014 at 6:32 PM, Chen Yucong wrote: > As Naoya Horiguchi says, this patch also have a small benefit that it > can reduce the processing time of monarch CPU. This is indeed a benefit - but I'm not super worried about performance of machine check handler. >/* > * Now clear the mces_seen of current CPU -*final - so that it > does not > * reappear on the next mce. > */ >memset(final, 0, sizeof(struct mce)); >mce_wrmsrl(MSR_IA32_MCG_STATUS, 0); But if the monarch hasn't managed to clean mces_seen, then it certainly hasn't cleared MCG_STATUS ... so there can't be a "next" mce that would see these old values. Any extra MCE will result in system reset. So we are not arguing that your patch is wrong - it just doesn't seem to be any better that what we have now (except for an unimportant small performance improvement). -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86/mce: Distirbute the clear operation of mces_seen to Per-CPU rather than only monarch CPU
On Fri, 2014-05-23 at 11:10 +0200, Borislav Petkov wrote: > On Fri, May 23, 2014 at 09:32:19AM +0800, Chen Yucong wrote: > > ...if we reach a timeout, there is very little > > chance for recovering. Thought. the probability for this situation to > > happen is very slight, it's not impossible. Indeed, it's hard to know > > the precise causes for timeout. OK, we can exclude the timeout. Why can not we distribute the clear operations of mces_seen to Per-CPU? Why must monarch need to help all other CPUs to clean mces_seen? What's the advantage for this? Why do we have to discard the property of Per-CPU variable? Why can not we reduce the processing time of monarch CPU? ... > > Ok, enough talking, let's close that hole and get on with our lives: You can safely ignore all messages about this talking. > > There is very little and maybe practically nothing we can do to recover > from a system where at least one core has reached a timeout during the > whole monarch cores gathering. So panic when that happens. > Why do you prefer to use "very little" and "maybe practically"? Do you still not sure about that? > if ((s64)*t < SPINUNIT) { > - /* CHECKME: Make panic default for 1 too? */ > - if (mca_cfg.tolerant < 1) > + if (mca_cfg.tolerant <= 1) If (mca_cfg.tolerant == 2 || mce_cfg.tolerant == 3), what can you do for it? > mce_panic("Timeout synchronizing machine check over > CPUs", > NULL, NULL); > cpu_missing = 1; > -- > 1.9.0 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86/mce: Distirbute the clear operation of mces_seen to Per-CPU rather than only monarch CPU
On Fri, May 23, 2014 at 09:32:19AM +0800, Chen Yucong wrote: > ...if we reach a timeout, there is very little > chance for recovering. Thought. the probability for this situation to > happen is very slight, it's not impossible. Indeed, it's hard to know > the precise causes for timeout. Ok, enough talking, let's close that hole and get on with our lives: --- From: Borislav Petkov Date: Fri, 23 May 2014 11:06:35 +0200 Subject: [PATCH] mce: Panic when a core has reached a timeout There is very little and maybe practically nothing we can do to recover from a system where at least one core has reached a timeout during the whole monarch cores gathering. So panic when that happens. Signed-off-by: Borislav Petkov --- arch/x86/kernel/cpu/mcheck/mce.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index bfde4871848f..529ccc488f5a 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -704,8 +704,7 @@ static int mce_timed_out(u64 *t) if (!mca_cfg.monarch_timeout) goto out; if ((s64)*t < SPINUNIT) { - /* CHECKME: Make panic default for 1 too? */ - if (mca_cfg.tolerant < 1) + if (mca_cfg.tolerant <= 1) mce_panic("Timeout synchronizing machine check over CPUs", NULL, NULL); cpu_missing = 1; -- 1.9.0 -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86/mce: Distirbute the clear operation of mces_seen to Per-CPU rather than only monarch CPU
On Wed, 2014-05-21 at 21:09 +, Luck, Tony wrote: > Please do give us more detail on the scenario that you see that would > make your new version behave better. > > I'm sure the current code has no races w.r.t. clearing mces_seen. The > monarch clears them all in mce_reign() before clearing mce_executing > at the foot of mce_end() and allowing the others to run again. > Right. There are not races for cleaning mces_seen. But, if a timeout occurs in monarch, mces_seen will be not cleaned. It will affect all other CPUs. As Borislav Petkov says, if we reach a timeout, there is very little chance for recovering. Thought. the probability for this situation to happen is very slight, it's not impossible. Indeed, it's hard to know the precise causes for timeout. As Naoya Horiguchi says, this patch also have a small benefit that it can reduce the processing time of monarch CPU. > Your code has the monarch release all the other cpus from the spinloop > in mce_end() so they will all rush together through the final lines of > do_machine_check(). No. My code just distribute cleaning operation to Per-CPU. And all other CPUs still have to wait for clearing mce_executing by monarch. In fact, mces_seen is just used for system panics as quickly as possible if there is a truly data corrupting error. So there is not advantage for cleaning mces_seen in the monarch. > Some of them will have work to do if they saw > errors - they may have to send signals, or log the error. Others can > fly directly to the end of do_machine_check() and clear MCG_STATUS > and return to executing whatever code was interrupted. > > So it is possible that some processors will be out doing things that can > generate another machine check, before others have finished their > tasks and got to the point to clear mces_seen.(*) > > -Tony > > (*) maybe that doesn't matter because they haven't zeroed MCG_STATUS > yet - so this second machine check will force those cpus to shutdown. See MCIP > description in "15.3.1.2 IA32_MCG_STATUS_MSR" section of software > developer manual. Right. This I also know. This is the reason why you can find the following snippet in my code: /* * Now clear the mces_seen of current CPU -*final - so that it does not * reappear on the next mce. */ memset(final, 0, sizeof(struct mce)); mce_wrmsrl(MSR_IA32_MCG_STATUS, 0); --- Thanks very much for your reply. cyc -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v2] x86/mce: Distirbute the clear operation of mces_seen to Per-CPU rather than only monarch CPU
>> mce_regin, which is only called by monarch CPU, can be used for system >> panics as quickly as possible if there is a truly data corrupting error. >> But Monarch CPU don't have to help all other CPU to clean mces_clean. >> One advantage of Per-CPU is the isolation of errors propagation, being >> so, why do not we clean mces_seen by Per-CPU? > > What kind of error propagations are you expecting/concerning here? > Could you explain the problem more in detail? Please do give us more detail on the scenario that you see that would make your new version behave better. I'm sure the current code has no races w.r.t. clearing mces_seen. The monarch clears them all in mce_reign() before clearing mce_executing at the foot of mce_end() and allowing the others to run again. Your code has the monarch release all the other cpus from the spinloop in mce_end() so they will all rush together through the final lines of do_machine_check(). Some of them will have work to do if they saw errors - they may have to send signals, or log the error. Others can fly directly to the end of do_machine_check() and clear MCG_STATUS and return to executing whatever code was interrupted. So it is possible that some processors will be out doing things that can generate another machine check, before others have finished their tasks and got to the point to clear mces_seen.(*) -Tony (*) maybe that doesn't matter because they haven't zeroed MCG_STATUS yet - so this second machine check will force those cpus to shutdown. See MCIP description in "15.3.1.2 IA32_MCG_STATUS_MSR" section of software developer manual.
Re: [PATCH v2] x86/mce: Distirbute the clear operation of mces_seen to Per-CPU rather than only monarch CPU
(2014/05/21 12:19), Chen Yucong wrote: > On Wed, 2014-05-21 at 11:43 +0900, Hidetoshi Seto wrote: >> (2014/05/21 11:03), Chen Yucong wrote: >>> On Wed, 2014-05-21 at 10:40 +0900, Hidetoshi Seto wrote: (2014/05/20 11:11), Chen Yucong wrote: > mces_seen is a Per-CPU variable which should only be accessed by Per-CPU > as possible. So the > clear operation of mces_seen should also be lcoal to Per-CPU rather than > monarch CPU. I don't think it should be local. Originally what we want to have here is memory to save mces_seen for each online cpus, such as a global array like mces_seen[cpus]. But at same time we don't want to preallocate big array enough for max possible cpus. So we use per-cpu store instead. >>> But mces_seen will just be updated by Per-CPU rather than monarch CPU. >>> It is only read by monarch CPU. >> >> Because mce status registers are per-cpu and monarch cannot access subjects' >> registers >> directly, > Right. This is one reason why we need to distribute the clear operation > to Per-CPU. And in fact it exactly assigns per-cpu property to > mces_seen. > >> all subjects read it's status for monarch, store the status to memory for >> monarch, >> and then monarch gather all status to make decision for all. > > mce_regin, which is only called by monarch CPU, can be used for system > panics as quickly as possible if there is a truly data corrupting error. > But Monarch CPU don't have to help all other CPU to clean mces_clean. > One advantage of Per-CPU is the isolation of errors propagation, being > so, why do not we clean mces_seen by Per-CPU? What kind of error propagations are you expecting/concerning here? Could you explain the problem more in detail? Thanks, H.Seto -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86/mce: Distirbute the clear operation of mces_seen to Per-CPU rather than only monarch CPU
On Wed, 2014-05-21 at 11:43 +0900, Hidetoshi Seto wrote: > (2014/05/21 11:03), Chen Yucong wrote: > > On Wed, 2014-05-21 at 10:40 +0900, Hidetoshi Seto wrote: > >> (2014/05/20 11:11), Chen Yucong wrote: > >>> mces_seen is a Per-CPU variable which should only be accessed by Per-CPU > >>> as possible. So the > >>> clear operation of mces_seen should also be lcoal to Per-CPU rather than > >>> monarch CPU. > >> > >> I don't think it should be local. > >> Originally what we want to have here is memory to save mces_seen for each > >> online cpus, > >> such as a global array like mces_seen[cpus]. But at same time we don't > >> want to preallocate > >> big array enough for max possible cpus. So we use per-cpu store instead. > >> > > But mces_seen will just be updated by Per-CPU rather than monarch CPU. > > It is only read by monarch CPU. > > Because mce status registers are per-cpu and monarch cannot access subjects' > registers > directly, Right. This is one reason why we need to distribute the clear operation to Per-CPU. And in fact it exactly assigns per-cpu property to mces_seen. > all subjects read it's status for monarch, store the status to memory for > monarch, > and then monarch gather all status to make decision for all. mce_regin, which is only called by monarch CPU, can be used for system panics as quickly as possible if there is a truly data corrupting error. But Monarch CPU don't have to help all other CPU to clean mces_clean. One advantage of Per-CPU is the isolation of errors propagation, being so, why do not we clean mces_seen by Per-CPU? thx! cyc > > At last monarch kindly clear gathered status for all. > It will be one of important steps to ready for next mce events. > > I think you should clarify why "distributing the clear operation" is required > here. > What is the benefit? > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86/mce: Distirbute the clear operation of mces_seen to Per-CPU rather than only monarch CPU
(2014/05/21 11:03), Chen Yucong wrote: > On Wed, 2014-05-21 at 10:40 +0900, Hidetoshi Seto wrote: >> (2014/05/20 11:11), Chen Yucong wrote: >>> mces_seen is a Per-CPU variable which should only be accessed by Per-CPU as >>> possible. So the >>> clear operation of mces_seen should also be lcoal to Per-CPU rather than >>> monarch CPU. >> >> I don't think it should be local. >> Originally what we want to have here is memory to save mces_seen for each >> online cpus, >> such as a global array like mces_seen[cpus]. But at same time we don't want >> to preallocate >> big array enough for max possible cpus. So we use per-cpu store instead. >> > But mces_seen will just be updated by Per-CPU rather than monarch CPU. > It is only read by monarch CPU. Because mce status registers are per-cpu and monarch cannot access subjects' registers directly, all subjects read it's status for monarch, store the status to memory for monarch, and then monarch gather all status to make decision for all. At last monarch kindly clear gathered status for all. It will be one of important steps to ready for next mce events. I think you should clarify why "distributing the clear operation" is required here. What is the benefit? Thanks, H.Seto -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86/mce: Distirbute the clear operation of mces_seen to Per-CPU rather than only monarch CPU
On Wed, 2014-05-21 at 10:40 +0900, Hidetoshi Seto wrote: > (2014/05/20 11:11), Chen Yucong wrote: > > mces_seen is a Per-CPU variable which should only be accessed by Per-CPU as > > possible. So the > > clear operation of mces_seen should also be lcoal to Per-CPU rather than > > monarch CPU. > > I don't think it should be local. > Originally what we want to have here is memory to save mces_seen for each > online cpus, > such as a global array like mces_seen[cpus]. But at same time we don't want > to preallocate > big array enough for max possible cpus. So we use per-cpu store instead. > But mces_seen will just be updated by Per-CPU rather than monarch CPU. It is only read by monarch CPU. > > > > Meanwhile, there is also a potential risk that mces_seen will not be be > > cleared if a timeout > > occors in mce_end for monarch CPU. As a reuslt, the stale value of > > mces_seen will reappear > > on the next mce. > > In case if we decide panic in mce path, uncleared mces_seen can be referred > from memory dump. > I suppose it will help trouble investigation. > > > > > Based on the above reasons, this patch distirbute the clear operation of > > mces_seen to Per-CPU > > rather than only monarch CPU. > > "local" "occurs" "result" "distribute" > I recommend you to use spell-checker for your postings... Thanks very much for your suggestion, I will use a spell-cheker for my carelessness. thx! cyc -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86/mce: Distirbute the clear operation of mces_seen to Per-CPU rather than only monarch CPU
(2014/05/20 11:11), Chen Yucong wrote: > mces_seen is a Per-CPU variable which should only be accessed by Per-CPU as > possible. So the > clear operation of mces_seen should also be lcoal to Per-CPU rather than > monarch CPU. I don't think it should be local. Originally what we want to have here is memory to save mces_seen for each online cpus, such as a global array like mces_seen[cpus]. But at same time we don't want to preallocate big array enough for max possible cpus. So we use per-cpu store instead. > > Meanwhile, there is also a potential risk that mces_seen will not be be > cleared if a timeout > occors in mce_end for monarch CPU. As a reuslt, the stale value of mces_seen > will reappear > on the next mce. In case if we decide panic in mce path, uncleared mces_seen can be referred from memory dump. I suppose it will help trouble investigation. > > Based on the above reasons, this patch distirbute the clear operation of > mces_seen to Per-CPU > rather than only monarch CPU. "local" "occurs" "result" "distribute" I recommend you to use spell-checker for your postings... Thanks, H.Seto -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86/mce: Distirbute the clear operation of mces_seen to Per-CPU rather than only monarch CPU
On Tue, 2014-05-20 at 19:33 +0200, Borislav Petkov wrote: > On Tue, May 20, 2014 at 10:11:25AM +0800, Chen Yucong wrote: > > mces_seen is a Per-CPU variable which should only be accessed by > > Per-CPU as possible. So the clear operation of mces_seen should also > > be lcoal to Per-CPU rather than monarch CPU. > > > > Meanwhile, there is also a potential risk that mces_seen will not > > be be cleared if a timeout occors in mce_end for monarch CPU. As a > > reuslt, the stale value of mces_seen will reappear on the next mce. > > I don't know how many times I have to tell you this already: if we reach > the timeout, we have a much bigger friggin' problem! > > What you could do instead is make the machine panic in the tolerant==1, > i.e., the default case, in mce_timed_out(). > > Basically, in the case any core is stuck and we reach a timeout, we want > to panic the whole box immediately. There's a very little chance we can > recover so panic is the only sane thing left to do. > > Ok? Do you have any strong reasons to claim that a timeout is raised by any core which is stuck? In other word, what is the probability that an timeout is caused by stuck cores? thx! cyc -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86/mce: Distirbute the clear operation of mces_seen to Per-CPU rather than only monarch CPU
On Tue, 2014-05-20 at 19:33 +0200, Borislav Petkov wrote: > On Tue, May 20, 2014 at 10:11:25AM +0800, Chen Yucong wrote: > > mces_seen is a Per-CPU variable which should only be accessed by > > Per-CPU as possible. So the clear operation of mces_seen should also > > be lcoal to Per-CPU rather than monarch CPU. > > > > Meanwhile, there is also a potential risk that mces_seen will not > > be be cleared if a timeout occors in mce_end for monarch CPU. As a > > reuslt, the stale value of mces_seen will reappear on the next mce. > > I don't know how many times I have to tell you this already: if we reach > the timeout, we have a much bigger friggin' problem! Even if we do not take into account timeout, we should distribute the clear operation of mces_seen to Per-CPU rather then monarch CPU. mce_regin, which is only called by monarch CPU, can be used for system panics as quickly as possible if there is a truly data corrupting error. But Monarch CPU don't have to help all other CPU to clean mces_clean. One advantage of Per-CPU is the isolation of errors propagation, being so, why do not we clean mces_seen by Per-CPU? You say, "you need to do the cleaning in mce_reign because the monarch cpu has to run last after all other cpus have scanned their mce banks." But this is not an adequate explanation. thx! cyc > > What you could do instead is make the machine panic in the tolerant==1, > i.e., the default case, in mce_timed_out(). > > Basically, in the case any core is stuck and we reach a timeout, we want > to panic the whole box immediately. There's a very little chance we can > recover so panic is the only sane thing left to do. > > Ok? > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] x86/mce: Distirbute the clear operation of mces_seen to Per-CPU rather than only monarch CPU
On Tue, May 20, 2014 at 10:11:25AM +0800, Chen Yucong wrote: > mces_seen is a Per-CPU variable which should only be accessed by > Per-CPU as possible. So the clear operation of mces_seen should also > be lcoal to Per-CPU rather than monarch CPU. > > Meanwhile, there is also a potential risk that mces_seen will not > be be cleared if a timeout occors in mce_end for monarch CPU. As a > reuslt, the stale value of mces_seen will reappear on the next mce. I don't know how many times I have to tell you this already: if we reach the timeout, we have a much bigger friggin' problem! What you could do instead is make the machine panic in the tolerant==1, i.e., the default case, in mce_timed_out(). Basically, in the case any core is stuck and we reach a timeout, we want to panic the whole box immediately. There's a very little chance we can recover so panic is the only sane thing left to do. Ok? -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/