On Wed, May 21, 2014 at 4:51 PM, Borislav Petkov <b...@alien8.de> wrote: > On Thu, May 22, 2014 at 08:30:33AM +0900, Linus Torvalds wrote: >> If the OS then decides to take down the whole machine, the OS - not >> the hardware - can choose to do something that will punch through >> other CPU's NMI blocking (notably, init/reset), but the hardware doing >> this on its own is just broken if true. > > Not that it is any consolation but MCE is not broadcast on AMD. > > Regardless, exceptions like MCE cannot be held pending and do pierce the > NMI handler on both. > > Now, if the NMI handler experiences a non-broadcast MCE on the same CPU, > while running, we're simply going to panic as we're in kernel space > anyway. > > The only problem is if the NMI handler gets interrupted while running > on a bystander CPU. And I think we could deal with this because the > bystander would not see an MCE and will return safely. We just need > to make sure that it returns back to the said NMI handler and not to > userspace. Unless I'm missing something ...
Under my "always RET unless returning from IST to weird CS or to specific known-invalid-stack regions" proposal this should work fine. In the current code it'll also work fine *unless* it hits really early in the NMI, in which case a second NMI can kill us. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/