On 12/16/15, Jeff Merkey <linux....@gmail.com> wrote: > Setting the (trap flag | resume flag) inside of an nmi handler results > in a hard lockup while setting the resume flag works fine. > > The watchdog detector fails to detect the lockup. I am currently > examining the trap gate and interrupt gate setup on Linux and if > anyone has any ideas it would be nice to be able to debug and step > through the nmi handlers. I got breakpoints to work. I noticed > kgdb/kdb just punts here and refuses to allow someone to step inside > an nmi handler. > > There is no reason Linux should not allow this to work since windows > does and every other OS out there. I have seen this across some rex64 > sysret calls as well this lockup behavior. > > Anyone who is an intel expert with any clues would love some input if > you know about this problem. > > Jeff >
This bug has been located. Results from returning from NMI interrupt with trap flag set in to a userspace address as Andy suspected but its not due to the RSP value being different as he suggested. This is a separate bug from the rex64 sysret bug. Results in the NMI handler switching IDT entries if an NMI fires off in a debug stack. Ironic since the code claims it is switching stacks to enable debugging of NMI handlers and does the opposite -- breaks them. Commenting out this code gets rid of the hard lockup. The user space process that gets the trap flag and doesn't expect a trap flag just hangs (but the just that process the rest of the system keeps running). So a few bugs to run down still. NMI handlers can now be debugged -- kindof. This bug is closed and I will issue a patch for it. It's a condition where a trap flag is set inside an nmi handler that exits to a userspace address. The code for setting and clearing the trap in kernel all worked correctly for the userspace path, except it put the process to sleep when it shouldn't have. It's not a condition that can happen during normal operations unless you set the trap flag from a debugger inside an NMI handler and try to debug it then exit the handler into userspace, so I think the probability of this showing up outside a debugging session is low. I verified that kgdb/kdb also experiences this bug (If I comment out the code blocking folks from debugging NMI handlers with kgdb/kdb). Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/