On Tue, 19 Jun 2018, Andy Lutomirski wrote: > On Jun 19, 2018, at 9:15 AM, Siarhei Liakh <[email protected]> > wrote: > > > On Mon, 18 Jun 2018, Andy Lutomirski wrote: > > > > > > On Thu, Jun 14, 2018 at 10:10 PM Siarhei Liakh > > > > <[email protected]> wrote: > > > > > > > > > > fpu__drop() has an explicit fwait which under some conditions can > > > > > trigger > > > > > a fixable FPU exception while in kernel. Thus, we should attempt to > > > > > fixup > > > > > the exception first, and only call notify_die() if the fixup failed > > > > > just > > > > > like in do_general_protection(). The original call sequence > > > > > incorrectly > > > > > triggers KDB entry on debug kernels under particular FPU-intensive > > > > > workloads. This issue had been privately observed, fixed, and tested > > > > > on 4.9.98, while this patch brings the fix to the upstream. > > > > > > > > Reviewed-by: Andy Lutomirski <[email protected]> > > > > > > > > With the caveat that you are perpetuating what is arguably a bug in > > > > some of the other entries: math_error() can now be called with IRQs > > > > off and return with IRQs on. If we actually start asserting good > > > > behavior in the entry code, we'll need to fix this. > > > > > > Confused. math_error() is still invoked with interrupts off. What's > > > different now is that notify_die() is called with interrupts conditionally > > > enabled while upstream it's always called with interrupts disabled. > > > > I see that notify_die() is being called either way in upstream (ex: > > do_general_protection() and do_iret_error() vs do_bounds() and etc.). > > Is there some some sort of general policy/guide documentation available > > which outlines the expectations of notify_die(), as well as its notifiers? > > I doubt it. > > The right fix is to delete notify_die(), not to document it. kernel debuggers > should > hook die() directly, and other users (if any) should be moved into the error > handlers.
Got it. Unfortunately, this looks like a whole separate code refactoring project which I cannot undertake at this time. In the mean time, this patch offers a fix for an immediate issue (KDB tripped when it shouldn't) even if it does nothing to address the deficiencies in the framework itself. Thank you.

