On Tue, 19 Jun 2018, Andy Lutomirski wrote:   

> On Jun 19, 2018, at 9:15 AM, Siarhei Liakh <siarhei.li...@concurrent-rt.com> 
> wrote:
> 
> > On Mon, 18 Jun 2018, Andy Lutomirski wrote:
> > 
> > > > On Thu, Jun 14, 2018 at 10:10 PM Siarhei Liakh
> > > > <siarhei.li...@concurrent-rt.com> wrote:
> > > > >
> > > > > fpu__drop() has an explicit fwait which under some conditions can 
> > > > > trigger
> > > > > a fixable FPU exception while in kernel. Thus, we should attempt to 
> > > > > fixup
> > > > > the exception first, and only call notify_die() if the fixup failed 
> > > > > just
> > > > > like in do_general_protection(). The original call sequence 
> > > > > incorrectly
> > > > > triggers KDB entry on debug kernels under particular FPU-intensive
> > > > > workloads. This issue had been privately observed, fixed, and tested
> > > > > on 4.9.98, while this patch brings the fix to the upstream.
> > > > 
> > > > Reviewed-by: Andy Lutomirski <l...@kernel.org>
> > > > 
> > > > With the caveat that you are perpetuating what is arguably a bug in
> > > > some of the other entries: math_error() can now be called with IRQs
> > > > off and return with IRQs on.  If we actually start asserting good
> > > > behavior in the entry code, we'll need to fix this.
> > > 
> > > Confused. math_error() is still invoked with interrupts off. What's
> > > different now is that notify_die() is called with interrupts conditionally
> > > enabled while upstream it's always called with interrupts disabled.
> > 
> > I see that notify_die() is being called either way in upstream (ex:
> > do_general_protection() and do_iret_error() vs do_bounds() and etc.).
> > Is there some some sort of general policy/guide documentation available
> > which outlines the expectations of notify_die(), as well as its notifiers?
> 
> I doubt it.
> 
> The right fix is to delete notify_die(), not to document it. kernel debuggers 
> should
> hook die() directly, and other users (if any) should be moved into the error 
> handlers.

Got it. Unfortunately, this looks like a whole separate code refactoring project
which I cannot undertake at this time. In the mean time, this patch offers a 
fix for
an immediate issue (KDB tripped when it shouldn't) even if it does nothing to
address the deficiencies in the framework itself. 

Thank you.

Reply via email to