Re: frequent lockups in 3.18rc4

Frederic Weisbecker Fri, 21 Nov 2014 13:51:27 -0800

On Fri, Nov 21, 2014 at 01:34:08PM -0800, Andy Lutomirski wrote:
> On Fri, Nov 21, 2014 at 1:32 PM, Frederic Weisbecker <[email protected]> 
> wrote:
> > On Fri, Nov 21, 2014 at 12:01:51PM -0500, Steven Rostedt wrote:
> >> On Fri, Nov 21, 2014 at 11:25:06AM -0500, Tejun Heo wrote:
> >> >
> >> > * Static percpu areas wouldn't trigger fault lazily.  Note that this
> >> >   is not necessarily because the first percpu chunk which contains the
> >> >   static area is embedded inside the kernel linear mapping.  Depending
> >> >   on the memory layout and boot param, percpu allocator may choose to
> >> >   map the first chunk in vmalloc space too; however, this still works
> >> >   out fine because at that point there are no other page tables and
> >> >   the PUD entries covering the first chunk is faulted in before other
> >> >   pages tables are copied from the kernel one.
> >>
> >> That sounds correct.
> >>
> >> >
> >> > * NMI used to be a problem because vmalloc fault handler couldn't
> >> >   safely nest inside NMI handler but this has been fixed since and it
> >> >   should work fine from NMI handlers now.
> >>
> >> Right. Of course "should work fine" does not excatly mean "will work fine".
> >>
> >>
> >> >
> >> > * Function tracers are problematic because they may end up nesting
> >> >   inside themselves through triggering a vmalloc fault while accessing
> >> >   dynamic percpu memory area.  This may lead to recursive locking and
> >> >   other surprises.
> >>
> >> The function tracer infrastructure now has a recursive check that happens
> >> rather early in the call. Unless the registered OPS specifically states
> >> it handles recursions (FTRACE_OPS_FL_RECUSION_SAFE), ftrace will add the
> >> necessary recursion checks. If a registered OPS lies about being recusion
> >> safe, well we can't stop suicide.
> >
> > Same if the recursion state is based on per cpu memory.
> >
> >>
> >> Looking at kernel/trace/trace_functions.c: function_trace_call() which is
> >> registered with RECURSION_SAFE, I see that the recursion check is done
> >> before the per_cpu_ptr() call to the dynamically allocated per_cpu data.
> >>
> >> It looks OK, but...
> >>
> >> Oh! but if we trace the page fault handler, and we fault here too
> >> we just nuked the cr2 register. Not good.
> >
> > If we fault in the page fault handler, we double fault and apparently
> > recovering from that isn't quite expected anyway.
> 
> Not quite.  We only double fault if we fault while pushing the
> hardware part of the state onto the stack.  That happens even before
> the entry asm gets run.
> 
> Otherwise if we have a page fault inside do_page_fault, it's just a
> nested page fault.


Oh ok!

But we still have the cr2 issue that Steve talked about.

> 
> --Andy
> 
> 
> -- 
> Andy Lutomirski
> AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: frequent lockups in 3.18rc4

Reply via email to