On Wed, 18 Jun 2014, Paul E. McKenney wrote:

> > > > - both RCU stall detector and 'echo l > sysrq-trigger' can (and we've 
> > > >   seen it happening for real) cause a complete, undebuggable, silent 
> > > > hang 
> > > >   of machine (deadlock in NMI context)
> > > 
> > > I could easily add an option to RCU to allow people to tell it not to
> > > use NMIs to dump the stack.  Would that help?
> > 
> > Well, that would make unfortunately the information provided by RCU stall 
> > detector rather useless ... workqueue-based stack dumping is very unlikely 
> > to point its finger to the real offender, as it'd be coming way too late.
> 
> I would not use workqueues, but rather have the CPU detecting the
> stall grovel through the other CPUs' stacks, which is what I do now for
> architectures that don't support NMI-based stack dumps.  Would that be
> a reasonable approach?

That would indeed solve lockups induced by RCU stall detector (and we 
should convert sysrq stack dumping code to use the same mechanism 
afterwards).

But then, the kernel is still polluted by quite a few instances of

        WARN_ON(in_nmi())

        BUG_IN(in_nmi())

        if (in_nmi())
                printk(....)

which need to be fixed separately afterwards anyway.

Thanks,

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to