On Thu, Jun 06, 2013 at 09:10:06AM -0700, [email protected] wrote: > From: Ben Greear <[email protected]> > > The stop machine logic can lock up if all but one of > the migration threads make it through the disable-irq > step and the one remaining thread gets stuck in > __do_softirq. The reason __do_softirq can hang is > that it has a bail-out based on jiffies timeout, but > in the lockup case, jiffies itself is not incremented. > > To work around this, re-add the max_restart counter in __do_irq > and stop processing irqs after 10 restarts. > > Thanks to Tejun Heo and Rusty Russell and others for > helping me track this down. > > This was introduced in 3.9 by commit: c10d73671ad30f5469 > (softirq: reduce latencies). > > It may be worth looking into ath9k to see if it has issues with > it's irq handler at a later date. > > The hang stack traces look something like this:
Oops, you already posted the second version. :) > /* > - * We restart softirq processing for at most 2 ms, > - * and if need_resched() is not set. > + * We restart softirq processing for at most MAX_SOFTIRQ_RESTART times, > + * but break the loop if need_resched() is set or after 2 ms. > * > * These limits have been established via experimentation. > * The two things to balance is latency against fairness - > @@ -204,6 +204,7 @@ EXPORT_SYMBOL(local_bh_enable_ip); > * should not be able to lock up the box. > */ > #define MAX_SOFTIRQ_TIME msecs_to_jiffies(2) > +#define MAX_SOFTIRQ_RESTART 10 As wrote before, a brief explanation on why both are necessary would be nice. Something like - "the time limit prevents from introducing excessive latency from softirq handling and the loop limit protects against softirq runaway which may happen during stop_machine - see XXX". Please cc Linus and also cc [email protected]. We definitely want this backported. Thanks a lot! -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

