On Tue, Oct 15, 2013 at 03:58:06PM -0400, Steven Rostedt wrote: > The WARN_ON_ONCE() code is to trigger a waring only once when some > condition happens. But due to the way it is written it is racy. > > if (unlikely(condition)) { > if (WARN(!__warned)) > __warned = true; > } > > The problem is that multiple CPUs could hit the same warning and > produce multiple output dumps of the same warning, or an interrupt could > happen and hit the same warning and do the warning in the middle of a > previous one, especially since the WARN() does a dump of the current > stack. > > Even more of a problem, a recent WARN_ON_ONCE() that was in the page > fault handler triggered and the stack dump of the WARN() caused the > same WARN_ON_ONCE() get hit again. Since the __warned = true is not > updated until after the WARN() is completed, each WARN() triggered > another page fault causing the stack to be filled and crashed the box. > > The point of WARN_ON() is to warn the user and not to crash the box. > > The easy fix is to update the __warned variable with a xchg(). This way > only one WARN_ON_ONCE() will actually happen, and prevents any issues > of the WARN() causing the same WARN() to be hit and crash the system.
How about just updating __warned without a cmpxchg. It's not that critical if the update is not seen immediately to other CPUs. OTOH it's critical that's it is visible immediately to the current CPU I mean some warrning can be hard to reproduce and happen to some users while staying for several kernel releases. If it's repetitive, the xchg might impact the performance. I may be overly paranoid, but I think barrier() (so that at least we don't recurse locally) alone would be better. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/