On Thu, Dec 18, 2014 at 07:49:41PM -0800, Linus Torvalds wrote:

 > And when spinlocks start getting  contention, *nested* spinlocks
 > really really hurt. And you've got all the spinlock debugging on etc,
 > don't you?

Yeah, though remember this seems to have for some reason gotten worse
in more recent builds. I've been running kitchen-sink debug kernels
for my trinity runs for the last three years, and it's only this
last few months that this has got to be enough of a problem that I'm
not seeing the more interesting bugs. (Or perhaps we're just getting
better at fixing them in -next now, so my runs are lasting longer..)

 > Also, you do have this:
 > 
 >   sched: RT throttling activated
 > 
 > so there's something going on with RT scheduling too.

I see that fairly often. I've never dug into exactly what causes it, but
it seems to be triggerable just by some long running CPU hogs.

 > So your printouts are finally starting to make sense. But I'm also
 > starting to suspect strongly that the problem is that with all your
 > lock debugging and other overheads (does this still have
 > DEBUG_PAGEALLOC?) you really are getting into a "real" softlockup
 > because things are scaling so horribly badly.
 > 
 > If you now disable spinlock debugging and lockdep, hopefully that page
 > table lock now doesn't always get hung up on the lockdep locking, so
 > it starts scaling much better, and maybe you'd not see this...

I can give it a shot.  Hopefully there's some further mitigation that
could be done to allow a workload like this to survive under a debug
build though, as we've caught *so many* bugs with this stuff in the past.

        Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to