On Thu, Nov 27, 2014 at 11:17:16AM -0800, Linus Torvalds wrote: > On Wed, Nov 26, 2014 at 2:57 PM, Dave Jones <da...@redhat.com> wrote: > > > > So 3.17 also has this problem. > > Good news I guess in that it's not a regression, but damn I really didn't > > want to have to go digging through the mists of time to find the last > > 'good' point. > > So I'm looking at the watchdog code, and it seems racy wrt parking and > startup. > > In particular, it sets the high priority *after* starting the hrtimer, > and it goes back to SCHED_NORMAL *before* canceling the timer. > > Which seems completely ass-backwards. And the smp_hotplug_thread stuff > explicitly enables preemption around the setup/cleanup/part/unpark > operations. > > However, that would be an issue only if trinity might be doing things > that enable and disable the watchdog. And doing so under insane loads. > Even then it seems unlikely. > > The insane loads you have. But even then, could a load average of 169 > possibly delay running a non-RT process for 22 seconds? Doubtful. > > But just in case: do you do cpu hotplug events (that will disable and > re-enable the watchdog process?). Anything else that will part/unpark > the hotplug thread?
That's root-only iirc, and I'm not running trinity as root, so that shouldn't be happening. There's also no sign of such behaviour in dmesg when the problem occurs. > Quite frankly, I'm just grasping for straws here, but a lot of the > watchdog traces really have seemed spurious... Agreed. Currently leaving 3.16 running. 21hrs so far. Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/