On Thu, Sep 04, 2014 at 11:05:02PM +0200, Catalin Iacob wrote: > On Thu, Sep 4, 2014 at 10:17 PM, Frederic Weisbecker <[email protected]> > wrote: > > Yeah, that's expected. You need to apply the nine patches on top of -rc1: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git > > nohz/fixes > > > > "nohz: Restore NMI safe local irq work for local nohz kick" only fixes > > part of the issue. > > Ok, but if the whole series is needed, isn't it better if it all goes > into 3.17? Otherwise 3.17 is a clear regression for some users; it's > definitely for me since before 3.17-rc1 I never saw this bug and now I > see it every time I do something CPU intensive. Maybe the regression > is acceptable because the it's confined to some CONFIG_NO_HZ_* > combination (I think) which is still rather experimental, that's your > call to make, but it's still a regression.
Yeah the bug is there for a while but likely something got merged in the last -rc1 that made the bug more likely to happen. This is probably due to the fact that we converted remote nohz kick to use irq work instead of the scheduler IPI. So it fires more likely and if we are unlucky enough, some tick sees the irq work before the irq work IPI can fire. Or some code enqueues that irq work from the tick itself. Awyway you're right that it belongs to the category of regressions. Unfortunately the fix is invasive. Also I don't know much users of nohz full so probably this won't have much impact. Or this could be a good way to know who uses this feature after all :o) I'm not sure what I should do. Lets see how the final fix will look like, Peter is proposing some simplifications. Then we'll know better. BTW, do you run some specific workloads to trigger this? Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

