On Wed, 17 Feb 2021 at 12:51, Valentin Schneider <valentin.schnei...@arm.com> wrote: > > On 15/02/21 16:02, Vincent Guittot wrote: > > On Fri, 12 Feb 2021 at 20:19, Valentin Schneider > > <valentin.schnei...@arm.com> wrote: > >> I don't think there is anything inherently wrong with it - the > >> nohz_idle_balance() call resulting from the kick_ilb() IPI will just bail > >> out due to the flags being cleared here. This wasn't immediately clear to > >> me however. > > > > In fact, I forgot to replace the WARN_ON in nohz_csd_func() by a > > simple return as reported by kernel test robot / oliver.s...@intel.com > > > > Can't that actually be a problem? kick_ilb() says: > > * Access to rq::nohz_csd is serialized by NOHZ_KICK_MASK; he who sets > * the first flag owns it; cleared by nohz_csd_func(). > > So if you have: > > kick_ilb() -> kicks CPU42 > > And then said CPU42 goes through, before nohz_csd_func(),: > > do_idle() -> nohz_run_idle_balance() > > you could have yet another CPU do: > > kick_ilb() -> kicks CPU42 > > which would break rq->nohz_csd serialization.
Yeah there are ever further problems and I get some rcu_sched log on my large server with one benchmark with one specific parameter which I can't reproduce on my smaller system. Right now, I'm working on making both exclusive which should be mainly about testing if this_cpu is set in nohz.idle_cpus_mask > > >> > >> > +} > >> > +