On Fri, Nov 30, 2012 at 10:38 AM, Frederic Weisbecker <fweis...@gmail.com> wrote: > 2012/11/28 Hakan Akkan <hakanak...@gmail.com>: >> +static int check_drop_timer_duty(int cpu) >> +{ >> + int curr_handler, prev_handler, new_handler; >> + int nrepeat = -1; >> + bool drop_recheck; >> + >> +repeat: >> + WARN_ON_ONCE(++nrepeat > 1); >> + drop_recheck = false; >> + curr_handler = cpu; >> + new_handler = TICK_DO_TIMER_NONE; >> + >> +#ifdef CONFIG_CPUSETS_NO_HZ >> + if (atomic_read(&nr_cpus_user_nohz) > 0) { > > Note atomic_read() is not SMP ordered. If another CPU does > atomic_add() or atomic_add_return(), you may not see the new value as > expected with atomic_read(). Doing atomic_add_return(0, &atomic_thing) > returns the fully ordered result.
That code doesn't necessarily try to achieve a prefect ordering. I wasn't very sure if we want the overhead of atomic_add_return(0, ...) in such a hot path so this part of the function is racy. Nevertheless, it makes sure that someone gets the duty. #ifdef CONFIG_CPUSETS_NO_HZ if (atomic_read(&nr_cpus_user_nohz) > 0) { The CPU who updated nr_cpus_user_nohz will certainly see the new value and claim the duty if someone else hasn't yet. Another regular CPU will eventually see the new value and assign itself to the duty. So we can tolerate some race here for the sake of avoiding heavier locks. > > You also need to do that to ensure full ordering against > tick_cpu_sched.user_nohz. > > On the current layout we have: > > (Write side) (Read side) > > ts->user_nohz = 1; > atomic_inc(&nr_cpus_user_nohz) > > if > (atomic_read(&nr_cpus_user_nohz)) > if > (per_cpu(tick_cpu_sched, curr_handler).user_nohz) > .... > > If you want to make sure that you see the expected value on user_nohz > from the read side, you need to correctly order the write and read > against nr_cpus_user_nohz. Again, perfect ordering is not critical here. In the worst case we miss that CPUs ts->user_nohz = 1 update and that adaptive nohz CPU will keep the duty for a short while until someone else takes it. Trying to avoid this would require more locking instructions like you suggest below. If we really want to fully isolate nohz CPUs and avoid "being stuck with jiffies update until someone else takes it away" situations, we can have a cpumask denoting idle and/or non-adaptive-nohz CPUs and dump the duty once we discover that we're "stuck". > > For this you can use atomic_add_return() which implies the full barrier: > > > (Write side) (Read side) > > ts->user_nohz = 1; > atomic_inc_return(&nr_cpus_user_nohz) > > if > (atomic_add_return(0, &nr_cpus_user_nohz)) > if > (per_cpu(tick_cpu_sched, curr_handler).user_nohz) > .... > > I have much more comments on this patch, I will come back on this soon. > > Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/