On 12/10, Srivatsa S. Bhat wrote:
>
> On 12/10/2012 01:52 AM, Oleg Nesterov wrote:
> > On 12/10, Srivatsa S. Bhat wrote:
> >>
> >> On 12/10/2012 12:44 AM, Oleg Nesterov wrote:
> >>
> >>> But yes, it is easy to blame somebody else's code ;) And I can't suggest
> >>> something better at least right now. If I understand correctly, we can not
> >>> use, say, synchronize_sched() in _cpu_down() path
> >>
> >> We can't sleep in that code.. so that's a no-go.
> >
> > But we can?
> >
> > Note that I meant _cpu_down(), not get_online_cpus_atomic() or 
> > take_cpu_down().
> >
>
> Maybe I'm missing something, but how would it help if we did a
> synchronize_sched() so early (in _cpu_down())? Another bunch of 
> preempt_disable()
> sections could start immediately after our call to synchronize_sched() no?
> How would we deal with that?

Sorry for confusion. Of course synchronize_sched() alone is not enough.
But we can use it to synchronize with preempt-disabled section and avoid
the barriers/atomic in the fast-path.

For example,

        bool writer_pending;
        DEFINE_RWLOCK(writer_rwlock);
        DEFINE_PER_CPU(int, reader_ctr);

        void get_online_cpus_atomic(void)
        {
                preempt_disable();
        
                if (likely(!writer_pending) || __this_cpu_read(reader_ctr)) {
                        __this_cpu_inc(reader_ctr);
                        return;
                }

                read_lock(&writer_rwlock);
                __this_cpu_inc(reader_ctr);
                read_unlock(&writer_rwlock);
        }

        // lacks release semantics, but we don't care
        void put_online_cpus_atomic(void)
        {
                __this_cpu_dec(reader_ctr);
                preempt_enable();
        }

Now, _cpu_down() does

        writer_pending = true;
        synchronize_sched();

before stop_one_cpu(). When synchronize_sched() returns, we know that
every get_online_cpus_atomic() must see writer_pending == T. And, if
any CPU incremented its reader_ctr we must see it is not zero.

take_cpu_down() does

        write_lock(&writer_rwlock);

        for_each_online_cpu(cpu) {
                while (per_cpu(reader_ctr, cpu))
                        cpu_relax();
        }

and takes the lock.

However. This can lead to the deadlock we already discussed. So
take_cpu_down() should do

 retry:
        write_lock(&writer_rwlock);

        for_each_online_cpu(cpu) {
                if (per_cpu(reader_ctr, cpu)) {
                        write_unlock(&writer_rwlock);
                        goto retry;
                }
        }

to take the lock. But this is livelockable. However, I do not think it
is possible to avoid the livelock.

Just in case, the code above is only for illustration, perhaps it is not
100% correct and perhaps we can do it better. cpu_hotplug.active_writer
is ignored for simplicity, get/put should check current == active_writer.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to