subject:"\[RFC PATCH v3 1\/9\] CPU hotplug\: Provide APIs to prevent CPU offline from atomic context"

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-11 Thread Srivatsa S. Bhat

On 12/11/2012 07:37 PM, Tejun Heo wrote:
> Hello,
> 
> On Tue, Dec 11, 2012 at 07:32:13PM +0530, Srivatsa S. Bhat wrote:
>> On 12/11/2012 07:17 PM, Tejun Heo wrote:
>>> Hello, Srivatsa.
>>>
>>> On Tue, Dec 11, 2012 at 06:43:54PM +0530, Srivatsa S. Bhat wrote:
 This approach (of using synchronize_sched()) also looks good. It is simple,
 yet effective, but unfortunately inefficient at the writer side (because
 he'll have to wait for a full synchronize_sched()).
>>>
>>> While synchornize_sched() is heavier on the writer side than the
>>> originally posted version, it doesn't stall the whole machine and
>>> wouldn't introduce latencies to others.  Shouldn't that be enough?
>>>
>>
>> Short answer: Yes. But we can do better, with almost comparable code
>> complexity. So I'm tempted to try that out.
>>
>> Long answer:
>> Even in the synchronize_sched() approach, we still have to identify the
>> readers who need to be converted to use the new get/put_online_cpus_atomic()
>> APIs and convert them. Then, if we can come up with a scheme such that
>> the writer has to wait only for those readers to complete, then why not?
>>
>> If such a scheme ends up becoming too complicated, then I agree, we
>> can use synchronize_sched() itself. (That's what I meant by saying that
>> we'll use this as a fallback).
>>
>> But even in this scheme which uses synchronize_sched(), we are
>> already half-way through (we already use 2 types of sync schemes -
>> counters and rwlocks). Just a little more logic can get rid of the
>> unnecessary full-wait too.. So why not give it a shot?
> 
> It's not really about the code complexity but making the reader side
> as light as possible.  Please keep in mind that reader side is still
> *way* more hotter than the writer side.  Before, the writer side was
> heavy to the extent which causes noticeable disruptions on the whole
> system and I think that's what we're trying to hunt down here.  If we
> can shave of memory barriers from reader side by using
> synchornized_sched() on writer side, that is the *better* result, not
> worse.
> 

That's a fair point. Hmmm...
Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-11 Thread Tejun Heo

Hello,

On Tue, Dec 11, 2012 at 07:32:13PM +0530, Srivatsa S. Bhat wrote:
> On 12/11/2012 07:17 PM, Tejun Heo wrote:
> > Hello, Srivatsa.
> > 
> > On Tue, Dec 11, 2012 at 06:43:54PM +0530, Srivatsa S. Bhat wrote:
> >> This approach (of using synchronize_sched()) also looks good. It is simple,
> >> yet effective, but unfortunately inefficient at the writer side (because
> >> he'll have to wait for a full synchronize_sched()).
> > 
> > While synchornize_sched() is heavier on the writer side than the
> > originally posted version, it doesn't stall the whole machine and
> > wouldn't introduce latencies to others.  Shouldn't that be enough?
> > 
> 
> Short answer: Yes. But we can do better, with almost comparable code
> complexity. So I'm tempted to try that out.
> 
> Long answer:
> Even in the synchronize_sched() approach, we still have to identify the
> readers who need to be converted to use the new get/put_online_cpus_atomic()
> APIs and convert them. Then, if we can come up with a scheme such that
> the writer has to wait only for those readers to complete, then why not?
> 
> If such a scheme ends up becoming too complicated, then I agree, we
> can use synchronize_sched() itself. (That's what I meant by saying that
> we'll use this as a fallback).
> 
> But even in this scheme which uses synchronize_sched(), we are
> already half-way through (we already use 2 types of sync schemes -
> counters and rwlocks). Just a little more logic can get rid of the
> unnecessary full-wait too.. So why not give it a shot?

It's not really about the code complexity but making the reader side
as light as possible.  Please keep in mind that reader side is still
*way* more hotter than the writer side.  Before, the writer side was
heavy to the extent which causes noticeable disruptions on the whole
system and I think that's what we're trying to hunt down here.  If we
can shave of memory barriers from reader side by using
synchornized_sched() on writer side, that is the *better* result, not
worse.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-11 Thread Srivatsa S. Bhat

On 12/11/2012 07:17 PM, Tejun Heo wrote:
> Hello, Srivatsa.
> 
> On Tue, Dec 11, 2012 at 06:43:54PM +0530, Srivatsa S. Bhat wrote:
>> This approach (of using synchronize_sched()) also looks good. It is simple,
>> yet effective, but unfortunately inefficient at the writer side (because
>> he'll have to wait for a full synchronize_sched()).
> 
> While synchornize_sched() is heavier on the writer side than the
> originally posted version, it doesn't stall the whole machine and
> wouldn't introduce latencies to others.  Shouldn't that be enough?
> 

Short answer: Yes. But we can do better, with almost comparable code
complexity. So I'm tempted to try that out.

Long answer:
Even in the synchronize_sched() approach, we still have to identify the
readers who need to be converted to use the new get/put_online_cpus_atomic()
APIs and convert them. Then, if we can come up with a scheme such that
the writer has to wait only for those readers to complete, then why not?

If such a scheme ends up becoming too complicated, then I agree, we
can use synchronize_sched() itself. (That's what I meant by saying that
we'll use this as a fallback).

But even in this scheme which uses synchronize_sched(), we are
already half-way through (we already use 2 types of sync schemes -
counters and rwlocks). Just a little more logic can get rid of the
unnecessary full-wait too.. So why not give it a shot?

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-11 Thread Tejun Heo

Hello, Srivatsa.

On Tue, Dec 11, 2012 at 06:43:54PM +0530, Srivatsa S. Bhat wrote:
> This approach (of using synchronize_sched()) also looks good. It is simple,
> yet effective, but unfortunately inefficient at the writer side (because
> he'll have to wait for a full synchronize_sched()).

While synchornize_sched() is heavier on the writer side than the
originally posted version, it doesn't stall the whole machine and
wouldn't introduce latencies to others.  Shouldn't that be enough?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-11 Thread Srivatsa S. Bhat

On 12/10/2012 10:54 PM, Oleg Nesterov wrote:
> On 12/10, Srivatsa S. Bhat wrote:
>>
>> On 12/10/2012 01:52 AM, Oleg Nesterov wrote:
>>> On 12/10, Srivatsa S. Bhat wrote:

 On 12/10/2012 12:44 AM, Oleg Nesterov wrote:

> But yes, it is easy to blame somebody else's code ;) And I can't suggest
> something better at least right now. If I understand correctly, we can not
> use, say, synchronize_sched() in _cpu_down() path

 We can't sleep in that code.. so that's a no-go.
>>>
>>> But we can?
>>>
>>> Note that I meant _cpu_down(), not get_online_cpus_atomic() or 
>>> take_cpu_down().
>>>
>>
>> Maybe I'm missing something, but how would it help if we did a
>> synchronize_sched() so early (in _cpu_down())? Another bunch of 
>> preempt_disable()
>> sections could start immediately after our call to synchronize_sched() no?
>> How would we deal with that?
> 
> Sorry for confusion. Of course synchronize_sched() alone is not enough.
> But we can use it to synchronize with preempt-disabled section and avoid
> the barriers/atomic in the fast-path.
> 
> For example,
> 
>   bool writer_pending;
>   DEFINE_RWLOCK(writer_rwlock);
>   DEFINE_PER_CPU(int, reader_ctr);
> 
>   void get_online_cpus_atomic(void)
>   {
>   preempt_disable();
>   
>   if (likely(!writer_pending) || __this_cpu_read(reader_ctr)) {
>   __this_cpu_inc(reader_ctr);
>   return;
>   }
> 
>   read_lock(_rwlock);
>   __this_cpu_inc(reader_ctr);
>   read_unlock(_rwlock);
>   }
> 
>   // lacks release semantics, but we don't care
>   void put_online_cpus_atomic(void)
>   {
>   __this_cpu_dec(reader_ctr);
>   preempt_enable();
>   }
> 
> Now, _cpu_down() does
> 
>   writer_pending = true;
>   synchronize_sched();
> 
> before stop_one_cpu(). When synchronize_sched() returns, we know that
> every get_online_cpus_atomic() must see writer_pending == T. And, if
> any CPU incremented its reader_ctr we must see it is not zero.
> 
> take_cpu_down() does
> 
>   write_lock(_rwlock);
> 
>   for_each_online_cpu(cpu) {
>   while (per_cpu(reader_ctr, cpu))
>   cpu_relax();
>   }
> 
> and takes the lock.
> 
> However. This can lead to the deadlock we already discussed. So
> take_cpu_down() should do
> 
>  retry:
>   write_lock(_rwlock);
> 
>   for_each_online_cpu(cpu) {
>   if (per_cpu(reader_ctr, cpu)) {
>   write_unlock(_rwlock);
>   goto retry;
>   }
>   }
> 
> to take the lock. But this is livelockable. However, I do not think it
> is possible to avoid the livelock.
> 
> Just in case, the code above is only for illustration, perhaps it is not
> 100% correct and perhaps we can do it better. cpu_hotplug.active_writer
> is ignored for simplicity, get/put should check current == active_writer.
> 

This approach (of using synchronize_sched()) also looks good. It is simple,
yet effective, but unfortunately inefficient at the writer side (because
he'll have to wait for a full synchronize_sched()).

So I would say that we should keep this approach as a fallback, if we don't
come up with any better synchronization scheme (in terms of efficiency) at
a comparable level of simplicity/complexity.

I have come up with a v4 that simplifies several aspects of the synchronization
and makes it look a lot more simpler than v3. (Lesser race windows to take
care of, implicit ACKs, no atomic ops etc..)

I'll post it soon. Let me know what you think...

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-11 Thread Srivatsa S. Bhat

On 12/10/2012 10:58 PM, Oleg Nesterov wrote:
> On 12/10, Srivatsa S. Bhat wrote:
>>
>> On 12/10/2012 02:43 AM, Oleg Nesterov wrote:
>>> Damn, sorry for noise. I missed this part...
>>>
>>> On 12/10, Srivatsa S. Bhat wrote:

 On 12/10/2012 12:44 AM, Oleg Nesterov wrote:
> the latency. And I guess something like kick_all_cpus_sync() is "too 
> heavy".

 I hadn't considered that. Thinking of it, I don't think it would help us..
 It won't get rid of the currently running preempt_disable() sections no?
>>>
>>> Sure. But (again, this is only my feeling so far) given that 
>>> get_online_cpus_atomic()
>>> does cli/sti,
>>
>> Ah, that one! Actually, the only reason I do that cli/sti is because, 
>> potentially
>> interrupt handlers can be hotplug readers too. So we need to protect the 
>> portion
>> of the code of get_online_cpus_atomic() which is not re-entrant.
> 
> Yes, I understand.
> 
>>> this can help to implement ensure-the-readers-must-see-the-pending-writer.
>>> IOW this might help to implement sync-with-readers.
>>>
>>
>> 2 problems:
>>
>> 1. It won't help with cases like this:
>>
>>preempt_disable()
>>  ...
>>  preempt_disable()
>>  ...
>>  <--- Here
>>  ...
>>  preempt_enable()
>>  ...
>>preempt_enable()
> 
> No, I meant that kick_all_cpus_sync() can be used to synchronize with
> cli/sti in get_online_cpus_atomic(), just like synchronize_sched() does
> in the code I posted a minute ago.
> 

Ah, OK.

>> 2. Part of the reason we want to get rid of stop_machine() is to avoid the
>> latency it induces on _all_ CPUs just to take *one* CPU offline. If we use
>> kick_all_cpus_sync(), we get into that territory again : we unfairly 
>> interrupt
>> every CPU, _even when_ that CPU's existing preempt_disabled() sections might
>> not actually be hotplug readers! (ie., not bothered about CPU Hotplug).
> 
> I agree, that is why I said it is "too heavy".
> 

Got it :)

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-11 Thread Srivatsa S. Bhat

On 12/10/2012 11:45 PM, Oleg Nesterov wrote:
> On 12/10, Srivatsa S. Bhat wrote:
>>
>> On 12/10/2012 02:27 AM, Oleg Nesterov wrote:
>>> However. If this is true, then compared to preempt_disable/stop_machine
>>> livelock is possible. Probably this is fine, we have the same problem with
>>> get_online_cpus(). But if we can accept this fact I feel we can simmplify
>>> this somehow... Can't prove, only feel ;)
>>
>> Not sure I follow..
> 
> I meant that write_lock_irqsave(_rwlock) in take_cpu_down()
> can spin "forever".
> 
> Suppose that reader_acked() == T on every CPU, so that
> get_online_cpus_atomic() always takes read_lock(_rwlock).
> 
> It is possible that this lock will be never released by readers,
> 
>   CPU_0   CPU_1
> 
>   get_online_cpus_atomic()
>   get_online_cpus_atomic()
>   put_online_cpus_atomic()
> 
>   get_online_cpus_atomic()
>   put_online_cpus_atomic()
> 
>   get_online_cpus_atomic()
>   put_online_cpus_atomic()
> 
> and so on.
> 

Right, and we can't do anything about it :(
Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-11 Thread Srivatsa S. Bhat

On 12/10/2012 11:45 PM, Oleg Nesterov wrote:
 On 12/10, Srivatsa S. Bhat wrote:

 On 12/10/2012 02:27 AM, Oleg Nesterov wrote:
 However. If this is true, then compared to preempt_disable/stop_machine
 livelock is possible. Probably this is fine, we have the same problem with
 get_online_cpus(). But if we can accept this fact I feel we can simmplify
 this somehow... Can't prove, only feel ;)

 Not sure I follow..
 
 I meant that write_lock_irqsave(hotplug_rwlock) in take_cpu_down()
 can spin forever.
 
 Suppose that reader_acked() == T on every CPU, so that
 get_online_cpus_atomic() always takes read_lock(hotplug_rwlock).
 
 It is possible that this lock will be never released by readers,
 
   CPU_0   CPU_1
 
   get_online_cpus_atomic()
   get_online_cpus_atomic()
   put_online_cpus_atomic()
 
   get_online_cpus_atomic()
   put_online_cpus_atomic()
 
   get_online_cpus_atomic()
   put_online_cpus_atomic()
 
 and so on.
 

Right, and we can't do anything about it :(
Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-11 Thread Srivatsa S. Bhat

On 12/10/2012 10:58 PM, Oleg Nesterov wrote:
 On 12/10, Srivatsa S. Bhat wrote:

 On 12/10/2012 02:43 AM, Oleg Nesterov wrote:
 Damn, sorry for noise. I missed this part...

 On 12/10, Srivatsa S. Bhat wrote:

 On 12/10/2012 12:44 AM, Oleg Nesterov wrote:
 the latency. And I guess something like kick_all_cpus_sync() is too 
 heavy.

 I hadn't considered that. Thinking of it, I don't think it would help us..
 It won't get rid of the currently running preempt_disable() sections no?

 Sure. But (again, this is only my feeling so far) given that 
 get_online_cpus_atomic()
 does cli/sti,

 Ah, that one! Actually, the only reason I do that cli/sti is because, 
 potentially
 interrupt handlers can be hotplug readers too. So we need to protect the 
 portion
 of the code of get_online_cpus_atomic() which is not re-entrant.
 
 Yes, I understand.
 
 this can help to implement ensure-the-readers-must-see-the-pending-writer.
 IOW this might help to implement sync-with-readers.


 2 problems:

 1. It won't help with cases like this:

preempt_disable()
  ...
  preempt_disable()
  ...
  --- Here
  ...
  preempt_enable()
  ...
preempt_enable()
 
 No, I meant that kick_all_cpus_sync() can be used to synchronize with
 cli/sti in get_online_cpus_atomic(), just like synchronize_sched() does
 in the code I posted a minute ago.
 

Ah, OK.

 2. Part of the reason we want to get rid of stop_machine() is to avoid the
 latency it induces on _all_ CPUs just to take *one* CPU offline. If we use
 kick_all_cpus_sync(), we get into that territory again : we unfairly 
 interrupt
 every CPU, _even when_ that CPU's existing preempt_disabled() sections might
 not actually be hotplug readers! (ie., not bothered about CPU Hotplug).
 
 I agree, that is why I said it is too heavy.
 

Got it :)

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-11 Thread Srivatsa S. Bhat

On 12/10/2012 10:54 PM, Oleg Nesterov wrote:
 On 12/10, Srivatsa S. Bhat wrote:

 On 12/10/2012 01:52 AM, Oleg Nesterov wrote:
 On 12/10, Srivatsa S. Bhat wrote:

 On 12/10/2012 12:44 AM, Oleg Nesterov wrote:

 But yes, it is easy to blame somebody else's code ;) And I can't suggest
 something better at least right now. If I understand correctly, we can not
 use, say, synchronize_sched() in _cpu_down() path

 We can't sleep in that code.. so that's a no-go.

 But we can?

 Note that I meant _cpu_down(), not get_online_cpus_atomic() or 
 take_cpu_down().


 Maybe I'm missing something, but how would it help if we did a
 synchronize_sched() so early (in _cpu_down())? Another bunch of 
 preempt_disable()
 sections could start immediately after our call to synchronize_sched() no?
 How would we deal with that?
 
 Sorry for confusion. Of course synchronize_sched() alone is not enough.
 But we can use it to synchronize with preempt-disabled section and avoid
 the barriers/atomic in the fast-path.
 
 For example,
 
   bool writer_pending;
   DEFINE_RWLOCK(writer_rwlock);
   DEFINE_PER_CPU(int, reader_ctr);
 
   void get_online_cpus_atomic(void)
   {
   preempt_disable();
   
   if (likely(!writer_pending) || __this_cpu_read(reader_ctr)) {
   __this_cpu_inc(reader_ctr);
   return;
   }
 
   read_lock(writer_rwlock);
   __this_cpu_inc(reader_ctr);
   read_unlock(writer_rwlock);
   }
 
   // lacks release semantics, but we don't care
   void put_online_cpus_atomic(void)
   {
   __this_cpu_dec(reader_ctr);
   preempt_enable();
   }
 
 Now, _cpu_down() does
 
   writer_pending = true;
   synchronize_sched();
 
 before stop_one_cpu(). When synchronize_sched() returns, we know that
 every get_online_cpus_atomic() must see writer_pending == T. And, if
 any CPU incremented its reader_ctr we must see it is not zero.
 
 take_cpu_down() does
 
   write_lock(writer_rwlock);
 
   for_each_online_cpu(cpu) {
   while (per_cpu(reader_ctr, cpu))
   cpu_relax();
   }
 
 and takes the lock.
 
 However. This can lead to the deadlock we already discussed. So
 take_cpu_down() should do
 
  retry:
   write_lock(writer_rwlock);
 
   for_each_online_cpu(cpu) {
   if (per_cpu(reader_ctr, cpu)) {
   write_unlock(writer_rwlock);
   goto retry;
   }
   }
 
 to take the lock. But this is livelockable. However, I do not think it
 is possible to avoid the livelock.
 
 Just in case, the code above is only for illustration, perhaps it is not
 100% correct and perhaps we can do it better. cpu_hotplug.active_writer
 is ignored for simplicity, get/put should check current == active_writer.
 

This approach (of using synchronize_sched()) also looks good. It is simple,
yet effective, but unfortunately inefficient at the writer side (because
he'll have to wait for a full synchronize_sched()).

So I would say that we should keep this approach as a fallback, if we don't
come up with any better synchronization scheme (in terms of efficiency) at
a comparable level of simplicity/complexity.

I have come up with a v4 that simplifies several aspects of the synchronization
and makes it look a lot more simpler than v3. (Lesser race windows to take
care of, implicit ACKs, no atomic ops etc..)

I'll post it soon. Let me know what you think...

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-11 Thread Tejun Heo

Hello, Srivatsa.

On Tue, Dec 11, 2012 at 06:43:54PM +0530, Srivatsa S. Bhat wrote:
 This approach (of using synchronize_sched()) also looks good. It is simple,
 yet effective, but unfortunately inefficient at the writer side (because
 he'll have to wait for a full synchronize_sched()).

While synchornize_sched() is heavier on the writer side than the
originally posted version, it doesn't stall the whole machine and
wouldn't introduce latencies to others.  Shouldn't that be enough?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-11 Thread Srivatsa S. Bhat

On 12/11/2012 07:17 PM, Tejun Heo wrote:
 Hello, Srivatsa.
 
 On Tue, Dec 11, 2012 at 06:43:54PM +0530, Srivatsa S. Bhat wrote:
 This approach (of using synchronize_sched()) also looks good. It is simple,
 yet effective, but unfortunately inefficient at the writer side (because
 he'll have to wait for a full synchronize_sched()).
 
 While synchornize_sched() is heavier on the writer side than the
 originally posted version, it doesn't stall the whole machine and
 wouldn't introduce latencies to others.  Shouldn't that be enough?
 

Short answer: Yes. But we can do better, with almost comparable code
complexity. So I'm tempted to try that out.

Long answer:
Even in the synchronize_sched() approach, we still have to identify the
readers who need to be converted to use the new get/put_online_cpus_atomic()
APIs and convert them. Then, if we can come up with a scheme such that
the writer has to wait only for those readers to complete, then why not?

If such a scheme ends up becoming too complicated, then I agree, we
can use synchronize_sched() itself. (That's what I meant by saying that
we'll use this as a fallback).

But even in this scheme which uses synchronize_sched(), we are
already half-way through (we already use 2 types of sync schemes -
counters and rwlocks). Just a little more logic can get rid of the
unnecessary full-wait too.. So why not give it a shot?

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-11 Thread Tejun Heo

Hello,

On Tue, Dec 11, 2012 at 07:32:13PM +0530, Srivatsa S. Bhat wrote:
 On 12/11/2012 07:17 PM, Tejun Heo wrote:
  Hello, Srivatsa.
  
  On Tue, Dec 11, 2012 at 06:43:54PM +0530, Srivatsa S. Bhat wrote:
  This approach (of using synchronize_sched()) also looks good. It is simple,
  yet effective, but unfortunately inefficient at the writer side (because
  he'll have to wait for a full synchronize_sched()).
  
  While synchornize_sched() is heavier on the writer side than the
  originally posted version, it doesn't stall the whole machine and
  wouldn't introduce latencies to others.  Shouldn't that be enough?
  
 
 Short answer: Yes. But we can do better, with almost comparable code
 complexity. So I'm tempted to try that out.
 
 Long answer:
 Even in the synchronize_sched() approach, we still have to identify the
 readers who need to be converted to use the new get/put_online_cpus_atomic()
 APIs and convert them. Then, if we can come up with a scheme such that
 the writer has to wait only for those readers to complete, then why not?
 
 If such a scheme ends up becoming too complicated, then I agree, we
 can use synchronize_sched() itself. (That's what I meant by saying that
 we'll use this as a fallback).
 
 But even in this scheme which uses synchronize_sched(), we are
 already half-way through (we already use 2 types of sync schemes -
 counters and rwlocks). Just a little more logic can get rid of the
 unnecessary full-wait too.. So why not give it a shot?

It's not really about the code complexity but making the reader side
as light as possible.  Please keep in mind that reader side is still
*way* more hotter than the writer side.  Before, the writer side was
heavy to the extent which causes noticeable disruptions on the whole
system and I think that's what we're trying to hunt down here.  If we
can shave of memory barriers from reader side by using
synchornized_sched() on writer side, that is the *better* result, not
worse.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-11 Thread Srivatsa S. Bhat

On 12/11/2012 07:37 PM, Tejun Heo wrote:
 Hello,
 
 On Tue, Dec 11, 2012 at 07:32:13PM +0530, Srivatsa S. Bhat wrote:
 On 12/11/2012 07:17 PM, Tejun Heo wrote:
 Hello, Srivatsa.

 On Tue, Dec 11, 2012 at 06:43:54PM +0530, Srivatsa S. Bhat wrote:
 This approach (of using synchronize_sched()) also looks good. It is simple,
 yet effective, but unfortunately inefficient at the writer side (because
 he'll have to wait for a full synchronize_sched()).

 While synchornize_sched() is heavier on the writer side than the
 originally posted version, it doesn't stall the whole machine and
 wouldn't introduce latencies to others.  Shouldn't that be enough?


 Short answer: Yes. But we can do better, with almost comparable code
 complexity. So I'm tempted to try that out.

 Long answer:
 Even in the synchronize_sched() approach, we still have to identify the
 readers who need to be converted to use the new get/put_online_cpus_atomic()
 APIs and convert them. Then, if we can come up with a scheme such that
 the writer has to wait only for those readers to complete, then why not?

 If such a scheme ends up becoming too complicated, then I agree, we
 can use synchronize_sched() itself. (That's what I meant by saying that
 we'll use this as a fallback).

 But even in this scheme which uses synchronize_sched(), we are
 already half-way through (we already use 2 types of sync schemes -
 counters and rwlocks). Just a little more logic can get rid of the
 unnecessary full-wait too.. So why not give it a shot?
 
 It's not really about the code complexity but making the reader side
 as light as possible.  Please keep in mind that reader side is still
 *way* more hotter than the writer side.  Before, the writer side was
 heavy to the extent which causes noticeable disruptions on the whole
 system and I think that's what we're trying to hunt down here.  If we
 can shave of memory barriers from reader side by using
 synchornized_sched() on writer side, that is the *better* result, not
 worse.
 

That's a fair point. Hmmm...
Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-10 Thread Oleg Nesterov

On 12/10, Srivatsa S. Bhat wrote:
>
> On 12/10/2012 02:27 AM, Oleg Nesterov wrote:
> > On 12/07, Srivatsa S. Bhat wrote:
> >>
> >> 4. No deadlock possibilities
> >>
> >>Per-cpu locking is not the way to go if we want to have relaxed rules
> >>for lock-ordering. Because, we can end up in circular-locking 
> >> dependencies
> >>as explained in https://lkml.org/lkml/2012/12/6/290
> >
> > OK, but this assumes that, contrary to what Steven said, read-write-read
> > deadlock is not possible when it comes to rwlock_t.
>
> What I meant is, with a single (global) rwlock, you can't deadlock like that.

Ah. I greatly misunderstood Steven's email,

http://marc.info/?l=linux-pm=135482212307876

Somehow I didn't notice he described the deadlock with _two_ rwlock's, I
wrongly thought that his point is that read_lock() is not recursive (like
down_read).

> Let me know if my assumptions are incorrect!

No, sorry, I misunderstood Steven.


> > However. If this is true, then compared to preempt_disable/stop_machine
> > livelock is possible. Probably this is fine, we have the same problem with
> > get_online_cpus(). But if we can accept this fact I feel we can simmplify
> > this somehow... Can't prove, only feel ;)
>
> Not sure I follow..

I meant that write_lock_irqsave(_rwlock) in take_cpu_down()
can spin "forever".

Suppose that reader_acked() == T on every CPU, so that
get_online_cpus_atomic() always takes read_lock(_rwlock).

It is possible that this lock will be never released by readers,

CPU_0   CPU_1

get_online_cpus_atomic()
get_online_cpus_atomic()
put_online_cpus_atomic()

get_online_cpus_atomic()
put_online_cpus_atomic()

get_online_cpus_atomic()
put_online_cpus_atomic()

and so on.


> Reader-side:
>-> read_lock() your per-cpu rwlock and proceed.
>
> Writer-side:
>-> for_each_online_cpu(cpu)
>   write_lock(per-cpu rwlock of 'cpu');

Yes, yes, this is clear.

> Also, like Tejun said, one of the important measures for per-cpu rwlocks
> should be that, if a user replaces global rwlocks with percpu rwlocks (for
> performance reasons), he shouldn't suddenly end up in numerous deadlock
> possibilities which never existed before. The replacement should continue to
> remain safe, and perhaps improve the performance.

Sure, I agree.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-10 Thread Oleg Nesterov

On 12/10, Srivatsa S. Bhat wrote:
>
> On 12/10/2012 02:43 AM, Oleg Nesterov wrote:
> > Damn, sorry for noise. I missed this part...
> >
> > On 12/10, Srivatsa S. Bhat wrote:
> >>
> >> On 12/10/2012 12:44 AM, Oleg Nesterov wrote:
> >>> the latency. And I guess something like kick_all_cpus_sync() is "too 
> >>> heavy".
> >>
> >> I hadn't considered that. Thinking of it, I don't think it would help us..
> >> It won't get rid of the currently running preempt_disable() sections no?
> >
> > Sure. But (again, this is only my feeling so far) given that 
> > get_online_cpus_atomic()
> > does cli/sti,
>
> Ah, that one! Actually, the only reason I do that cli/sti is because, 
> potentially
> interrupt handlers can be hotplug readers too. So we need to protect the 
> portion
> of the code of get_online_cpus_atomic() which is not re-entrant.

Yes, I understand.

> > this can help to implement ensure-the-readers-must-see-the-pending-writer.
> > IOW this might help to implement sync-with-readers.
> >
>
> 2 problems:
>
> 1. It won't help with cases like this:
>
>preempt_disable()
>   ...
>   preempt_disable()
>   ...
>   <--- Here
>   ...
>   preempt_enable()
>   ...
>preempt_enable()

No, I meant that kick_all_cpus_sync() can be used to synchronize with
cli/sti in get_online_cpus_atomic(), just like synchronize_sched() does
in the code I posted a minute ago.

> 2. Part of the reason we want to get rid of stop_machine() is to avoid the
> latency it induces on _all_ CPUs just to take *one* CPU offline. If we use
> kick_all_cpus_sync(), we get into that territory again : we unfairly interrupt
> every CPU, _even when_ that CPU's existing preempt_disabled() sections might
> not actually be hotplug readers! (ie., not bothered about CPU Hotplug).

I agree, that is why I said it is "too heavy".

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-10 Thread Oleg Nesterov

On 12/10, Srivatsa S. Bhat wrote:
>
> On 12/10/2012 01:52 AM, Oleg Nesterov wrote:
> > On 12/10, Srivatsa S. Bhat wrote:
> >>
> >> On 12/10/2012 12:44 AM, Oleg Nesterov wrote:
> >>
> >>> But yes, it is easy to blame somebody else's code ;) And I can't suggest
> >>> something better at least right now. If I understand correctly, we can not
> >>> use, say, synchronize_sched() in _cpu_down() path
> >>
> >> We can't sleep in that code.. so that's a no-go.
> >
> > But we can?
> >
> > Note that I meant _cpu_down(), not get_online_cpus_atomic() or 
> > take_cpu_down().
> >
>
> Maybe I'm missing something, but how would it help if we did a
> synchronize_sched() so early (in _cpu_down())? Another bunch of 
> preempt_disable()
> sections could start immediately after our call to synchronize_sched() no?
> How would we deal with that?

Sorry for confusion. Of course synchronize_sched() alone is not enough.
But we can use it to synchronize with preempt-disabled section and avoid
the barriers/atomic in the fast-path.

For example,

bool writer_pending;
DEFINE_RWLOCK(writer_rwlock);
DEFINE_PER_CPU(int, reader_ctr);

void get_online_cpus_atomic(void)
{
preempt_disable();

if (likely(!writer_pending) || __this_cpu_read(reader_ctr)) {
__this_cpu_inc(reader_ctr);
return;
}

read_lock(_rwlock);
__this_cpu_inc(reader_ctr);
read_unlock(_rwlock);
}

// lacks release semantics, but we don't care
void put_online_cpus_atomic(void)
{
__this_cpu_dec(reader_ctr);
preempt_enable();
}

Now, _cpu_down() does

writer_pending = true;
synchronize_sched();

before stop_one_cpu(). When synchronize_sched() returns, we know that
every get_online_cpus_atomic() must see writer_pending == T. And, if
any CPU incremented its reader_ctr we must see it is not zero.

take_cpu_down() does

write_lock(_rwlock);

for_each_online_cpu(cpu) {
while (per_cpu(reader_ctr, cpu))
cpu_relax();
}

and takes the lock.

However. This can lead to the deadlock we already discussed. So
take_cpu_down() should do

 retry:
write_lock(_rwlock);

for_each_online_cpu(cpu) {
if (per_cpu(reader_ctr, cpu)) {
write_unlock(_rwlock);
goto retry;
}
}

to take the lock. But this is livelockable. However, I do not think it
is possible to avoid the livelock.

Just in case, the code above is only for illustration, perhaps it is not
100% correct and perhaps we can do it better. cpu_hotplug.active_writer
is ignored for simplicity, get/put should check current == active_writer.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-10 Thread Oleg Nesterov

On 12/10, Srivatsa S. Bhat wrote:

 On 12/10/2012 01:52 AM, Oleg Nesterov wrote:
  On 12/10, Srivatsa S. Bhat wrote:
 
  On 12/10/2012 12:44 AM, Oleg Nesterov wrote:
 
  But yes, it is easy to blame somebody else's code ;) And I can't suggest
  something better at least right now. If I understand correctly, we can not
  use, say, synchronize_sched() in _cpu_down() path
 
  We can't sleep in that code.. so that's a no-go.
 
  But we can?
 
  Note that I meant _cpu_down(), not get_online_cpus_atomic() or 
  take_cpu_down().
 

 Maybe I'm missing something, but how would it help if we did a
 synchronize_sched() so early (in _cpu_down())? Another bunch of 
 preempt_disable()
 sections could start immediately after our call to synchronize_sched() no?
 How would we deal with that?

Sorry for confusion. Of course synchronize_sched() alone is not enough.
But we can use it to synchronize with preempt-disabled section and avoid
the barriers/atomic in the fast-path.

For example,

bool writer_pending;
DEFINE_RWLOCK(writer_rwlock);
DEFINE_PER_CPU(int, reader_ctr);

void get_online_cpus_atomic(void)
{
preempt_disable();

if (likely(!writer_pending) || __this_cpu_read(reader_ctr)) {
__this_cpu_inc(reader_ctr);
return;
}

read_lock(writer_rwlock);
__this_cpu_inc(reader_ctr);
read_unlock(writer_rwlock);
}

// lacks release semantics, but we don't care
void put_online_cpus_atomic(void)
{
__this_cpu_dec(reader_ctr);
preempt_enable();
}

Now, _cpu_down() does

writer_pending = true;
synchronize_sched();

before stop_one_cpu(). When synchronize_sched() returns, we know that
every get_online_cpus_atomic() must see writer_pending == T. And, if
any CPU incremented its reader_ctr we must see it is not zero.

take_cpu_down() does

write_lock(writer_rwlock);

for_each_online_cpu(cpu) {
while (per_cpu(reader_ctr, cpu))
cpu_relax();
}

and takes the lock.

However. This can lead to the deadlock we already discussed. So
take_cpu_down() should do

 retry:
write_lock(writer_rwlock);

for_each_online_cpu(cpu) {
if (per_cpu(reader_ctr, cpu)) {
write_unlock(writer_rwlock);
goto retry;
}
}

to take the lock. But this is livelockable. However, I do not think it
is possible to avoid the livelock.

Just in case, the code above is only for illustration, perhaps it is not
100% correct and perhaps we can do it better. cpu_hotplug.active_writer
is ignored for simplicity, get/put should check current == active_writer.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-10 Thread Oleg Nesterov

On 12/10, Srivatsa S. Bhat wrote:

 On 12/10/2012 02:43 AM, Oleg Nesterov wrote:
  Damn, sorry for noise. I missed this part...
 
  On 12/10, Srivatsa S. Bhat wrote:
 
  On 12/10/2012 12:44 AM, Oleg Nesterov wrote:
  the latency. And I guess something like kick_all_cpus_sync() is too 
  heavy.
 
  I hadn't considered that. Thinking of it, I don't think it would help us..
  It won't get rid of the currently running preempt_disable() sections no?
 
  Sure. But (again, this is only my feeling so far) given that 
  get_online_cpus_atomic()
  does cli/sti,

 Ah, that one! Actually, the only reason I do that cli/sti is because, 
 potentially
 interrupt handlers can be hotplug readers too. So we need to protect the 
 portion
 of the code of get_online_cpus_atomic() which is not re-entrant.

Yes, I understand.

  this can help to implement ensure-the-readers-must-see-the-pending-writer.
  IOW this might help to implement sync-with-readers.
 

 2 problems:

 1. It won't help with cases like this:

preempt_disable()
   ...
   preempt_disable()
   ...
   --- Here
   ...
   preempt_enable()
   ...
preempt_enable()

No, I meant that kick_all_cpus_sync() can be used to synchronize with
cli/sti in get_online_cpus_atomic(), just like synchronize_sched() does
in the code I posted a minute ago.

 2. Part of the reason we want to get rid of stop_machine() is to avoid the
 latency it induces on _all_ CPUs just to take *one* CPU offline. If we use
 kick_all_cpus_sync(), we get into that territory again : we unfairly interrupt
 every CPU, _even when_ that CPU's existing preempt_disabled() sections might
 not actually be hotplug readers! (ie., not bothered about CPU Hotplug).

I agree, that is why I said it is too heavy.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-10 Thread Oleg Nesterov

On 12/10, Srivatsa S. Bhat wrote:

 On 12/10/2012 02:27 AM, Oleg Nesterov wrote:
  On 12/07, Srivatsa S. Bhat wrote:
 
  4. No deadlock possibilities
 
 Per-cpu locking is not the way to go if we want to have relaxed rules
 for lock-ordering. Because, we can end up in circular-locking 
  dependencies
 as explained in https://lkml.org/lkml/2012/12/6/290
 
  OK, but this assumes that, contrary to what Steven said, read-write-read
  deadlock is not possible when it comes to rwlock_t.

 What I meant is, with a single (global) rwlock, you can't deadlock like that.

Ah. I greatly misunderstood Steven's email,

http://marc.info/?l=linux-pmm=135482212307876

Somehow I didn't notice he described the deadlock with _two_ rwlock's, I
wrongly thought that his point is that read_lock() is not recursive (like
down_read).

 Let me know if my assumptions are incorrect!

No, sorry, I misunderstood Steven.


  However. If this is true, then compared to preempt_disable/stop_machine
  livelock is possible. Probably this is fine, we have the same problem with
  get_online_cpus(). But if we can accept this fact I feel we can simmplify
  this somehow... Can't prove, only feel ;)

 Not sure I follow..

I meant that write_lock_irqsave(hotplug_rwlock) in take_cpu_down()
can spin forever.

Suppose that reader_acked() == T on every CPU, so that
get_online_cpus_atomic() always takes read_lock(hotplug_rwlock).

It is possible that this lock will be never released by readers,

CPU_0   CPU_1

get_online_cpus_atomic()
get_online_cpus_atomic()
put_online_cpus_atomic()

get_online_cpus_atomic()
put_online_cpus_atomic()

get_online_cpus_atomic()
put_online_cpus_atomic()

and so on.


 Reader-side:
- read_lock() your per-cpu rwlock and proceed.

 Writer-side:
- for_each_online_cpu(cpu)
   write_lock(per-cpu rwlock of 'cpu');

Yes, yes, this is clear.

 Also, like Tejun said, one of the important measures for per-cpu rwlocks
 should be that, if a user replaces global rwlocks with percpu rwlocks (for
 performance reasons), he shouldn't suddenly end up in numerous deadlock
 possibilities which never existed before. The replacement should continue to
 remain safe, and perhaps improve the performance.

Sure, I agree.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-09 Thread Srivatsa S. Bhat

On 12/10/2012 02:27 AM, Oleg Nesterov wrote:
> On 12/07, Srivatsa S. Bhat wrote:
>>
>> 4. No deadlock possibilities
>>
>>Per-cpu locking is not the way to go if we want to have relaxed rules
>>for lock-ordering. Because, we can end up in circular-locking dependencies
>>as explained in https://lkml.org/lkml/2012/12/6/290
> 
> OK, but this assumes that, contrary to what Steven said, read-write-read
> deadlock is not possible when it comes to rwlock_t.

What I meant is, with a single (global) rwlock, you can't deadlock like that.
But if you used per-cpu rwlocks and if we don't implement them properly, then we
can end up in circular locking dependencies like shown above.

That is, if you take the same example and replace the lock with global
rwlock, you won't deadlock:

Readers:

 CPU 0CPU 1
 --   --

1.spin_lock(_lock); read_lock(_rwlock);

2.read_lock(_rwlock);   spin_lock(_lock);

Writer:

 CPU 2:
 --

   write_lock(_rwlock);

Even if the writer does a write_lock() in-between steps 1 and 2 at the reader-
side, nothing will happen. The writer would spin because CPU 1 already got
the rwlock for read, and hence both CPU 0 and 1 go ahead. When they finish,
the writer will get the lock and proceed. So, no deadlocks here.
So, what I was pointing out here was, if somebody replaced this global
rwlock with a "straight-forward" implementation of per-cpu rwlocks, he'll
immediately end up in circular locking dependency deadlock between the 3
entities as explained in the link above.

Let me know if my assumptions are incorrect!

> So far I think this
> is true and we can't deadlock. Steven?
> 
> However. If this is true, then compared to preempt_disable/stop_machine
> livelock is possible. Probably this is fine, we have the same problem with
> get_online_cpus(). But if we can accept this fact I feel we can simmplify
> this somehow... Can't prove, only feel ;)
> 

Not sure I follow..

Anyway, my point is that, we _can't_ implement per-cpu rwlocks like lglocks
and expect it to work in this case. IOW, we can't do:

Reader-side:
   -> read_lock() your per-cpu rwlock and proceed.

Writer-side:
   -> for_each_online_cpu(cpu)
  write_lock(per-cpu rwlock of 'cpu');

Also, like Tejun said, one of the important measures for per-cpu rwlocks
should be that, if a user replaces global rwlocks with percpu rwlocks (for
performance reasons), he shouldn't suddenly end up in numerous deadlock
possibilities which never existed before. The replacement should continue to
remain safe, and perhaps improve the performance.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-09 Thread Srivatsa S. Bhat

On 12/10/2012 02:43 AM, Oleg Nesterov wrote:
> Damn, sorry for noise. I missed this part...
> 
> On 12/10, Srivatsa S. Bhat wrote:
>>
>> On 12/10/2012 12:44 AM, Oleg Nesterov wrote:
>>> the latency. And I guess something like kick_all_cpus_sync() is "too heavy".
>>
>> I hadn't considered that. Thinking of it, I don't think it would help us..
>> It won't get rid of the currently running preempt_disable() sections no?
> 
> Sure. But (again, this is only my feeling so far) given that 
> get_online_cpus_atomic()
> does cli/sti,

Ah, that one! Actually, the only reason I do that cli/sti is because, 
potentially
interrupt handlers can be hotplug readers too. So we need to protect the portion
of the code of get_online_cpus_atomic() which is not re-entrant.
(Which reminds me to try and reduce the length of cli/sti in that code, if 
possible).

> this can help to implement ensure-the-readers-must-see-the-pending-writer.
> IOW this might help to implement sync-with-readers.
> 

2 problems:

1. It won't help with cases like this:

   preempt_disable()
...
preempt_disable()
...
<--- Here
...
preempt_enable()
...
   preempt_enable()

If the IPI hits at the point marked above, the IPI is useless, because, at
that point, since we are already in a nested read-side critical section, we 
can't
switch the synchronization protocol. We need to wait till we start a fresh
non-nested read-side critical section, in order to switch to global rwlock.
The reason is that preempt_enable() or put_online_cpus_atomic() can only undo
what its predecessor (preempt_disable()/get_online_cpus_atomic()) did.

2. Part of the reason we want to get rid of stop_machine() is to avoid the
latency it induces on _all_ CPUs just to take *one* CPU offline. If we use
kick_all_cpus_sync(), we get into that territory again : we unfairly interrupt
every CPU, _even when_ that CPU's existing preempt_disabled() sections might
not actually be hotplug readers! (ie., not bothered about CPU Hotplug).

So, I think whatever synchronization scheme we develop, must not induce the very
same problems that stop_machine() had. Otherwise, we can end up going a 
full-circle
and coming back to the same stop_machine() scenario that we intended to get rid 
of.

(That's part of the reason why I initially tried to provide that _light() 
variant
of the reader APIs in the previous version of the patchset, so that light 
readers
can remain as undisturbed from cpu hotplug as possible, thereby avoiding 
indirectly
inducing the "stop_machine effect", like I mentioned here:
https://lkml.org/lkml/2012/12/5/369)

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-09 Thread Srivatsa S. Bhat

On 12/10/2012 01:52 AM, Oleg Nesterov wrote:
> On 12/10, Srivatsa S. Bhat wrote:
>>
>> On 12/10/2012 12:44 AM, Oleg Nesterov wrote:
>>
>>> But yes, it is easy to blame somebody else's code ;) And I can't suggest
>>> something better at least right now. If I understand correctly, we can not
>>> use, say, synchronize_sched() in _cpu_down() path
>>
>> We can't sleep in that code.. so that's a no-go.
> 
> But we can?
> 
> Note that I meant _cpu_down(), not get_online_cpus_atomic() or 
> take_cpu_down().
> 

Maybe I'm missing something, but how would it help if we did a
synchronize_sched() so early (in _cpu_down())? Another bunch of 
preempt_disable()
sections could start immediately after our call to synchronize_sched() no?
How would we deal with that?

What we need to ensure here is that all existing preempt_disable() sections
are over and that *we* (meaning, the cpu offline writer) get to proceed 
immediately
after that, making all the new readers wait for us. And that is possible only if
we do our 'wait-for-readers' thing very close to our actual cpu offline 
operation
(which is take_cpu_down). Moreover, the writer needs to remain stable
(non-preemptible) while doing the wait-for-readers. Else (if the writer himself
keeps getting scheduled in and out of the CPU) I can't see how he can possibly
do justice to the wait.

Also, synchronize_sched() only helps us do the 'wait-for-existing-readers' 
logic.
What would take care of the 'prevent-new-readers-from-going-ahead' logic?

To add to it all, synchronize_sched() waits for _all_ preempt_disable() sections
to complete, whereas only a handful of them might actually care about CPU 
hotplug.
Which is an unnecessary burden for the writer (ie., waiting for unrelated 
readers
to complete).

I bet you meant something else. Sorry if I misunderstood it.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-09 Thread Oleg Nesterov

Damn, sorry for noise. I missed this part...

On 12/10, Srivatsa S. Bhat wrote:
>
> On 12/10/2012 12:44 AM, Oleg Nesterov wrote:
> > the latency. And I guess something like kick_all_cpus_sync() is "too heavy".
>
> I hadn't considered that. Thinking of it, I don't think it would help us..
> It won't get rid of the currently running preempt_disable() sections no?

Sure. But (again, this is only my feeling so far) given that 
get_online_cpus_atomic()
does cli/sti, this can help to implement 
ensure-the-readers-must-see-the-pending-writer.
IOW this might help to implement sync-with-readers.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-09 Thread Oleg Nesterov

On 12/07, Srivatsa S. Bhat wrote:
>
> 4. No deadlock possibilities
>
>Per-cpu locking is not the way to go if we want to have relaxed rules
>for lock-ordering. Because, we can end up in circular-locking dependencies
>as explained in https://lkml.org/lkml/2012/12/6/290

OK, but this assumes that, contrary to what Steven said, read-write-read
deadlock is not possible when it comes to rwlock_t. So far I think this
is true and we can't deadlock. Steven?

However. If this is true, then compared to preempt_disable/stop_machine
livelock is possible. Probably this is fine, we have the same problem with
get_online_cpus(). But if we can accept this fact I feel we can simmplify
this somehow... Can't prove, only feel ;)

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-09 Thread Oleg Nesterov

On 12/10, Srivatsa S. Bhat wrote:
>
> On 12/10/2012 12:44 AM, Oleg Nesterov wrote:
>
> > But yes, it is easy to blame somebody else's code ;) And I can't suggest
> > something better at least right now. If I understand correctly, we can not
> > use, say, synchronize_sched() in _cpu_down() path
>
> We can't sleep in that code.. so that's a no-go.

But we can?

Note that I meant _cpu_down(), not get_online_cpus_atomic() or take_cpu_down().

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-09 Thread Srivatsa S. Bhat

On 12/10/2012 12:44 AM, Oleg Nesterov wrote:
> On 12/07, Srivatsa S. Bhat wrote:
>>
>> Per-cpu counters can help solve the cache-line bouncing problem. So we
>> actually use the best of both: per-cpu counters (no-waiting) at the reader
>> side in the fast-path, and global rwlocks in the slowpath.
>>
>> [ Fastpath = no writer is active; Slowpath = a writer is active ]
>>
>> IOW, the hotplug readers just increment/decrement their per-cpu refcounts
>> when no writer is active.
> 
> Plus LOCK and cli/sti. I do not pretend I really know how bad this is
> performance-wise though. And at first glance this look overcomplicated.
>

Hehe, I agree ;-) But I couldn't think of any other way to get rid of the
deadlock possibilities, other than using global rwlocks. So I designed a
way in which we can switch between per-cpu counters and global rwlocks
dynamically. Probably there is a smarter way to achieve what we want, dunno...
 
> But yes, it is easy to blame somebody else's code ;) And I can't suggest
> something better at least right now. If I understand correctly, we can not
> use, say, synchronize_sched() in _cpu_down() path

We can't sleep in that code.. so that's a no-go.

>, you also want to improve
> the latency. And I guess something like kick_all_cpus_sync() is "too heavy".
> 

I hadn't considered that. Thinking of it, I don't think it would help us..
It won't get rid of the currently running preempt_disable() sections no?

> Also. After the quick reading this doesn't look correct, please see below.
> 
>> +void get_online_cpus_atomic(void)
>> +{
>> +unsigned int cpu = smp_processor_id();
>> +unsigned long flags;
>> +
>> +preempt_disable();
>> +local_irq_save(flags);
>> +
>> +if (cpu_hotplug.active_writer == current)
>> +goto out;
>> +
>> +smp_rmb(); /* Paired with smp_wmb() in drop_writer_signal() */
>> +
>> +if (likely(!writer_active(cpu))) {
> 
> WINDOW. Suppose that reader_active() == F.
> 
>> +mark_reader_fastpath();
>> +goto out;
> 
> Why take_cpu_down() can't do announce_cpu_offline_begin() + sync_all_readers()
> in between?
> 
> Looks like we should increment the counter first, then check writer_active().

You are right, I missed the above race-conditions.

> And sync_atomic_reader() needs rmb between 2 atomic_read's.
> 

OK.

> 
> Or. Again, suppose that reader_active() == F. But is_new_writer() == T.
> 
>> +if (is_new_writer(cpu)) {
>> +/*
>> + * ACK the writer's signal only if this is a fresh read-side
>> + * critical section, and not just an extension of a running
>> + * (nested) read-side critical section.
>> + */
>> +if (!reader_active(cpu)) {
>> +ack_writer_signal();
> 
> What if take_cpu_down() does announce_cpu_offline_end() right before
> ack_writer_signal() ? In this case get_online_cpus_atomic() returns
> with writer_signal == -1. If nothing else this breaks the next
> raise_writer_signal().
> 

Oh, yes, this one is wrong too! We need to mark ourselves as active
reader right in the beginning. And probably change the check to
"reader_nested()" or something like that.

Thanks a lot Oleg!
Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-09 Thread Oleg Nesterov

On 12/07, Srivatsa S. Bhat wrote:
>
> Per-cpu counters can help solve the cache-line bouncing problem. So we
> actually use the best of both: per-cpu counters (no-waiting) at the reader
> side in the fast-path, and global rwlocks in the slowpath.
>
> [ Fastpath = no writer is active; Slowpath = a writer is active ]
>
> IOW, the hotplug readers just increment/decrement their per-cpu refcounts
> when no writer is active.

Plus LOCK and cli/sti. I do not pretend I really know how bad this is
performance-wise though. And at first glance this look overcomplicated.

But yes, it is easy to blame somebody else's code ;) And I can't suggest
something better at least right now. If I understand correctly, we can not
use, say, synchronize_sched() in _cpu_down() path, you also want to improve
the latency. And I guess something like kick_all_cpus_sync() is "too heavy".

Also. After the quick reading this doesn't look correct, please see below.

> +void get_online_cpus_atomic(void)
> +{
> + unsigned int cpu = smp_processor_id();
> + unsigned long flags;
> +
> + preempt_disable();
> + local_irq_save(flags);
> +
> + if (cpu_hotplug.active_writer == current)
> + goto out;
> +
> + smp_rmb(); /* Paired with smp_wmb() in drop_writer_signal() */
> +
> + if (likely(!writer_active(cpu))) {

WINDOW. Suppose that reader_active() == F.

> + mark_reader_fastpath();
> + goto out;

Why take_cpu_down() can't do announce_cpu_offline_begin() + sync_all_readers()
in between?

Looks like we should increment the counter first, then check writer_active().
And sync_atomic_reader() needs rmb between 2 atomic_read's.

Or. Again, suppose that reader_active() == F. But is_new_writer() == T.

> + if (is_new_writer(cpu)) {
> + /*
> +  * ACK the writer's signal only if this is a fresh read-side
> +  * critical section, and not just an extension of a running
> +  * (nested) read-side critical section.
> +  */
> + if (!reader_active(cpu)) {
> + ack_writer_signal();

What if take_cpu_down() does announce_cpu_offline_end() right before
ack_writer_signal() ? In this case get_online_cpus_atomic() returns
with writer_signal == -1. If nothing else this breaks the next
raise_writer_signal().

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-09 Thread Oleg Nesterov

On 12/07, Srivatsa S. Bhat wrote:

 Per-cpu counters can help solve the cache-line bouncing problem. So we
 actually use the best of both: per-cpu counters (no-waiting) at the reader
 side in the fast-path, and global rwlocks in the slowpath.

 [ Fastpath = no writer is active; Slowpath = a writer is active ]

 IOW, the hotplug readers just increment/decrement their per-cpu refcounts
 when no writer is active.

Plus LOCK and cli/sti. I do not pretend I really know how bad this is
performance-wise though. And at first glance this look overcomplicated.

But yes, it is easy to blame somebody else's code ;) And I can't suggest
something better at least right now. If I understand correctly, we can not
use, say, synchronize_sched() in _cpu_down() path, you also want to improve
the latency. And I guess something like kick_all_cpus_sync() is too heavy.

Also. After the quick reading this doesn't look correct, please see below.

 +void get_online_cpus_atomic(void)
 +{
 + unsigned int cpu = smp_processor_id();
 + unsigned long flags;
 +
 + preempt_disable();
 + local_irq_save(flags);
 +
 + if (cpu_hotplug.active_writer == current)
 + goto out;
 +
 + smp_rmb(); /* Paired with smp_wmb() in drop_writer_signal() */
 +
 + if (likely(!writer_active(cpu))) {

WINDOW. Suppose that reader_active() == F.

 + mark_reader_fastpath();
 + goto out;

Why take_cpu_down() can't do announce_cpu_offline_begin() + sync_all_readers()
in between?

Looks like we should increment the counter first, then check writer_active().
And sync_atomic_reader() needs rmb between 2 atomic_read's.


Or. Again, suppose that reader_active() == F. But is_new_writer() == T.

 + if (is_new_writer(cpu)) {
 + /*
 +  * ACK the writer's signal only if this is a fresh read-side
 +  * critical section, and not just an extension of a running
 +  * (nested) read-side critical section.
 +  */
 + if (!reader_active(cpu)) {
 + ack_writer_signal();

What if take_cpu_down() does announce_cpu_offline_end() right before
ack_writer_signal() ? In this case get_online_cpus_atomic() returns
with writer_signal == -1. If nothing else this breaks the next
raise_writer_signal().

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-09 Thread Srivatsa S. Bhat

On 12/10/2012 12:44 AM, Oleg Nesterov wrote:
 On 12/07, Srivatsa S. Bhat wrote:

 Per-cpu counters can help solve the cache-line bouncing problem. So we
 actually use the best of both: per-cpu counters (no-waiting) at the reader
 side in the fast-path, and global rwlocks in the slowpath.

 [ Fastpath = no writer is active; Slowpath = a writer is active ]

 IOW, the hotplug readers just increment/decrement their per-cpu refcounts
 when no writer is active.
 
 Plus LOCK and cli/sti. I do not pretend I really know how bad this is
 performance-wise though. And at first glance this look overcomplicated.


Hehe, I agree ;-) But I couldn't think of any other way to get rid of the
deadlock possibilities, other than using global rwlocks. So I designed a
way in which we can switch between per-cpu counters and global rwlocks
dynamically. Probably there is a smarter way to achieve what we want, dunno...
 
 But yes, it is easy to blame somebody else's code ;) And I can't suggest
 something better at least right now. If I understand correctly, we can not
 use, say, synchronize_sched() in _cpu_down() path

We can't sleep in that code.. so that's a no-go.

, you also want to improve
 the latency. And I guess something like kick_all_cpus_sync() is too heavy.
 

I hadn't considered that. Thinking of it, I don't think it would help us..
It won't get rid of the currently running preempt_disable() sections no?

 Also. After the quick reading this doesn't look correct, please see below.
 
 +void get_online_cpus_atomic(void)
 +{
 +unsigned int cpu = smp_processor_id();
 +unsigned long flags;
 +
 +preempt_disable();
 +local_irq_save(flags);
 +
 +if (cpu_hotplug.active_writer == current)
 +goto out;
 +
 +smp_rmb(); /* Paired with smp_wmb() in drop_writer_signal() */
 +
 +if (likely(!writer_active(cpu))) {
 
 WINDOW. Suppose that reader_active() == F.
 
 +mark_reader_fastpath();
 +goto out;
 
 Why take_cpu_down() can't do announce_cpu_offline_begin() + sync_all_readers()
 in between?
 
 Looks like we should increment the counter first, then check writer_active().

You are right, I missed the above race-conditions.

 And sync_atomic_reader() needs rmb between 2 atomic_read's.
 

OK.

 
 Or. Again, suppose that reader_active() == F. But is_new_writer() == T.
 
 +if (is_new_writer(cpu)) {
 +/*
 + * ACK the writer's signal only if this is a fresh read-side
 + * critical section, and not just an extension of a running
 + * (nested) read-side critical section.
 + */
 +if (!reader_active(cpu)) {
 +ack_writer_signal();
 
 What if take_cpu_down() does announce_cpu_offline_end() right before
 ack_writer_signal() ? In this case get_online_cpus_atomic() returns
 with writer_signal == -1. If nothing else this breaks the next
 raise_writer_signal().
 

Oh, yes, this one is wrong too! We need to mark ourselves as active
reader right in the beginning. And probably change the check to
reader_nested() or something like that.

Thanks a lot Oleg!
Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-09 Thread Oleg Nesterov

On 12/10, Srivatsa S. Bhat wrote:

 On 12/10/2012 12:44 AM, Oleg Nesterov wrote:

  But yes, it is easy to blame somebody else's code ;) And I can't suggest
  something better at least right now. If I understand correctly, we can not
  use, say, synchronize_sched() in _cpu_down() path

 We can't sleep in that code.. so that's a no-go.

But we can?

Note that I meant _cpu_down(), not get_online_cpus_atomic() or take_cpu_down().

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-09 Thread Oleg Nesterov

On 12/07, Srivatsa S. Bhat wrote:

 4. No deadlock possibilities

Per-cpu locking is not the way to go if we want to have relaxed rules
for lock-ordering. Because, we can end up in circular-locking dependencies
as explained in https://lkml.org/lkml/2012/12/6/290

OK, but this assumes that, contrary to what Steven said, read-write-read
deadlock is not possible when it comes to rwlock_t. So far I think this
is true and we can't deadlock. Steven?

However. If this is true, then compared to preempt_disable/stop_machine
livelock is possible. Probably this is fine, we have the same problem with
get_online_cpus(). But if we can accept this fact I feel we can simmplify
this somehow... Can't prove, only feel ;)

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-09 Thread Oleg Nesterov

Damn, sorry for noise. I missed this part...

On 12/10, Srivatsa S. Bhat wrote:

 On 12/10/2012 12:44 AM, Oleg Nesterov wrote:
  the latency. And I guess something like kick_all_cpus_sync() is too heavy.

 I hadn't considered that. Thinking of it, I don't think it would help us..
 It won't get rid of the currently running preempt_disable() sections no?

Sure. But (again, this is only my feeling so far) given that 
get_online_cpus_atomic()
does cli/sti, this can help to implement 
ensure-the-readers-must-see-the-pending-writer.
IOW this might help to implement sync-with-readers.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-09 Thread Srivatsa S. Bhat

On 12/10/2012 01:52 AM, Oleg Nesterov wrote:
 On 12/10, Srivatsa S. Bhat wrote:

 On 12/10/2012 12:44 AM, Oleg Nesterov wrote:

 But yes, it is easy to blame somebody else's code ;) And I can't suggest
 something better at least right now. If I understand correctly, we can not
 use, say, synchronize_sched() in _cpu_down() path

 We can't sleep in that code.. so that's a no-go.
 
 But we can?
 
 Note that I meant _cpu_down(), not get_online_cpus_atomic() or 
 take_cpu_down().
 

Maybe I'm missing something, but how would it help if we did a
synchronize_sched() so early (in _cpu_down())? Another bunch of 
preempt_disable()
sections could start immediately after our call to synchronize_sched() no?
How would we deal with that?

What we need to ensure here is that all existing preempt_disable() sections
are over and that *we* (meaning, the cpu offline writer) get to proceed 
immediately
after that, making all the new readers wait for us. And that is possible only if
we do our 'wait-for-readers' thing very close to our actual cpu offline 
operation
(which is take_cpu_down). Moreover, the writer needs to remain stable
(non-preemptible) while doing the wait-for-readers. Else (if the writer himself
keeps getting scheduled in and out of the CPU) I can't see how he can possibly
do justice to the wait.

Also, synchronize_sched() only helps us do the 'wait-for-existing-readers' 
logic.
What would take care of the 'prevent-new-readers-from-going-ahead' logic?

To add to it all, synchronize_sched() waits for _all_ preempt_disable() sections
to complete, whereas only a handful of them might actually care about CPU 
hotplug.
Which is an unnecessary burden for the writer (ie., waiting for unrelated 
readers
to complete).

I bet you meant something else. Sorry if I misunderstood it.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-09 Thread Srivatsa S. Bhat

On 12/10/2012 02:43 AM, Oleg Nesterov wrote:
 Damn, sorry for noise. I missed this part...
 
 On 12/10, Srivatsa S. Bhat wrote:

 On 12/10/2012 12:44 AM, Oleg Nesterov wrote:
 the latency. And I guess something like kick_all_cpus_sync() is too heavy.

 I hadn't considered that. Thinking of it, I don't think it would help us..
 It won't get rid of the currently running preempt_disable() sections no?
 
 Sure. But (again, this is only my feeling so far) given that 
 get_online_cpus_atomic()
 does cli/sti,

Ah, that one! Actually, the only reason I do that cli/sti is because, 
potentially
interrupt handlers can be hotplug readers too. So we need to protect the portion
of the code of get_online_cpus_atomic() which is not re-entrant.
(Which reminds me to try and reduce the length of cli/sti in that code, if 
possible).

 this can help to implement ensure-the-readers-must-see-the-pending-writer.
 IOW this might help to implement sync-with-readers.
 

2 problems:

1. It won't help with cases like this:

   preempt_disable()
...
preempt_disable()
...
--- Here
...
preempt_enable()
...
   preempt_enable()

If the IPI hits at the point marked above, the IPI is useless, because, at
that point, since we are already in a nested read-side critical section, we 
can't
switch the synchronization protocol. We need to wait till we start a fresh
non-nested read-side critical section, in order to switch to global rwlock.
The reason is that preempt_enable() or put_online_cpus_atomic() can only undo
what its predecessor (preempt_disable()/get_online_cpus_atomic()) did.

2. Part of the reason we want to get rid of stop_machine() is to avoid the
latency it induces on _all_ CPUs just to take *one* CPU offline. If we use
kick_all_cpus_sync(), we get into that territory again : we unfairly interrupt
every CPU, _even when_ that CPU's existing preempt_disabled() sections might
not actually be hotplug readers! (ie., not bothered about CPU Hotplug).

So, I think whatever synchronization scheme we develop, must not induce the very
same problems that stop_machine() had. Otherwise, we can end up going a 
full-circle
and coming back to the same stop_machine() scenario that we intended to get rid 
of.

(That's part of the reason why I initially tried to provide that _light() 
variant
of the reader APIs in the previous version of the patchset, so that light 
readers
can remain as undisturbed from cpu hotplug as possible, thereby avoiding 
indirectly
inducing the stop_machine effect, like I mentioned here:
https://lkml.org/lkml/2012/12/5/369)

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-09 Thread Srivatsa S. Bhat

On 12/10/2012 02:27 AM, Oleg Nesterov wrote:
 On 12/07, Srivatsa S. Bhat wrote:

 4. No deadlock possibilities

Per-cpu locking is not the way to go if we want to have relaxed rules
for lock-ordering. Because, we can end up in circular-locking dependencies
as explained in https://lkml.org/lkml/2012/12/6/290
 
 OK, but this assumes that, contrary to what Steven said, read-write-read
 deadlock is not possible when it comes to rwlock_t.

What I meant is, with a single (global) rwlock, you can't deadlock like that.
But if you used per-cpu rwlocks and if we don't implement them properly, then we
can end up in circular locking dependencies like shown above.

That is, if you take the same example and replace the lock with global
rwlock, you won't deadlock:


Readers:

 CPU 0CPU 1
 --   --

1.spin_lock(random_lock); read_lock(my_rwlock);


2.read_lock(my_rwlock);   spin_lock(random_lock);


Writer:

 CPU 2:
 --

   write_lock(my_rwlock);


Even if the writer does a write_lock() in-between steps 1 and 2 at the reader-
side, nothing will happen. The writer would spin because CPU 1 already got
the rwlock for read, and hence both CPU 0 and 1 go ahead. When they finish,
the writer will get the lock and proceed. So, no deadlocks here.
So, what I was pointing out here was, if somebody replaced this global
rwlock with a straight-forward implementation of per-cpu rwlocks, he'll
immediately end up in circular locking dependency deadlock between the 3
entities as explained in the link above.

Let me know if my assumptions are incorrect!

 So far I think this
 is true and we can't deadlock. Steven?
 
 However. If this is true, then compared to preempt_disable/stop_machine
 livelock is possible. Probably this is fine, we have the same problem with
 get_online_cpus(). But if we can accept this fact I feel we can simmplify
 this somehow... Can't prove, only feel ;)
 

Not sure I follow..

Anyway, my point is that, we _can't_ implement per-cpu rwlocks like lglocks
and expect it to work in this case. IOW, we can't do:

Reader-side:
   - read_lock() your per-cpu rwlock and proceed.

Writer-side:
   - for_each_online_cpu(cpu)
  write_lock(per-cpu rwlock of 'cpu');


Also, like Tejun said, one of the important measures for per-cpu rwlocks
should be that, if a user replaces global rwlocks with percpu rwlocks (for
performance reasons), he shouldn't suddenly end up in numerous deadlock
possibilities which never existed before. The replacement should continue to
remain safe, and perhaps improve the performance.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-07 Thread Srivatsa S. Bhat

On 12/08/2012 12:01 AM, Tejun Heo wrote:
> Hello, Srivatsa.
> 
> On Fri, Dec 07, 2012 at 11:54:01PM +0530, Srivatsa S. Bhat wrote:
>>> lg_lock doesn't do local nesting and I'm not sure how big a deal that
>>> is as I don't know how many should be converted.  But if nesting is an
>>> absolute necessity, it would be much better to implement generic
>>> rwlock variant (say, lg_rwlock) rather than implementing unusual
>>> cpuhotplug-specific percpu synchronization construct.
>>
>> To be honest, at a certain point in time while designing this, I did
>> realize that this was getting kinda overly complicated ;-) ... but I
>> wanted to see how this would actually work out when finished and get
>> some feedback on the same, hence I posted it out. But this also proves
>> that we _can_ actually compete with the flexibility of preempt_disable()
>> and still be safe with respect to locking, if we really want to ;-)
> 
> I got confused by comparison to preempt_disable() but you're right
> that percpu rwlock shouldn't be able to introduce locking dependency
> which doesn't exist with non-percpu rwlock.  ie. write locking should
> be atomic w.r.t. to all readers.

Yep!

>  At the simplest, this can be
> implemented by writer backing out all the way if try-locking any CPU
> fails and retrying the whole thing.  That should be correct but has
> the potential of starving the writer.
> 

Exactly! This is what I mentioned yesterday in the link below, and said
that its not good because of writer starvation ("too wasteful"):
https://lkml.org/lkml/2012/12/6/290

> What we need here is a generic percpu-rwlock.  I don't know which
> exact implementation strategy we should choose.  Maybe your switching
> to global rwlock is the right solution.  But, at any rate, I think it
> would be best to implement proper percpu-rwlock and then apply it to
> CPU hotplug.  It's actually gonna be pretty fitting as
> get_online_cpus() is being converted to percpu-rwsem.  IIUC, Oleg has
> been working on this for a while now.  Oleg, what do you think?
> 

Hmm, that sounds good.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-07 Thread Srivatsa S. Bhat

On 12/07/2012 11:46 PM, Tejun Heo wrote:
> Hello, again.
> 
> On Fri, Dec 07, 2012 at 09:57:24AM -0800, Tejun Heo wrote:
>> possible.  Also, I think the right approach would be auditing each
>> get_online_cpus_atomic() callsites and figure out proper locking order
>> rather than implementing a construct this unusual especially as
>> hunting down the incorrect cases shouldn't be difficult given proper
>> lockdep annotation.
> 
> On the second look, it looks like you're implementing proper
> percpu_rwlock semantics

Ah, nice! I didn't realize that I was actually doing what I intended
to avoid! ;-)

Looking at the implementation of lglocks, and going by Oleg's earlier
comment that we just need to replace spinlock_t with rwlock_t in them
to get percpu_rwlocks, I was horrified at the kinds of circular locking
dependencies that they would be prone to, unless used carefully.

So I devised this scheme to be safe, while still having relaxed rules.
But if *this* is what percpu_rwlocks should ideally look like,
then great! :-)

> as readers aren't supposed to induce circular
> dependency directly.

Yep, in this scheme, nobody will end up in circular dependency.

>  Can you please work with Oleg to implement
> proper percpu-rwlock and use that for CPU hotplug rather than
> implementing it inside CPU hotplug?
> 

Sure, I'd be more than happy to!

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-07 Thread Tejun Heo

Hello, Srivatsa.

On Fri, Dec 07, 2012 at 11:54:01PM +0530, Srivatsa S. Bhat wrote:
> > lg_lock doesn't do local nesting and I'm not sure how big a deal that
> > is as I don't know how many should be converted.  But if nesting is an
> > absolute necessity, it would be much better to implement generic
> > rwlock variant (say, lg_rwlock) rather than implementing unusual
> > cpuhotplug-specific percpu synchronization construct.
> 
> To be honest, at a certain point in time while designing this, I did
> realize that this was getting kinda overly complicated ;-) ... but I
> wanted to see how this would actually work out when finished and get
> some feedback on the same, hence I posted it out. But this also proves
> that we _can_ actually compete with the flexibility of preempt_disable()
> and still be safe with respect to locking, if we really want to ;-)

I got confused by comparison to preempt_disable() but you're right
that percpu rwlock shouldn't be able to introduce locking dependency
which doesn't exist with non-percpu rwlock.  ie. write locking should
be atomic w.r.t. to all readers.  At the simplest, this can be
implemented by writer backing out all the way if try-locking any CPU
fails and retrying the whole thing.  That should be correct but has
the potential of starving the writer.

What we need here is a generic percpu-rwlock.  I don't know which
exact implementation strategy we should choose.  Maybe your switching
to global rwlock is the right solution.  But, at any rate, I think it
would be best to implement proper percpu-rwlock and then apply it to
CPU hotplug.  It's actually gonna be pretty fitting as
get_online_cpus() is being converted to percpu-rwsem.  IIUC, Oleg has
been working on this for a while now.  Oleg, what do you think?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-07 Thread Srivatsa S. Bhat

On 12/07/2012 11:27 PM, Tejun Heo wrote:
> On Fri, Dec 07, 2012 at 11:08:13PM +0530, Srivatsa S. Bhat wrote:
>> 4. No deadlock possibilities
>>
>>Per-cpu locking is not the way to go if we want to have relaxed rules
>>for lock-ordering. Because, we can end up in circular-locking dependencies
>>as explained in https://lkml.org/lkml/2012/12/6/290
>>
>>So, avoid per-cpu locking schemes (per-cpu locks/per-cpu atomic counters
>>with spin-on-contention etc) as much as possible.
> 
> I really can't say I like this approach.  percpu locking is very
> tricky to get right and difficult to get right and we should try our
> best to avoid implementing subsystem specific ones as much as
> possible.  Also, I think the right approach would be auditing each
> get_online_cpus_atomic() callsites and figure out proper locking order
> rather than implementing a construct this unusual especially as
> hunting down the incorrect cases shouldn't be difficult given proper
> lockdep annotation.
> 
> lg_lock doesn't do local nesting and I'm not sure how big a deal that
> is as I don't know how many should be converted.  But if nesting is an
> absolute necessity, it would be much better to implement generic
> rwlock variant (say, lg_rwlock) rather than implementing unusual
> cpuhotplug-specific percpu synchronization construct.
>

To be honest, at a certain point in time while designing this, I did
realize that this was getting kinda overly complicated ;-) ... but I
wanted to see how this would actually work out when finished and get
some feedback on the same, hence I posted it out. But this also proves
that we _can_ actually compete with the flexibility of preempt_disable()
and still be safe with respect to locking, if we really want to ;-)

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-07 Thread Tejun Heo

Hello, again.

On Fri, Dec 07, 2012 at 09:57:24AM -0800, Tejun Heo wrote:
> possible.  Also, I think the right approach would be auditing each
> get_online_cpus_atomic() callsites and figure out proper locking order
> rather than implementing a construct this unusual especially as
> hunting down the incorrect cases shouldn't be difficult given proper
> lockdep annotation.

On the second look, it looks like you're implementing proper
percpu_rwlock semantics as readers aren't supposed to induce circular
dependency directly.  Can you please work with Oleg to implement
proper percpu-rwlock and use that for CPU hotplug rather than
implementing it inside CPU hotplug?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-07 Thread Tejun Heo

On Fri, Dec 07, 2012 at 11:08:13PM +0530, Srivatsa S. Bhat wrote:
> 4. No deadlock possibilities
> 
>Per-cpu locking is not the way to go if we want to have relaxed rules
>for lock-ordering. Because, we can end up in circular-locking dependencies
>as explained in https://lkml.org/lkml/2012/12/6/290
> 
>So, avoid per-cpu locking schemes (per-cpu locks/per-cpu atomic counters
>with spin-on-contention etc) as much as possible.

I really can't say I like this approach.  percpu locking is very
tricky to get right and difficult to get right and we should try our
best to avoid implementing subsystem specific ones as much as
possible.  Also, I think the right approach would be auditing each
get_online_cpus_atomic() callsites and figure out proper locking order
rather than implementing a construct this unusual especially as
hunting down the incorrect cases shouldn't be difficult given proper
lockdep annotation.

lg_lock doesn't do local nesting and I'm not sure how big a deal that
is as I don't know how many should be converted.  But if nesting is an
absolute necessity, it would be much better to implement generic
rwlock variant (say, lg_rwlock) rather than implementing unusual
cpuhotplug-specific percpu synchronization construct.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-07 Thread Tejun Heo

On Fri, Dec 07, 2012 at 11:08:13PM +0530, Srivatsa S. Bhat wrote:
 4. No deadlock possibilities
 
Per-cpu locking is not the way to go if we want to have relaxed rules
for lock-ordering. Because, we can end up in circular-locking dependencies
as explained in https://lkml.org/lkml/2012/12/6/290
 
So, avoid per-cpu locking schemes (per-cpu locks/per-cpu atomic counters
with spin-on-contention etc) as much as possible.

I really can't say I like this approach.  percpu locking is very
tricky to get right and difficult to get right and we should try our
best to avoid implementing subsystem specific ones as much as
possible.  Also, I think the right approach would be auditing each
get_online_cpus_atomic() callsites and figure out proper locking order
rather than implementing a construct this unusual especially as
hunting down the incorrect cases shouldn't be difficult given proper
lockdep annotation.

lg_lock doesn't do local nesting and I'm not sure how big a deal that
is as I don't know how many should be converted.  But if nesting is an
absolute necessity, it would be much better to implement generic
rwlock variant (say, lg_rwlock) rather than implementing unusual
cpuhotplug-specific percpu synchronization construct.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-07 Thread Tejun Heo

Hello, again.

On Fri, Dec 07, 2012 at 09:57:24AM -0800, Tejun Heo wrote:
 possible.  Also, I think the right approach would be auditing each
 get_online_cpus_atomic() callsites and figure out proper locking order
 rather than implementing a construct this unusual especially as
 hunting down the incorrect cases shouldn't be difficult given proper
 lockdep annotation.

On the second look, it looks like you're implementing proper
percpu_rwlock semantics as readers aren't supposed to induce circular
dependency directly.  Can you please work with Oleg to implement
proper percpu-rwlock and use that for CPU hotplug rather than
implementing it inside CPU hotplug?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-07 Thread Srivatsa S. Bhat

On 12/07/2012 11:27 PM, Tejun Heo wrote:
 On Fri, Dec 07, 2012 at 11:08:13PM +0530, Srivatsa S. Bhat wrote:
 4. No deadlock possibilities

Per-cpu locking is not the way to go if we want to have relaxed rules
for lock-ordering. Because, we can end up in circular-locking dependencies
as explained in https://lkml.org/lkml/2012/12/6/290

So, avoid per-cpu locking schemes (per-cpu locks/per-cpu atomic counters
with spin-on-contention etc) as much as possible.
 
 I really can't say I like this approach.  percpu locking is very
 tricky to get right and difficult to get right and we should try our
 best to avoid implementing subsystem specific ones as much as
 possible.  Also, I think the right approach would be auditing each
 get_online_cpus_atomic() callsites and figure out proper locking order
 rather than implementing a construct this unusual especially as
 hunting down the incorrect cases shouldn't be difficult given proper
 lockdep annotation.
 
 lg_lock doesn't do local nesting and I'm not sure how big a deal that
 is as I don't know how many should be converted.  But if nesting is an
 absolute necessity, it would be much better to implement generic
 rwlock variant (say, lg_rwlock) rather than implementing unusual
 cpuhotplug-specific percpu synchronization construct.


To be honest, at a certain point in time while designing this, I did
realize that this was getting kinda overly complicated ;-) ... but I
wanted to see how this would actually work out when finished and get
some feedback on the same, hence I posted it out. But this also proves
that we _can_ actually compete with the flexibility of preempt_disable()
and still be safe with respect to locking, if we really want to ;-)

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-07 Thread Tejun Heo

Hello, Srivatsa.

On Fri, Dec 07, 2012 at 11:54:01PM +0530, Srivatsa S. Bhat wrote:
  lg_lock doesn't do local nesting and I'm not sure how big a deal that
  is as I don't know how many should be converted.  But if nesting is an
  absolute necessity, it would be much better to implement generic
  rwlock variant (say, lg_rwlock) rather than implementing unusual
  cpuhotplug-specific percpu synchronization construct.
 
 To be honest, at a certain point in time while designing this, I did
 realize that this was getting kinda overly complicated ;-) ... but I
 wanted to see how this would actually work out when finished and get
 some feedback on the same, hence I posted it out. But this also proves
 that we _can_ actually compete with the flexibility of preempt_disable()
 and still be safe with respect to locking, if we really want to ;-)

I got confused by comparison to preempt_disable() but you're right
that percpu rwlock shouldn't be able to introduce locking dependency
which doesn't exist with non-percpu rwlock.  ie. write locking should
be atomic w.r.t. to all readers.  At the simplest, this can be
implemented by writer backing out all the way if try-locking any CPU
fails and retrying the whole thing.  That should be correct but has
the potential of starving the writer.

What we need here is a generic percpu-rwlock.  I don't know which
exact implementation strategy we should choose.  Maybe your switching
to global rwlock is the right solution.  But, at any rate, I think it
would be best to implement proper percpu-rwlock and then apply it to
CPU hotplug.  It's actually gonna be pretty fitting as
get_online_cpus() is being converted to percpu-rwsem.  IIUC, Oleg has
been working on this for a while now.  Oleg, what do you think?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-07 Thread Srivatsa S. Bhat

On 12/07/2012 11:46 PM, Tejun Heo wrote:
 Hello, again.
 
 On Fri, Dec 07, 2012 at 09:57:24AM -0800, Tejun Heo wrote:
 possible.  Also, I think the right approach would be auditing each
 get_online_cpus_atomic() callsites and figure out proper locking order
 rather than implementing a construct this unusual especially as
 hunting down the incorrect cases shouldn't be difficult given proper
 lockdep annotation.
 
 On the second look, it looks like you're implementing proper
 percpu_rwlock semantics

Ah, nice! I didn't realize that I was actually doing what I intended
to avoid! ;-)

Looking at the implementation of lglocks, and going by Oleg's earlier
comment that we just need to replace spinlock_t with rwlock_t in them
to get percpu_rwlocks, I was horrified at the kinds of circular locking
dependencies that they would be prone to, unless used carefully.

So I devised this scheme to be safe, while still having relaxed rules.
But if *this* is what percpu_rwlocks should ideally look like,
then great! :-)

 as readers aren't supposed to induce circular
 dependency directly.

Yep, in this scheme, nobody will end up in circular dependency.

  Can you please work with Oleg to implement
 proper percpu-rwlock and use that for CPU hotplug rather than
 implementing it inside CPU hotplug?
 

Sure, I'd be more than happy to!
 
Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context

2012-12-07 Thread Srivatsa S. Bhat

On 12/08/2012 12:01 AM, Tejun Heo wrote:
 Hello, Srivatsa.
 
 On Fri, Dec 07, 2012 at 11:54:01PM +0530, Srivatsa S. Bhat wrote:
 lg_lock doesn't do local nesting and I'm not sure how big a deal that
 is as I don't know how many should be converted.  But if nesting is an
 absolute necessity, it would be much better to implement generic
 rwlock variant (say, lg_rwlock) rather than implementing unusual
 cpuhotplug-specific percpu synchronization construct.

 To be honest, at a certain point in time while designing this, I did
 realize that this was getting kinda overly complicated ;-) ... but I
 wanted to see how this would actually work out when finished and get
 some feedback on the same, hence I posted it out. But this also proves
 that we _can_ actually compete with the flexibility of preempt_disable()
 and still be safe with respect to locking, if we really want to ;-)
 
 I got confused by comparison to preempt_disable() but you're right
 that percpu rwlock shouldn't be able to introduce locking dependency
 which doesn't exist with non-percpu rwlock.  ie. write locking should
 be atomic w.r.t. to all readers.

Yep!

  At the simplest, this can be
 implemented by writer backing out all the way if try-locking any CPU
 fails and retrying the whole thing.  That should be correct but has
 the potential of starving the writer.
 

Exactly! This is what I mentioned yesterday in the link below, and said
that its not good because of writer starvation (too wasteful):
https://lkml.org/lkml/2012/12/6/290

 What we need here is a generic percpu-rwlock.  I don't know which
 exact implementation strategy we should choose.  Maybe your switching
 to global rwlock is the right solution.  But, at any rate, I think it
 would be best to implement proper percpu-rwlock and then apply it to
 CPU hotplug.  It's actually gonna be pretty fitting as
 get_online_cpus() is being converted to percpu-rwsem.  IIUC, Oleg has
 been working on this for a while now.  Oleg, what do you think?
 

Hmm, that sounds good.

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

48 matches

Mail list logo