Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
On 12/11/2012 07:37 PM, Tejun Heo wrote: > Hello, > > On Tue, Dec 11, 2012 at 07:32:13PM +0530, Srivatsa S. Bhat wrote: >> On 12/11/2012 07:17 PM, Tejun Heo wrote: >>> Hello, Srivatsa. >>> >>> On Tue, Dec 11, 2012 at 06:43:54PM +0530, Srivatsa S. Bhat wrote: This approach (of using synchronize_sched()) also looks good. It is simple, yet effective, but unfortunately inefficient at the writer side (because he'll have to wait for a full synchronize_sched()). >>> >>> While synchornize_sched() is heavier on the writer side than the >>> originally posted version, it doesn't stall the whole machine and >>> wouldn't introduce latencies to others. Shouldn't that be enough? >>> >> >> Short answer: Yes. But we can do better, with almost comparable code >> complexity. So I'm tempted to try that out. >> >> Long answer: >> Even in the synchronize_sched() approach, we still have to identify the >> readers who need to be converted to use the new get/put_online_cpus_atomic() >> APIs and convert them. Then, if we can come up with a scheme such that >> the writer has to wait only for those readers to complete, then why not? >> >> If such a scheme ends up becoming too complicated, then I agree, we >> can use synchronize_sched() itself. (That's what I meant by saying that >> we'll use this as a fallback). >> >> But even in this scheme which uses synchronize_sched(), we are >> already half-way through (we already use 2 types of sync schemes - >> counters and rwlocks). Just a little more logic can get rid of the >> unnecessary full-wait too.. So why not give it a shot? > > It's not really about the code complexity but making the reader side > as light as possible. Please keep in mind that reader side is still > *way* more hotter than the writer side. Before, the writer side was > heavy to the extent which causes noticeable disruptions on the whole > system and I think that's what we're trying to hunt down here. If we > can shave of memory barriers from reader side by using > synchornized_sched() on writer side, that is the *better* result, not > worse. > That's a fair point. Hmmm... Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
Hello, On Tue, Dec 11, 2012 at 07:32:13PM +0530, Srivatsa S. Bhat wrote: > On 12/11/2012 07:17 PM, Tejun Heo wrote: > > Hello, Srivatsa. > > > > On Tue, Dec 11, 2012 at 06:43:54PM +0530, Srivatsa S. Bhat wrote: > >> This approach (of using synchronize_sched()) also looks good. It is simple, > >> yet effective, but unfortunately inefficient at the writer side (because > >> he'll have to wait for a full synchronize_sched()). > > > > While synchornize_sched() is heavier on the writer side than the > > originally posted version, it doesn't stall the whole machine and > > wouldn't introduce latencies to others. Shouldn't that be enough? > > > > Short answer: Yes. But we can do better, with almost comparable code > complexity. So I'm tempted to try that out. > > Long answer: > Even in the synchronize_sched() approach, we still have to identify the > readers who need to be converted to use the new get/put_online_cpus_atomic() > APIs and convert them. Then, if we can come up with a scheme such that > the writer has to wait only for those readers to complete, then why not? > > If such a scheme ends up becoming too complicated, then I agree, we > can use synchronize_sched() itself. (That's what I meant by saying that > we'll use this as a fallback). > > But even in this scheme which uses synchronize_sched(), we are > already half-way through (we already use 2 types of sync schemes - > counters and rwlocks). Just a little more logic can get rid of the > unnecessary full-wait too.. So why not give it a shot? It's not really about the code complexity but making the reader side as light as possible. Please keep in mind that reader side is still *way* more hotter than the writer side. Before, the writer side was heavy to the extent which causes noticeable disruptions on the whole system and I think that's what we're trying to hunt down here. If we can shave of memory barriers from reader side by using synchornized_sched() on writer side, that is the *better* result, not worse. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
On 12/11/2012 07:17 PM, Tejun Heo wrote: > Hello, Srivatsa. > > On Tue, Dec 11, 2012 at 06:43:54PM +0530, Srivatsa S. Bhat wrote: >> This approach (of using synchronize_sched()) also looks good. It is simple, >> yet effective, but unfortunately inefficient at the writer side (because >> he'll have to wait for a full synchronize_sched()). > > While synchornize_sched() is heavier on the writer side than the > originally posted version, it doesn't stall the whole machine and > wouldn't introduce latencies to others. Shouldn't that be enough? > Short answer: Yes. But we can do better, with almost comparable code complexity. So I'm tempted to try that out. Long answer: Even in the synchronize_sched() approach, we still have to identify the readers who need to be converted to use the new get/put_online_cpus_atomic() APIs and convert them. Then, if we can come up with a scheme such that the writer has to wait only for those readers to complete, then why not? If such a scheme ends up becoming too complicated, then I agree, we can use synchronize_sched() itself. (That's what I meant by saying that we'll use this as a fallback). But even in this scheme which uses synchronize_sched(), we are already half-way through (we already use 2 types of sync schemes - counters and rwlocks). Just a little more logic can get rid of the unnecessary full-wait too.. So why not give it a shot? Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
Hello, Srivatsa. On Tue, Dec 11, 2012 at 06:43:54PM +0530, Srivatsa S. Bhat wrote: > This approach (of using synchronize_sched()) also looks good. It is simple, > yet effective, but unfortunately inefficient at the writer side (because > he'll have to wait for a full synchronize_sched()). While synchornize_sched() is heavier on the writer side than the originally posted version, it doesn't stall the whole machine and wouldn't introduce latencies to others. Shouldn't that be enough? Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
On 12/10/2012 10:54 PM, Oleg Nesterov wrote: > On 12/10, Srivatsa S. Bhat wrote: >> >> On 12/10/2012 01:52 AM, Oleg Nesterov wrote: >>> On 12/10, Srivatsa S. Bhat wrote: On 12/10/2012 12:44 AM, Oleg Nesterov wrote: > But yes, it is easy to blame somebody else's code ;) And I can't suggest > something better at least right now. If I understand correctly, we can not > use, say, synchronize_sched() in _cpu_down() path We can't sleep in that code.. so that's a no-go. >>> >>> But we can? >>> >>> Note that I meant _cpu_down(), not get_online_cpus_atomic() or >>> take_cpu_down(). >>> >> >> Maybe I'm missing something, but how would it help if we did a >> synchronize_sched() so early (in _cpu_down())? Another bunch of >> preempt_disable() >> sections could start immediately after our call to synchronize_sched() no? >> How would we deal with that? > > Sorry for confusion. Of course synchronize_sched() alone is not enough. > But we can use it to synchronize with preempt-disabled section and avoid > the barriers/atomic in the fast-path. > > For example, > > bool writer_pending; > DEFINE_RWLOCK(writer_rwlock); > DEFINE_PER_CPU(int, reader_ctr); > > void get_online_cpus_atomic(void) > { > preempt_disable(); > > if (likely(!writer_pending) || __this_cpu_read(reader_ctr)) { > __this_cpu_inc(reader_ctr); > return; > } > > read_lock(&writer_rwlock); > __this_cpu_inc(reader_ctr); > read_unlock(&writer_rwlock); > } > > // lacks release semantics, but we don't care > void put_online_cpus_atomic(void) > { > __this_cpu_dec(reader_ctr); > preempt_enable(); > } > > Now, _cpu_down() does > > writer_pending = true; > synchronize_sched(); > > before stop_one_cpu(). When synchronize_sched() returns, we know that > every get_online_cpus_atomic() must see writer_pending == T. And, if > any CPU incremented its reader_ctr we must see it is not zero. > > take_cpu_down() does > > write_lock(&writer_rwlock); > > for_each_online_cpu(cpu) { > while (per_cpu(reader_ctr, cpu)) > cpu_relax(); > } > > and takes the lock. > > However. This can lead to the deadlock we already discussed. So > take_cpu_down() should do > > retry: > write_lock(&writer_rwlock); > > for_each_online_cpu(cpu) { > if (per_cpu(reader_ctr, cpu)) { > write_unlock(&writer_rwlock); > goto retry; > } > } > > to take the lock. But this is livelockable. However, I do not think it > is possible to avoid the livelock. > > Just in case, the code above is only for illustration, perhaps it is not > 100% correct and perhaps we can do it better. cpu_hotplug.active_writer > is ignored for simplicity, get/put should check current == active_writer. > This approach (of using synchronize_sched()) also looks good. It is simple, yet effective, but unfortunately inefficient at the writer side (because he'll have to wait for a full synchronize_sched()). So I would say that we should keep this approach as a fallback, if we don't come up with any better synchronization scheme (in terms of efficiency) at a comparable level of simplicity/complexity. I have come up with a v4 that simplifies several aspects of the synchronization and makes it look a lot more simpler than v3. (Lesser race windows to take care of, implicit ACKs, no atomic ops etc..) I'll post it soon. Let me know what you think... Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
On 12/10/2012 10:58 PM, Oleg Nesterov wrote: > On 12/10, Srivatsa S. Bhat wrote: >> >> On 12/10/2012 02:43 AM, Oleg Nesterov wrote: >>> Damn, sorry for noise. I missed this part... >>> >>> On 12/10, Srivatsa S. Bhat wrote: On 12/10/2012 12:44 AM, Oleg Nesterov wrote: > the latency. And I guess something like kick_all_cpus_sync() is "too > heavy". I hadn't considered that. Thinking of it, I don't think it would help us.. It won't get rid of the currently running preempt_disable() sections no? >>> >>> Sure. But (again, this is only my feeling so far) given that >>> get_online_cpus_atomic() >>> does cli/sti, >> >> Ah, that one! Actually, the only reason I do that cli/sti is because, >> potentially >> interrupt handlers can be hotplug readers too. So we need to protect the >> portion >> of the code of get_online_cpus_atomic() which is not re-entrant. > > Yes, I understand. > >>> this can help to implement ensure-the-readers-must-see-the-pending-writer. >>> IOW this might help to implement sync-with-readers. >>> >> >> 2 problems: >> >> 1. It won't help with cases like this: >> >>preempt_disable() >> ... >> preempt_disable() >> ... >> <--- Here >> ... >> preempt_enable() >> ... >>preempt_enable() > > No, I meant that kick_all_cpus_sync() can be used to synchronize with > cli/sti in get_online_cpus_atomic(), just like synchronize_sched() does > in the code I posted a minute ago. > Ah, OK. >> 2. Part of the reason we want to get rid of stop_machine() is to avoid the >> latency it induces on _all_ CPUs just to take *one* CPU offline. If we use >> kick_all_cpus_sync(), we get into that territory again : we unfairly >> interrupt >> every CPU, _even when_ that CPU's existing preempt_disabled() sections might >> not actually be hotplug readers! (ie., not bothered about CPU Hotplug). > > I agree, that is why I said it is "too heavy". > Got it :) Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
On 12/10/2012 11:45 PM, Oleg Nesterov wrote: > On 12/10, Srivatsa S. Bhat wrote: >> >> On 12/10/2012 02:27 AM, Oleg Nesterov wrote: >>> However. If this is true, then compared to preempt_disable/stop_machine >>> livelock is possible. Probably this is fine, we have the same problem with >>> get_online_cpus(). But if we can accept this fact I feel we can simmplify >>> this somehow... Can't prove, only feel ;) >> >> Not sure I follow.. > > I meant that write_lock_irqsave(&hotplug_rwlock) in take_cpu_down() > can spin "forever". > > Suppose that reader_acked() == T on every CPU, so that > get_online_cpus_atomic() always takes read_lock(&hotplug_rwlock). > > It is possible that this lock will be never released by readers, > > CPU_0 CPU_1 > > get_online_cpus_atomic() > get_online_cpus_atomic() > put_online_cpus_atomic() > > get_online_cpus_atomic() > put_online_cpus_atomic() > > get_online_cpus_atomic() > put_online_cpus_atomic() > > and so on. > Right, and we can't do anything about it :( Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
On 12/10, Srivatsa S. Bhat wrote: > > On 12/10/2012 02:27 AM, Oleg Nesterov wrote: > > On 12/07, Srivatsa S. Bhat wrote: > >> > >> 4. No deadlock possibilities > >> > >>Per-cpu locking is not the way to go if we want to have relaxed rules > >>for lock-ordering. Because, we can end up in circular-locking > >> dependencies > >>as explained in https://lkml.org/lkml/2012/12/6/290 > > > > OK, but this assumes that, contrary to what Steven said, read-write-read > > deadlock is not possible when it comes to rwlock_t. > > What I meant is, with a single (global) rwlock, you can't deadlock like that. Ah. I greatly misunderstood Steven's email, http://marc.info/?l=linux-pm&m=135482212307876 Somehow I didn't notice he described the deadlock with _two_ rwlock's, I wrongly thought that his point is that read_lock() is not recursive (like down_read). > Let me know if my assumptions are incorrect! No, sorry, I misunderstood Steven. > > However. If this is true, then compared to preempt_disable/stop_machine > > livelock is possible. Probably this is fine, we have the same problem with > > get_online_cpus(). But if we can accept this fact I feel we can simmplify > > this somehow... Can't prove, only feel ;) > > Not sure I follow.. I meant that write_lock_irqsave(&hotplug_rwlock) in take_cpu_down() can spin "forever". Suppose that reader_acked() == T on every CPU, so that get_online_cpus_atomic() always takes read_lock(&hotplug_rwlock). It is possible that this lock will be never released by readers, CPU_0 CPU_1 get_online_cpus_atomic() get_online_cpus_atomic() put_online_cpus_atomic() get_online_cpus_atomic() put_online_cpus_atomic() get_online_cpus_atomic() put_online_cpus_atomic() and so on. > Reader-side: >-> read_lock() your per-cpu rwlock and proceed. > > Writer-side: >-> for_each_online_cpu(cpu) > write_lock(per-cpu rwlock of 'cpu'); Yes, yes, this is clear. > Also, like Tejun said, one of the important measures for per-cpu rwlocks > should be that, if a user replaces global rwlocks with percpu rwlocks (for > performance reasons), he shouldn't suddenly end up in numerous deadlock > possibilities which never existed before. The replacement should continue to > remain safe, and perhaps improve the performance. Sure, I agree. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
On 12/10, Srivatsa S. Bhat wrote: > > On 12/10/2012 02:43 AM, Oleg Nesterov wrote: > > Damn, sorry for noise. I missed this part... > > > > On 12/10, Srivatsa S. Bhat wrote: > >> > >> On 12/10/2012 12:44 AM, Oleg Nesterov wrote: > >>> the latency. And I guess something like kick_all_cpus_sync() is "too > >>> heavy". > >> > >> I hadn't considered that. Thinking of it, I don't think it would help us.. > >> It won't get rid of the currently running preempt_disable() sections no? > > > > Sure. But (again, this is only my feeling so far) given that > > get_online_cpus_atomic() > > does cli/sti, > > Ah, that one! Actually, the only reason I do that cli/sti is because, > potentially > interrupt handlers can be hotplug readers too. So we need to protect the > portion > of the code of get_online_cpus_atomic() which is not re-entrant. Yes, I understand. > > this can help to implement ensure-the-readers-must-see-the-pending-writer. > > IOW this might help to implement sync-with-readers. > > > > 2 problems: > > 1. It won't help with cases like this: > >preempt_disable() > ... > preempt_disable() > ... > <--- Here > ... > preempt_enable() > ... >preempt_enable() No, I meant that kick_all_cpus_sync() can be used to synchronize with cli/sti in get_online_cpus_atomic(), just like synchronize_sched() does in the code I posted a minute ago. > 2. Part of the reason we want to get rid of stop_machine() is to avoid the > latency it induces on _all_ CPUs just to take *one* CPU offline. If we use > kick_all_cpus_sync(), we get into that territory again : we unfairly interrupt > every CPU, _even when_ that CPU's existing preempt_disabled() sections might > not actually be hotplug readers! (ie., not bothered about CPU Hotplug). I agree, that is why I said it is "too heavy". Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
On 12/10, Srivatsa S. Bhat wrote: > > On 12/10/2012 01:52 AM, Oleg Nesterov wrote: > > On 12/10, Srivatsa S. Bhat wrote: > >> > >> On 12/10/2012 12:44 AM, Oleg Nesterov wrote: > >> > >>> But yes, it is easy to blame somebody else's code ;) And I can't suggest > >>> something better at least right now. If I understand correctly, we can not > >>> use, say, synchronize_sched() in _cpu_down() path > >> > >> We can't sleep in that code.. so that's a no-go. > > > > But we can? > > > > Note that I meant _cpu_down(), not get_online_cpus_atomic() or > > take_cpu_down(). > > > > Maybe I'm missing something, but how would it help if we did a > synchronize_sched() so early (in _cpu_down())? Another bunch of > preempt_disable() > sections could start immediately after our call to synchronize_sched() no? > How would we deal with that? Sorry for confusion. Of course synchronize_sched() alone is not enough. But we can use it to synchronize with preempt-disabled section and avoid the barriers/atomic in the fast-path. For example, bool writer_pending; DEFINE_RWLOCK(writer_rwlock); DEFINE_PER_CPU(int, reader_ctr); void get_online_cpus_atomic(void) { preempt_disable(); if (likely(!writer_pending) || __this_cpu_read(reader_ctr)) { __this_cpu_inc(reader_ctr); return; } read_lock(&writer_rwlock); __this_cpu_inc(reader_ctr); read_unlock(&writer_rwlock); } // lacks release semantics, but we don't care void put_online_cpus_atomic(void) { __this_cpu_dec(reader_ctr); preempt_enable(); } Now, _cpu_down() does writer_pending = true; synchronize_sched(); before stop_one_cpu(). When synchronize_sched() returns, we know that every get_online_cpus_atomic() must see writer_pending == T. And, if any CPU incremented its reader_ctr we must see it is not zero. take_cpu_down() does write_lock(&writer_rwlock); for_each_online_cpu(cpu) { while (per_cpu(reader_ctr, cpu)) cpu_relax(); } and takes the lock. However. This can lead to the deadlock we already discussed. So take_cpu_down() should do retry: write_lock(&writer_rwlock); for_each_online_cpu(cpu) { if (per_cpu(reader_ctr, cpu)) { write_unlock(&writer_rwlock); goto retry; } } to take the lock. But this is livelockable. However, I do not think it is possible to avoid the livelock. Just in case, the code above is only for illustration, perhaps it is not 100% correct and perhaps we can do it better. cpu_hotplug.active_writer is ignored for simplicity, get/put should check current == active_writer. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
On 12/10/2012 02:27 AM, Oleg Nesterov wrote: > On 12/07, Srivatsa S. Bhat wrote: >> >> 4. No deadlock possibilities >> >>Per-cpu locking is not the way to go if we want to have relaxed rules >>for lock-ordering. Because, we can end up in circular-locking dependencies >>as explained in https://lkml.org/lkml/2012/12/6/290 > > OK, but this assumes that, contrary to what Steven said, read-write-read > deadlock is not possible when it comes to rwlock_t. What I meant is, with a single (global) rwlock, you can't deadlock like that. But if you used per-cpu rwlocks and if we don't implement them properly, then we can end up in circular locking dependencies like shown above. That is, if you take the same example and replace the lock with global rwlock, you won't deadlock: Readers: CPU 0CPU 1 -- -- 1.spin_lock(&random_lock); read_lock(&my_rwlock); 2.read_lock(&my_rwlock); spin_lock(&random_lock); Writer: CPU 2: -- write_lock(&my_rwlock); Even if the writer does a write_lock() in-between steps 1 and 2 at the reader- side, nothing will happen. The writer would spin because CPU 1 already got the rwlock for read, and hence both CPU 0 and 1 go ahead. When they finish, the writer will get the lock and proceed. So, no deadlocks here. So, what I was pointing out here was, if somebody replaced this global rwlock with a "straight-forward" implementation of per-cpu rwlocks, he'll immediately end up in circular locking dependency deadlock between the 3 entities as explained in the link above. Let me know if my assumptions are incorrect! > So far I think this > is true and we can't deadlock. Steven? > > However. If this is true, then compared to preempt_disable/stop_machine > livelock is possible. Probably this is fine, we have the same problem with > get_online_cpus(). But if we can accept this fact I feel we can simmplify > this somehow... Can't prove, only feel ;) > Not sure I follow.. Anyway, my point is that, we _can't_ implement per-cpu rwlocks like lglocks and expect it to work in this case. IOW, we can't do: Reader-side: -> read_lock() your per-cpu rwlock and proceed. Writer-side: -> for_each_online_cpu(cpu) write_lock(per-cpu rwlock of 'cpu'); Also, like Tejun said, one of the important measures for per-cpu rwlocks should be that, if a user replaces global rwlocks with percpu rwlocks (for performance reasons), he shouldn't suddenly end up in numerous deadlock possibilities which never existed before. The replacement should continue to remain safe, and perhaps improve the performance. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
On 12/10/2012 02:43 AM, Oleg Nesterov wrote: > Damn, sorry for noise. I missed this part... > > On 12/10, Srivatsa S. Bhat wrote: >> >> On 12/10/2012 12:44 AM, Oleg Nesterov wrote: >>> the latency. And I guess something like kick_all_cpus_sync() is "too heavy". >> >> I hadn't considered that. Thinking of it, I don't think it would help us.. >> It won't get rid of the currently running preempt_disable() sections no? > > Sure. But (again, this is only my feeling so far) given that > get_online_cpus_atomic() > does cli/sti, Ah, that one! Actually, the only reason I do that cli/sti is because, potentially interrupt handlers can be hotplug readers too. So we need to protect the portion of the code of get_online_cpus_atomic() which is not re-entrant. (Which reminds me to try and reduce the length of cli/sti in that code, if possible). > this can help to implement ensure-the-readers-must-see-the-pending-writer. > IOW this might help to implement sync-with-readers. > 2 problems: 1. It won't help with cases like this: preempt_disable() ... preempt_disable() ... <--- Here ... preempt_enable() ... preempt_enable() If the IPI hits at the point marked above, the IPI is useless, because, at that point, since we are already in a nested read-side critical section, we can't switch the synchronization protocol. We need to wait till we start a fresh non-nested read-side critical section, in order to switch to global rwlock. The reason is that preempt_enable() or put_online_cpus_atomic() can only undo what its predecessor (preempt_disable()/get_online_cpus_atomic()) did. 2. Part of the reason we want to get rid of stop_machine() is to avoid the latency it induces on _all_ CPUs just to take *one* CPU offline. If we use kick_all_cpus_sync(), we get into that territory again : we unfairly interrupt every CPU, _even when_ that CPU's existing preempt_disabled() sections might not actually be hotplug readers! (ie., not bothered about CPU Hotplug). So, I think whatever synchronization scheme we develop, must not induce the very same problems that stop_machine() had. Otherwise, we can end up going a full-circle and coming back to the same stop_machine() scenario that we intended to get rid of. (That's part of the reason why I initially tried to provide that _light() variant of the reader APIs in the previous version of the patchset, so that light readers can remain as undisturbed from cpu hotplug as possible, thereby avoiding indirectly inducing the "stop_machine effect", like I mentioned here: https://lkml.org/lkml/2012/12/5/369) Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
On 12/10/2012 01:52 AM, Oleg Nesterov wrote: > On 12/10, Srivatsa S. Bhat wrote: >> >> On 12/10/2012 12:44 AM, Oleg Nesterov wrote: >> >>> But yes, it is easy to blame somebody else's code ;) And I can't suggest >>> something better at least right now. If I understand correctly, we can not >>> use, say, synchronize_sched() in _cpu_down() path >> >> We can't sleep in that code.. so that's a no-go. > > But we can? > > Note that I meant _cpu_down(), not get_online_cpus_atomic() or > take_cpu_down(). > Maybe I'm missing something, but how would it help if we did a synchronize_sched() so early (in _cpu_down())? Another bunch of preempt_disable() sections could start immediately after our call to synchronize_sched() no? How would we deal with that? What we need to ensure here is that all existing preempt_disable() sections are over and that *we* (meaning, the cpu offline writer) get to proceed immediately after that, making all the new readers wait for us. And that is possible only if we do our 'wait-for-readers' thing very close to our actual cpu offline operation (which is take_cpu_down). Moreover, the writer needs to remain stable (non-preemptible) while doing the wait-for-readers. Else (if the writer himself keeps getting scheduled in and out of the CPU) I can't see how he can possibly do justice to the wait. Also, synchronize_sched() only helps us do the 'wait-for-existing-readers' logic. What would take care of the 'prevent-new-readers-from-going-ahead' logic? To add to it all, synchronize_sched() waits for _all_ preempt_disable() sections to complete, whereas only a handful of them might actually care about CPU hotplug. Which is an unnecessary burden for the writer (ie., waiting for unrelated readers to complete). I bet you meant something else. Sorry if I misunderstood it. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
Damn, sorry for noise. I missed this part... On 12/10, Srivatsa S. Bhat wrote: > > On 12/10/2012 12:44 AM, Oleg Nesterov wrote: > > the latency. And I guess something like kick_all_cpus_sync() is "too heavy". > > I hadn't considered that. Thinking of it, I don't think it would help us.. > It won't get rid of the currently running preempt_disable() sections no? Sure. But (again, this is only my feeling so far) given that get_online_cpus_atomic() does cli/sti, this can help to implement ensure-the-readers-must-see-the-pending-writer. IOW this might help to implement sync-with-readers. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
On 12/07, Srivatsa S. Bhat wrote: > > 4. No deadlock possibilities > >Per-cpu locking is not the way to go if we want to have relaxed rules >for lock-ordering. Because, we can end up in circular-locking dependencies >as explained in https://lkml.org/lkml/2012/12/6/290 OK, but this assumes that, contrary to what Steven said, read-write-read deadlock is not possible when it comes to rwlock_t. So far I think this is true and we can't deadlock. Steven? However. If this is true, then compared to preempt_disable/stop_machine livelock is possible. Probably this is fine, we have the same problem with get_online_cpus(). But if we can accept this fact I feel we can simmplify this somehow... Can't prove, only feel ;) Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
On 12/10, Srivatsa S. Bhat wrote: > > On 12/10/2012 12:44 AM, Oleg Nesterov wrote: > > > But yes, it is easy to blame somebody else's code ;) And I can't suggest > > something better at least right now. If I understand correctly, we can not > > use, say, synchronize_sched() in _cpu_down() path > > We can't sleep in that code.. so that's a no-go. But we can? Note that I meant _cpu_down(), not get_online_cpus_atomic() or take_cpu_down(). Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
On 12/10/2012 12:44 AM, Oleg Nesterov wrote: > On 12/07, Srivatsa S. Bhat wrote: >> >> Per-cpu counters can help solve the cache-line bouncing problem. So we >> actually use the best of both: per-cpu counters (no-waiting) at the reader >> side in the fast-path, and global rwlocks in the slowpath. >> >> [ Fastpath = no writer is active; Slowpath = a writer is active ] >> >> IOW, the hotplug readers just increment/decrement their per-cpu refcounts >> when no writer is active. > > Plus LOCK and cli/sti. I do not pretend I really know how bad this is > performance-wise though. And at first glance this look overcomplicated. > Hehe, I agree ;-) But I couldn't think of any other way to get rid of the deadlock possibilities, other than using global rwlocks. So I designed a way in which we can switch between per-cpu counters and global rwlocks dynamically. Probably there is a smarter way to achieve what we want, dunno... > But yes, it is easy to blame somebody else's code ;) And I can't suggest > something better at least right now. If I understand correctly, we can not > use, say, synchronize_sched() in _cpu_down() path We can't sleep in that code.. so that's a no-go. >, you also want to improve > the latency. And I guess something like kick_all_cpus_sync() is "too heavy". > I hadn't considered that. Thinking of it, I don't think it would help us.. It won't get rid of the currently running preempt_disable() sections no? > Also. After the quick reading this doesn't look correct, please see below. > >> +void get_online_cpus_atomic(void) >> +{ >> +unsigned int cpu = smp_processor_id(); >> +unsigned long flags; >> + >> +preempt_disable(); >> +local_irq_save(flags); >> + >> +if (cpu_hotplug.active_writer == current) >> +goto out; >> + >> +smp_rmb(); /* Paired with smp_wmb() in drop_writer_signal() */ >> + >> +if (likely(!writer_active(cpu))) { > > WINDOW. Suppose that reader_active() == F. > >> +mark_reader_fastpath(); >> +goto out; > > Why take_cpu_down() can't do announce_cpu_offline_begin() + sync_all_readers() > in between? > > Looks like we should increment the counter first, then check writer_active(). You are right, I missed the above race-conditions. > And sync_atomic_reader() needs rmb between 2 atomic_read's. > OK. > > Or. Again, suppose that reader_active() == F. But is_new_writer() == T. > >> +if (is_new_writer(cpu)) { >> +/* >> + * ACK the writer's signal only if this is a fresh read-side >> + * critical section, and not just an extension of a running >> + * (nested) read-side critical section. >> + */ >> +if (!reader_active(cpu)) { >> +ack_writer_signal(); > > What if take_cpu_down() does announce_cpu_offline_end() right before > ack_writer_signal() ? In this case get_online_cpus_atomic() returns > with writer_signal == -1. If nothing else this breaks the next > raise_writer_signal(). > Oh, yes, this one is wrong too! We need to mark ourselves as active reader right in the beginning. And probably change the check to "reader_nested()" or something like that. Thanks a lot Oleg! Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
On 12/07, Srivatsa S. Bhat wrote: > > Per-cpu counters can help solve the cache-line bouncing problem. So we > actually use the best of both: per-cpu counters (no-waiting) at the reader > side in the fast-path, and global rwlocks in the slowpath. > > [ Fastpath = no writer is active; Slowpath = a writer is active ] > > IOW, the hotplug readers just increment/decrement their per-cpu refcounts > when no writer is active. Plus LOCK and cli/sti. I do not pretend I really know how bad this is performance-wise though. And at first glance this look overcomplicated. But yes, it is easy to blame somebody else's code ;) And I can't suggest something better at least right now. If I understand correctly, we can not use, say, synchronize_sched() in _cpu_down() path, you also want to improve the latency. And I guess something like kick_all_cpus_sync() is "too heavy". Also. After the quick reading this doesn't look correct, please see below. > +void get_online_cpus_atomic(void) > +{ > + unsigned int cpu = smp_processor_id(); > + unsigned long flags; > + > + preempt_disable(); > + local_irq_save(flags); > + > + if (cpu_hotplug.active_writer == current) > + goto out; > + > + smp_rmb(); /* Paired with smp_wmb() in drop_writer_signal() */ > + > + if (likely(!writer_active(cpu))) { WINDOW. Suppose that reader_active() == F. > + mark_reader_fastpath(); > + goto out; Why take_cpu_down() can't do announce_cpu_offline_begin() + sync_all_readers() in between? Looks like we should increment the counter first, then check writer_active(). And sync_atomic_reader() needs rmb between 2 atomic_read's. Or. Again, suppose that reader_active() == F. But is_new_writer() == T. > + if (is_new_writer(cpu)) { > + /* > + * ACK the writer's signal only if this is a fresh read-side > + * critical section, and not just an extension of a running > + * (nested) read-side critical section. > + */ > + if (!reader_active(cpu)) { > + ack_writer_signal(); What if take_cpu_down() does announce_cpu_offline_end() right before ack_writer_signal() ? In this case get_online_cpus_atomic() returns with writer_signal == -1. If nothing else this breaks the next raise_writer_signal(). Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
On 12/08/2012 12:01 AM, Tejun Heo wrote: > Hello, Srivatsa. > > On Fri, Dec 07, 2012 at 11:54:01PM +0530, Srivatsa S. Bhat wrote: >>> lg_lock doesn't do local nesting and I'm not sure how big a deal that >>> is as I don't know how many should be converted. But if nesting is an >>> absolute necessity, it would be much better to implement generic >>> rwlock variant (say, lg_rwlock) rather than implementing unusual >>> cpuhotplug-specific percpu synchronization construct. >> >> To be honest, at a certain point in time while designing this, I did >> realize that this was getting kinda overly complicated ;-) ... but I >> wanted to see how this would actually work out when finished and get >> some feedback on the same, hence I posted it out. But this also proves >> that we _can_ actually compete with the flexibility of preempt_disable() >> and still be safe with respect to locking, if we really want to ;-) > > I got confused by comparison to preempt_disable() but you're right > that percpu rwlock shouldn't be able to introduce locking dependency > which doesn't exist with non-percpu rwlock. ie. write locking should > be atomic w.r.t. to all readers. Yep! > At the simplest, this can be > implemented by writer backing out all the way if try-locking any CPU > fails and retrying the whole thing. That should be correct but has > the potential of starving the writer. > Exactly! This is what I mentioned yesterday in the link below, and said that its not good because of writer starvation ("too wasteful"): https://lkml.org/lkml/2012/12/6/290 > What we need here is a generic percpu-rwlock. I don't know which > exact implementation strategy we should choose. Maybe your switching > to global rwlock is the right solution. But, at any rate, I think it > would be best to implement proper percpu-rwlock and then apply it to > CPU hotplug. It's actually gonna be pretty fitting as > get_online_cpus() is being converted to percpu-rwsem. IIUC, Oleg has > been working on this for a while now. Oleg, what do you think? > Hmm, that sounds good. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
On 12/07/2012 11:46 PM, Tejun Heo wrote: > Hello, again. > > On Fri, Dec 07, 2012 at 09:57:24AM -0800, Tejun Heo wrote: >> possible. Also, I think the right approach would be auditing each >> get_online_cpus_atomic() callsites and figure out proper locking order >> rather than implementing a construct this unusual especially as >> hunting down the incorrect cases shouldn't be difficult given proper >> lockdep annotation. > > On the second look, it looks like you're implementing proper > percpu_rwlock semantics Ah, nice! I didn't realize that I was actually doing what I intended to avoid! ;-) Looking at the implementation of lglocks, and going by Oleg's earlier comment that we just need to replace spinlock_t with rwlock_t in them to get percpu_rwlocks, I was horrified at the kinds of circular locking dependencies that they would be prone to, unless used carefully. So I devised this scheme to be safe, while still having relaxed rules. But if *this* is what percpu_rwlocks should ideally look like, then great! :-) > as readers aren't supposed to induce circular > dependency directly. Yep, in this scheme, nobody will end up in circular dependency. > Can you please work with Oleg to implement > proper percpu-rwlock and use that for CPU hotplug rather than > implementing it inside CPU hotplug? > Sure, I'd be more than happy to! Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
Hello, Srivatsa. On Fri, Dec 07, 2012 at 11:54:01PM +0530, Srivatsa S. Bhat wrote: > > lg_lock doesn't do local nesting and I'm not sure how big a deal that > > is as I don't know how many should be converted. But if nesting is an > > absolute necessity, it would be much better to implement generic > > rwlock variant (say, lg_rwlock) rather than implementing unusual > > cpuhotplug-specific percpu synchronization construct. > > To be honest, at a certain point in time while designing this, I did > realize that this was getting kinda overly complicated ;-) ... but I > wanted to see how this would actually work out when finished and get > some feedback on the same, hence I posted it out. But this also proves > that we _can_ actually compete with the flexibility of preempt_disable() > and still be safe with respect to locking, if we really want to ;-) I got confused by comparison to preempt_disable() but you're right that percpu rwlock shouldn't be able to introduce locking dependency which doesn't exist with non-percpu rwlock. ie. write locking should be atomic w.r.t. to all readers. At the simplest, this can be implemented by writer backing out all the way if try-locking any CPU fails and retrying the whole thing. That should be correct but has the potential of starving the writer. What we need here is a generic percpu-rwlock. I don't know which exact implementation strategy we should choose. Maybe your switching to global rwlock is the right solution. But, at any rate, I think it would be best to implement proper percpu-rwlock and then apply it to CPU hotplug. It's actually gonna be pretty fitting as get_online_cpus() is being converted to percpu-rwsem. IIUC, Oleg has been working on this for a while now. Oleg, what do you think? Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
On 12/07/2012 11:27 PM, Tejun Heo wrote: > On Fri, Dec 07, 2012 at 11:08:13PM +0530, Srivatsa S. Bhat wrote: >> 4. No deadlock possibilities >> >>Per-cpu locking is not the way to go if we want to have relaxed rules >>for lock-ordering. Because, we can end up in circular-locking dependencies >>as explained in https://lkml.org/lkml/2012/12/6/290 >> >>So, avoid per-cpu locking schemes (per-cpu locks/per-cpu atomic counters >>with spin-on-contention etc) as much as possible. > > I really can't say I like this approach. percpu locking is very > tricky to get right and difficult to get right and we should try our > best to avoid implementing subsystem specific ones as much as > possible. Also, I think the right approach would be auditing each > get_online_cpus_atomic() callsites and figure out proper locking order > rather than implementing a construct this unusual especially as > hunting down the incorrect cases shouldn't be difficult given proper > lockdep annotation. > > lg_lock doesn't do local nesting and I'm not sure how big a deal that > is as I don't know how many should be converted. But if nesting is an > absolute necessity, it would be much better to implement generic > rwlock variant (say, lg_rwlock) rather than implementing unusual > cpuhotplug-specific percpu synchronization construct. > To be honest, at a certain point in time while designing this, I did realize that this was getting kinda overly complicated ;-) ... but I wanted to see how this would actually work out when finished and get some feedback on the same, hence I posted it out. But this also proves that we _can_ actually compete with the flexibility of preempt_disable() and still be safe with respect to locking, if we really want to ;-) Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
Hello, again. On Fri, Dec 07, 2012 at 09:57:24AM -0800, Tejun Heo wrote: > possible. Also, I think the right approach would be auditing each > get_online_cpus_atomic() callsites and figure out proper locking order > rather than implementing a construct this unusual especially as > hunting down the incorrect cases shouldn't be difficult given proper > lockdep annotation. On the second look, it looks like you're implementing proper percpu_rwlock semantics as readers aren't supposed to induce circular dependency directly. Can you please work with Oleg to implement proper percpu-rwlock and use that for CPU hotplug rather than implementing it inside CPU hotplug? Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v3 1/9] CPU hotplug: Provide APIs to prevent CPU offline from atomic context
On Fri, Dec 07, 2012 at 11:08:13PM +0530, Srivatsa S. Bhat wrote: > 4. No deadlock possibilities > >Per-cpu locking is not the way to go if we want to have relaxed rules >for lock-ordering. Because, we can end up in circular-locking dependencies >as explained in https://lkml.org/lkml/2012/12/6/290 > >So, avoid per-cpu locking schemes (per-cpu locks/per-cpu atomic counters >with spin-on-contention etc) as much as possible. I really can't say I like this approach. percpu locking is very tricky to get right and difficult to get right and we should try our best to avoid implementing subsystem specific ones as much as possible. Also, I think the right approach would be auditing each get_online_cpus_atomic() callsites and figure out proper locking order rather than implementing a construct this unusual especially as hunting down the incorrect cases shouldn't be difficult given proper lockdep annotation. lg_lock doesn't do local nesting and I'm not sure how big a deal that is as I don't know how many should be converted. But if nesting is an absolute necessity, it would be much better to implement generic rwlock variant (say, lg_rwlock) rather than implementing unusual cpuhotplug-specific percpu synchronization construct. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/