Re: [PATCH] percpu-rwsem: use barrier in unlock path
On Thu, Oct 18, 2012 at 9:00 AM, Mikulas Patocka wrote: > > What is the procedure for making changes that require support of > architectures? It is trivial to make a patch that moves this into > arch-specific includes, the problem is that the patch break all the > architectures - I wrote support for x86, sparc, parisc, alpha (I can test > those) but not the others. We'd need to add it to everybody. It shouldn't need per-architecture testing - since "smp_mb()" is always safe. So we could just make all architectures default to that, and then for x86 (and potentially others that have cheaper models for release-consistency) just do the optimized one. We *could* also simply do something like #ifndef smp_release_before_store #define smp_release_before_store() smp_mb() #endif and basically make the rule be that only architectures that have a cheaper one need to define it at all. That may be the correct short-term fix, since there initially would be only a single user. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] percpu-rwsem: use barrier in unlock path
On Thu, Oct 18, 2012 at 9:00 AM, Mikulas Patocka mpato...@redhat.com wrote: What is the procedure for making changes that require support of architectures? It is trivial to make a patch that moves this into arch-specific includes, the problem is that the patch break all the architectures - I wrote support for x86, sparc, parisc, alpha (I can test those) but not the others. We'd need to add it to everybody. It shouldn't need per-architecture testing - since smp_mb() is always safe. So we could just make all architectures default to that, and then for x86 (and potentially others that have cheaper models for release-consistency) just do the optimized one. We *could* also simply do something like #ifndef smp_release_before_store #define smp_release_before_store() smp_mb() #endif and basically make the rule be that only architectures that have a cheaper one need to define it at all. That may be the correct short-term fix, since there initially would be only a single user. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] percpu-rwsem: use barrier in unlock path
This patch looks sensible. I'd apply either this or my previous patch that adds synchronize_rcu() to percpu_up_write. This patch avoids the memory barrier on non-x86 cpus in percpu_up_read, so it is faster than the previous approach. Mikulas On Thu, 18 Oct 2012, Lai Jiangshan wrote: > --- > > a very draft example of paired-mb()s is here: > > > diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h > index cf80f7e..84a93c0 100644 > --- a/include/linux/percpu-rwsem.h > +++ b/include/linux/percpu-rwsem.h > @@ -12,6 +12,14 @@ struct percpu_rw_semaphore { > struct mutex mtx; > }; > > +#if 1 > +#define light_mb() barrier() > +#define heavy_mb() synchronize_sched() > +#else > +#define light_mb() smp_mb() > +#define heavy_mb() smp_mb(); > +#endif > + > static inline void percpu_down_read(struct percpu_rw_semaphore *p) > { > rcu_read_lock(); > @@ -24,22 +32,12 @@ static inline void percpu_down_read(struct > percpu_rw_semaphore *p) > } > this_cpu_inc(*p->counters); > rcu_read_unlock(); > + light_mb(); /* A, between read of p->locked and read of data, paired > with D */ > } > > static inline void percpu_up_read(struct percpu_rw_semaphore *p) > { > - /* > - * On X86, write operation in this_cpu_dec serves as a memory unlock > - * barrier (i.e. memory accesses may be moved before the write, but > - * no memory accesses are moved past the write). > - * On other architectures this may not be the case, so we need smp_mb() > - * there. > - */ > -#if defined(CONFIG_X86) && (!defined(CONFIG_X86_PPRO_FENCE) && > !defined(CONFIG_X86_OOSTORE)) > - barrier(); > -#else > - smp_mb(); > -#endif > + light_mb(); /* B, between read of the data and write to p->counter, > paired with C */ > this_cpu_dec(*p->counters); > } > > @@ -61,11 +59,12 @@ static inline void percpu_down_write(struct > percpu_rw_semaphore *p) > synchronize_rcu(); > while (__percpu_count(p->counters)) > msleep(1); > - smp_rmb(); /* paired with smp_mb() in percpu_sem_up_read() */ > + heavy_mb(); /* C, between read of p->counter and write to data, paired > with B */ > } > > static inline void percpu_up_write(struct percpu_rw_semaphore *p) > { > + heavy_mb(); /* D, between write to data and write to p->locked, paired > with A */ > p->locked = false; > mutex_unlock(>mtx); > } > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] percpu-rwsem: use barrier in unlock path
On Tue, 16 Oct 2012, Linus Torvalds wrote: > [ Architecture people, note the potential new SMP barrier! ] > > On Tue, Oct 16, 2012 at 4:30 PM, Mikulas Patocka wrote: > > + /* > > +* The lock is considered unlocked when p->locked is set to false. > > +* Use barrier prevent reordering of operations around p->locked. > > +*/ > > +#if defined(CONFIG_X86) && (!defined(CONFIG_X86_PPRO_FENCE) && > > !defined(CONFIG_X86_OOSTORE)) > > + barrier(); > > +#else > > + smp_mb(); > > +#endif > > p->locked = false; > > Ugh. The #if is too ugly to live. One instance of this is already present in the code. I suggest that you drop this patch and use synchronize_rcu() that I have just posted. But another instance of this "#if defined(CONFIG_X86) && ..." is already there in percpu_up_read. What is the procedure for making changes that require support of architectures? It is trivial to make a patch that moves this into arch-specific includes, the problem is that the patch break all the architectures - I wrote support for x86, sparc, parisc, alpha (I can test those) but not the others. Are you going to apply this patch, break majority of architectures and wait until architecture maintainers fix it? Or is there some arch-specific git tree, where the patch should be applied and where the maintainers fix things? > This is a classic case of "people who write their own serialization > primitives invariably get them wrong". And this fix is just horrible, > and code like this should not be allowed. > > I suspect we could make the above x86-specific optimization be valid > by introducing a new barrier, called "smp_release_before_store()" or > something like that, which on x86 just happens to be a no-op (but > still a compiler barrier). That's because earlier reads will not pass > a later stores, and stores are viewed in program order, so all x86 > stores have "release consistency" modulo the theoretical PPro bugs > (that I don't think we actually ever saw in practice). > > But it is possible that that is true on other architectures too, so > your hand-crafted x86-specific #if is not just ugly, it's liable to > get worse. > > The alternative is to just say that we should use "smp_mb()" > unconditionally, depending on how critical this code actually ends up > being. > > Added linux-arch in case architecture-maintainers have comments on > "smp_release_before_store()" thing. It would be kind of similar to the > odd "smp_read_barrier_depends()", except it would normally be a full > memory barrier, except on architectures where a weaker barrier might > be sufficient. > > I suspect there may be many architectures where a "smp_wmb()" is > sufficient for this case, for the simple reason that no sane > microarchitecture would *ever* move earlier reads down past a later > write, Alpha can move reads down past writes (if the read is not in cache and the write hits the cache, it may make sense to do this reordering). > so release consistency really only needs the local writes to be > ordered, not the full memory ordering. > > Arch people? > > The more optimal solution may be to mark the store *itself* to be > "store with release consistency", which on x86 would be a regular > store (with the compiler barrier), but on other architectures may be a > special memory operation. On architectures with > release/aqcuire-consistency, there's not a separate barrier before the > store, the store instruction itself is done with special semantics. So > maybe the right thing to do is > >#define smp_release_consistency_store(val, ptr) ... > > where on x86, the implementation would be a simple > >do { barrier(); *(ptr)=(val); } while (0) smp_release_consistency_store doesn't work. In include/linux/percpu-rwsem.h it is required that "this_cpu_dec(*p->counters);" works as an unlock barrier. So we could do either. "smp_mb(); this_cpu_dec(*p->counters);" - generates pointless barrier on x86 "#if defined(CONFIG_X86) && (!defined(CONFIG_X86_PPRO_FENCE) && !defined(CONFIG_X86_OOSTORE)) barrier(); #else smp_mb(); #endif this_cpu_dec(*p->counters); " - removes the barrier on the most common architecture (X86), but the code looks dirty "smp_release_before_store();this_cpu_dec(*p->counters);" - the code is clean, but the downside is that it breaks architectures that don't have smp_release_before_store(). > but on other architectures it might be a inline asm with the required > magic store-with-release instruction. > > How important is this code sequence? Is the "percpu_up_write()" > function really so critical that we can't have an extra memory > barrier? percpu_up_write() is not critical. But percpu_up_read() is critical and it should be as fast as possible. Mikulas > Or do people perhaps see *other* places where > release-consistency-stores might be worth doing? > > But in no event do I want to see that butt-ugly #if statement. > >
Re: [PATCH] percpu-rwsem: use barrier in unlock path
On Thu, 18 Oct 2012, Steven Rostedt wrote: > On Thu, 2012-10-18 at 10:18 +0800, Lai Jiangshan wrote: > > > > > > Looking at the patch, you are correct. The read side doesn't need the > > > memory barrier as the worse thing that will happen is that it sees the > > > locked = false, and will just grab the mutex unnecessarily. > > > > - > > A memory barrier can be added iff these two things are known: > > 1) it disables the disordering between what and what. > > 2) what is the corresponding mb() that it pairs with. > > > > OK, I was just looking at the protection and actions of the locked flag, > but I see what you are saying with the data itself. > > > You tried to add a mb() in percpu_up_write(), OK, I know it disables the > > disordering > > between the writes to the protected data and the statement "p->locked = > > false", > > But I can't find out the corresponding mb() that it pairs with. > > > > percpu_down_read() writes to the data > > The cpu cache/prefetch the data writes to the data > > which is chaos writes to the data > > percpu_up_write() > > mb() > > p->locked = > > false; > > unlikely(p->locked) > > the cpu see p->lock = false, > > don't discard the cached/prefetch data > > this_cpu_inc(*p->counters); > > the code of read-access to the data > > and we use the chaos data* > > > > So you need to add a mb() after "unlikely(p->locked)". > > Does it need a full mb() or could it be just a rmb()? The down_read I > wouldn't think would need to protect against stores, would it? The > memory barrier should probably go in front of the unlikely() too. The > write to p->counters is handled by the synchronized sched, and adding a > rmb() in front of the unlikely check would keep prefetched data from > passing this barrier. > > This is a perfect example why this primitive should be vetted outside of > mainline before it gets merged. > > -- Steve If we do synchronize_rcu() in percpu_up_write, we don't need a barrier in percpu_down_read(). So I would do that. Mikulas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] percpu-rwsem: use barrier in unlock path
On Wed, 17 Oct 2012, Steven Rostedt wrote: > On Wed, Oct 17, 2012 at 11:07:21AM -0400, Mikulas Patocka wrote: > > > > > > Even the previous patch is applied, percpu_down_read() still > > > needs mb() to pair with it. > > > > percpu_down_read uses rcu_read_lock which should guarantee that memory > > accesses don't escape in front of a rcu-protected section. > > You do realize that rcu_read_lock() does nothing more that a barrier(), > right? > > Paul worked really hard to get rcu_read_locks() to not call HW barriers. > > > > > If rcu_read_unlock has only an unlock barrier and not a full barrier, > > memory accesses could be moved in front of rcu_read_unlock and reordered > > with this_cpu_inc(*p->counters), but it doesn't matter because > > percpu_down_write does synchronize_rcu(), so it never sees these accesses > > halfway through. > > Looking at the patch, you are correct. The read side doesn't need the > memory barrier as the worse thing that will happen is that it sees the > locked = false, and will just grab the mutex unnecessarily. It wasn't correct. CPU 1 is holding the write lock. CPU 2 could get to percpu_down_read past rcu_read_lock and prefetch some data that are accessed after percpu_down_read. CPU 1 goes into percpu_up_write(), calls a barrier, p->locked = false; and mutex_unlock(>mtx); CPU 2 now sees p->locked == false, so it goes through percpu_down_read. It exists percpu_down_read and uses the invalid prefetched data. It could be fixed by using synchronize_rcu(); in percpu_up_write, I sent a patch for that. Mikulas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] percpu-rwsem: use barrier in unlock path
On Thu, 18 Oct 2012, Lai Jiangshan wrote: > On 10/18/2012 04:28 AM, Steven Rostedt wrote: > > On Wed, Oct 17, 2012 at 11:07:21AM -0400, Mikulas Patocka wrote: > >>> > >>> Even the previous patch is applied, percpu_down_read() still > >>> needs mb() to pair with it. > >> > >> percpu_down_read uses rcu_read_lock which should guarantee that memory > >> accesses don't escape in front of a rcu-protected section. > > > > You do realize that rcu_read_lock() does nothing more that a barrier(), > > right? > > > > Paul worked really hard to get rcu_read_locks() to not call HW barriers. > > > >> > >> If rcu_read_unlock has only an unlock barrier and not a full barrier, > >> memory accesses could be moved in front of rcu_read_unlock and reordered > >> with this_cpu_inc(*p->counters), but it doesn't matter because > >> percpu_down_write does synchronize_rcu(), so it never sees these accesses > >> halfway through. > > > > Looking at the patch, you are correct. The read side doesn't need the > > memory barrier as the worse thing that will happen is that it sees the > > locked = false, and will just grab the mutex unnecessarily. > > - > A memory barrier can be added iff these two things are known: > 1) it disables the disordering between what and what. > 2) what is the corresponding mb() that it pairs with. > > You tried to add a mb() in percpu_up_write(), OK, I know it disables the > disordering > between the writes to the protected data and the statement "p->locked = > false", > But I can't find out the corresponding mb() that it pairs with. Or alternativelly, instead of barrier, we can do this. Mikulas --- percpu-rwsem: use barrier in unlock path The lock is considered unlocked when p->locked is set to false. Use barrier prevent reordering of operations around p->locked. Signed-off-by: Mikulas Patocka --- include/linux/percpu-rwsem.h |6 ++ 1 file changed, 6 insertions(+) Index: linux-3.6.2-fast/include/linux/percpu-rwsem.h === --- linux-3.6.2-fast.orig/include/linux/percpu-rwsem.h 2012-10-17 20:48:40.0 +0200 +++ linux-3.6.2-fast/include/linux/percpu-rwsem.h 2012-10-18 17:19:24.0 +0200 @@ -66,6 +66,12 @@ static inline void percpu_down_write(str static inline void percpu_up_write(struct percpu_rw_semaphore *p) { + /* +* Make sure that other processes that are in rcu section and that +* may have seen partially modified state exit the rcu section and +* try to grab the mutex. +*/ + synchronize_rcu(); p->locked = false; mutex_unlock(>mtx); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] percpu-rwsem: use barrier in unlock path
On Thu, 18 Oct 2012, Lai Jiangshan wrote: On 10/18/2012 04:28 AM, Steven Rostedt wrote: On Wed, Oct 17, 2012 at 11:07:21AM -0400, Mikulas Patocka wrote: Even the previous patch is applied, percpu_down_read() still needs mb() to pair with it. percpu_down_read uses rcu_read_lock which should guarantee that memory accesses don't escape in front of a rcu-protected section. You do realize that rcu_read_lock() does nothing more that a barrier(), right? Paul worked really hard to get rcu_read_locks() to not call HW barriers. If rcu_read_unlock has only an unlock barrier and not a full barrier, memory accesses could be moved in front of rcu_read_unlock and reordered with this_cpu_inc(*p-counters), but it doesn't matter because percpu_down_write does synchronize_rcu(), so it never sees these accesses halfway through. Looking at the patch, you are correct. The read side doesn't need the memory barrier as the worse thing that will happen is that it sees the locked = false, and will just grab the mutex unnecessarily. - A memory barrier can be added iff these two things are known: 1) it disables the disordering between what and what. 2) what is the corresponding mb() that it pairs with. You tried to add a mb() in percpu_up_write(), OK, I know it disables the disordering between the writes to the protected data and the statement p-locked = false, But I can't find out the corresponding mb() that it pairs with. Or alternativelly, instead of barrier, we can do this. Mikulas --- percpu-rwsem: use barrier in unlock path The lock is considered unlocked when p-locked is set to false. Use barrier prevent reordering of operations around p-locked. Signed-off-by: Mikulas Patocka mpato...@redhat.com --- include/linux/percpu-rwsem.h |6 ++ 1 file changed, 6 insertions(+) Index: linux-3.6.2-fast/include/linux/percpu-rwsem.h === --- linux-3.6.2-fast.orig/include/linux/percpu-rwsem.h 2012-10-17 20:48:40.0 +0200 +++ linux-3.6.2-fast/include/linux/percpu-rwsem.h 2012-10-18 17:19:24.0 +0200 @@ -66,6 +66,12 @@ static inline void percpu_down_write(str static inline void percpu_up_write(struct percpu_rw_semaphore *p) { + /* +* Make sure that other processes that are in rcu section and that +* may have seen partially modified state exit the rcu section and +* try to grab the mutex. +*/ + synchronize_rcu(); p-locked = false; mutex_unlock(p-mtx); } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] percpu-rwsem: use barrier in unlock path
On Wed, 17 Oct 2012, Steven Rostedt wrote: On Wed, Oct 17, 2012 at 11:07:21AM -0400, Mikulas Patocka wrote: Even the previous patch is applied, percpu_down_read() still needs mb() to pair with it. percpu_down_read uses rcu_read_lock which should guarantee that memory accesses don't escape in front of a rcu-protected section. You do realize that rcu_read_lock() does nothing more that a barrier(), right? Paul worked really hard to get rcu_read_locks() to not call HW barriers. If rcu_read_unlock has only an unlock barrier and not a full barrier, memory accesses could be moved in front of rcu_read_unlock and reordered with this_cpu_inc(*p-counters), but it doesn't matter because percpu_down_write does synchronize_rcu(), so it never sees these accesses halfway through. Looking at the patch, you are correct. The read side doesn't need the memory barrier as the worse thing that will happen is that it sees the locked = false, and will just grab the mutex unnecessarily. It wasn't correct. CPU 1 is holding the write lock. CPU 2 could get to percpu_down_read past rcu_read_lock and prefetch some data that are accessed after percpu_down_read. CPU 1 goes into percpu_up_write(), calls a barrier, p-locked = false; and mutex_unlock(p-mtx); CPU 2 now sees p-locked == false, so it goes through percpu_down_read. It exists percpu_down_read and uses the invalid prefetched data. It could be fixed by using synchronize_rcu(); in percpu_up_write, I sent a patch for that. Mikulas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] percpu-rwsem: use barrier in unlock path
On Thu, 18 Oct 2012, Steven Rostedt wrote: On Thu, 2012-10-18 at 10:18 +0800, Lai Jiangshan wrote: Looking at the patch, you are correct. The read side doesn't need the memory barrier as the worse thing that will happen is that it sees the locked = false, and will just grab the mutex unnecessarily. - A memory barrier can be added iff these two things are known: 1) it disables the disordering between what and what. 2) what is the corresponding mb() that it pairs with. OK, I was just looking at the protection and actions of the locked flag, but I see what you are saying with the data itself. You tried to add a mb() in percpu_up_write(), OK, I know it disables the disordering between the writes to the protected data and the statement p-locked = false, But I can't find out the corresponding mb() that it pairs with. percpu_down_read() writes to the data The cpu cache/prefetch the data writes to the data which is chaos writes to the data percpu_up_write() mb() p-locked = false; unlikely(p-locked) the cpu see p-lock = false, don't discard the cached/prefetch data this_cpu_inc(*p-counters); the code of read-access to the data and we use the chaos data* So you need to add a mb() after unlikely(p-locked). Does it need a full mb() or could it be just a rmb()? The down_read I wouldn't think would need to protect against stores, would it? The memory barrier should probably go in front of the unlikely() too. The write to p-counters is handled by the synchronized sched, and adding a rmb() in front of the unlikely check would keep prefetched data from passing this barrier. This is a perfect example why this primitive should be vetted outside of mainline before it gets merged. -- Steve If we do synchronize_rcu() in percpu_up_write, we don't need a barrier in percpu_down_read(). So I would do that. Mikulas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] percpu-rwsem: use barrier in unlock path
On Tue, 16 Oct 2012, Linus Torvalds wrote: [ Architecture people, note the potential new SMP barrier! ] On Tue, Oct 16, 2012 at 4:30 PM, Mikulas Patocka mpato...@redhat.com wrote: + /* +* The lock is considered unlocked when p-locked is set to false. +* Use barrier prevent reordering of operations around p-locked. +*/ +#if defined(CONFIG_X86) (!defined(CONFIG_X86_PPRO_FENCE) !defined(CONFIG_X86_OOSTORE)) + barrier(); +#else + smp_mb(); +#endif p-locked = false; Ugh. The #if is too ugly to live. One instance of this is already present in the code. I suggest that you drop this patch and use synchronize_rcu() that I have just posted. But another instance of this #if defined(CONFIG_X86) ... is already there in percpu_up_read. What is the procedure for making changes that require support of architectures? It is trivial to make a patch that moves this into arch-specific includes, the problem is that the patch break all the architectures - I wrote support for x86, sparc, parisc, alpha (I can test those) but not the others. Are you going to apply this patch, break majority of architectures and wait until architecture maintainers fix it? Or is there some arch-specific git tree, where the patch should be applied and where the maintainers fix things? This is a classic case of people who write their own serialization primitives invariably get them wrong. And this fix is just horrible, and code like this should not be allowed. I suspect we could make the above x86-specific optimization be valid by introducing a new barrier, called smp_release_before_store() or something like that, which on x86 just happens to be a no-op (but still a compiler barrier). That's because earlier reads will not pass a later stores, and stores are viewed in program order, so all x86 stores have release consistency modulo the theoretical PPro bugs (that I don't think we actually ever saw in practice). But it is possible that that is true on other architectures too, so your hand-crafted x86-specific #if is not just ugly, it's liable to get worse. The alternative is to just say that we should use smp_mb() unconditionally, depending on how critical this code actually ends up being. Added linux-arch in case architecture-maintainers have comments on smp_release_before_store() thing. It would be kind of similar to the odd smp_read_barrier_depends(), except it would normally be a full memory barrier, except on architectures where a weaker barrier might be sufficient. I suspect there may be many architectures where a smp_wmb() is sufficient for this case, for the simple reason that no sane microarchitecture would *ever* move earlier reads down past a later write, Alpha can move reads down past writes (if the read is not in cache and the write hits the cache, it may make sense to do this reordering). so release consistency really only needs the local writes to be ordered, not the full memory ordering. Arch people? The more optimal solution may be to mark the store *itself* to be store with release consistency, which on x86 would be a regular store (with the compiler barrier), but on other architectures may be a special memory operation. On architectures with release/aqcuire-consistency, there's not a separate barrier before the store, the store instruction itself is done with special semantics. So maybe the right thing to do is #define smp_release_consistency_store(val, ptr) ... where on x86, the implementation would be a simple do { barrier(); *(ptr)=(val); } while (0) smp_release_consistency_store doesn't work. In include/linux/percpu-rwsem.h it is required that this_cpu_dec(*p-counters); works as an unlock barrier. So we could do either. smp_mb(); this_cpu_dec(*p-counters); - generates pointless barrier on x86 #if defined(CONFIG_X86) (!defined(CONFIG_X86_PPRO_FENCE) !defined(CONFIG_X86_OOSTORE)) barrier(); #else smp_mb(); #endif this_cpu_dec(*p-counters); - removes the barrier on the most common architecture (X86), but the code looks dirty smp_release_before_store();this_cpu_dec(*p-counters); - the code is clean, but the downside is that it breaks architectures that don't have smp_release_before_store(). but on other architectures it might be a inline asm with the required magic store-with-release instruction. How important is this code sequence? Is the percpu_up_write() function really so critical that we can't have an extra memory barrier? percpu_up_write() is not critical. But percpu_up_read() is critical and it should be as fast as possible. Mikulas Or do people perhaps see *other* places where release-consistency-stores might be worth doing? But in no event do I want to see that butt-ugly #if statement. Linus --- Introduce smp_release_before_store() smp_release_before_store() with the following store
Re: [PATCH] percpu-rwsem: use barrier in unlock path
This patch looks sensible. I'd apply either this or my previous patch that adds synchronize_rcu() to percpu_up_write. This patch avoids the memory barrier on non-x86 cpus in percpu_up_read, so it is faster than the previous approach. Mikulas On Thu, 18 Oct 2012, Lai Jiangshan wrote: --- a very draft example of paired-mb()s is here: diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h index cf80f7e..84a93c0 100644 --- a/include/linux/percpu-rwsem.h +++ b/include/linux/percpu-rwsem.h @@ -12,6 +12,14 @@ struct percpu_rw_semaphore { struct mutex mtx; }; +#if 1 +#define light_mb() barrier() +#define heavy_mb() synchronize_sched() +#else +#define light_mb() smp_mb() +#define heavy_mb() smp_mb(); +#endif + static inline void percpu_down_read(struct percpu_rw_semaphore *p) { rcu_read_lock(); @@ -24,22 +32,12 @@ static inline void percpu_down_read(struct percpu_rw_semaphore *p) } this_cpu_inc(*p-counters); rcu_read_unlock(); + light_mb(); /* A, between read of p-locked and read of data, paired with D */ } static inline void percpu_up_read(struct percpu_rw_semaphore *p) { - /* - * On X86, write operation in this_cpu_dec serves as a memory unlock - * barrier (i.e. memory accesses may be moved before the write, but - * no memory accesses are moved past the write). - * On other architectures this may not be the case, so we need smp_mb() - * there. - */ -#if defined(CONFIG_X86) (!defined(CONFIG_X86_PPRO_FENCE) !defined(CONFIG_X86_OOSTORE)) - barrier(); -#else - smp_mb(); -#endif + light_mb(); /* B, between read of the data and write to p-counter, paired with C */ this_cpu_dec(*p-counters); } @@ -61,11 +59,12 @@ static inline void percpu_down_write(struct percpu_rw_semaphore *p) synchronize_rcu(); while (__percpu_count(p-counters)) msleep(1); - smp_rmb(); /* paired with smp_mb() in percpu_sem_up_read() */ + heavy_mb(); /* C, between read of p-counter and write to data, paired with B */ } static inline void percpu_up_write(struct percpu_rw_semaphore *p) { + heavy_mb(); /* D, between write to data and write to p-locked, paired with A */ p-locked = false; mutex_unlock(p-mtx); } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] percpu-rwsem: use barrier in unlock path
On Thu, 2012-10-18 at 10:18 +0800, Lai Jiangshan wrote: > > > > Looking at the patch, you are correct. The read side doesn't need the > > memory barrier as the worse thing that will happen is that it sees the > > locked = false, and will just grab the mutex unnecessarily. > > - > A memory barrier can be added iff these two things are known: > 1) it disables the disordering between what and what. > 2) what is the corresponding mb() that it pairs with. > OK, I was just looking at the protection and actions of the locked flag, but I see what you are saying with the data itself. > You tried to add a mb() in percpu_up_write(), OK, I know it disables the > disordering > between the writes to the protected data and the statement "p->locked = > false", > But I can't find out the corresponding mb() that it pairs with. > > percpu_down_read()writes to the data > The cpu cache/prefetch the data writes to the data > which is chaos writes to the data > percpu_up_write() > mb() > p->locked = > false; > unlikely(p->locked) > the cpu see p->lock = false, > don't discard the cached/prefetch data > this_cpu_inc(*p->counters); > the code of read-access to the data > and we use the chaos data* > > So you need to add a mb() after "unlikely(p->locked)". Does it need a full mb() or could it be just a rmb()? The down_read I wouldn't think would need to protect against stores, would it? The memory barrier should probably go in front of the unlikely() too. The write to p->counters is handled by the synchronized sched, and adding a rmb() in front of the unlikely check would keep prefetched data from passing this barrier. This is a perfect example why this primitive should be vetted outside of mainline before it gets merged. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] percpu-rwsem: use barrier in unlock path
On 10/18/2012 04:28 AM, Steven Rostedt wrote: > On Wed, Oct 17, 2012 at 11:07:21AM -0400, Mikulas Patocka wrote: >>> >>> Even the previous patch is applied, percpu_down_read() still >>> needs mb() to pair with it. >> >> percpu_down_read uses rcu_read_lock which should guarantee that memory >> accesses don't escape in front of a rcu-protected section. > > You do realize that rcu_read_lock() does nothing more that a barrier(), > right? > > Paul worked really hard to get rcu_read_locks() to not call HW barriers. > >> >> If rcu_read_unlock has only an unlock barrier and not a full barrier, >> memory accesses could be moved in front of rcu_read_unlock and reordered >> with this_cpu_inc(*p->counters), but it doesn't matter because >> percpu_down_write does synchronize_rcu(), so it never sees these accesses >> halfway through. > > Looking at the patch, you are correct. The read side doesn't need the > memory barrier as the worse thing that will happen is that it sees the > locked = false, and will just grab the mutex unnecessarily. - A memory barrier can be added iff these two things are known: 1) it disables the disordering between what and what. 2) what is the corresponding mb() that it pairs with. You tried to add a mb() in percpu_up_write(), OK, I know it disables the disordering between the writes to the protected data and the statement "p->locked = false", But I can't find out the corresponding mb() that it pairs with. percpu_down_read() writes to the data The cpu cache/prefetch the data writes to the data which is chaos writes to the data percpu_up_write() mb() p->locked = false; unlikely(p->locked) the cpu see p->lock = false, don't discard the cached/prefetch data this_cpu_inc(*p->counters); the code of read-access to the data and we use the chaos data* So you need to add a mb() after "unlikely(p->locked)". - The RCU you use don't protect any data. It protects codes of the fast path: unlikely(p->locked); this_cpu_inc(*p->counters); and synchronize_rcu() ensures all previous fast path had fully finished "this_cpu_inc(*p->counters);". It don't protect other code/data, if you want to protect other code or other data, please add more synchronizations or mb()s. --- I extremely hate a synchronization protects code instead of data. but sometimes I also have to do it. --- a very draft example of paired-mb()s is here: diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h index cf80f7e..84a93c0 100644 --- a/include/linux/percpu-rwsem.h +++ b/include/linux/percpu-rwsem.h @@ -12,6 +12,14 @@ struct percpu_rw_semaphore { struct mutex mtx; }; +#if 1 +#define light_mb() barrier() +#define heavy_mb() synchronize_sched() +#else +#define light_mb() smp_mb() +#define heavy_mb() smp_mb(); +#endif + static inline void percpu_down_read(struct percpu_rw_semaphore *p) { rcu_read_lock(); @@ -24,22 +32,12 @@ static inline void percpu_down_read(struct percpu_rw_semaphore *p) } this_cpu_inc(*p->counters); rcu_read_unlock(); + light_mb(); /* A, between read of p->locked and read of data, paired with D */ } static inline void percpu_up_read(struct percpu_rw_semaphore *p) { - /* -* On X86, write operation in this_cpu_dec serves as a memory unlock -* barrier (i.e. memory accesses may be moved before the write, but -* no memory accesses are moved past the write). -* On other architectures this may not be the case, so we need smp_mb() -* there. -*/ -#if defined(CONFIG_X86) && (!defined(CONFIG_X86_PPRO_FENCE) && !defined(CONFIG_X86_OOSTORE)) - barrier(); -#else - smp_mb(); -#endif + light_mb(); /* B, between read of the data and write to p->counter, paired with C */ this_cpu_dec(*p->counters); } @@ -61,11 +59,12 @@ static inline void percpu_down_write(struct percpu_rw_semaphore *p) synchronize_rcu(); while (__percpu_count(p->counters)) msleep(1); - smp_rmb(); /* paired with smp_mb() in percpu_sem_up_read() */ + heavy_mb(); /* C, between read of p->counter and write to data, paired with B */ } static inline void percpu_up_write(struct percpu_rw_semaphore *p) { + heavy_mb(); /* D, between write to data and write to p->locked, paired with A */ p->locked = false; mutex_unlock(>mtx); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: [PATCH] percpu-rwsem: use barrier in unlock path
On Wed, Oct 17, 2012 at 11:07:21AM -0400, Mikulas Patocka wrote: > > > > Even the previous patch is applied, percpu_down_read() still > > needs mb() to pair with it. > > percpu_down_read uses rcu_read_lock which should guarantee that memory > accesses don't escape in front of a rcu-protected section. You do realize that rcu_read_lock() does nothing more that a barrier(), right? Paul worked really hard to get rcu_read_locks() to not call HW barriers. > > If rcu_read_unlock has only an unlock barrier and not a full barrier, > memory accesses could be moved in front of rcu_read_unlock and reordered > with this_cpu_inc(*p->counters), but it doesn't matter because > percpu_down_write does synchronize_rcu(), so it never sees these accesses > halfway through. Looking at the patch, you are correct. The read side doesn't need the memory barrier as the worse thing that will happen is that it sees the locked = false, and will just grab the mutex unnecessarily. > > > > I suggest any new synchronization should stay in -tip for 2 or more cycles > > before merged to mainline. > > But the bug that this synchronization is fixing is quite serious (it > causes random crashes when block size is being changed, the crash happens > regularly at multiple important business sites) so it must be fixed soon > and not wait half a year. I don't think Lai was suggesting to wait on this fix, but instead to totally rip out the percpu_rwsems and work on them some more, and then re-introduce them in a half a year. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] percpu-rwsem: use barrier in unlock path
Hi On Wed, 17 Oct 2012, Lai Jiangshan wrote: > On 10/17/2012 10:23 AM, Linus Torvalds wrote: > > [ Architecture people, note the potential new SMP barrier! ] > > > > On Tue, Oct 16, 2012 at 4:30 PM, Mikulas Patocka > > wrote: > >> + /* > >> +* The lock is considered unlocked when p->locked is set to false. > >> +* Use barrier prevent reordering of operations around p->locked. > >> +*/ > >> +#if defined(CONFIG_X86) && (!defined(CONFIG_X86_PPRO_FENCE) && > >> !defined(CONFIG_X86_OOSTORE)) > >> + barrier(); > >> +#else > >> + smp_mb(); > >> +#endif > >> p->locked = false; > > > > Ugh. The #if is too ugly to live. > > Even the previous patch is applied, percpu_down_read() still > needs mb() to pair with it. percpu_down_read uses rcu_read_lock which should guarantee that memory accesses don't escape in front of a rcu-protected section. If rcu_read_unlock has only an unlock barrier and not a full barrier, memory accesses could be moved in front of rcu_read_unlock and reordered with this_cpu_inc(*p->counters), but it doesn't matter because percpu_down_write does synchronize_rcu(), so it never sees these accesses halfway through. > > This is a classic case of "people who write their own serialization > > primitives invariably get them wrong". And this fix is just horrible, > > and code like this should not be allowed. > > One of the most major problems of 62ac665ff9fc07497ca524bd20d6a96893d11071 is > that > it is merged without Ackeds or Revieweds from Paul or Peter or someone else > who are expert at synchronization/arch memory models. > > I suggest any new synchronization should stay in -tip for 2 or more cycles > before merged to mainline. But the bug that this synchronization is fixing is quite serious (it causes random crashes when block size is being changed, the crash happens regularly at multiple important business sites) so it must be fixed soon and not wait half a year. > Thanks, > Lai Mikulas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] percpu-rwsem: use barrier in unlock path
> a later stores, and stores are viewed in program order, so all x86 > stores have "release consistency" modulo the theoretical PPro bugs > (that I don't think we actually ever saw in practice). We did seem the for real. The PPro is on its way out however so I hardly thing we care about optimising for that case. Alan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] percpu-rwsem: use barrier in unlock path
a later stores, and stores are viewed in program order, so all x86 stores have release consistency modulo the theoretical PPro bugs (that I don't think we actually ever saw in practice). We did seem the for real. The PPro is on its way out however so I hardly thing we care about optimising for that case. Alan -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] percpu-rwsem: use barrier in unlock path
Hi On Wed, 17 Oct 2012, Lai Jiangshan wrote: On 10/17/2012 10:23 AM, Linus Torvalds wrote: [ Architecture people, note the potential new SMP barrier! ] On Tue, Oct 16, 2012 at 4:30 PM, Mikulas Patocka mpato...@redhat.com wrote: + /* +* The lock is considered unlocked when p-locked is set to false. +* Use barrier prevent reordering of operations around p-locked. +*/ +#if defined(CONFIG_X86) (!defined(CONFIG_X86_PPRO_FENCE) !defined(CONFIG_X86_OOSTORE)) + barrier(); +#else + smp_mb(); +#endif p-locked = false; Ugh. The #if is too ugly to live. Even the previous patch is applied, percpu_down_read() still needs mb() to pair with it. percpu_down_read uses rcu_read_lock which should guarantee that memory accesses don't escape in front of a rcu-protected section. If rcu_read_unlock has only an unlock barrier and not a full barrier, memory accesses could be moved in front of rcu_read_unlock and reordered with this_cpu_inc(*p-counters), but it doesn't matter because percpu_down_write does synchronize_rcu(), so it never sees these accesses halfway through. This is a classic case of people who write their own serialization primitives invariably get them wrong. And this fix is just horrible, and code like this should not be allowed. One of the most major problems of 62ac665ff9fc07497ca524bd20d6a96893d11071 is that it is merged without Ackeds or Revieweds from Paul or Peter or someone else who are expert at synchronization/arch memory models. I suggest any new synchronization should stay in -tip for 2 or more cycles before merged to mainline. But the bug that this synchronization is fixing is quite serious (it causes random crashes when block size is being changed, the crash happens regularly at multiple important business sites) so it must be fixed soon and not wait half a year. Thanks, Lai Mikulas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] percpu-rwsem: use barrier in unlock path
On Wed, Oct 17, 2012 at 11:07:21AM -0400, Mikulas Patocka wrote: Even the previous patch is applied, percpu_down_read() still needs mb() to pair with it. percpu_down_read uses rcu_read_lock which should guarantee that memory accesses don't escape in front of a rcu-protected section. You do realize that rcu_read_lock() does nothing more that a barrier(), right? Paul worked really hard to get rcu_read_locks() to not call HW barriers. If rcu_read_unlock has only an unlock barrier and not a full barrier, memory accesses could be moved in front of rcu_read_unlock and reordered with this_cpu_inc(*p-counters), but it doesn't matter because percpu_down_write does synchronize_rcu(), so it never sees these accesses halfway through. Looking at the patch, you are correct. The read side doesn't need the memory barrier as the worse thing that will happen is that it sees the locked = false, and will just grab the mutex unnecessarily. I suggest any new synchronization should stay in -tip for 2 or more cycles before merged to mainline. But the bug that this synchronization is fixing is quite serious (it causes random crashes when block size is being changed, the crash happens regularly at multiple important business sites) so it must be fixed soon and not wait half a year. I don't think Lai was suggesting to wait on this fix, but instead to totally rip out the percpu_rwsems and work on them some more, and then re-introduce them in a half a year. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] percpu-rwsem: use barrier in unlock path
On 10/18/2012 04:28 AM, Steven Rostedt wrote: On Wed, Oct 17, 2012 at 11:07:21AM -0400, Mikulas Patocka wrote: Even the previous patch is applied, percpu_down_read() still needs mb() to pair with it. percpu_down_read uses rcu_read_lock which should guarantee that memory accesses don't escape in front of a rcu-protected section. You do realize that rcu_read_lock() does nothing more that a barrier(), right? Paul worked really hard to get rcu_read_locks() to not call HW barriers. If rcu_read_unlock has only an unlock barrier and not a full barrier, memory accesses could be moved in front of rcu_read_unlock and reordered with this_cpu_inc(*p-counters), but it doesn't matter because percpu_down_write does synchronize_rcu(), so it never sees these accesses halfway through. Looking at the patch, you are correct. The read side doesn't need the memory barrier as the worse thing that will happen is that it sees the locked = false, and will just grab the mutex unnecessarily. - A memory barrier can be added iff these two things are known: 1) it disables the disordering between what and what. 2) what is the corresponding mb() that it pairs with. You tried to add a mb() in percpu_up_write(), OK, I know it disables the disordering between the writes to the protected data and the statement p-locked = false, But I can't find out the corresponding mb() that it pairs with. percpu_down_read() writes to the data The cpu cache/prefetch the data writes to the data which is chaos writes to the data percpu_up_write() mb() p-locked = false; unlikely(p-locked) the cpu see p-lock = false, don't discard the cached/prefetch data this_cpu_inc(*p-counters); the code of read-access to the data and we use the chaos data* So you need to add a mb() after unlikely(p-locked). - The RCU you use don't protect any data. It protects codes of the fast path: unlikely(p-locked); this_cpu_inc(*p-counters); and synchronize_rcu() ensures all previous fast path had fully finished this_cpu_inc(*p-counters);. It don't protect other code/data, if you want to protect other code or other data, please add more synchronizations or mb()s. --- I extremely hate a synchronization protects code instead of data. but sometimes I also have to do it. --- a very draft example of paired-mb()s is here: diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h index cf80f7e..84a93c0 100644 --- a/include/linux/percpu-rwsem.h +++ b/include/linux/percpu-rwsem.h @@ -12,6 +12,14 @@ struct percpu_rw_semaphore { struct mutex mtx; }; +#if 1 +#define light_mb() barrier() +#define heavy_mb() synchronize_sched() +#else +#define light_mb() smp_mb() +#define heavy_mb() smp_mb(); +#endif + static inline void percpu_down_read(struct percpu_rw_semaphore *p) { rcu_read_lock(); @@ -24,22 +32,12 @@ static inline void percpu_down_read(struct percpu_rw_semaphore *p) } this_cpu_inc(*p-counters); rcu_read_unlock(); + light_mb(); /* A, between read of p-locked and read of data, paired with D */ } static inline void percpu_up_read(struct percpu_rw_semaphore *p) { - /* -* On X86, write operation in this_cpu_dec serves as a memory unlock -* barrier (i.e. memory accesses may be moved before the write, but -* no memory accesses are moved past the write). -* On other architectures this may not be the case, so we need smp_mb() -* there. -*/ -#if defined(CONFIG_X86) (!defined(CONFIG_X86_PPRO_FENCE) !defined(CONFIG_X86_OOSTORE)) - barrier(); -#else - smp_mb(); -#endif + light_mb(); /* B, between read of the data and write to p-counter, paired with C */ this_cpu_dec(*p-counters); } @@ -61,11 +59,12 @@ static inline void percpu_down_write(struct percpu_rw_semaphore *p) synchronize_rcu(); while (__percpu_count(p-counters)) msleep(1); - smp_rmb(); /* paired with smp_mb() in percpu_sem_up_read() */ + heavy_mb(); /* C, between read of p-counter and write to data, paired with B */ } static inline void percpu_up_write(struct percpu_rw_semaphore *p) { + heavy_mb(); /* D, between write to data and write to p-locked, paired with A */ p-locked = false; mutex_unlock(p-mtx); } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at
Re: [PATCH] percpu-rwsem: use barrier in unlock path
On Thu, 2012-10-18 at 10:18 +0800, Lai Jiangshan wrote: Looking at the patch, you are correct. The read side doesn't need the memory barrier as the worse thing that will happen is that it sees the locked = false, and will just grab the mutex unnecessarily. - A memory barrier can be added iff these two things are known: 1) it disables the disordering between what and what. 2) what is the corresponding mb() that it pairs with. OK, I was just looking at the protection and actions of the locked flag, but I see what you are saying with the data itself. You tried to add a mb() in percpu_up_write(), OK, I know it disables the disordering between the writes to the protected data and the statement p-locked = false, But I can't find out the corresponding mb() that it pairs with. percpu_down_read()writes to the data The cpu cache/prefetch the data writes to the data which is chaos writes to the data percpu_up_write() mb() p-locked = false; unlikely(p-locked) the cpu see p-lock = false, don't discard the cached/prefetch data this_cpu_inc(*p-counters); the code of read-access to the data and we use the chaos data* So you need to add a mb() after unlikely(p-locked). Does it need a full mb() or could it be just a rmb()? The down_read I wouldn't think would need to protect against stores, would it? The memory barrier should probably go in front of the unlikely() too. The write to p-counters is handled by the synchronized sched, and adding a rmb() in front of the unlikely check would keep prefetched data from passing this barrier. This is a perfect example why this primitive should be vetted outside of mainline before it gets merged. -- Steve -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] percpu-rwsem: use barrier in unlock path
On 10/17/2012 10:23 AM, Linus Torvalds wrote: > [ Architecture people, note the potential new SMP barrier! ] > > On Tue, Oct 16, 2012 at 4:30 PM, Mikulas Patocka wrote: >> + /* >> +* The lock is considered unlocked when p->locked is set to false. >> +* Use barrier prevent reordering of operations around p->locked. >> +*/ >> +#if defined(CONFIG_X86) && (!defined(CONFIG_X86_PPRO_FENCE) && >> !defined(CONFIG_X86_OOSTORE)) >> + barrier(); >> +#else >> + smp_mb(); >> +#endif >> p->locked = false; > > Ugh. The #if is too ugly to live. Even the previous patch is applied, percpu_down_read() still needs mb() to pair with it. > > This is a classic case of "people who write their own serialization > primitives invariably get them wrong". And this fix is just horrible, > and code like this should not be allowed. One of the most major problems of 62ac665ff9fc07497ca524bd20d6a96893d11071 is that it is merged without Ackeds or Revieweds from Paul or Peter or someone else who are expert at synchronization/arch memory models. I suggest any new synchronization should stay in -tip for 2 or more cycles before merged to mainline. Thanks, Lai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] percpu-rwsem: use barrier in unlock path
[ Architecture people, note the potential new SMP barrier! ] On Tue, Oct 16, 2012 at 4:30 PM, Mikulas Patocka wrote: > + /* > +* The lock is considered unlocked when p->locked is set to false. > +* Use barrier prevent reordering of operations around p->locked. > +*/ > +#if defined(CONFIG_X86) && (!defined(CONFIG_X86_PPRO_FENCE) && > !defined(CONFIG_X86_OOSTORE)) > + barrier(); > +#else > + smp_mb(); > +#endif > p->locked = false; Ugh. The #if is too ugly to live. This is a classic case of "people who write their own serialization primitives invariably get them wrong". And this fix is just horrible, and code like this should not be allowed. I suspect we could make the above x86-specific optimization be valid by introducing a new barrier, called "smp_release_before_store()" or something like that, which on x86 just happens to be a no-op (but still a compiler barrier). That's because earlier reads will not pass a later stores, and stores are viewed in program order, so all x86 stores have "release consistency" modulo the theoretical PPro bugs (that I don't think we actually ever saw in practice). But it is possible that that is true on other architectures too, so your hand-crafted x86-specific #if is not just ugly, it's liable to get worse. The alternative is to just say that we should use "smp_mb()" unconditionally, depending on how critical this code actually ends up being. Added linux-arch in case architecture-maintainers have comments on "smp_release_before_store()" thing. It would be kind of similar to the odd "smp_read_barrier_depends()", except it would normally be a full memory barrier, except on architectures where a weaker barrier might be sufficient. I suspect there may be many architectures where a "smp_wmb()" is sufficient for this case, for the simple reason that no sane microarchitecture would *ever* move earlier reads down past a later write, so release consistency really only needs the local writes to be ordered, not the full memory ordering. Arch people? The more optimal solution may be to mark the store *itself* to be "store with release consistency", which on x86 would be a regular store (with the compiler barrier), but on other architectures may be a special memory operation. On architectures with release/aqcuire-consistency, there's not a separate barrier before the store, the store instruction itself is done with special semantics. So maybe the right thing to do is #define smp_release_consistency_store(val, ptr) ... where on x86, the implementation would be a simple do { barrier(); *(ptr)=(val); } while (0) but on other architectures it might be a inline asm with the required magic store-with-release instruction. How important is this code sequence? Is the "percpu_up_write()" function really so critical that we can't have an extra memory barrier? Or do people perhaps see *other* places where release-consistency-stores might be worth doing? But in no event do I want to see that butt-ugly #if statement. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] percpu-rwsem: use barrier in unlock path
Hi When I read my patch 62ac665ff9fc07497ca524bd20d6a96893d11071 again, it turns out that it is missing a barrier in percpu_up_write. Please apply this. Mikulas --- percpu-rwsem: use barrier in unlock path Fix missing barrier in patch 62ac665ff9fc07497ca524bd20d6a96893d11071. The lock is considered unlocked when p->locked is set to false. Use barrier prevent reordering of operations around p->locked. Signed-off-by: Mikulas Patocka --- include/linux/percpu-rwsem.h |9 + 1 file changed, 9 insertions(+) Index: linux-3.6.2-fast/include/linux/percpu-rwsem.h === --- linux-3.6.2-fast.orig/include/linux/percpu-rwsem.h 2012-10-17 01:17:55.0 +0200 +++ linux-3.6.2-fast/include/linux/percpu-rwsem.h 2012-10-17 01:21:46.0 +0200 @@ -66,6 +66,15 @@ static inline void percpu_down_write(str static inline void percpu_up_write(struct percpu_rw_semaphore *p) { + /* +* The lock is considered unlocked when p->locked is set to false. +* Use barrier prevent reordering of operations around p->locked. +*/ +#if defined(CONFIG_X86) && (!defined(CONFIG_X86_PPRO_FENCE) && !defined(CONFIG_X86_OOSTORE)) + barrier(); +#else + smp_mb(); +#endif p->locked = false; mutex_unlock(>mtx); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] percpu-rwsem: use barrier in unlock path
Hi When I read my patch 62ac665ff9fc07497ca524bd20d6a96893d11071 again, it turns out that it is missing a barrier in percpu_up_write. Please apply this. Mikulas --- percpu-rwsem: use barrier in unlock path Fix missing barrier in patch 62ac665ff9fc07497ca524bd20d6a96893d11071. The lock is considered unlocked when p-locked is set to false. Use barrier prevent reordering of operations around p-locked. Signed-off-by: Mikulas Patocka mpato...@redhat.com --- include/linux/percpu-rwsem.h |9 + 1 file changed, 9 insertions(+) Index: linux-3.6.2-fast/include/linux/percpu-rwsem.h === --- linux-3.6.2-fast.orig/include/linux/percpu-rwsem.h 2012-10-17 01:17:55.0 +0200 +++ linux-3.6.2-fast/include/linux/percpu-rwsem.h 2012-10-17 01:21:46.0 +0200 @@ -66,6 +66,15 @@ static inline void percpu_down_write(str static inline void percpu_up_write(struct percpu_rw_semaphore *p) { + /* +* The lock is considered unlocked when p-locked is set to false. +* Use barrier prevent reordering of operations around p-locked. +*/ +#if defined(CONFIG_X86) (!defined(CONFIG_X86_PPRO_FENCE) !defined(CONFIG_X86_OOSTORE)) + barrier(); +#else + smp_mb(); +#endif p-locked = false; mutex_unlock(p-mtx); } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] percpu-rwsem: use barrier in unlock path
[ Architecture people, note the potential new SMP barrier! ] On Tue, Oct 16, 2012 at 4:30 PM, Mikulas Patocka mpato...@redhat.com wrote: + /* +* The lock is considered unlocked when p-locked is set to false. +* Use barrier prevent reordering of operations around p-locked. +*/ +#if defined(CONFIG_X86) (!defined(CONFIG_X86_PPRO_FENCE) !defined(CONFIG_X86_OOSTORE)) + barrier(); +#else + smp_mb(); +#endif p-locked = false; Ugh. The #if is too ugly to live. This is a classic case of people who write their own serialization primitives invariably get them wrong. And this fix is just horrible, and code like this should not be allowed. I suspect we could make the above x86-specific optimization be valid by introducing a new barrier, called smp_release_before_store() or something like that, which on x86 just happens to be a no-op (but still a compiler barrier). That's because earlier reads will not pass a later stores, and stores are viewed in program order, so all x86 stores have release consistency modulo the theoretical PPro bugs (that I don't think we actually ever saw in practice). But it is possible that that is true on other architectures too, so your hand-crafted x86-specific #if is not just ugly, it's liable to get worse. The alternative is to just say that we should use smp_mb() unconditionally, depending on how critical this code actually ends up being. Added linux-arch in case architecture-maintainers have comments on smp_release_before_store() thing. It would be kind of similar to the odd smp_read_barrier_depends(), except it would normally be a full memory barrier, except on architectures where a weaker barrier might be sufficient. I suspect there may be many architectures where a smp_wmb() is sufficient for this case, for the simple reason that no sane microarchitecture would *ever* move earlier reads down past a later write, so release consistency really only needs the local writes to be ordered, not the full memory ordering. Arch people? The more optimal solution may be to mark the store *itself* to be store with release consistency, which on x86 would be a regular store (with the compiler barrier), but on other architectures may be a special memory operation. On architectures with release/aqcuire-consistency, there's not a separate barrier before the store, the store instruction itself is done with special semantics. So maybe the right thing to do is #define smp_release_consistency_store(val, ptr) ... where on x86, the implementation would be a simple do { barrier(); *(ptr)=(val); } while (0) but on other architectures it might be a inline asm with the required magic store-with-release instruction. How important is this code sequence? Is the percpu_up_write() function really so critical that we can't have an extra memory barrier? Or do people perhaps see *other* places where release-consistency-stores might be worth doing? But in no event do I want to see that butt-ugly #if statement. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] percpu-rwsem: use barrier in unlock path
On 10/17/2012 10:23 AM, Linus Torvalds wrote: [ Architecture people, note the potential new SMP barrier! ] On Tue, Oct 16, 2012 at 4:30 PM, Mikulas Patocka mpato...@redhat.com wrote: + /* +* The lock is considered unlocked when p-locked is set to false. +* Use barrier prevent reordering of operations around p-locked. +*/ +#if defined(CONFIG_X86) (!defined(CONFIG_X86_PPRO_FENCE) !defined(CONFIG_X86_OOSTORE)) + barrier(); +#else + smp_mb(); +#endif p-locked = false; Ugh. The #if is too ugly to live. Even the previous patch is applied, percpu_down_read() still needs mb() to pair with it. This is a classic case of people who write their own serialization primitives invariably get them wrong. And this fix is just horrible, and code like this should not be allowed. One of the most major problems of 62ac665ff9fc07497ca524bd20d6a96893d11071 is that it is merged without Ackeds or Revieweds from Paul or Peter or someone else who are expert at synchronization/arch memory models. I suggest any new synchronization should stay in -tip for 2 or more cycles before merged to mainline. Thanks, Lai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/