On Sat, Jan 18, 2014 at 04:25:48AM -0800, Paul E. McKenney wrote:
> On Sat, Jan 18, 2014 at 12:34:06PM +0100, Peter Zijlstra wrote:
> > On Sat, Jan 18, 2014 at 02:01:05AM -0800, Paul E. McKenney wrote:
> > > OK, I will bite...  Aside from fine-grained code timing, what code could
> > > you write to tell the difference between a real one-byte store and an
> > > RMW emulating that store?
> > 
> > Why isn't fine-grained code timing an issue? I'm sure Alpha people will
> > love it when their machine magically keels over every so often.
> > 
> > Suppose we have two bytes in a word that get concurrent updates:
> > 
> > union {
> >     struct {
> >             u8 a;
> >             u8 b;
> >     };
> >     int word;
> > } ponies = { .word = 0, };
> > 
> > then two threads concurrently do:
> > 
> > CPU0:               CPU1:
> > 
> > ponies.a = 5        ponies.b = 10
> > 
> > 
> > At which point you'd expect: a == 5 && b == 10
> > 
> > However, with a rmw you could end up like:
> > 
> > 
> >                     load r, ponies.word
> > load r, ponies.word
> > and  r, ~0xFF
> > or   r, 5
> > store ponies.word, r
> >                     and r, ~0xFF00
> >                     or r, 10 << 8
> >                     store ponies.word, r
> > 
> > which gives: a == 0 && b == 10
> > 
> > The same can be had on a single CPU if you make the second RMW an
> > interrupt.
> > 
> > 
> > In fact, we recently had such a RMW issue on PPC64 although from a
> > slightly different angle, but we managed to hit it quite consistently.
> > See commit ba1f14fbe7096.
> > 
> > The thing is, if we allow the above RMW 'atomic' store, we have to be
> > _very_ careful that there cannot be such overlapping stores, otherwise
> > things will go BOOM!
> > 
> > However, if we already have to make sure there's no overlapping stores,
> > we might as well write a wide store and not allow the narrow stores to
> > begin with, to force people to think about the issue.
> 
> Ah, I was assuming atomic rmw, which for Alpha would be implemented using
> the LL and SC instructions.  Yes, lots of overhead, but if the CPU
> designers chose not to provide a load/store byte...

I don't see how ll/sc will help any. Suppose we do the a store as
smp_store_release() using ll/sc but the b store is unaware and doesn't
do an ll/sc.

Then we're still up shit creek without no paddle.

Whatever you're going to do, you need to be intimately aware of what the
other bits in your word are doing.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to