Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-10 Thread Valdis . Kletnieks
On Sat, 11 Aug 2007 02:38:40 +0200, Segher Boessenkool said:
> >> That means GCC cannot compile Linux; it already optimises
> >> some accesses to scalars to smaller accesses when it knows
> >> it is allowed to.  Not often though, since it hardly ever
> >> helps in the cost model it employs.
> >
> > Please give an example code snippet + gcc version + arch
> > to back this up.
> 
>   unsigned char f(unsigned long *p)
>   {
>   return *p & 1;
>   }

Not really valid, because it's still able to do one atomic access to
compute the result.

Now, if you had found an example where it converts a 32-bit atomic access into
2 separate 16-bit accesses that weren't atomic as a whole


pgpvV5YbCLIyT.pgp
Description: PGP signature


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-10 Thread Segher Boessenkool

That means GCC cannot compile Linux; it already optimises
some accesses to scalars to smaller accesses when it knows
it is allowed to.  Not often though, since it hardly ever
helps in the cost model it employs.


Please give an example code snippet + gcc version + arch
to back this up.


unsigned char f(unsigned long *p)
{
return *p & 1;
}


This doesn't really matter since we only care about the LSB.


It is exactly what I claimed, and what you asked proof of.


Do you have an example where gcc reads it non-atmoically and
we care about all parts?


Like I explained in the original mail; no, I suspect such
a testcase will be really hard to construct, esp. as a small
testcase.  I have no reason to believe it is impossible to
do so though -- maybe someone else can write trickier code
than I can, in which case, please do so.


Segher

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-10 Thread Herbert Xu
On Sat, Aug 11, 2007 at 02:38:40AM +0200, Segher Boessenkool wrote:
> >>That means GCC cannot compile Linux; it already optimises
> >>some accesses to scalars to smaller accesses when it knows
> >>it is allowed to.  Not often though, since it hardly ever
> >>helps in the cost model it employs.
> >
> >Please give an example code snippet + gcc version + arch
> >to back this up.
> 
>   unsigned char f(unsigned long *p)
>   {
>   return *p & 1;
>   }

This doesn't really matter since we only care about the LSB.
Do you have an example where gcc reads it non-atmoically and
we care about all parts?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-10 Thread Segher Boessenkool

That means GCC cannot compile Linux; it already optimises
some accesses to scalars to smaller accesses when it knows
it is allowed to.  Not often though, since it hardly ever
helps in the cost model it employs.


Please give an example code snippet + gcc version + arch
to back this up.


unsigned char f(unsigned long *p)
{
return *p & 1;
}

with both

powerpc64-linux-gcc (GCC) 4.3.0 20070731 (experimental)

and

powerpc64-linux-gcc-4.2.0 (GCC) 4.2.0

(sorry, I don't have anything newer or older right now; if you
really care, I can test with those too)

generate (in 64-bit mode):

.L.f:
lbz 3,7(3)
rldicl 3,3,0,63
blr

and in 32-bit mode:

f:
stwu 1,-16(1)
nop
nop
lbz 3,3(3)
addi 1,1,16
rlwinm 3,3,0,31,31
blr

(the nops are because I use --with-cpu=970).


But perhaps you do not care for PowerPC, in which case:

i686-linux-gcc (GCC) 4.2.0 20060410 (experimental)

(sorry for the old version, I don't build x86 compilers
all that often; also I don't have a 64-bit version right
now):

f:
pushl   %ebp
movl%esp, %ebp
movl8(%ebp), %eax
popl%ebp
movzbl  (%eax), %eax
andl$1, %eax
ret


If you want testing with any other versions, and/or for
any other target architecture, I can do that; it takes a
few minutes to build a compiler.

It is quite hard to build a testcase that reads more than
one part of the "long", since for small testcases the
compiler will almost always be smart enough to do one
bigger read instead; but it certainly isn't inconceivable,
and anyway the compiler would be fully in its right to do
reads non-atomically if not instructed otherwise.


Segher

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-10 Thread Herbert Xu
On Fri, Aug 10, 2007 at 10:07:27PM +0200, Segher Boessenkool wrote:
> 
> That means GCC cannot compile Linux; it already optimises
> some accesses to scalars to smaller accesses when it knows
> it is allowed to.  Not often though, since it hardly ever
> helps in the cost model it employs.

Please give an example code snippet + gcc version + arch
to back this up.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-10 Thread Paul E. McKenney
On Fri, Aug 10, 2007 at 03:49:03PM -0400, Chris Snook wrote:
> Paul E. McKenney wrote:
> >On Thu, Aug 09, 2007 at 03:24:40PM -0400, Chris Snook wrote:
> >>Paul E. McKenney wrote:
> >>>On Thu, Aug 09, 2007 at 02:13:52PM -0400, Chris Snook wrote:
> Paul E. McKenney wrote:
> >On Thu, Aug 09, 2007 at 01:14:35PM -0400, Chris Snook wrote:
> >>  If you're depending on volatile writes 
> >>being visible to other CPUs, you're screwed either way, because the 
> >>CPU can hold that data in cache as long as it wants before it writes 
> >>it to memory.  When this finally does happen, it will happen 
> >>atomically, which is all that atomic_set guarantees.  If you need to 
> >>guarantee that the value is written to memory at a particular time in 
> >>your execution sequence, you either have to read it from memory to 
> >>force the compiler to store it first (and a volatile cast in 
> >>atomic_read will suffice for this) or you have to use LOCK_PREFIX 
> >>instructions which will invalidate remote cache lines containing the 
> >>same variable.  This patch doesn't change either of these cases.
> >The case that it -can- change is interactions with interrupt handlers.
> >And NMI/SMI handlers, for that matter.
> You have a point here, but only if you can guarantee that the interrupt 
> handler is running on a processor sharing the cache that has the 
> not-yet-written volatile value.  That implies a strictly non-SMP 
> architecture.  At the moment, none of those have volatile in their 
> declaration of atomic_t, so this patch can't break any of them.
> >>>This can also happen when using per-CPU variables.  And there are a
> >>>number of per-CPU variables that are either atomic themselves or are
> >>>structures containing atomic fields.
> >>Accessing per-CPU variables in this fashion reliably already requires a 
> >>suitable smp/non-smp read/write memory barrier.  I maintain that if we 
> >>break anything with this change, it was really already broken, if less 
> >>obviously.  Can you give a real or synthetic example of legitimate code 
> >>that could break?
> >
> >My main concern is actually the lack of symmetry -- I would expect
> >that an atomic_set() would have the same properties as atomic_read().
> >It is easy and cheap to provide them with similar properties, so why not?
> >Debugging even a single problem would consume far more time than simply
> >giving them corresponding semantics.
> >
> >But you asked for examples.  These are synthetic, and of course legitimacy
> >is in the eye of the beholder.
> >
> >1.  Watchdog variable.
> >
> > atomic_t watchdog = ATOMIC_INIT(0);
> >
> > ...
> >
> > int i;
> > while (!done) {
> >
> > /* Do so stuff that doesn't take more than a few us. */
> > /* Could do atomic increment, but throughput penalty. */
> >
> > i++;
> > atomic_set(&watchdog, i);
> > }
> > do_something_with(&watchdog);
> >
> >
> > /* Every so often on some other CPU... */
> >
> > if ((new_watchdog = atomic_read(&watchdog)) == old_watchdog)
> > die_horribly();
> > old_watchdog = new_watchdog;
> >
> >
> > If atomic_set() did not have volatile semantics, the compiler
> > would be within its rights optimizing it to simply get the
> > final value of "i" after exit from the loop.  This would cause
> > the watchdog check to fail spuriously.  Memory barriers are
> > not required in this case, because the CPU cannot hang onto
> > the value for very long -- we don't care about the exact value,
> > or about exact synchronization, but rather about whether or
> > not the value is changing.
> >
> > In this (toy) example, one might replace the atomic_set() with
> > an atomic increment (though that might be too expensive in some
> > cases) or with something like:
> >
> > atomic_set(&watchdog, atomic_read(&watchdog) + 1);
> >
> > However, other cases might not permit this transformation,
> > for example, an existing heavily used API might take int rather
> > than atomic_t.
> >
> > Some will no doubt argue that this example should use a
> > macro or an asm similar to the "forget()" asm put forward
> > elsewhere in this thread.
> >
> >2.  Communicating both with interrupt handler and with other CPUs.
> >For example, data elements that are built up in a location visible
> >to interrupts and NMIs, and then added as a unit to a data structure
> >visible to other CPUs.  This more-realistic example is abbreviated
> >to the point of pointlessness as follows:
> >
> > struct foo {
> > atomic_t a;
> > atomic_t b;
> > };
> >
> > DEFINE_PER_CPU(struct foo *, staging) = NULL;
> >
> > /* Create element in staging area. */
> >
> > __get_cpu_var(staging) = kzalloc(sizeof(*p), GFP_WHATEVER);
> > if (__get_cpu_v

Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-10 Thread Segher Boessenkool

The compiler is within its rights to read a 32-bit quantity 16 bits at
at time, even on a 32-bit machine.  I would be glad to help pummel any
compiler writer that pulls such a dirty trick, but the C standard 
really

does permit this.


Code all over the kernel assumes that 32-bit reads/writes
are atomic so while such a compiler might be legal it certainly
can't compile Linux.


That means GCC cannot compile Linux; it already optimises
some accesses to scalars to smaller accesses when it knows
it is allowed to.  Not often though, since it hardly ever
helps in the cost model it employs.


Segher

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-10 Thread Chris Snook

Paul E. McKenney wrote:

On Thu, Aug 09, 2007 at 03:24:40PM -0400, Chris Snook wrote:

Paul E. McKenney wrote:

On Thu, Aug 09, 2007 at 02:13:52PM -0400, Chris Snook wrote:

Paul E. McKenney wrote:

On Thu, Aug 09, 2007 at 01:14:35PM -0400, Chris Snook wrote:
  If you're depending on volatile writes 
being visible to other CPUs, you're screwed either way, because the CPU 
can hold that data in cache as long as it wants before it writes it to 
memory.  When this finally does happen, it will happen atomically, 
which is all that atomic_set guarantees.  If you need to guarantee that 
the value is written to memory at a particular time in your execution 
sequence, you either have to read it from memory to force the compiler 
to store it first (and a volatile cast in atomic_read will suffice for 
this) or you have to use LOCK_PREFIX instructions which will invalidate 
remote cache lines containing the same variable.  This patch doesn't 
change either of these cases.

The case that it -can- change is interactions with interrupt handlers.
And NMI/SMI handlers, for that matter.
You have a point here, but only if you can guarantee that the interrupt 
handler is running on a processor sharing the cache that has the 
not-yet-written volatile value.  That implies a strictly non-SMP 
architecture.  At the moment, none of those have volatile in their 
declaration of atomic_t, so this patch can't break any of them.

This can also happen when using per-CPU variables.  And there are a
number of per-CPU variables that are either atomic themselves or are
structures containing atomic fields.
Accessing per-CPU variables in this fashion reliably already requires a 
suitable smp/non-smp read/write memory barrier.  I maintain that if we 
break anything with this change, it was really already broken, if less 
obviously.  Can you give a real or synthetic example of legitimate code 
that could break?


My main concern is actually the lack of symmetry -- I would expect
that an atomic_set() would have the same properties as atomic_read().
It is easy and cheap to provide them with similar properties, so why not?
Debugging even a single problem would consume far more time than simply
giving them corresponding semantics.

But you asked for examples.  These are synthetic, and of course legitimacy
is in the eye of the beholder.

1.  Watchdog variable.

atomic_t watchdog = ATOMIC_INIT(0);

...

int i;
while (!done) {

/* Do so stuff that doesn't take more than a few us. */
/* Could do atomic increment, but throughput penalty. */

i++;
atomic_set(&watchdog, i);
}
do_something_with(&watchdog);


/* Every so often on some other CPU... */

if ((new_watchdog = atomic_read(&watchdog)) == old_watchdog)
die_horribly();
old_watchdog = new_watchdog;


If atomic_set() did not have volatile semantics, the compiler
would be within its rights optimizing it to simply get the
final value of "i" after exit from the loop.  This would cause
the watchdog check to fail spuriously.  Memory barriers are
not required in this case, because the CPU cannot hang onto
the value for very long -- we don't care about the exact value,
or about exact synchronization, but rather about whether or
not the value is changing.

In this (toy) example, one might replace the atomic_set() with
an atomic increment (though that might be too expensive in some
cases) or with something like:

atomic_set(&watchdog, atomic_read(&watchdog) + 1);

However, other cases might not permit this transformation,
for example, an existing heavily used API might take int rather
than atomic_t.

Some will no doubt argue that this example should use a
macro or an asm similar to the "forget()" asm put forward
elsewhere in this thread.

2.  Communicating both with interrupt handler and with other CPUs.
For example, data elements that are built up in a location visible
to interrupts and NMIs, and then added as a unit to a data structure
visible to other CPUs.  This more-realistic example is abbreviated
to the point of pointlessness as follows:

struct foo {
atomic_t a;
atomic_t b;
};

DEFINE_PER_CPU(struct foo *, staging) = NULL;

/* Create element in staging area. */

__get_cpu_var(staging) = kzalloc(sizeof(*p), GFP_WHATEVER);
if (__get_cpu_var(staging) == NULL)
die_horribly();
/* allocate an element of some per-CPU array, get the result in "i" */
atomic_set(__get_cpu_var(staging).a, i);
/* allocate another element of a per-CPU array, with result in "i" */
atomic_set(__get_cpu_var(staging).b, i);
rcu_assign_pointer(some_global_plac

Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-10 Thread Paul E. McKenney
On Fri, Aug 10, 2007 at 11:08:20AM +0200, Andi Kleen wrote:
> On Friday 10 August 2007 10:21:46 Herbert Xu wrote:
> > Paul E. McKenney <[EMAIL PROTECTED]> wrote:
> > >
> > > The compiler is within its rights to read a 32-bit quantity 16 bits at
> > > at time, even on a 32-bit machine.  I would be glad to help pummel any
> > > compiler writer that pulls such a dirty trick, but the C standard really
> > > does permit this.
> > 
> > Code all over the kernel assumes that 32-bit reads/writes
> > are atomic so while such a compiler might be legal it certainly
> > can't compile Linux.
> 
> Yes, the kernel requirements are much stricter than ISO-C. And besides
> it is a heavy user of C extensions anyways. On the other hand some of the
> C99 extensions are not allowed. And then there is sparse, which enforces
> a language which sometimes is quite far from standard C. You could say it is 
> written in Linux-C, not ISO C. 

Understood.  My question is "why do we want the semantics of atomic_read()
and atomic_set() to differ?"

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-10 Thread Andi Kleen
On Friday 10 August 2007 10:21:46 Herbert Xu wrote:
> Paul E. McKenney <[EMAIL PROTECTED]> wrote:
> >
> > The compiler is within its rights to read a 32-bit quantity 16 bits at
> > at time, even on a 32-bit machine.  I would be glad to help pummel any
> > compiler writer that pulls such a dirty trick, but the C standard really
> > does permit this.
> 
> Code all over the kernel assumes that 32-bit reads/writes
> are atomic so while such a compiler might be legal it certainly
> can't compile Linux.

Yes, the kernel requirements are much stricter than ISO-C. And besides
it is a heavy user of C extensions anyways. On the other hand some of the
C99 extensions are not allowed. And then there is sparse, which enforces
a language which sometimes is quite far from standard C. You could say it is 
written in Linux-C, not ISO C. 

-Andi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-10 Thread Herbert Xu
Paul E. McKenney <[EMAIL PROTECTED]> wrote:
>
> The compiler is within its rights to read a 32-bit quantity 16 bits at
> at time, even on a 32-bit machine.  I would be glad to help pummel any
> compiler writer that pulls such a dirty trick, but the C standard really
> does permit this.

Code all over the kernel assumes that 32-bit reads/writes
are atomic so while such a compiler might be legal it certainly
can't compile Linux.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Paul E. McKenney
On Thu, Aug 09, 2007 at 03:24:40PM -0400, Chris Snook wrote:
> Paul E. McKenney wrote:
> >On Thu, Aug 09, 2007 at 02:13:52PM -0400, Chris Snook wrote:
> >>Paul E. McKenney wrote:
> >>>On Thu, Aug 09, 2007 at 01:14:35PM -0400, Chris Snook wrote:
>    If you're depending on volatile writes 
> being visible to other CPUs, you're screwed either way, because the CPU 
> can hold that data in cache as long as it wants before it writes it to 
> memory.  When this finally does happen, it will happen atomically, 
> which is all that atomic_set guarantees.  If you need to guarantee that 
> the value is written to memory at a particular time in your execution 
> sequence, you either have to read it from memory to force the compiler 
> to store it first (and a volatile cast in atomic_read will suffice for 
> this) or you have to use LOCK_PREFIX instructions which will invalidate 
> remote cache lines containing the same variable.  This patch doesn't 
> change either of these cases.
> >>>The case that it -can- change is interactions with interrupt handlers.
> >>>And NMI/SMI handlers, for that matter.
> >>You have a point here, but only if you can guarantee that the interrupt 
> >>handler is running on a processor sharing the cache that has the 
> >>not-yet-written volatile value.  That implies a strictly non-SMP 
> >>architecture.  At the moment, none of those have volatile in their 
> >>declaration of atomic_t, so this patch can't break any of them.
> >
> >This can also happen when using per-CPU variables.  And there are a
> >number of per-CPU variables that are either atomic themselves or are
> >structures containing atomic fields.
> 
> Accessing per-CPU variables in this fashion reliably already requires a 
> suitable smp/non-smp read/write memory barrier.  I maintain that if we 
> break anything with this change, it was really already broken, if less 
> obviously.  Can you give a real or synthetic example of legitimate code 
> that could break?

My main concern is actually the lack of symmetry -- I would expect
that an atomic_set() would have the same properties as atomic_read().
It is easy and cheap to provide them with similar properties, so why not?
Debugging even a single problem would consume far more time than simply
giving them corresponding semantics.

But you asked for examples.  These are synthetic, and of course legitimacy
is in the eye of the beholder.

1.  Watchdog variable.

atomic_t watchdog = ATOMIC_INIT(0);

...

int i;
while (!done) {

/* Do so stuff that doesn't take more than a few us. */
/* Could do atomic increment, but throughput penalty. */

i++;
atomic_set(&watchdog, i);
}
do_something_with(&watchdog);


/* Every so often on some other CPU... */

if ((new_watchdog = atomic_read(&watchdog)) == old_watchdog)
die_horribly();
old_watchdog = new_watchdog;


If atomic_set() did not have volatile semantics, the compiler
would be within its rights optimizing it to simply get the
final value of "i" after exit from the loop.  This would cause
the watchdog check to fail spuriously.  Memory barriers are
not required in this case, because the CPU cannot hang onto
the value for very long -- we don't care about the exact value,
or about exact synchronization, but rather about whether or
not the value is changing.

In this (toy) example, one might replace the atomic_set() with
an atomic increment (though that might be too expensive in some
cases) or with something like:

atomic_set(&watchdog, atomic_read(&watchdog) + 1);

However, other cases might not permit this transformation,
for example, an existing heavily used API might take int rather
than atomic_t.

Some will no doubt argue that this example should use a
macro or an asm similar to the "forget()" asm put forward
elsewhere in this thread.

2.  Communicating both with interrupt handler and with other CPUs.
For example, data elements that are built up in a location visible
to interrupts and NMIs, and then added as a unit to a data structure
visible to other CPUs.  This more-realistic example is abbreviated
to the point of pointlessness as follows:

struct foo {
atomic_t a;
atomic_t b;
};

DEFINE_PER_CPU(struct foo *, staging) = NULL;

/* Create element in staging area. */

__get_cpu_var(staging) = kzalloc(sizeof(*p), GFP_WHATEVER);
if (__get_cpu_var(staging) == NULL)
die_horribly();
/* allocate an element of some per-CPU array, get the result in "i" */
atomic_set(__get_cpu_var(staging).a, i);
/* allocate another element of a per-CPU array, with resul

Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Geert Uytterhoeven
On Thu, 9 Aug 2007, Chris Snook wrote:
> Segher Boessenkool wrote:
> > > > The only safe way to get atomic accesses is to write
> > > > assembler code.  Are there any downsides to that?  I don't
> > > > see any.
> > > 
> > > The assumption that aligned word reads and writes are atomic, and that
> > > words are aligned unless explicitly packed otherwise, is endemic in the
> > > kernel.  No sane compiler violates this assumption.  It's true that we're
> > > not portable to insane compilers after this patch, but we never were in
> > > the first place.
> > 
> > You didn't answer my question: are there any downsides to using
> > explicit coded-in-assembler accesses for atomic accesses?  You
> > can handwave all you want that it should "just work" with
> > volatile accesses, but volatility != atomicity, volatile in C
> > is really badly defined, GCC never officially gave stronger
> > guarantees, and we have a bugzilla full of PRs to show what a
> > minefield it is.
> > 
> > So, why not use the well-defined alternative?
> 
> Because we don't need to, and it hurts performance.

It hurts performance by implementing 32-bit atomic reads in assembler?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [EMAIL PROTECTED]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Segher Boessenkool

So, why not use the well-defined alternative?

Because we don't need to, and it hurts performance.

It hurts performance by implementing 32-bit atomic reads in assembler?


No, I misunderstood the question.  Implementing 32-bit atomic reads in 
assembler is redundant, because any sane compiler, *particularly* and 
optimizing compiler (and we're only in this mess because of optimizing 
compilers)


Oh please, don't tell me you don't want an optimising compiler.
And if you _do_ want one, well you're in this mess because you
chose C as implementation language and C has some pretty strange
rules.  Trying to use not-all-that-well-defined-and-completely-
misunderstood features of the language doesn't make things easier;
trying to use something that isn't even part of the language and
that your particular compiler originally supported by accident,
and that isn't yet an officially supported feature, and that on
top of it all has a track record of problems -- well it makes me
wonder if you're in this game for fun or what.


 will give us that automatically without the assembler.


No, it does *not* give it to you automatically; you have to do
either the asm() thing, or the not-defined-at-all *(volatile *)&
thing.

Yes, it is legal for a compiler to violate this assumption.  It is 
also legal for us to refuse to maintain compatibility with compilers 
that suck this badly.


So that's rm include/linux/compiler-gcc*.h then.  Good luck with
the intel compiler, maybe it works more to your liking.


Segher

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Chris Snook

Geert Uytterhoeven wrote:

On Thu, 9 Aug 2007, Chris Snook wrote:

Segher Boessenkool wrote:

The only safe way to get atomic accesses is to write
assembler code.  Are there any downsides to that?  I don't
see any.

The assumption that aligned word reads and writes are atomic, and that
words are aligned unless explicitly packed otherwise, is endemic in the
kernel.  No sane compiler violates this assumption.  It's true that we're
not portable to insane compilers after this patch, but we never were in
the first place.

You didn't answer my question: are there any downsides to using
explicit coded-in-assembler accesses for atomic accesses?  You
can handwave all you want that it should "just work" with
volatile accesses, but volatility != atomicity, volatile in C
is really badly defined, GCC never officially gave stronger
guarantees, and we have a bugzilla full of PRs to show what a
minefield it is.

So, why not use the well-defined alternative?

Because we don't need to, and it hurts performance.


It hurts performance by implementing 32-bit atomic reads in assembler?


No, I misunderstood the question.  Implementing 32-bit atomic reads in assembler 
is redundant, because any sane compiler, *particularly* and optimizing compiler 
(and we're only in this mess because of optimizing compilers) will give us that 
automatically without the assembler.  Yes, it is legal for a compiler to violate 
this assumption.  It is also legal for us to refuse to maintain compatibility 
with compilers that suck this badly.  That decision was made a very long time 
ago, and I consider it the correct decision.


-- Chris
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Chris Snook

Segher Boessenkool wrote:

The compiler is within its rights to read a 32-bit quantity 16 bits at
at time, even on a 32-bit machine.  I would be glad to help pummel any
compiler writer that pulls such a dirty trick, but the C standard really
does permit this.


Yes, but we don't write code for these compilers.  There are countless 
pieces of kernel code which would break in this condition, and there 
doesn't seem to be any interest in fixing this.


"Other things are broken too".  Great argument :-)


We make plenty of practical assumptions in the kernel, and declare incorrect 
things which violate them, even in cases where there's no commandment from the 
heavens forbidding them.  Since the whole point of this exercise is to prevent 
badness with *optimizing* compilers, it's quite reasonable to declare broken any 
so-called optimizer which violates these trivial assumptions.



In short, please retain atomic_set()'s volatility, especially on those
architectures that declared the atomic_t's counter to be volatile.


Like i386 and x86_64?  These used to have volatile in the atomic_t 
declaration.  We removed it, and the sky did not fall.


And this proves what?  Lots of stuff "works" by accident.


If something breaks because of this, it was already broken, but hidden a lot 
better.  I don't see much of a downside to exposing and fixing those bugs.


-- Chris
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Chris Snook

Paul E. McKenney wrote:

On Thu, Aug 09, 2007 at 02:13:52PM -0400, Chris Snook wrote:

Paul E. McKenney wrote:

On Thu, Aug 09, 2007 at 01:14:35PM -0400, Chris Snook wrote:
   If you're depending on volatile writes 
being visible to other CPUs, you're screwed either way, because the CPU 
can hold that data in cache as long as it wants before it writes it to 
memory.  When this finally does happen, it will happen atomically, which 
is all that atomic_set guarantees.  If you need to guarantee that the 
value is written to memory at a particular time in your execution 
sequence, you either have to read it from memory to force the compiler to 
store it first (and a volatile cast in atomic_read will suffice for this) 
or you have to use LOCK_PREFIX instructions which will invalidate remote 
cache lines containing the same variable.  This patch doesn't change 
either of these cases.

The case that it -can- change is interactions with interrupt handlers.
And NMI/SMI handlers, for that matter.
You have a point here, but only if you can guarantee that the interrupt 
handler is running on a processor sharing the cache that has the 
not-yet-written volatile value.  That implies a strictly non-SMP 
architecture.  At the moment, none of those have volatile in their 
declaration of atomic_t, so this patch can't break any of them.


This can also happen when using per-CPU variables.  And there are a
number of per-CPU variables that are either atomic themselves or are
structures containing atomic fields.


Accessing per-CPU variables in this fashion reliably already requires a suitable 
smp/non-smp read/write memory barrier.  I maintain that if we break anything 
with this change, it was really already broken, if less obviously.  Can you give 
a real or synthetic example of legitimate code that could break?


-- Chris
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Segher Boessenkool
If you need to guarantee that the value is written to memory at a 
particular time in your execution sequence, you either have to read it 
from memory to force the compiler to store it first


That isn't enough.  The CPU will happily read the datum back from
its own store queue before it ever hit memory or cache or any
external bus.  If you need the value to be globally visible before
another store, you use wmb(); if you need it before some read,
you use mb().

These concepts are completely separate from both atomicity and
volatility (which aren't the same themselves).

No, but I can reason about it and be confident that this won't break 
anything that isn't already broken.  At worst, this patch will make 
any existing subtly incorrect uses of atomic_t much more obvious and 
easier to track down.


PR22278, PR29753 -- both P2 bugs, and both about basic *(volatile *)
usage.  Yeah, it's definitely not problematic, esp. not if you want
to support older compilers than 4.2 too.


Segher

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Segher Boessenkool

The only safe way to get atomic accesses is to write
assembler code.  Are there any downsides to that?  I don't
see any.


The assumption that aligned word reads and writes are atomic, and 
that words are aligned unless explicitly packed otherwise, is 
endemic in the kernel.  No sane compiler violates this assumption.  
It's true that we're not portable to insane compilers after this 
patch, but we never were in the first place.

You didn't answer my question: are there any downsides to using
explicit coded-in-assembler accesses for atomic accesses?  You
can handwave all you want that it should "just work" with
volatile accesses, but volatility != atomicity, volatile in C
is really badly defined, GCC never officially gave stronger
guarantees, and we have a bugzilla full of PRs to show what a
minefield it is.
So, why not use the well-defined alternative?


Because we don't need to,


You don't need to use volatile objects, or accesses through
valatile-cast pointers, either.


and it hurts performance.


Please show how it does this -- one load is one load either way,
and one store is one store.


Segher

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Chris Snook

Segher Boessenkool wrote:

The only safe way to get atomic accesses is to write
assembler code.  Are there any downsides to that?  I don't
see any.


The assumption that aligned word reads and writes are atomic, and that 
words are aligned unless explicitly packed otherwise, is endemic in 
the kernel.  No sane compiler violates this assumption.  It's true 
that we're not portable to insane compilers after this patch, but we 
never were in the first place.


You didn't answer my question: are there any downsides to using
explicit coded-in-assembler accesses for atomic accesses?  You
can handwave all you want that it should "just work" with
volatile accesses, but volatility != atomicity, volatile in C
is really badly defined, GCC never officially gave stronger
guarantees, and we have a bugzilla full of PRs to show what a
minefield it is.

So, why not use the well-defined alternative?


Because we don't need to, and it hurts performance.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Segher Boessenkool

The compiler is within its rights to read a 32-bit quantity 16 bits at
at time, even on a 32-bit machine.  I would be glad to help pummel any
compiler writer that pulls such a dirty trick, but the C standard 
really

does permit this.


Yes, but we don't write code for these compilers.  There are countless 
pieces of kernel code which would break in this condition, and there 
doesn't seem to be any interest in fixing this.


"Other things are broken too".  Great argument :-)

Sequence points enforce read-after-write ordering, not 
write-after-write.


Sequence points order *all* side effects; sequence points exist in the
domain of the abstract sequential model of the C language only.  The
compiler translates that to machine code that is equivalent to that C
code under the "as-if" rule; but this is still in that abstract model,
which doesn't include things such as SMP, visibility by I/O devices,
store queues, etc. etc.

We flush writes with reads for MMIO because of this effect as well as 
the CPU/bus effects.


You cannot flush all MMIO writes with reads; this is a PCI-specific
thing.  And even then, you need more than just the read itself: you
have to make sure the read completed and returned data.


In short, please retain atomic_set()'s volatility, especially on those
architectures that declared the atomic_t's counter to be volatile.


Like i386 and x86_64?  These used to have volatile in the atomic_t 
declaration.  We removed it, and the sky did not fall.


And this proves what?  Lots of stuff "works" by accident.


Segher

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Paul E. McKenney
On Thu, Aug 09, 2007 at 02:13:52PM -0400, Chris Snook wrote:
> Paul E. McKenney wrote:
> >On Thu, Aug 09, 2007 at 01:14:35PM -0400, Chris Snook wrote:
> >>If you're depending on volatile writes 
> >>being visible to other CPUs, you're screwed either way, because the CPU 
> >>can hold that data in cache as long as it wants before it writes it to 
> >>memory.  When this finally does happen, it will happen atomically, which 
> >>is all that atomic_set guarantees.  If you need to guarantee that the 
> >>value is written to memory at a particular time in your execution 
> >>sequence, you either have to read it from memory to force the compiler to 
> >>store it first (and a volatile cast in atomic_read will suffice for this) 
> >>or you have to use LOCK_PREFIX instructions which will invalidate remote 
> >>cache lines containing the same variable.  This patch doesn't change 
> >>either of these cases.
> >
> >The case that it -can- change is interactions with interrupt handlers.
> >And NMI/SMI handlers, for that matter.
> 
> You have a point here, but only if you can guarantee that the interrupt 
> handler is running on a processor sharing the cache that has the 
> not-yet-written volatile value.  That implies a strictly non-SMP 
> architecture.  At the moment, none of those have volatile in their 
> declaration of atomic_t, so this patch can't break any of them.

This can also happen when using per-CPU variables.  And there are a
number of per-CPU variables that are either atomic themselves or are
structures containing atomic fields.

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Segher Boessenkool

The only safe way to get atomic accesses is to write
assembler code.  Are there any downsides to that?  I don't
see any.


The assumption that aligned word reads and writes are atomic, and that 
words are aligned unless explicitly packed otherwise, is endemic in 
the kernel.  No sane compiler violates this assumption.  It's true 
that we're not portable to insane compilers after this patch, but we 
never were in the first place.


You didn't answer my question: are there any downsides to using
explicit coded-in-assembler accesses for atomic accesses?  You
can handwave all you want that it should "just work" with
volatile accesses, but volatility != atomicity, volatile in C
is really badly defined, GCC never officially gave stronger
guarantees, and we have a bugzilla full of PRs to show what a
minefield it is.

So, why not use the well-defined alternative?


Segher

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Chris Snook

Paul E. McKenney wrote:

On Thu, Aug 09, 2007 at 01:14:35PM -0400, Chris Snook wrote:
If you're depending on volatile writes 
being visible to other CPUs, you're screwed either way, because the CPU can 
hold that data in cache as long as it wants before it writes it to memory.  
When this finally does happen, it will happen atomically, which is all that 
atomic_set guarantees.  If you need to guarantee that the value is written 
to memory at a particular time in your execution sequence, you either have 
to read it from memory to force the compiler to store it first (and a 
volatile cast in atomic_read will suffice for this) or you have to use 
LOCK_PREFIX instructions which will invalidate remote cache lines 
containing the same variable.  This patch doesn't change either of these 
cases.


The case that it -can- change is interactions with interrupt handlers.
And NMI/SMI handlers, for that matter.


You have a point here, but only if you can guarantee that the interrupt handler 
is running on a processor sharing the cache that has the not-yet-written 
volatile value.  That implies a strictly non-SMP architecture.  At the moment, 
none of those have volatile in their declaration of atomic_t, so this patch 
can't break any of them.


-- Chris
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Paul E. McKenney
On Thu, Aug 09, 2007 at 01:14:35PM -0400, Chris Snook wrote:
> Paul E. McKenney wrote:
> >On Thu, Aug 09, 2007 at 12:36:17PM -0400, Chris Snook wrote:
> >>Paul E. McKenney wrote:
> >>>The compiler is within its rights to read a 32-bit quantity 16 bits at
> >>>at time, even on a 32-bit machine.  I would be glad to help pummel any
> >>>compiler writer that pulls such a dirty trick, but the C standard really
> >>>does permit this.
> >>Yes, but we don't write code for these compilers.  There are countless 
> >>pieces of kernel code which would break in this condition, and there 
> >>doesn't seem to be any interest in fixing this.
> >>
> >>>Use of volatile does in fact save you from the compiler pushing stores 
> >>>out
> >>>of loops regardless of whether you are also doing reads.  The C standard
> >>>has the notion of sequence points, which occur at various places 
> >>>including
> >>>the ends of statements and the control expressions for "if" and "while"
> >>>statements.  The compiler is not permitted to move volatile references
> >>>across a sequence point.  Therefore, the compiler is not allowed to
> >>>push a volatile store out of a loop.  Now the CPU might well do such a
> >>>reordering, but that is a separate issue to be dealt with via memory
> >>>barriers.  Note that it is the CPU and I/O system, not the compiler,
> >>>that is forcing you to use reads to flush writes to MMIO registers.
> >>Sequence points enforce read-after-write ordering, not write-after-write. 
> >>We flush writes with reads for MMIO because of this effect as well as the 
> >>CPU/bus effects.
> >
> >Neither volatile reads nor volatile writes may be moved across sequence
> >points.
> 
> By the compiler, or by the CPU?

As mentioned in earlier emails, by the compiler.  The CPU knows nothing
of C sequence points.

> If you're depending on volatile writes 
> being visible to other CPUs, you're screwed either way, because the CPU can 
> hold that data in cache as long as it wants before it writes it to memory.  
> When this finally does happen, it will happen atomically, which is all that 
> atomic_set guarantees.  If you need to guarantee that the value is written 
> to memory at a particular time in your execution sequence, you either have 
> to read it from memory to force the compiler to store it first (and a 
> volatile cast in atomic_read will suffice for this) or you have to use 
> LOCK_PREFIX instructions which will invalidate remote cache lines 
> containing the same variable.  This patch doesn't change either of these 
> cases.

The case that it -can- change is interactions with interrupt handlers.
And NMI/SMI handlers, for that matter.

> >>>And you would be amazed at what compiler writers will do in order to
> >>>get an additional fraction of a percent out of SpecCPU...
> >>Probably not :)
> >>
> >>>In short, please retain atomic_set()'s volatility, especially on those
> >>>architectures that declared the atomic_t's counter to be volatile.
> >>Like i386 and x86_64?  These used to have volatile in the atomic_t 
> >>declaration. We removed it, and the sky did not fall.
> >
> >Interesting.  You tested all possible configs on all possible hardware?
> 
> No, but I can reason about it and be confident that this won't break 
> anything that isn't already broken.  At worst, this patch will make any 
> existing subtly incorrect uses of atomic_t much more obvious and easier to 
> track down.

You took interrupt and NMI/SMI handlers into account?

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Chris Snook

Paul E. McKenney wrote:

On Thu, Aug 09, 2007 at 12:36:17PM -0400, Chris Snook wrote:

Paul E. McKenney wrote:

The compiler is within its rights to read a 32-bit quantity 16 bits at
at time, even on a 32-bit machine.  I would be glad to help pummel any
compiler writer that pulls such a dirty trick, but the C standard really
does permit this.
Yes, but we don't write code for these compilers.  There are countless 
pieces of kernel code which would break in this condition, and there 
doesn't seem to be any interest in fixing this.



Use of volatile does in fact save you from the compiler pushing stores out
of loops regardless of whether you are also doing reads.  The C standard
has the notion of sequence points, which occur at various places including
the ends of statements and the control expressions for "if" and "while"
statements.  The compiler is not permitted to move volatile references
across a sequence point.  Therefore, the compiler is not allowed to
push a volatile store out of a loop.  Now the CPU might well do such a
reordering, but that is a separate issue to be dealt with via memory
barriers.  Note that it is the CPU and I/O system, not the compiler,
that is forcing you to use reads to flush writes to MMIO registers.
Sequence points enforce read-after-write ordering, not write-after-write.  
We flush writes with reads for MMIO because of this effect as well as the 
CPU/bus effects.


Neither volatile reads nor volatile writes may be moved across sequence
points.


By the compiler, or by the CPU?  If you're depending on volatile writes being 
visible to other CPUs, you're screwed either way, because the CPU can hold that 
data in cache as long as it wants before it writes it to memory.  When this 
finally does happen, it will happen atomically, which is all that atomic_set 
guarantees.  If you need to guarantee that the value is written to memory at a 
particular time in your execution sequence, you either have to read it from 
memory to force the compiler to store it first (and a volatile cast in 
atomic_read will suffice for this) or you have to use LOCK_PREFIX instructions 
which will invalidate remote cache lines containing the same variable.  This 
patch doesn't change either of these cases.



And you would be amazed at what compiler writers will do in order to
get an additional fraction of a percent out of SpecCPU...

Probably not :)


In short, please retain atomic_set()'s volatility, especially on those
architectures that declared the atomic_t's counter to be volatile.
Like i386 and x86_64?  These used to have volatile in the atomic_t 
declaration. We removed it, and the sky did not fall.


Interesting.  You tested all possible configs on all possible hardware?


No, but I can reason about it and be confident that this won't break anything 
that isn't already broken.  At worst, this patch will make any existing subtly 
incorrect uses of atomic_t much more obvious and easier to track down.


-- Chris
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Paul E. McKenney
On Thu, Aug 09, 2007 at 12:36:17PM -0400, Chris Snook wrote:
> Paul E. McKenney wrote:
> >The compiler is within its rights to read a 32-bit quantity 16 bits at
> >at time, even on a 32-bit machine.  I would be glad to help pummel any
> >compiler writer that pulls such a dirty trick, but the C standard really
> >does permit this.
> 
> Yes, but we don't write code for these compilers.  There are countless 
> pieces of kernel code which would break in this condition, and there 
> doesn't seem to be any interest in fixing this.
> 
> >Use of volatile does in fact save you from the compiler pushing stores out
> >of loops regardless of whether you are also doing reads.  The C standard
> >has the notion of sequence points, which occur at various places including
> >the ends of statements and the control expressions for "if" and "while"
> >statements.  The compiler is not permitted to move volatile references
> >across a sequence point.  Therefore, the compiler is not allowed to
> >push a volatile store out of a loop.  Now the CPU might well do such a
> >reordering, but that is a separate issue to be dealt with via memory
> >barriers.  Note that it is the CPU and I/O system, not the compiler,
> >that is forcing you to use reads to flush writes to MMIO registers.
> 
> Sequence points enforce read-after-write ordering, not write-after-write.  
> We flush writes with reads for MMIO because of this effect as well as the 
> CPU/bus effects.

Neither volatile reads nor volatile writes may be moved across sequence
points.

> >And you would be amazed at what compiler writers will do in order to
> >get an additional fraction of a percent out of SpecCPU...
> 
> Probably not :)
> 
> >In short, please retain atomic_set()'s volatility, especially on those
> >architectures that declared the atomic_t's counter to be volatile.
> 
> Like i386 and x86_64?  These used to have volatile in the atomic_t 
> declaration. We removed it, and the sky did not fall.

Interesting.  You tested all possible configs on all possible hardware?

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Chris Snook

Paul E. McKenney wrote:

The compiler is within its rights to read a 32-bit quantity 16 bits at
at time, even on a 32-bit machine.  I would be glad to help pummel any
compiler writer that pulls such a dirty trick, but the C standard really
does permit this.


Yes, but we don't write code for these compilers.  There are countless pieces of 
kernel code which would break in this condition, and there doesn't seem to be 
any interest in fixing this.



Use of volatile does in fact save you from the compiler pushing stores out
of loops regardless of whether you are also doing reads.  The C standard
has the notion of sequence points, which occur at various places including
the ends of statements and the control expressions for "if" and "while"
statements.  The compiler is not permitted to move volatile references
across a sequence point.  Therefore, the compiler is not allowed to
push a volatile store out of a loop.  Now the CPU might well do such a
reordering, but that is a separate issue to be dealt with via memory
barriers.  Note that it is the CPU and I/O system, not the compiler,
that is forcing you to use reads to flush writes to MMIO registers.


Sequence points enforce read-after-write ordering, not write-after-write.  We 
flush writes with reads for MMIO because of this effect as well as the CPU/bus 
effects.



And you would be amazed at what compiler writers will do in order to
get an additional fraction of a percent out of SpecCPU...


Probably not :)


In short, please retain atomic_set()'s volatility, especially on those
architectures that declared the atomic_t's counter to be volatile.


Like i386 and x86_64?  These used to have volatile in the atomic_t declaration. 
 We removed it, and the sky did not fall.


-- Chris
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Chris Snook

Segher Boessenkool wrote:
We can't have split stores because we don't use atomic64_t on 32-bit 
architectures.


That's not true; the compiler is free to split all stores
(and reads) from memory however it wants.  It is debatable
whether "volatile" would prevent this as well, certainly
it is unsafe if you want to be portable.  GCC will do its
best to not split volatile memory accesses, but bugs in
this area do happen a lot (because the compiler code for
volatile isn't as well exercised as most other compiler
code, and because it is simply a hard subject; and there
is no real formalised model for what GCC should do).

The only safe way to get atomic accesses is to write
assembler code.  Are there any downsides to that?  I don't
see any.


The assumption that aligned word reads and writes are atomic, and that words are 
aligned unless explicitly packed otherwise, is endemic in the kernel.  No sane 
compiler violates this assumption.  It's true that we're not portable to insane 
compilers after this patch, but we never were in the first place.


-- Chris
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Paul E. McKenney
On Thu, Aug 09, 2007 at 11:24:22AM -0400, Chris Snook wrote:
> Paul E. McKenney wrote:
> >On Thu, Aug 09, 2007 at 10:53:14AM -0400, Chris Snook wrote:
> >>Paul E. McKenney wrote:
> >>>Why not the same access-once semantics for atomic_set() as
> >>>for atomic_read()?  As this patch stands, it might introduce
> >>>architecture-specific compiler-induced bugs due to the fact that
> >>>atomic_set() used to imply volatile behavior but no longer does.
> >>When we make the volatile cast in atomic_read(), we're casting an rvalue 
> >>to volatile.  This unambiguously tells the compiler that we want to 
> >>re-load that register from memory.  What's "volatile behavior" for an 
> >>lvalue?
> >
> >I was absolutely -not- suggesting volatile behavior for lvalues.
> >
> >Instead, I am asking for volatile behavior from an -rvalue-.  In the
> >case of atomic_read(), it is the atomic_t being read from.  In the case
> >of atomic_set(), it is the atomic_t being written to.  As suggested in
> >my previous email:
> >
> >#define atomic_set(v,i)  ((*(volatile int *)&(v)->counter) = 
> >(i))
> >#define atomic64_set(v,i)((*(volatile long *)&(v)->counter) = (i))
> 
> That looks like a volatile lvalue to me.  I confess I didn't exactly ace 
> compilers.  Care to explain this?

OK, so I am dylexic this morning.  Never could tell left from right...

This is indeed an lvalue, as is any non-function expression that you
can apply the "&" prefix operator to.  Including the expression in
your proposed definitions of atomic_set() and atomic_set64().

An lvalue is any expression that -could- appear on the left-hand side
of an assignment operator, regardless of where it actually appears.
More precisely, an lvalue is an expression that refers to a variable in
such a way that the variable might be both loaded from and stored to,
ignoring things like "const" for the moment.

Because "(v)->counter" could be either loaded from or stored to, it
is an lvalue, which is why the compiler didn't scream bloody murder
at you when you took the address of it.

> >Again, the architectures that used to have their "counter" declared
> >as volatile will lose volatile semantics on atomic_set() with your
> >patch, which might result in bugs due to overly imaginative compiler
> >optimizations.  The above would prevent any such bugs from appearing.
> >
> >>  A 
> >>write to an lvalue already implies an eventual write to memory, so this 
> >>would be a no-op. Maybe you'll write to the register a few times before 
> >>flushing it to memory, but it will happen eventually.  With an rvalue, 
> >>there's no guarantee that it will *ever* load from memory, which is what 
> >>volatile fixes.
> >>
> >>I think what you have in mind is LOCK_PREFIX behavior, which is not the 
> >>purpose of atomic_set.  We use LOCK_PREFIX in the inline assembly for the 
> >>atomic_* operations that read, modify, and write a value, only because it 
> >>is necessary to perform that entire transaction atomically.
> >
> >No LOCK_PREFIX, thank you!!!  I just want to make sure that the compiler
> >doesn't push the store down out of a loop, split the store, allow the
> >store to happen twice (e.g., to allow different code paths to be merged),
> >and all the other tricks that the C standard permits compilers to pull.
> 
> We can't have split stores because we don't use atomic64_t on 32-bit 
> architectures.  volatile won't save you from pushing stores out of loops 
> unless you're also doing reads.  This is why we use reads to flush writes 
> to mmio registers.  In this case, an atomic_read() with volatile in it will 
> suffice. Storing twice is perfectly legal here, though it's unlikely that 
> an optimizing compiler smart enough to create the problem we're addressing 
> would ever do that.

The compiler is within its rights to read a 32-bit quantity 16 bits at
at time, even on a 32-bit machine.  I would be glad to help pummel any
compiler writer that pulls such a dirty trick, but the C standard really
does permit this.

Use of volatile does in fact save you from the compiler pushing stores out
of loops regardless of whether you are also doing reads.  The C standard
has the notion of sequence points, which occur at various places including
the ends of statements and the control expressions for "if" and "while"
statements.  The compiler is not permitted to move volatile references
across a sequence point.  Therefore, the compiler is not allowed to
push a volatile store out of a loop.  Now the CPU might well do such a
reordering, but that is a separate issue to be dealt with via memory
barriers.  Note that it is the CPU and I/O system, not the compiler,
that is forcing you to use reads to flush writes to MMIO registers.

And you would be amazed at what compiler writers will do in order to
get an additional fraction of a percent out of SpecCPU...

In short, please retain atomic_set()'s volatility, especially on those
archit

Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Segher Boessenkool
We can't have split stores because we don't use atomic64_t on 32-bit 
architectures.


That's not true; the compiler is free to split all stores
(and reads) from memory however it wants.  It is debatable
whether "volatile" would prevent this as well, certainly
it is unsafe if you want to be portable.  GCC will do its
best to not split volatile memory accesses, but bugs in
this area do happen a lot (because the compiler code for
volatile isn't as well exercised as most other compiler
code, and because it is simply a hard subject; and there
is no real formalised model for what GCC should do).

The only safe way to get atomic accesses is to write
assembler code.  Are there any downsides to that?  I don't
see any.


Segher

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Chris Snook

Paul E. McKenney wrote:

On Thu, Aug 09, 2007 at 10:53:14AM -0400, Chris Snook wrote:

Paul E. McKenney wrote:

Why not the same access-once semantics for atomic_set() as
for atomic_read()?  As this patch stands, it might introduce
architecture-specific compiler-induced bugs due to the fact that
atomic_set() used to imply volatile behavior but no longer does.
When we make the volatile cast in atomic_read(), we're casting an rvalue to 
volatile.  This unambiguously tells the compiler that we want to re-load 
that register from memory.  What's "volatile behavior" for an lvalue?


I was absolutely -not- suggesting volatile behavior for lvalues.

Instead, I am asking for volatile behavior from an -rvalue-.  In the
case of atomic_read(), it is the atomic_t being read from.  In the case
of atomic_set(), it is the atomic_t being written to.  As suggested in
my previous email:

#define atomic_set(v,i) ((*(volatile int *)&(v)->counter) = (i))
#define atomic64_set(v,i)   ((*(volatile long *)&(v)->counter) = (i))


That looks like a volatile lvalue to me.  I confess I didn't exactly ace 
compilers.  Care to explain this?



Again, the architectures that used to have their "counter" declared
as volatile will lose volatile semantics on atomic_set() with your
patch, which might result in bugs due to overly imaginative compiler
optimizations.  The above would prevent any such bugs from appearing.

  A 
write to an lvalue already implies an eventual write to memory, so this 
would be a no-op. Maybe you'll write to the register a few times before 
flushing it to memory, but it will happen eventually.  With an rvalue, 
there's no guarantee that it will *ever* load from memory, which is what 
volatile fixes.


I think what you have in mind is LOCK_PREFIX behavior, which is not the 
purpose of atomic_set.  We use LOCK_PREFIX in the inline assembly for the 
atomic_* operations that read, modify, and write a value, only because it 
is necessary to perform that entire transaction atomically.


No LOCK_PREFIX, thank you!!!  I just want to make sure that the compiler
doesn't push the store down out of a loop, split the store, allow the
store to happen twice (e.g., to allow different code paths to be merged),
and all the other tricks that the C standard permits compilers to pull.


We can't have split stores because we don't use atomic64_t on 32-bit 
architectures.  volatile won't save you from pushing stores out of loops unless 
you're also doing reads.  This is why we use reads to flush writes to mmio 
registers.  In this case, an atomic_read() with volatile in it will suffice. 
Storing twice is perfectly legal here, though it's unlikely that an optimizing 
compiler smart enough to create the problem we're addressing would ever do that.


-- Chris
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Paul E. McKenney
On Thu, Aug 09, 2007 at 10:53:14AM -0400, Chris Snook wrote:
> Paul E. McKenney wrote:
> >Why not the same access-once semantics for atomic_set() as
> >for atomic_read()?  As this patch stands, it might introduce
> >architecture-specific compiler-induced bugs due to the fact that
> >atomic_set() used to imply volatile behavior but no longer does.
> 
> When we make the volatile cast in atomic_read(), we're casting an rvalue to 
> volatile.  This unambiguously tells the compiler that we want to re-load 
> that register from memory.  What's "volatile behavior" for an lvalue?

I was absolutely -not- suggesting volatile behavior for lvalues.

Instead, I am asking for volatile behavior from an -rvalue-.  In the
case of atomic_read(), it is the atomic_t being read from.  In the case
of atomic_set(), it is the atomic_t being written to.  As suggested in
my previous email:

#define atomic_set(v,i) ((*(volatile int *)&(v)->counter) = (i))
#define atomic64_set(v,i)   ((*(volatile long *)&(v)->counter) = (i))

Again, the architectures that used to have their "counter" declared
as volatile will lose volatile semantics on atomic_set() with your
patch, which might result in bugs due to overly imaginative compiler
optimizations.  The above would prevent any such bugs from appearing.

>   A 
> write to an lvalue already implies an eventual write to memory, so this 
> would be a no-op. Maybe you'll write to the register a few times before 
> flushing it to memory, but it will happen eventually.  With an rvalue, 
> there's no guarantee that it will *ever* load from memory, which is what 
> volatile fixes.
> 
> I think what you have in mind is LOCK_PREFIX behavior, which is not the 
> purpose of atomic_set.  We use LOCK_PREFIX in the inline assembly for the 
> atomic_* operations that read, modify, and write a value, only because it 
> is necessary to perform that entire transaction atomically.

No LOCK_PREFIX, thank you!!!  I just want to make sure that the compiler
doesn't push the store down out of a loop, split the store, allow the
store to happen twice (e.g., to allow different code paths to be merged),
and all the other tricks that the C standard permits compilers to pull.

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Chris Snook

Paul E. McKenney wrote:

Why not the same access-once semantics for atomic_set() as
for atomic_read()?  As this patch stands, it might introduce
architecture-specific compiler-induced bugs due to the fact that
atomic_set() used to imply volatile behavior but no longer does.


When we make the volatile cast in atomic_read(), we're casting an rvalue to 
volatile.  This unambiguously tells the compiler that we want to re-load that 
register from memory.  What's "volatile behavior" for an lvalue?  A write to an 
lvalue already implies an eventual write to memory, so this would be a no-op. 
Maybe you'll write to the register a few times before flushing it to memory, but 
it will happen eventually.  With an rvalue, there's no guarantee that it will 
*ever* load from memory, which is what volatile fixes.


I think what you have in mind is LOCK_PREFIX behavior, which is not the purpose 
of atomic_set.  We use LOCK_PREFIX in the inline assembly for the atomic_* 
operations that read, modify, and write a value, only because it is necessary to 
perform that entire transaction atomically.


-- Chris
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Paul E. McKenney
On Thu, Aug 09, 2007 at 09:24:42AM -0400, Chris Snook wrote:
> From: Chris Snook <[EMAIL PROTECTED]>
> 
> Purify volatile use for atomic[64]_t on alpha.

Why not the same access-once semantics for atomic_set() as
for atomic_read()?  As this patch stands, it might introduce
architecture-specific compiler-induced bugs due to the fact that
atomic_set() used to imply volatile behavior but no longer does.

See below for one approach to fixing this.

> Signed-off-by: Chris Snook <[EMAIL PROTECTED]>
> 
> --- linux-2.6.23-rc2-orig/include/asm-alpha/atomic.h  2007-07-08 
> 19:32:17.0 -0400
> +++ linux-2.6.23-rc2/include/asm-alpha/atomic.h   2007-08-09 
> 09:19:00.0 -0400
> @@ -14,18 +14,22 @@
> 
> 
>  /*
> - * Counter is volatile to make sure gcc doesn't try to be clever
> - * and move things around on us. We need to use _exactly_ the address
> - * the user gave us, not some alias that contains the same information.
> + * Make sure gcc doesn't try to be clever and move things around
> + * on us. We need to use _exactly_ the address the user gave us,
> + * not some alias that contains the same information.
>   */
> -typedef struct { volatile int counter; } atomic_t;
> -typedef struct { volatile long counter; } atomic64_t;
> +typedef struct { int counter; } atomic_t;
> +typedef struct { long counter; } atomic64_t;
> 
>  #define ATOMIC_INIT(i)   ( (atomic_t) { (i) } )
>  #define ATOMIC64_INIT(i) ( (atomic64_t) { (i) } )
> 
> -#define atomic_read(v)   ((v)->counter + 0)
> -#define atomic64_read(v) ((v)->counter + 0)
> +/*
> + * Casting to volatile here minimizes the need for barriers,
> + * without having to declare the type itself as volatile.
> + */
> +#define atomic_read(v)   (*(volatile int *)&(v)->counter + 0)
> +#define atomic64_read(v) (*(volatile long *)&(v)->counter + 0)
> 
>  #define atomic_set(v,i)  ((v)->counter = (i))
>  #define atomic64_set(v,i)((v)->counter = (i))

How about:

#define atomic_set(v,i) ((*(volatile int *)&(v)->counter) = (i))
#define atomic64_set(v,i)   ((*(volatile long *)&(v)->counter) = (i))

Thanx, Paul

> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/24] make atomic_read() behave consistently on alpha

2007-08-09 Thread Chris Snook
From: Chris Snook <[EMAIL PROTECTED]>

Purify volatile use for atomic[64]_t on alpha.

Signed-off-by: Chris Snook <[EMAIL PROTECTED]>

--- linux-2.6.23-rc2-orig/include/asm-alpha/atomic.h2007-07-08 
19:32:17.0 -0400
+++ linux-2.6.23-rc2/include/asm-alpha/atomic.h 2007-08-09 09:19:00.0 
-0400
@@ -14,18 +14,22 @@
 
 
 /*
- * Counter is volatile to make sure gcc doesn't try to be clever
- * and move things around on us. We need to use _exactly_ the address
- * the user gave us, not some alias that contains the same information.
+ * Make sure gcc doesn't try to be clever and move things around
+ * on us. We need to use _exactly_ the address the user gave us,
+ * not some alias that contains the same information.
  */
-typedef struct { volatile int counter; } atomic_t;
-typedef struct { volatile long counter; } atomic64_t;
+typedef struct { int counter; } atomic_t;
+typedef struct { long counter; } atomic64_t;
 
 #define ATOMIC_INIT(i) ( (atomic_t) { (i) } )
 #define ATOMIC64_INIT(i)   ( (atomic64_t) { (i) } )
 
-#define atomic_read(v) ((v)->counter + 0)
-#define atomic64_read(v)   ((v)->counter + 0)
+/*
+ * Casting to volatile here minimizes the need for barriers,
+ * without having to declare the type itself as volatile.
+ */
+#define atomic_read(v) (*(volatile int *)&(v)->counter + 0)
+#define atomic64_read(v)   (*(volatile long *)&(v)->counter + 0)
 
 #define atomic_set(v,i)((v)->counter = (i))
 #define atomic64_set(v,i)  ((v)->counter = (i))
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html