subject:"\[PATCH 0\/24\] make atomic_read\(\) behave consistently across all architectures"

[PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-09 Thread Chris Snook

As recent discussions[1], and bugs[2] have shown, there is a great deal of
confusion about the expected behavior of atomic_read(), compounded by the
fact that it is not the same on all architectures.  Since users expect calls
to atomic_read() to actually perform a read, it is not desirable to allow
the compiler to optimize this away.  Requiring the use of barrier() in this
case is inefficient, since we only want to re-load the atomic_t variable,
not everything else in scope.

This patchset makes the behavior of atomic_read uniform by removing the
volatile keyword from all atomic_t and atomic64_t definitions that currently
have it, and instead explicitly casts the variable as volatile in
atomic_read().  This leaves little room for creative optimization by the
compiler, and is in keeping with the principles behind "volatile considered
harmful".

Busy-waiters should still use cpu_relax(), but fast paths may be able to
reduce their use of barrier() between some atomic_read() calls.

-- Chris

1)  http://lkml.org/lkml/2007/7/1/52
2)  http://lkml.org/lkml/2007/8/8/122
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-09 Thread Arnd Bergmann

On Thursday 09 August 2007, Chris Snook wrote:
> This patchset makes the behavior of atomic_read uniform by removing the
> volatile keyword from all atomic_t and atomic64_t definitions that currently
> have it, and instead explicitly casts the variable as volatile in
> atomic_read().  This leaves little room for creative optimization by the
> compiler, and is in keeping with the principles behind "volatile considered
> harmful".
> 

Just an idea: since all architectures already include asm-generic/atomic.h,
why not move the definitions of atomic_t and atomic64_t, as well as anything
that does not involve architecture specific inline assembly into the generic
header?

Arnd <><
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-09 Thread Chris Snook


Arnd Bergmann wrote:

On Thursday 09 August 2007, Chris Snook wrote:

This patchset makes the behavior of atomic_read uniform by removing the
volatile keyword from all atomic_t and atomic64_t definitions that currently
have it, and instead explicitly casts the variable as volatile in
atomic_read().  This leaves little room for creative optimization by the
compiler, and is in keeping with the principles behind "volatile considered
harmful".



Just an idea: since all architectures already include asm-generic/atomic.h,
why not move the definitions of atomic_t and atomic64_t, as well as anything
that does not involve architecture specific inline assembly into the generic
header?

Arnd <><


a) chicken and egg: asm-generic/atomic.h depends on definitions in asm/atomic.h

If you can find a way to reshuffle the code and make it simpler, I personally am 
all for it.  I'm skeptical that you'll get much to show for the effort.


b) The definitions aren't precisely identical between all architectures, so it 
would be a mess of special cases, which gets us right back to where we are now.


-- Chris
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-09 Thread Arnd Bergmann

On Thursday 09 August 2007, Chris Snook wrote:
> a) chicken and egg: asm-generic/atomic.h depends on definitions in 
> asm/atomic.h

Ok, I see.
 
> If you can find a way to reshuffle the code and make it simpler, I personally 
> am 
> all for it. I'm skeptical that you'll get much to show for the effort. 

I guess it could  be done using more macros or new headers, but I don't see
a way that would actually improve the situation.

> b) The definitions aren't precisely identical between all architectures, so 
> it 
> would be a mess of special cases, which gets us right back to where we are 
> now.

Why are they not identical? Anything beyond the 32/64 bit difference should
be the same afaics, or it might cause more bugs.

Arnd <><
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-14 Thread Christoph Lameter

On Thu, 9 Aug 2007, Chris Snook wrote:

> This patchset makes the behavior of atomic_read uniform by removing the
> volatile keyword from all atomic_t and atomic64_t definitions that currently
> have it, and instead explicitly casts the variable as volatile in
> atomic_read().  This leaves little room for creative optimization by the
> compiler, and is in keeping with the principles behind "volatile considered
> harmful".

volatile is generally harmful even in atomic_read(). Barriers control
visibility and AFAICT things are fine.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-14 Thread Chris Snook


Christoph Lameter wrote:

On Thu, 9 Aug 2007, Chris Snook wrote:


This patchset makes the behavior of atomic_read uniform by removing the
volatile keyword from all atomic_t and atomic64_t definitions that currently
have it, and instead explicitly casts the variable as volatile in
atomic_read().  This leaves little room for creative optimization by the
compiler, and is in keeping with the principles behind "volatile considered
harmful".


volatile is generally harmful even in atomic_read(). Barriers control
visibility and AFAICT things are fine.


But barriers force a flush of *everything* in scope, which we generally don't 
want.  On the other hand, we pretty much always want to flush atomic_* 
operations.  One way or another, we should be restricting the volatile behavior 
to the thing that needs it.  On most architectures, this patch set just moves 
that from the declaration, where it is considered harmful, to the use, where it 
is considered an occasional necessary evil.


See the resubmitted patchset, which also puts a cast in the atomic[64]_set 
operations.


-- Chris
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-14 Thread Christoph Lameter

On Tue, 14 Aug 2007, Chris Snook wrote:

> But barriers force a flush of *everything* in scope, which we generally don't
> want.  On the other hand, we pretty much always want to flush atomic_*
> operations.  One way or another, we should be restricting the volatile
> behavior to the thing that needs it.  On most architectures, this patch set
> just moves that from the declaration, where it is considered harmful, to the
> use, where it is considered an occasional necessary evil.

Then we would need

atomic_read()

and

atomic_read_volatile()

atomic_read_volatile() would imply an object sized memory barrier before 
and after?
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-14 Thread Satyam Sharma

On Tue, 14 Aug 2007, Christoph Lameter wrote:

> On Thu, 9 Aug 2007, Chris Snook wrote:
> 
> > This patchset makes the behavior of atomic_read uniform by removing the
> > volatile keyword from all atomic_t and atomic64_t definitions that currently
> > have it, and instead explicitly casts the variable as volatile in
> > atomic_read().  This leaves little room for creative optimization by the
> > compiler, and is in keeping with the principles behind "volatile considered
> > harmful".
> 
> volatile is generally harmful even in atomic_read(). Barriers control
> visibility and AFAICT things are fine.

Frankly, I don't see the need for this series myself either. Personal
opinion (others may differ), but I consider "volatile" to be a sad /
unfortunate wart in C (numerous threads on this list and on the gcc
lists/bugzilla over the years stand testimony to this) and if we _can_
steer clear of it, then why not -- why use this ill-defined primitive
whose implementation has often differed over compiler versions and
platforms? Granted, barrier() _is_ heavy-handed in that it makes the
optimizer forget _everything_, but then somebody did post a forget()
macro on this thread itself ...

[ BTW, why do we want the compiler to not optimize atomic_read()'s in
  the first place? Atomic ops guarantee atomicity, which has nothing
  to do with "volatility" -- users that expect "volatility" from
  atomic ops are the ones who must be fixed instead, IMHO. ]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-14 Thread Chris Snook


Satyam Sharma wrote:


On Tue, 14 Aug 2007, Christoph Lameter wrote:


On Thu, 9 Aug 2007, Chris Snook wrote:


This patchset makes the behavior of atomic_read uniform by removing the
volatile keyword from all atomic_t and atomic64_t definitions that currently
have it, and instead explicitly casts the variable as volatile in
atomic_read().  This leaves little room for creative optimization by the
compiler, and is in keeping with the principles behind "volatile considered
harmful".

volatile is generally harmful even in atomic_read(). Barriers control
visibility and AFAICT things are fine.


Frankly, I don't see the need for this series myself either. Personal
opinion (others may differ), but I consider "volatile" to be a sad /
unfortunate wart in C (numerous threads on this list and on the gcc
lists/bugzilla over the years stand testimony to this) and if we _can_
steer clear of it, then why not -- why use this ill-defined primitive
whose implementation has often differed over compiler versions and
platforms? Granted, barrier() _is_ heavy-handed in that it makes the
optimizer forget _everything_, but then somebody did post a forget()
macro on this thread itself ...

[ BTW, why do we want the compiler to not optimize atomic_read()'s in
  the first place? Atomic ops guarantee atomicity, which has nothing
  to do with "volatility" -- users that expect "volatility" from
  atomic ops are the ones who must be fixed instead, IMHO. ]


Because atomic operations are generally used for synchronization, which requires 
volatile behavior.  Most such codepaths currently use an inefficient barrier(). 
 Some forget to and we get bugs, because people assume that atomic_read() 
actually reads something, and atomic_write() actually writes something.  Worse, 
these are architecture-specific, even compiler version-specific bugs that are 
often difficult to track down.


-- Chris
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-14 Thread Christoph Lameter

On Tue, 14 Aug 2007, Chris Snook wrote:

> Because atomic operations are generally used for synchronization, which
> requires volatile behavior.  Most such codepaths currently use an inefficient
> barrier().  Some forget to and we get bugs, because people assume that
> atomic_read() actually reads something, and atomic_write() actually writes
> something.  Worse, these are architecture-specific, even compiler
> version-specific bugs that are often difficult to track down.

Looks like we need to have lock and unlock semantics?

atomic_read()

which has no barrier or volatile implications.

atomic_read_for_lock

Acquire semantics?


atomic_read_for_unlock

Release semantics?

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-14 Thread Paul E. McKenney

On Wed, Aug 15, 2007 at 04:38:54AM +0530, Satyam Sharma wrote:
> 
> 
> On Tue, 14 Aug 2007, Christoph Lameter wrote:
> 
> > On Thu, 9 Aug 2007, Chris Snook wrote:
> > 
> > > This patchset makes the behavior of atomic_read uniform by removing the
> > > volatile keyword from all atomic_t and atomic64_t definitions that 
> > > currently
> > > have it, and instead explicitly casts the variable as volatile in
> > > atomic_read().  This leaves little room for creative optimization by the
> > > compiler, and is in keeping with the principles behind "volatile 
> > > considered
> > > harmful".
> > 
> > volatile is generally harmful even in atomic_read(). Barriers control
> > visibility and AFAICT things are fine.
> 
> Frankly, I don't see the need for this series myself either. Personal
> opinion (others may differ), but I consider "volatile" to be a sad /
> unfortunate wart in C (numerous threads on this list and on the gcc
> lists/bugzilla over the years stand testimony to this) and if we _can_
> steer clear of it, then why not -- why use this ill-defined primitive
> whose implementation has often differed over compiler versions and
> platforms? Granted, barrier() _is_ heavy-handed in that it makes the
> optimizer forget _everything_, but then somebody did post a forget()
> macro on this thread itself ...
> 
> [ BTW, why do we want the compiler to not optimize atomic_read()'s in
>   the first place? Atomic ops guarantee atomicity, which has nothing
>   to do with "volatility" -- users that expect "volatility" from
>   atomic ops are the ones who must be fixed instead, IMHO. ]

Interactions between mainline code and interrupt/NMI handlers on the same
CPU (for example, when both are using per-CPU variables.  See examples
previously posted in this thread, or look at the rcu_read_lock() and
rcu_read_unlock() implementations in http://lkml.org/lkml/2007/8/7/280.

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-14 Thread Herbert Xu

Chris Snook <[EMAIL PROTECTED]> wrote:
> 
> Because atomic operations are generally used for synchronization, which 
> requires 
> volatile behavior.  Most such codepaths currently use an inefficient 
> barrier(). 
>  Some forget to and we get bugs, because people assume that atomic_read() 
> actually reads something, and atomic_write() actually writes something.  
> Worse, 
> these are architecture-specific, even compiler version-specific bugs that are 
> often difficult to track down.

I'm yet to see a single example from the current tree where
this patch series is the correct solution.  So far the only
example has been a buggy piece of code which has since been
fixed with a cpu_relax.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Heiko Carstens

On Wed, Aug 15, 2007 at 02:49:03PM +0800, Herbert Xu wrote:
> Chris Snook <[EMAIL PROTECTED]> wrote:
> > 
> > Because atomic operations are generally used for synchronization, which 
> > requires 
> > volatile behavior.  Most such codepaths currently use an inefficient 
> > barrier(). 
> >  Some forget to and we get bugs, because people assume that atomic_read() 
> > actually reads something, and atomic_write() actually writes something.  
> > Worse, 
> > these are architecture-specific, even compiler version-specific bugs that 
> > are 
> > often difficult to track down.
> 
> I'm yet to see a single example from the current tree where
> this patch series is the correct solution.  So far the only
> example has been a buggy piece of code which has since been
> fixed with a cpu_relax.

Btw.: we still have

include/asm-i386/mach-es7000/mach_wakecpu.h:  while (!atomic_read(deassert));
include/asm-i386/mach-default/mach_wakecpu.h: while (!atomic_read(deassert));

Looks like they need to be fixed as well.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Stefan Richter

Satyam Sharma wrote:
> [ BTW, why do we want the compiler to not optimize atomic_read()'s in
>   the first place? Atomic ops guarantee atomicity, which has nothing
>   to do with "volatility" -- users that expect "volatility" from
>   atomic ops are the ones who must be fixed instead, IMHO. ]

LDD3 says on page 125:  "The following operations are defined for the
type [atomic_t] and are guaranteed to be atomic with respect to all
processors of an SMP computer."

Doesn't "atomic WRT all processors" require volatility?
-- 
Stefan Richter
-=-=-=== =--- -
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Herbert Xu

On Wed, Aug 15, 2007 at 12:35:31PM +0200, Stefan Richter wrote:
> 
> LDD3 says on page 125:  "The following operations are defined for the
> type [atomic_t] and are guaranteed to be atomic with respect to all
> processors of an SMP computer."
> 
> Doesn't "atomic WRT all processors" require volatility?

Not at all.  We also require this to be atomic without any
hint of volatility.

extern int foo;
foo = 4;

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma

On Wed, 15 Aug 2007, Stefan Richter wrote:

> Satyam Sharma wrote:
> > [ BTW, why do we want the compiler to not optimize atomic_read()'s in
> >   the first place? Atomic ops guarantee atomicity, which has nothing
> >   to do with "volatility" -- users that expect "volatility" from
> >   atomic ops are the ones who must be fixed instead, IMHO. ]
> 
> LDD3 says on page 125:  "The following operations are defined for the
> type [atomic_t] and are guaranteed to be atomic with respect to all
> processors of an SMP computer."
> 
> Doesn't "atomic WRT all processors" require volatility?

No, it definitely doesn't. Why should it?

"Atomic w.r.t. all processors" is just your normal, simple "atomicity"
for SMP systems (ensure that that object is modified / set / replaced
in main memory atomically) and has nothing to do with "volatile"
behaviour.

"Volatile behaviour" itself isn't consistently defined (at least
definitely not consistently implemented in various gcc versions across
platforms), but it is /expected/ to mean something like: "ensure that
every such access actually goes all the way to memory, and is not
re-ordered w.r.t. to other accesses, as far as the compiler can take
care of these". The last "as far as compiler can take care" disclaimer
comes about due to CPUs doing their own re-ordering nowadays.

For example (say on i386):

(A)
$ cat tp1.c
int a;

void func(void)
{
a = 10;
a = 20;
}
$ gcc -Os -S tp1.c
$ cat tp1.s
...
movl$20, a
...

(B)
$ cat tp2.c
volatile int a;

void func(void)
{
a = 10;
a = 20;
}
$ gcc -Os -S tp2.c
$ cat tp2.s
...
movl$10, a
movl$20, a
...

(C)
$ cat tp3.c
int a;

void func(void)
{
*(volatile int *)&a = 10;
*(volatile int *)&a = 20;
}
$ gcc -Os -S tp3.c
$ cat tp3.s
...
movl$10, a
movl$20, a
...

In (A) the compiler optimized "a = 10;" away, but the actual store
of the final value "20" to "a" was still "atomic". (B) and (C) also
exhibit "volatile" behaviour apart from the "atomicity".

But as others replied, it seems some callers out there depend upon
atomic ops exhibiting "volatile" behaviour as well, so that answers
my initial question, actually. I haven't looked at the code Paul
pointed me at, but I wonder if that "forget(x)" macro would help
those cases. I'd wish to avoid the "volatile" primitive, personally.

Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Stefan Richter

Satyam Sharma wrote:
> On Wed, 15 Aug 2007, Stefan Richter wrote:
>> Doesn't "atomic WRT all processors" require volatility?
> 
> No, it definitely doesn't. Why should it?
> 
> "Atomic w.r.t. all processors" is just your normal, simple "atomicity"
> for SMP systems (ensure that that object is modified / set / replaced
> in main memory atomically) and has nothing to do with "volatile"
> behaviour.
> 
> "Volatile behaviour" itself isn't consistently defined (at least
> definitely not consistently implemented in various gcc versions across
> platforms), but it is /expected/ to mean something like: "ensure that
> every such access actually goes all the way to memory, and is not
> re-ordered w.r.t. to other accesses, as far as the compiler can take
> care of these". The last "as far as compiler can take care" disclaimer
> comes about due to CPUs doing their own re-ordering nowadays.
> 
> For example (say on i386):

[...]

> In (A) the compiler optimized "a = 10;" away, but the actual store
> of the final value "20" to "a" was still "atomic". (B) and (C) also
> exhibit "volatile" behaviour apart from the "atomicity".
> 
> But as others replied, it seems some callers out there depend upon
> atomic ops exhibiting "volatile" behaviour as well, so that answers
> my initial question, actually. I haven't looked at the code Paul
> pointed me at, but I wonder if that "forget(x)" macro would help
> those cases. I'd wish to avoid the "volatile" primitive, personally.

So, looking at load instead of store, understand I correctly that in
your opinion

int b;

b = atomic_read(&a);
if (b)
do_something_time_consuming();

b = atomic_read(&a);
if (b)
do_something_more();

should be changed to explicitly forget(&a) after
do_something_time_consuming?

If so, how about the following:

static inline void A(atomic_t *a)
{
int b = atomic_read(a);
if (b)
do_something_time_consuming();
}

static inline void B(atomic_t *a)
{
int b = atomic_read(a);
if (b)
do_something_more();
}

static void C(atomic_t *a)
{
A(a);
B(b);
}

Would this need forget(a) after A(a)?

(Is the latter actually answered in C99 or is it compiler-dependent?)
-- 
Stefan Richter
-=-=-=== =--- -
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Stefan Richter

I wrote:
> static inline void A(atomic_t *a)
> {
>   int b = atomic_read(a);
>   if (b)
>   do_something_time_consuming();
> }
> 
> static inline void B(atomic_t *a)
> {
>   int b = atomic_read(a);
>   if (b)
>   do_something_more();
> }
> 
> static void C(atomic_t *a)
> {
>   A(a);
>   B(b);
/* ^ typo */
B(a);
> }
> 
> Would this need forget(a) after A(a)?
> 
> (Is the latter actually answered in C99 or is it compiler-dependent?)


-- 
Stefan Richter
-=-=-=== =--- -
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Stefan Richter

On 8/15/2007 10:18 AM, Heiko Carstens wrote:
> On Wed, Aug 15, 2007 at 02:49:03PM +0800, Herbert Xu wrote:
>> Chris Snook <[EMAIL PROTECTED]> wrote:
>> > 
>> > Because atomic operations are generally used for synchronization, which 
>> > requires 
>> > volatile behavior.  Most such codepaths currently use an inefficient 
>> > barrier(). 
>> >  Some forget to and we get bugs, because people assume that atomic_read() 
>> > actually reads something, and atomic_write() actually writes something.  
>> > Worse, 
>> > these are architecture-specific, even compiler version-specific bugs that 
>> > are 
>> > often difficult to track down.
>> 
>> I'm yet to see a single example from the current tree where
>> this patch series is the correct solution.  So far the only
>> example has been a buggy piece of code which has since been
>> fixed with a cpu_relax.
> 
> Btw.: we still have
> 
> include/asm-i386/mach-es7000/mach_wakecpu.h:  while (!atomic_read(deassert));
> include/asm-i386/mach-default/mach_wakecpu.h: while (!atomic_read(deassert));
> 
> Looks like they need to be fixed as well.


I don't know if this here is affected:

/* drivers/ieee1394/ieee1394_core.h */
static inline unsigned int get_hpsb_generation(struct hpsb_host *host)
{
return atomic_read(&host->generation);
}

/* drivers/ieee1394/nodemgr.c */
static int nodemgr_host_thread(void *__hi)
{
[...]

for (;;) {
[... sleep until bus reset event ...]

/* Pause for 1/4 second in 1/16 second intervals,
 * to make sure things settle down. */
g = get_hpsb_generation(host);
for (i = 0; i < 4 ; i++) {
if (msleep_interruptible(63) ||
kthread_should_stop())
goto exit;

/* Now get the generation in which the node ID's we collect
 * are valid.  During the bus scan we will use this generation
 * for the read transactions, so that if another reset occurs
 * during the scan the transactions will fail instead of
 * returning bogus data. */

generation = get_hpsb_generation(host);

/* If we get a reset before we are done waiting, then
 * start the waiting over again */

if (generation != g)
g = generation, i = 0;
}

[... scan bus, using generation ...]

}
exit:
[...]
}



-- 
Stefan Richter
-=-=-=== =--- -
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma

On Wed, 15 Aug 2007, Stefan Richter wrote:

> Satyam Sharma wrote:
> > On Wed, 15 Aug 2007, Stefan Richter wrote:
> >> Doesn't "atomic WRT all processors" require volatility?
> > 
> > No, it definitely doesn't. Why should it?
> > 
> > "Atomic w.r.t. all processors" is just your normal, simple "atomicity"
> > for SMP systems (ensure that that object is modified / set / replaced
> > in main memory atomically) and has nothing to do with "volatile"
> > behaviour.
> > 
> > "Volatile behaviour" itself isn't consistently defined (at least
> > definitely not consistently implemented in various gcc versions across
> > platforms), but it is /expected/ to mean something like: "ensure that
> > every such access actually goes all the way to memory, and is not
> > re-ordered w.r.t. to other accesses, as far as the compiler can take
> > care of these". The last "as far as compiler can take care" disclaimer
> > comes about due to CPUs doing their own re-ordering nowadays.
> > 
> > For example (say on i386):
> 
> [...]
> 
> > In (A) the compiler optimized "a = 10;" away, but the actual store
> > of the final value "20" to "a" was still "atomic". (B) and (C) also
> > exhibit "volatile" behaviour apart from the "atomicity".
> > 
> > But as others replied, it seems some callers out there depend upon
> > atomic ops exhibiting "volatile" behaviour as well, so that answers
> > my initial question, actually. I haven't looked at the code Paul
> > pointed me at, but I wonder if that "forget(x)" macro would help
> > those cases. I'd wish to avoid the "volatile" primitive, personally.
> 
> So, looking at load instead of store, understand I correctly that in
> your opinion
> 
>   int b;
> 
>   b = atomic_read(&a);
>   if (b)
>   do_something_time_consuming();
> 
>   b = atomic_read(&a);
>   if (b)
>   do_something_more();
> 
> should be changed to explicitly forget(&a) after
> do_something_time_consuming?

No, I'd actually prefer something like what Christoph Lameter suggested,
i.e. users (such as above) who want "volatile"-like behaviour from atomic
ops can use alternative functions. How about something like:

#define atomic_read_volatile(v) \
({  \
forget(&(v)->counter);  \
((v)->counter); \
})

Or possibly, implement these "volatile" atomic ops variants in inline asm
like the patch that Sebastian Siewior has submitted on another thread just
a while back.

Of course, if we find there are more callers in the kernel who want the
volatility behaviour than those who don't care, we can re-define the
existing ops to such variants, and re-name the existing definitions to
somethine else, say "atomic_read_nonvolatile" for all I care.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma

Hi Stefan,

On Wed, 15 Aug 2007, Stefan Richter wrote:

> On 8/15/2007 10:18 AM, Heiko Carstens wrote:
> > On Wed, Aug 15, 2007 at 02:49:03PM +0800, Herbert Xu wrote:
> >> Chris Snook <[EMAIL PROTECTED]> wrote:
> >> > 
> >> > Because atomic operations are generally used for synchronization, which 
> >> > requires 
> >> > volatile behavior.  Most such codepaths currently use an inefficient 
> >> > barrier(). 
> >> >  Some forget to and we get bugs, because people assume that 
> >> > atomic_read() 
> >> > actually reads something, and atomic_write() actually writes something.  
> >> > Worse, 
> >> > these are architecture-specific, even compiler version-specific bugs 
> >> > that are 
> >> > often difficult to track down.
> >> 
> >> I'm yet to see a single example from the current tree where
> >> this patch series is the correct solution.  So far the only
> >> example has been a buggy piece of code which has since been
> >> fixed with a cpu_relax.
> > 
> > Btw.: we still have
> > 
> > include/asm-i386/mach-es7000/mach_wakecpu.h:  while 
> > (!atomic_read(deassert));
> > include/asm-i386/mach-default/mach_wakecpu.h: while 
> > (!atomic_read(deassert));
> > 
> > Looks like they need to be fixed as well.
> 
> 
> I don't know if this here is affected:

Yes, I think it is. You're clearly expecting the read to actually happen
when you call get_hpsb_generation(). It's clearly not a busy-loop, so
cpu_relax() sounds pointless / wrong solution for this case, so I'm now
somewhat beginning to appreciate the motivation behind this series :-)

But as I said, there are ways to achieve the same goals of this series
without using "volatile".

I think I'll submit a RFC/patch or two on this myself (will also fix
the code pieces listed here).

> /* drivers/ieee1394/ieee1394_core.h */
> static inline unsigned int get_hpsb_generation(struct hpsb_host *host)
> {
>   return atomic_read(&host->generation);
> }
> 
> /* drivers/ieee1394/nodemgr.c */
> static int nodemgr_host_thread(void *__hi)
> {
>   [...]
> 
>   for (;;) {
>   [... sleep until bus reset event ...]
> 
>   /* Pause for 1/4 second in 1/16 second intervals,
>* to make sure things settle down. */
>   g = get_hpsb_generation(host);
>   for (i = 0; i < 4 ; i++) {
>   if (msleep_interruptible(63) ||
>   kthread_should_stop())
>   goto exit;

Totally unrelated, but this looks weird. IMHO you actually wanted:

msleep_interruptible(63);
if (kthread_should_stop())
goto exit;

here, didn't you? Otherwise the thread will exit even when
kthread_should_stop() != TRUE (just because it received a signal),
and it is not good for a kthread to exit on its own if it uses
kthread_should_stop() or if some other piece of kernel code could
eventually call kthread_stop(tsk) on it.

Ok, probably the thread will never receive a signal in the first
place because it's spawned off kthreadd which ignores all signals
beforehand, but still ...

[PATCH] ieee1394: Fix kthread stopping in nodemgr_host_thread

The nodemgr host thread can exit on its own even when kthread_should_stop
is not true, on receiving a signal (might never happen in practice, as
it ignores signals). But considering kthread_stop() must not be mixed with
kthreads that can exit on their own, I think changing the code like this
is clearer. This change means the thread can cut its sleep short when
receive a signal but looking at the code around, that sounds okay (and
again, it might never actually recieve a signal in practice).

Signed-off-by: Satyam Sharma <[EMAIL PROTECTED]>

---

 drivers/ieee1394/nodemgr.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/ieee1394/nodemgr.c b/drivers/ieee1394/nodemgr.c
index 2ffd534..981a7da 100644
--- a/drivers/ieee1394/nodemgr.c
+++ b/drivers/ieee1394/nodemgr.c
@@ -1721,7 +1721,8 @@ static int nodemgr_host_thread(void *__hi)
 * to make sure things settle down. */
g = get_hpsb_generation(host);
for (i = 0; i < 4 ; i++) {
-   if (msleep_interruptible(63) || kthread_should_stop())
+   msleep_interruptible(63);
+   if (kthread_should_stop())
goto exit;

/* Now get the generation in which the node ID's we 
collect
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul E. McKenney

On Wed, Aug 15, 2007 at 07:17:29PM +0530, Satyam Sharma wrote:
> On Wed, 15 Aug 2007, Stefan Richter wrote:
> > Satyam Sharma wrote:
> > > On Wed, 15 Aug 2007, Stefan Richter wrote:
> > >> Doesn't "atomic WRT all processors" require volatility?
> > > 
> > > No, it definitely doesn't. Why should it?
> > > 
> > > "Atomic w.r.t. all processors" is just your normal, simple "atomicity"
> > > for SMP systems (ensure that that object is modified / set / replaced
> > > in main memory atomically) and has nothing to do with "volatile"
> > > behaviour.
> > > 
> > > "Volatile behaviour" itself isn't consistently defined (at least
> > > definitely not consistently implemented in various gcc versions across
> > > platforms), but it is /expected/ to mean something like: "ensure that
> > > every such access actually goes all the way to memory, and is not
> > > re-ordered w.r.t. to other accesses, as far as the compiler can take
> > > care of these". The last "as far as compiler can take care" disclaimer
> > > comes about due to CPUs doing their own re-ordering nowadays.
> > > 
> > > For example (say on i386):
> > 
> > [...]
> > 
> > > In (A) the compiler optimized "a = 10;" away, but the actual store
> > > of the final value "20" to "a" was still "atomic". (B) and (C) also
> > > exhibit "volatile" behaviour apart from the "atomicity".
> > > 
> > > But as others replied, it seems some callers out there depend upon
> > > atomic ops exhibiting "volatile" behaviour as well, so that answers
> > > my initial question, actually. I haven't looked at the code Paul
> > > pointed me at, but I wonder if that "forget(x)" macro would help
> > > those cases. I'd wish to avoid the "volatile" primitive, personally.
> > 
> > So, looking at load instead of store, understand I correctly that in
> > your opinion
> > 
> > int b;
> > 
> > b = atomic_read(&a);
> > if (b)
> > do_something_time_consuming();
> > 
> > b = atomic_read(&a);
> > if (b)
> > do_something_more();
> > 
> > should be changed to explicitly forget(&a) after
> > do_something_time_consuming?
> 
> No, I'd actually prefer something like what Christoph Lameter suggested,
> i.e. users (such as above) who want "volatile"-like behaviour from atomic
> ops can use alternative functions. How about something like:
> 
> #define atomic_read_volatile(v)   \
>   ({  \
>   forget(&(v)->counter);  \
>   ((v)->counter); \
>   })

Wouldn't the above "forget" the value, throw it away, then forget
that it forgot it, giving non-volatile semantics?

> Or possibly, implement these "volatile" atomic ops variants in inline asm
> like the patch that Sebastian Siewior has submitted on another thread just
> a while back.

Given that you are advocating a change (please keep in mind that
atomic_read() and atomic_set() had volatile semantics on almost all
platforms), care to give some example where these historical volatile
semantics are causing a problem?

> Of course, if we find there are more callers in the kernel who want the
> volatility behaviour than those who don't care, we can re-define the
> existing ops to such variants, and re-name the existing definitions to
> somethine else, say "atomic_read_nonvolatile" for all I care.

Do we really need another set of APIs?  Can you give even one example
where the pre-existing volatile semantics are causing enough of a problem
to justify adding yet more atomic_*() APIs?

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Herbert Xu

On Wed, Aug 15, 2007 at 08:05:38PM +0530, Satyam Sharma wrote:
>
> > I don't know if this here is affected:
> 
> Yes, I think it is. You're clearly expecting the read to actually happen
> when you call get_hpsb_generation(). It's clearly not a busy-loop, so
> cpu_relax() sounds pointless / wrong solution for this case, so I'm now
> somewhat beginning to appreciate the motivation behind this series :-)

Nope, we're calling schedule which is a rather heavy-weight
barrier.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Herbert Xu

On Wed, Aug 15, 2007 at 07:25:16AM -0700, Paul E. McKenney wrote:
> 
> Do we really need another set of APIs?  Can you give even one example
> where the pre-existing volatile semantics are causing enough of a problem
> to justify adding yet more atomic_*() APIs?

Let's turn this around.  Can you give a single example where
the volatile semantics is needed in a legitimate way?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul E. McKenney

On Wed, Aug 15, 2007 at 11:33:36PM +0800, Herbert Xu wrote:
> On Wed, Aug 15, 2007 at 07:25:16AM -0700, Paul E. McKenney wrote:
> > 
> > Do we really need another set of APIs?  Can you give even one example
> > where the pre-existing volatile semantics are causing enough of a problem
> > to justify adding yet more atomic_*() APIs?
> 
> Let's turn this around.  Can you give a single example where
> the volatile semantics is needed in a legitimate way?

Sorry, but you are the one advocating for the change.

Nice try, though!  ;-)

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Stefan Richter

Herbert Xu wrote:
> On Wed, Aug 15, 2007 at 08:05:38PM +0530, Satyam Sharma wrote:
>>> I don't know if this here is affected:

[...something like]
b = atomic_read(a);
for (i = 0; i < 4; i++) {
msleep_interruptible(63);
c = atomic_read(a);
if (c != b) {
b = c;
i = 0;
}
}

> Nope, we're calling schedule which is a rather heavy-weight
> barrier.

How does the compiler know that msleep() has got barrier()s?
-- 
Stefan Richter
-=-=-=== =--- -
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Chris Snook


Herbert Xu wrote:

Chris Snook <[EMAIL PROTECTED]> wrote:
Because atomic operations are generally used for synchronization, which requires 
volatile behavior.  Most such codepaths currently use an inefficient barrier(). 
 Some forget to and we get bugs, because people assume that atomic_read() 
actually reads something, and atomic_write() actually writes something.  Worse, 
these are architecture-specific, even compiler version-specific bugs that are 
often difficult to track down.


I'm yet to see a single example from the current tree where
this patch series is the correct solution.  So far the only
example has been a buggy piece of code which has since been
fixed with a cpu_relax.


Part of the motivation here is to fix heisenbugs.  If I knew where they 
were, I'd be posting patches for them.  Unlike most bugs, where we want 
to expose them as obviously as possible, these can be extremely 
difficult to track down, and are often due to people assuming that the 
atomic_* operations have the same semantics they've historically had. 
Remember that until recently, all SMP architectures except s390 (which 
very few kernel developers outside of IBM, Red Hat, and SuSE do much 
work on) had volatile declarations for atomic_t.  Removing the volatile 
declarations from i386 and x86_64 may have created heisenbugs that won't 
manifest themselves until GCC 6.0 comes out and people start compiling 
kernels with -O5.  We should have consistent semantics for atomic_* 
operations.


The other motivation is to reduce the need for the barriers used to 
prevent/fix such problems which clobber all your registers, and instead 
force atomic_* operations to behave in the way they're actually used. 
After the (resubmitted) patchset is merged, we'll be able to remove a 
whole bunch of barriers, shrinking our source and our binaries, and 
improving performance.


-- Chris
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul E. McKenney

On Wed, Aug 15, 2007 at 06:09:35PM +0200, Stefan Richter wrote:
> Herbert Xu wrote:
> > On Wed, Aug 15, 2007 at 08:05:38PM +0530, Satyam Sharma wrote:
> >>> I don't know if this here is affected:
> 
> [...something like]
>   b = atomic_read(a);
>   for (i = 0; i < 4; i++) {
>   msleep_interruptible(63);
>   c = atomic_read(a);
>   if (c != b) {
>   b = c;
>   i = 0;
>   }
>   }
> 
> > Nope, we're calling schedule which is a rather heavy-weight
> > barrier.
> 
> How does the compiler know that msleep() has got barrier()s?

Because msleep_interruptible() is in a separate compilation unit,
the compiler has to assume that it might modify any arbitrary global.
In many cases, the compiler also has to assume that msleep_interruptible()
might call back into a function in the current compilation unit, thus
possibly modifying global static variables.

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma



On Wed, 15 Aug 2007, Paul E. McKenney wrote:

> On Wed, Aug 15, 2007 at 06:09:35PM +0200, Stefan Richter wrote:
> > Herbert Xu wrote:
> > > On Wed, Aug 15, 2007 at 08:05:38PM +0530, Satyam Sharma wrote:
> > >>> I don't know if this here is affected:
> > 
> > [...something like]
> > b = atomic_read(a);
> > for (i = 0; i < 4; i++) {
> > msleep_interruptible(63);
> > c = atomic_read(a);
> > if (c != b) {
> > b = c;
> > i = 0;
> > }
> > }
> > 
> > > Nope, we're calling schedule which is a rather heavy-weight
> > > barrier.
> > 
> > How does the compiler know that msleep() has got barrier()s?
> 
> Because msleep_interruptible() is in a separate compilation unit,
> the compiler has to assume that it might modify any arbitrary global.
> In many cases, the compiler also has to assume that msleep_interruptible()
> might call back into a function in the current compilation unit, thus
> possibly modifying global static variables.

Yup, I've just verified this with a testcase. So a call to any function
outside of the current compilation unit acts as a compiler barrier. Cool.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma

On Wed, 15 Aug 2007, Paul E. McKenney wrote:

> On Wed, Aug 15, 2007 at 11:33:36PM +0800, Herbert Xu wrote:
> > On Wed, Aug 15, 2007 at 07:25:16AM -0700, Paul E. McKenney wrote:
> > > 
> > > Do we really need another set of APIs?  Can you give even one example
> > > where the pre-existing volatile semantics are causing enough of a problem
> > > to justify adding yet more atomic_*() APIs?
> > 
> > Let's turn this around.  Can you give a single example where
> > the volatile semantics is needed in a legitimate way?
> 
> Sorry, but you are the one advocating for the change.

Not for i386 and x86_64 -- those have atomic ops without any "volatile"
semantics (currently as per existing definitions).
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma

On Wed, 15 Aug 2007, Paul E. McKenney wrote:

> On Wed, Aug 15, 2007 at 10:48:28PM +0530, Satyam Sharma wrote:
> > [...]
> > Not for i386 and x86_64 -- those have atomic ops without any "volatile"
> > semantics (currently as per existing definitions).
> 
> I claim unit volumes with arm, and the majority of the architectures, but
> I cannot deny the popularity of i386 and x86_64 with many developers.  ;-)

Hmm, does arm really need that "volatile int counter;"? Hopefully RMK will
take a patch removing that "volatile" ... ;-)

> However, I am not aware of code in the kernel that would benefit
> from the compiler coalescing multiple atomic_set() and atomic_read()
> invocations, thus I don't see the downside to volatility in this case.
> Are there some performance-critical code fragments that I am missing?

I don't know, and yes, code with multiple atomic_set's and atomic_read's
getting optimized or coalesced does sound strange to start with. Anyway,
I'm not against "volatile semantics" per se. As replied elsewhere, I do
appreciate the motivation behind this series (to _avoid_ gotchas, not to
fix existing ones). Just that I'd like to avoid using "volatile", for
aforementioned reasons, especially given that there are perfectly
reasonable alternatives to achieve the same desired behaviour.

Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma

Hi Paul,


On Wed, 15 Aug 2007, Paul E. McKenney wrote:

> On Wed, Aug 15, 2007 at 07:17:29PM +0530, Satyam Sharma wrote:
> > [...]
> > No, I'd actually prefer something like what Christoph Lameter suggested,
> > i.e. users (such as above) who want "volatile"-like behaviour from atomic
> > ops can use alternative functions. How about something like:
> > 
> > #define atomic_read_volatile(v) \
> > ({  \
> > forget(&(v)->counter);  \
> > ((v)->counter); \
> > })
> 
> Wouldn't the above "forget" the value, throw it away, then forget
> that it forgot it, giving non-volatile semantics?

Nope, I don't think so. I wrote the following trivial testcases:
[ See especially tp4.c and tp4.s (last example). ]

==
$ cat tp1.c # Using volatile access casts

#define atomic_read(a)  (*(volatile int *)&a)

int a;

void func(void)
{
a = 0;
while (atomic_read(a))
;
}
==
$ gcc -Os -S tp1.c; cat tp1.s

func:
pushl   %ebp
movl%esp, %ebp
movl$0, a
.L2:
movla, %eax
testl   %eax, %eax
jne .L2
popl%ebp
ret
==
$ cat tp2.c # Using nothing; gcc will optimize the whole loop away

#define forget(x)

#define atomic_read(a)  \
({  \
forget(&(a));   \
(a);\
})

int a;

void func(void)
{
a = 0;
while (atomic_read(a))
;
}
==
$ gcc -Os -S tp2.c; cat tp2.s

func:
pushl   %ebp
movl%esp, %ebp
popl%ebp
movl$0, a
ret
==
$ cat tp3.c # Using a full memory clobber barrier

#define forget(x)   asm volatile ("":::"memory")

#define atomic_read(a)  \
({  \
forget(&(a));   \
(a);\
})

int a;

void func(void)
{
a = 0;
while (atomic_read(a))
;
}
==
$ gcc -Os -S tp3.c; cat tp3.s

func:
pushl   %ebp
movl%esp, %ebp
movl$0, a
.L2:
cmpl$0, a
jne .L2
popl%ebp
ret
==
$ cat tp4.c # Using a forget(var) macro

#define forget(a)   __asm__ __volatile__ ("" :"=m" (a) :"m" (a))

#define atomic_read(a)  \
({  \
forget(a);  \
(a);\
})

int a;

void func(void)
{
a = 0;
while (atomic_read(a))
;
}
==
$ gcc -Os -S tp4.c; cat tp4.s

func:
pushl   %ebp
movl%esp, %ebp
movl$0, a
.L2:
cmpl$0, a
jne .L2
popl%ebp
ret
==


Possibly these were too trivial to expose any potential problems that you
may have been referring to, so would be helpful if you could write a more
concrete example / sample code.


> > Or possibly, implement these "volatile" atomic ops variants in inline asm
> > like the patch that Sebastian Siewior has submitted on another thread just
> > a while back.
> 
> Given that you are advocating a change (please keep in mind that
> atomic_read() and atomic_set() had volatile semantics on almost all
> platforms), care to give some example where these historical volatile
> semantics are causing a problem?
> [...]
> Can you give even one example
> where the pre-existing volatile semantics are causing enough of a problem
> to justify adding yet more atomic_*() APIs?

Will take this to the other sub-thread ...


> > Of course, if we find there are more callers in the kernel who want the
> > volatility behaviour than those who don't care, we can re-define the
> > existing ops to such variants, and re-name the existing definitions to
> > somethine else, say "atomic_read_nonvolatile" for all I care.
> 
> Do we really need another set of APIs?

Well, if there's one set of users who do care about volatile behaviour,
and another set that doesn't, it only sounds correct to provide both
those APIs, instead of forcing one behaviour on all users.


Thanks,
Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul E. McKenney

On Wed, Aug 15, 2007 at 10:48:28PM +0530, Satyam Sharma wrote:
> On Wed, 15 Aug 2007, Paul E. McKenney wrote:
> > On Wed, Aug 15, 2007 at 11:33:36PM +0800, Herbert Xu wrote:
> > > On Wed, Aug 15, 2007 at 07:25:16AM -0700, Paul E. McKenney wrote:
> > > > 
> > > > Do we really need another set of APIs?  Can you give even one example
> > > > where the pre-existing volatile semantics are causing enough of a 
> > > > problem
> > > > to justify adding yet more atomic_*() APIs?
> > > 
> > > Let's turn this around.  Can you give a single example where
> > > the volatile semantics is needed in a legitimate way?
> > 
> > Sorry, but you are the one advocating for the change.
> 
> Not for i386 and x86_64 -- those have atomic ops without any "volatile"
> semantics (currently as per existing definitions).

I claim unit volumes with arm, and the majority of the architectures, but
I cannot deny the popularity of i386 and x86_64 with many developers.  ;-)

However, I am not aware of code in the kernel that would benefit
from the compiler coalescing multiple atomic_set() and atomic_read()
invocations, thus I don't see the downside to volatility in this case.
Are there some performance-critical code fragments that I am missing?

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread David Howells

Herbert Xu <[EMAIL PROTECTED]> wrote:

> Let's turn this around.  Can you give a single example where
> the volatile semantics is needed in a legitimate way?

Accessing H/W registers?  But apart from that...

David
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul E. McKenney

On Wed, Aug 15, 2007 at 07:19:57PM +0100, David Howells wrote:
> Herbert Xu <[EMAIL PROTECTED]> wrote:
> 
> > Let's turn this around.  Can you give a single example where
> > the volatile semantics is needed in a legitimate way?
> 
> Accessing H/W registers?  But apart from that...

Communicating between process context and interrupt/NMI handlers using
per-CPU variables.

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool


"Volatile behaviour" itself isn't consistently defined (at least
definitely not consistently implemented in various gcc versions across
platforms),


It should be consistent across platforms; if not, file a bug please.


but it is /expected/ to mean something like: "ensure that
every such access actually goes all the way to memory, and is not
re-ordered w.r.t. to other accesses, as far as the compiler can take
care of these". The last "as far as compiler can take care" disclaimer
comes about due to CPUs doing their own re-ordering nowadays.


You can *expect* whatever you want, but this isn't in line with
reality at all.

volatile _does not_ make accesses go all the way to memory.
volatile _does not_ prevent reordering wrt other accesses.

What volatile does are a) never optimise away a read (or write)
to the object, since the data can change in ways the compiler
cannot see; and b) never move stores to the object across a
sequence point.  This does not mean other accesses cannot be
reordered wrt the volatile access.

If the abstract machine would do an access to a volatile-
qualified object, the generated machine code will do that
access too.  But, for example, it can still be optimised
away by the compiler, if it can prove it is allowed to.

If you want stuff to go all the way to memory, you need some
architecture-specific flush sequence; to make a store globally
visible before another store, you need mb(); before some following
read, you need mb(); to prevent reordering you need a barrier.


Segher

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul E. McKenney

On Wed, Aug 15, 2007 at 08:31:25PM +0200, Segher Boessenkool wrote:
> >>How does the compiler know that msleep() has got barrier()s?
> >
> >Because msleep_interruptible() is in a separate compilation unit,
> >the compiler has to assume that it might modify any arbitrary global.
> 
> No; compilation units have nothing to do with it, GCC can optimise
> across compilation unit boundaries just fine, if you tell it to
> compile more than one compilation unit at once.

Last I checked, the Linux kernel build system did compile each .c file
as a separate compilation unit.

> What you probably mean is that the compiler has to assume any code
> it cannot currently see can do anything (insofar as allowed by the
> relevant standards etc.)

Indeed.

> >In many cases, the compiler also has to assume that 
> >msleep_interruptible()
> >might call back into a function in the current compilation unit, thus
> >possibly modifying global static variables.
> 
> It most often is smart enough to see what compilation-unit-local
> variables might be modified that way, though :-)

Yep.  For example, if it knows the current value of a given such local
variable, and if all code paths that would change some other variable
cannot be reached given that current value of the first variable.
At least given that gcc doesn't know about multiple threads of execution!

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul E. McKenney

On Wed, Aug 15, 2007 at 11:25:05PM +0530, Satyam Sharma wrote:
> Hi Paul,
> On Wed, 15 Aug 2007, Paul E. McKenney wrote:
> 
> > On Wed, Aug 15, 2007 at 07:17:29PM +0530, Satyam Sharma wrote:
> > > [...]
> > > No, I'd actually prefer something like what Christoph Lameter suggested,
> > > i.e. users (such as above) who want "volatile"-like behaviour from atomic
> > > ops can use alternative functions. How about something like:
> > > 
> > > #define atomic_read_volatile(v)   \
> > >   ({  \
> > >   forget(&(v)->counter);  \
> > >   ((v)->counter); \
> > >   })
> > 
> > Wouldn't the above "forget" the value, throw it away, then forget
> > that it forgot it, giving non-volatile semantics?
> 
> Nope, I don't think so. I wrote the following trivial testcases:
> [ See especially tp4.c and tp4.s (last example). ]

Right.  I should have said "wouldn't the compiler be within its rights
to forget the value, throw it away, then forget that it forgot it".
The value coming out of the #define above is an unadorned ((v)->counter),
which has no volatile semantics.

> ==
> $ cat tp1.c # Using volatile access casts
> 
> #define atomic_read(a)(*(volatile int *)&a)
> 
> int a;
> 
> void func(void)
> {
>   a = 0;
>   while (atomic_read(a))
>   ;
> }
> ==
> $ gcc -Os -S tp1.c; cat tp1.s
> 
> func:
>   pushl   %ebp
>   movl%esp, %ebp
>   movl$0, a
> .L2:
>   movla, %eax
>   testl   %eax, %eax
>   jne .L2
>   popl%ebp
>   ret
> ==
> $ cat tp2.c # Using nothing; gcc will optimize the whole loop away
> 
> #define forget(x)
> 
> #define atomic_read(a)\
>   ({  \
>   forget(&(a));   \
>   (a);\
>   })
> 
> int a;
> 
> void func(void)
> {
>   a = 0;
>   while (atomic_read(a))
>   ;
> }
> ==
> $ gcc -Os -S tp2.c; cat tp2.s
> 
> func:
>   pushl   %ebp
>   movl%esp, %ebp
>   popl%ebp
>   movl$0, a
>   ret
> ==
> $ cat tp3.c # Using a full memory clobber barrier
> 
> #define forget(x) asm volatile ("":::"memory")
> 
> #define atomic_read(a)\
>   ({  \
>   forget(&(a));   \
>   (a);\
>   })
> 
> int a;
> 
> void func(void)
> {
>   a = 0;
>   while (atomic_read(a))
>   ;
> }
> ==
> $ gcc -Os -S tp3.c; cat tp3.s
> 
> func:
>   pushl   %ebp
>   movl%esp, %ebp
>   movl$0, a
> .L2:
>   cmpl$0, a
>   jne .L2
>   popl%ebp
>   ret
> ==
> $ cat tp4.c # Using a forget(var) macro
> 
> #define forget(a) __asm__ __volatile__ ("" :"=m" (a) :"m" (a))
> 
> #define atomic_read(a)\
>   ({  \
>   forget(a);  \
>   (a);\
>   })
> 
> int a;
> 
> void func(void)
> {
>   a = 0;
>   while (atomic_read(a))
>   ;
> }
> ==
> $ gcc -Os -S tp4.c; cat tp4.s
> 
> func:
>   pushl   %ebp
>   movl%esp, %ebp
>   movl$0, a
> .L2:
>   cmpl$0, a
>   jne .L2
>   popl%ebp
>   ret
> ==
> 
> Possibly these were too trivial to expose any potential problems that you
> may have been referring to, so would be helpful if you could write a more
> concrete example / sample code.

The trick is to have a sufficiently complicated expression to force
the compiler to run out of registers.  If the value is non-volatile,
it will refetch it (and expect it not to have changed, possibly being
disappointed by an interrupt handler running on that same CPU).

> > > Or possibly, implement these "volatile" atomic ops variants in inline asm
> > > like the patch that Sebastian Siewior has submitted on another thread just
> > > a while back.
> > 
> > Given that you are advocating a change (please keep in mind that
> > atomic_read() and atomic_set() had volatile semantics on almost all
> > platforms), care to give some example where these historical volatile
> > semantics are causing a problem?
> > [...]
> > Can you give even one example
> > where the pre-existing volatile semantics are causing enough of a problem
> > to justify adding ye

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma



On Wed, 15 Aug 2007, Segher Boessenkool wrote:

> > "Volatile behaviour" itself isn't consistently defined (at least
> > definitely not consistently implemented in various gcc versions across
> > platforms),
> 
> It should be consistent across platforms; if not, file a bug please.
> 
> > but it is /expected/ to mean something like: "ensure that
> > every such access actually goes all the way to memory, and is not
> > re-ordered w.r.t. to other accesses, as far as the compiler can take
  ^
  (volatile)

(or, alternatively, "other accesses to the same volatile object" ...)

> > care of these". The last "as far as compiler can take care" disclaimer
> > comes about due to CPUs doing their own re-ordering nowadays.
> 
> You can *expect* whatever you want, but this isn't in line with
> reality at all.
> 
> volatile _does not_ prevent reordering wrt other accesses.
> [...]
> What volatile does are a) never optimise away a read (or write)
> to the object, since the data can change in ways the compiler
> cannot see; and b) never move stores to the object across a
> sequence point.  This does not mean other accesses cannot be
> reordered wrt the volatile access.
> 
> If the abstract machine would do an access to a volatile-
> qualified object, the generated machine code will do that
> access too.  But, for example, it can still be optimised
> away by the compiler, if it can prove it is allowed to.

As (now) indicated above, I had meant multiple volatile accesses to
the same object, obviously.

BTW:

#define atomic_read(a)  (*(volatile int *)&(a))
#define atomic_set(a,i) (*(volatile int *)&(a) = (i))

int a;

void func(void)
{
int b;

b = atomic_read(a);
atomic_set(a, 20);
b = atomic_read(a);
}

gives:

func:
pushl   %ebp
movla, %eax
movl%esp, %ebp
movl$20, a
movla, %eax
popl%ebp
ret

so the first atomic_read() wasn't optimized away.


> volatile _does not_ make accesses go all the way to memory.
> [...]
> If you want stuff to go all the way to memory, you need some
> architecture-specific flush sequence; to make a store globally
> visible before another store, you need mb(); before some following
> read, you need mb(); to prevent reordering you need a barrier.

Sure, which explains the "as far as the compiler can take care" bit.
Poor phrase / choice of words, probably.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma

[ The Cc: list scares me. Should probably trim it. ]

On Wed, 15 Aug 2007, Paul E. McKenney wrote:

> On Wed, Aug 15, 2007 at 08:31:25PM +0200, Segher Boessenkool wrote:
> > >>How does the compiler know that msleep() has got barrier()s?
> > >
> > >Because msleep_interruptible() is in a separate compilation unit,
> > >the compiler has to assume that it might modify any arbitrary global.
> > 
> > No; compilation units have nothing to do with it, GCC can optimise
> > across compilation unit boundaries just fine, if you tell it to
> > compile more than one compilation unit at once.
> 
> Last I checked, the Linux kernel build system did compile each .c file
> as a separate compilation unit.
> 
> > What you probably mean is that the compiler has to assume any code
> > it cannot currently see can do anything (insofar as allowed by the
> > relevant standards etc.)

I think this was just terminology confusion here again. Isn't "any code
that it cannot currently see" the same as "another compilation unit",
and wouldn't the "compilation unit" itself expand if we ask gcc to
compile more than one unit at once? Or is there some more specific
"definition" for "compilation unit" (in gcc lingo, possibly?)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul E. McKenney

On Thu, Aug 16, 2007 at 01:24:42AM +0530, Satyam Sharma wrote:
> [ The Cc: list scares me. Should probably trim it. ]

Trim away!  ;-)

> On Wed, 15 Aug 2007, Paul E. McKenney wrote:
> 
> > On Wed, Aug 15, 2007 at 08:31:25PM +0200, Segher Boessenkool wrote:
> > > >>How does the compiler know that msleep() has got barrier()s?
> > > >
> > > >Because msleep_interruptible() is in a separate compilation unit,
> > > >the compiler has to assume that it might modify any arbitrary global.
> > > 
> > > No; compilation units have nothing to do with it, GCC can optimise
> > > across compilation unit boundaries just fine, if you tell it to
> > > compile more than one compilation unit at once.
> > 
> > Last I checked, the Linux kernel build system did compile each .c file
> > as a separate compilation unit.
> > 
> > > What you probably mean is that the compiler has to assume any code
> > > it cannot currently see can do anything (insofar as allowed by the
> > > relevant standards etc.)
> 
> I think this was just terminology confusion here again. Isn't "any code
> that it cannot currently see" the same as "another compilation unit",
> and wouldn't the "compilation unit" itself expand if we ask gcc to
> compile more than one unit at once? Or is there some more specific
> "definition" for "compilation unit" (in gcc lingo, possibly?)

This is indeed my understanding -- "compilation unit" is whatever the
compiler looks at in one go.  I have heard the word "module" used for
the minimal compilation unit covering a single .c file and everything
that it #includes, but there might be a better name for this.

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Christoph Lameter

On Wed, 15 Aug 2007, Stefan Richter wrote:

> LDD3 says on page 125:  "The following operations are defined for the
> type [atomic_t] and are guaranteed to be atomic with respect to all
> processors of an SMP computer."
> 
> Doesn't "atomic WRT all processors" require volatility?

Atomic operations only require exclusive access to the cacheline while the 
value is modified.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool


How does the compiler know that msleep() has got barrier()s?


Because msleep_interruptible() is in a separate compilation unit,
the compiler has to assume that it might modify any arbitrary global.


No; compilation units have nothing to do with it, GCC can optimise
across compilation unit boundaries just fine, if you tell it to
compile more than one compilation unit at once.

What you probably mean is that the compiler has to assume any code
it cannot currently see can do anything (insofar as allowed by the
relevant standards etc.)

In many cases, the compiler also has to assume that 
msleep_interruptible()

might call back into a function in the current compilation unit, thus
possibly modifying global static variables.


It most often is smart enough to see what compilation-unit-local
variables might be modified that way, though :-)


Segher

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool

I think this was just terminology confusion here again. Isn't "any 
code

that it cannot currently see" the same as "another compilation unit",
and wouldn't the "compilation unit" itself expand if we ask gcc to
compile more than one unit at once? Or is there some more specific
"definition" for "compilation unit" (in gcc lingo, possibly?)


This is indeed my understanding -- "compilation unit" is whatever the
compiler looks at in one go.  I have heard the word "module" used for
the minimal compilation unit covering a single .c file and everything
that it #includes, but there might be a better name for this.


Yes, that's what's called "compilation unit" :-)

[/me double checks]

Erm, the C standard actually calls it "translation unit".

To be exact, to avoid any more confusion:

5.1.1.1/1:
A C program need not all be translated at the same time. The
text of the program is kept in units called source files, (or
preprocessing files) in this International Standard. A source
file together with all the headers and source files included
via the preprocessing directive #include is known as a
preprocessing translation unit. After preprocessing, a
preprocessing translation unit is called a translation unit.



Segher

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool


What volatile does are a) never optimise away a read (or write)
to the object, since the data can change in ways the compiler
cannot see; and b) never move stores to the object across a
sequence point.  This does not mean other accesses cannot be
reordered wrt the volatile access.

If the abstract machine would do an access to a volatile-
qualified object, the generated machine code will do that
access too.  But, for example, it can still be optimised
away by the compiler, if it can prove it is allowed to.


As (now) indicated above, I had meant multiple volatile accesses to
the same object, obviously.


Yes, accesses to volatile objects are never reordered with
respect to each other.


BTW:

#define atomic_read(a)  (*(volatile int *)&(a))
#define atomic_set(a,i) (*(volatile int *)&(a) = (i))

int a;

void func(void)
{
int b;

b = atomic_read(a);
atomic_set(a, 20);
b = atomic_read(a);
}

gives:

func:
pushl   %ebp
movla, %eax
movl%esp, %ebp
movl$20, a
movla, %eax
popl%ebp
ret

so the first atomic_read() wasn't optimized away.


Of course.  It is executed by the abstract machine, so
it will be executed by the actual machine.  On the other
hand, try

b = 0;
if (b)
b = atomic_read(a);

or similar.


Segher

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool


What you probably mean is that the compiler has to assume any code
it cannot currently see can do anything (insofar as allowed by the
relevant standards etc.)


I think this was just terminology confusion here again. Isn't "any code
that it cannot currently see" the same as "another compilation unit",


It is not; try  gcc -combine  or the upcoming link-time optimisation
stuff, for example.


and wouldn't the "compilation unit" itself expand if we ask gcc to
compile more than one unit at once? Or is there some more specific
"definition" for "compilation unit" (in gcc lingo, possibly?)


"compilation unit" is a C standard term.  It typically boils down
to "single .c file".


Segher

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool

Of course, if we find there are more callers in the kernel who want 
the

volatility behaviour than those who don't care, we can re-define the
existing ops to such variants, and re-name the existing definitions 
to

somethine else, say "atomic_read_nonvolatile" for all I care.


Do we really need another set of APIs?


Well, if there's one set of users who do care about volatile behaviour,
and another set that doesn't, it only sounds correct to provide both
those APIs, instead of forcing one behaviour on all users.


But since there currently is only one such API, and there are
users expecting the stronger behaviour, the only sane thing to
do is let the API provide that behaviour.  You can always add
a new API with weaker behaviour later, and move users that are
okay with it over to that new API.


Segher

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool


No; compilation units have nothing to do with it, GCC can optimise
across compilation unit boundaries just fine, if you tell it to
compile more than one compilation unit at once.


Last I checked, the Linux kernel build system did compile each .c file
as a separate compilation unit.


I have some patches to use -combine -fwhole-program for Linux.
Highly experimental, you need a patched bleeding edge toolchain.
If there's interest I'll clean it up and put it online.

David Woodhouse had some similar patches about a year ago.


In many cases, the compiler also has to assume that
msleep_interruptible()
might call back into a function in the current compilation unit, thus
possibly modifying global static variables.


It most often is smart enough to see what compilation-unit-local
variables might be modified that way, though :-)


Yep.  For example, if it knows the current value of a given such local
variable, and if all code paths that would change some other variable
cannot be reached given that current value of the first variable.


Or the most common thing: if neither the address of the translation-
unit local variable nor the address of any function writing to that
variable can "escape" from that translation unit, nothing outside
the translation unit can write to the variable.

At least given that gcc doesn't know about multiple threads of 
execution!


Heh, only about the threads it creates itself (not relevant to
the kernel, for sure :-) )


Segher

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool

Possibly these were too trivial to expose any potential problems that 
you
may have been referring to, so would be helpful if you could write a 
more

concrete example / sample code.


The trick is to have a sufficiently complicated expression to force
the compiler to run out of registers.


You can use -ffixed-XXX to keep the testcase simple.


Segher

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul E. McKenney

On Wed, Aug 15, 2007 at 10:52:53PM +0200, Segher Boessenkool wrote:
> >>I think this was just terminology confusion here again. Isn't "any 
> >>code
> >>that it cannot currently see" the same as "another compilation unit",
> >>and wouldn't the "compilation unit" itself expand if we ask gcc to
> >>compile more than one unit at once? Or is there some more specific
> >>"definition" for "compilation unit" (in gcc lingo, possibly?)
> >
> >This is indeed my understanding -- "compilation unit" is whatever the
> >compiler looks at in one go.  I have heard the word "module" used for
> >the minimal compilation unit covering a single .c file and everything
> >that it #includes, but there might be a better name for this.
> 
> Yes, that's what's called "compilation unit" :-)
> 
> [/me double checks]
> 
> Erm, the C standard actually calls it "translation unit".
> 
> To be exact, to avoid any more confusion:
> 
> 5.1.1.1/1:
> A C program need not all be translated at the same time. The
> text of the program is kept in units called source files, (or
> preprocessing files) in this International Standard. A source
> file together with all the headers and source files included
> via the preprocessing directive #include is known as a
> preprocessing translation unit. After preprocessing, a
> preprocessing translation unit is called a translation unit.

I am OK with "translation" and "compilation" being near-synonyms.  ;-)

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul E. McKenney

On Wed, Aug 15, 2007 at 11:05:35PM +0200, Segher Boessenkool wrote:
> >>No; compilation units have nothing to do with it, GCC can optimise
> >>across compilation unit boundaries just fine, if you tell it to
> >>compile more than one compilation unit at once.
> >
> >Last I checked, the Linux kernel build system did compile each .c file
> >as a separate compilation unit.
> 
> I have some patches to use -combine -fwhole-program for Linux.
> Highly experimental, you need a patched bleeding edge toolchain.
> If there's interest I'll clean it up and put it online.
> 
> David Woodhouse had some similar patches about a year ago.

Sounds exciting...  ;-)

> >>>In many cases, the compiler also has to assume that
> >>>msleep_interruptible()
> >>>might call back into a function in the current compilation unit, thus
> >>>possibly modifying global static variables.
> >>
> >>It most often is smart enough to see what compilation-unit-local
> >>variables might be modified that way, though :-)
> >
> >Yep.  For example, if it knows the current value of a given such local
> >variable, and if all code paths that would change some other variable
> >cannot be reached given that current value of the first variable.
> 
> Or the most common thing: if neither the address of the translation-
> unit local variable nor the address of any function writing to that
> variable can "escape" from that translation unit, nothing outside
> the translation unit can write to the variable.

But there is usually at least one externally callable function in
a .c file.

> >At least given that gcc doesn't know about multiple threads of 
> >execution!
> 
> Heh, only about the threads it creates itself (not relevant to
> the kernel, for sure :-) )

;-)

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul Mackerras

Satyam Sharma writes:

> > Doesn't "atomic WRT all processors" require volatility?
> 
> No, it definitely doesn't. Why should it?
> 
> "Atomic w.r.t. all processors" is just your normal, simple "atomicity"
> for SMP systems (ensure that that object is modified / set / replaced
> in main memory atomically) and has nothing to do with "volatile"
> behaviour.

Atomic variables are "volatile" in the sense that they are liable to
be changed at any time by mechanisms that are outside the knowledge of
the C compiler, namely, other CPUs, or this CPU executing an interrupt
routine.

In the kernel we use atomic variables in precisely those situations
where a variable is potentially accessed concurrently by multiple
CPUs, and where each CPU needs to see updates done by other CPUs in a
timely fashion.  That is what they are for.  Therefore the compiler
must not cache values of atomic variables in registers; each
atomic_read must result in a load and each atomic_set must result in a
store.  Anything else will just lead to subtle bugs.

I have no strong opinion about whether or not the best way to achieve
this is through the use of the "volatile" C keyword.  Segher's idea of
using asm instead seems like a good one to me.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Herbert Xu

On Wed, Aug 15, 2007 at 11:45:20AM -0700, Paul E. McKenney wrote:
> On Wed, Aug 15, 2007 at 07:19:57PM +0100, David Howells wrote:
> > Herbert Xu <[EMAIL PROTECTED]> wrote:
> > 
> > > Let's turn this around.  Can you give a single example where
> > > the volatile semantics is needed in a legitimate way?
> > 
> > Accessing H/W registers?  But apart from that...
> 
> Communicating between process context and interrupt/NMI handlers using
> per-CPU variables.

Remeber we're talking about atomic_read/atomic_set.  Please
cite the actual file/function name you have in mind.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul E. McKenney

On Thu, Aug 16, 2007 at 07:40:21AM +0800, Herbert Xu wrote:
> On Wed, Aug 15, 2007 at 12:13:12PM -0400, Chris Snook wrote:
> >
> > Part of the motivation here is to fix heisenbugs.  If I knew where they 
> 
> By the same token we should probably disable optimisations
> altogether since that too can create heisenbugs.

Precisely the point -- use of volatile (whether in casts or on asms)
in these cases are intended to disable those optimizations likely to
result in heisenbugs.  But they are also intended to leave other
valuable optimizations in force.

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul E. McKenney

On Thu, Aug 16, 2007 at 07:41:46AM +0800, Herbert Xu wrote:
> On Wed, Aug 15, 2007 at 11:45:20AM -0700, Paul E. McKenney wrote:
> > On Wed, Aug 15, 2007 at 07:19:57PM +0100, David Howells wrote:
> > > Herbert Xu <[EMAIL PROTECTED]> wrote:
> > > 
> > > > Let's turn this around.  Can you give a single example where
> > > > the volatile semantics is needed in a legitimate way?
> > > 
> > > Accessing H/W registers?  But apart from that...
> > 
> > Communicating between process context and interrupt/NMI handlers using
> > per-CPU variables.
> 
> Remeber we're talking about atomic_read/atomic_set.  Please
> cite the actual file/function name you have in mind.

Yep, we are indeed talking about atomic_read()/atomic_set().

We have been through this issue already in this thread.

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Herbert Xu

On Wed, Aug 15, 2007 at 12:13:12PM -0400, Chris Snook wrote:
>
> Part of the motivation here is to fix heisenbugs.  If I knew where they 

By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Herbert Xu

On Wed, Aug 15, 2007 at 04:53:35PM -0700, Paul E. McKenney wrote:
>
> > > Communicating between process context and interrupt/NMI handlers using
> > > per-CPU variables.
> > 
> > Remeber we're talking about atomic_read/atomic_set.  Please
> > cite the actual file/function name you have in mind.
> 
> Yep, we are indeed talking about atomic_read()/atomic_set().
> 
> We have been through this issue already in this thread.

Sorry, but I must've missed it.  Could you cite the file or
function for my benefit?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul E. McKenney

On Thu, Aug 16, 2007 at 08:12:48AM +0800, Herbert Xu wrote:
> On Wed, Aug 15, 2007 at 04:53:35PM -0700, Paul E. McKenney wrote:
> >
> > > > Communicating between process context and interrupt/NMI handlers using
> > > > per-CPU variables.
> > > 
> > > Remeber we're talking about atomic_read/atomic_set.  Please
> > > cite the actual file/function name you have in mind.
> > 
> > Yep, we are indeed talking about atomic_read()/atomic_set().
> > 
> > We have been through this issue already in this thread.
> 
> Sorry, but I must've missed it.  Could you cite the file or
> function for my benefit?

I might summarize the thread if there is interest, but I am not able to
do so right this minute.

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Christoph Lameter

On Thu, 16 Aug 2007, Paul Mackerras wrote:

> In the kernel we use atomic variables in precisely those situations
> where a variable is potentially accessed concurrently by multiple
> CPUs, and where each CPU needs to see updates done by other CPUs in a
> timely fashion.  That is what they are for.  Therefore the compiler
> must not cache values of atomic variables in registers; each
> atomic_read must result in a load and each atomic_set must result in a
> store.  Anything else will just lead to subtle bugs.

This may have been the intend. However, today the visibility is controlled 
using barriers. And we have barriers that we use with atomic operations. 
Having volatile be the default just lead to confusion. Atomic read should 
just read with no extras. Extras can be added by using variants like 
atomic_read_volatile or so.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Herbert Xu

On Wed, Aug 15, 2007 at 05:23:10PM -0700, Paul E. McKenney wrote:
> On Thu, Aug 16, 2007 at 08:12:48AM +0800, Herbert Xu wrote:
> > On Wed, Aug 15, 2007 at 04:53:35PM -0700, Paul E. McKenney wrote:
> > >
> > > > > Communicating between process context and interrupt/NMI handlers using
> > > > > per-CPU variables.
> > > > 
> > > > Remeber we're talking about atomic_read/atomic_set.  Please
> > > > cite the actual file/function name you have in mind.
> > > 
> > > Yep, we are indeed talking about atomic_read()/atomic_set().
> > > 
> > > We have been through this issue already in this thread.
> > 
> > Sorry, but I must've missed it.  Could you cite the file or
> > function for my benefit?
> 
> I might summarize the thread if there is interest, but I am not able to
> do so right this minute.

Thanks.  But I don't need a summary of the thread, I'm asking
for an extant code snippet in our kernel that benefits from
the volatile change and is not part of a busy-wait.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul Mackerras

Christoph Lameter writes:

> On Thu, 16 Aug 2007, Paul Mackerras wrote:
> 
> > In the kernel we use atomic variables in precisely those situations
> > where a variable is potentially accessed concurrently by multiple
> > CPUs, and where each CPU needs to see updates done by other CPUs in a
> > timely fashion.  That is what they are for.  Therefore the compiler
> > must not cache values of atomic variables in registers; each
> > atomic_read must result in a load and each atomic_set must result in a
> > store.  Anything else will just lead to subtle bugs.
> 
> This may have been the intend. However, today the visibility is controlled 
> using barriers. And we have barriers that we use with atomic operations. 

Those barriers are for when we need ordering between atomic variables
and other memory locations.  An atomic variable by itself doesn't and
shouldn't need any barriers for other CPUs to be able to see what's
happening to it.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Christoph Lameter

On Thu, 16 Aug 2007, Paul Mackerras wrote:

> Those barriers are for when we need ordering between atomic variables
> and other memory locations.  An atomic variable by itself doesn't and
> shouldn't need any barriers for other CPUs to be able to see what's
> happening to it.

It does not need any barriers. As soon as one cpu acquires the 
cacheline for write it will be invalidated in the caches of the others. So 
the other cpu will have to refetch. No need for volatile.

The issue here may be that the compiler has fetched the atomic variable 
earlier and put it into a register. However, that prefetching is limited 
because it cannot cross functions calls etc. The only problem could be 
loops where the compiler does not refetch the variable since it assumes 
that it does not change and there are no function calls in the body of the 
loop. But AFAIK these loops need cpu_relax and other measures anyways to 
avoid bad effects from busy waiting.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Christoph Lameter

On Wed, 15 Aug 2007, Paul E. McKenney wrote:

> Seems to me that we face greater chance of confusion without the
> volatile than with, particularly as compiler optimizations become
> more aggressive.  Yes, we could simply disable optimization, but
> optimization can be quite helpful.

A volatile default would disable optimizations for atomic_read. 
atomic_read without volatile would allow for full optimization by the 
compiler. Seems that this is what one wants in many cases.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma

[ Sorry for empty subject line in previous mail. I intended to make
  a patch so cleared it to change it, but ultimately neither made
  a patch nor restored subject line. Done that now. ]

On Thu, 16 Aug 2007, Herbert Xu wrote:

> On Thu, Aug 16, 2007 at 06:06:00AM +0530, Satyam Sharma wrote:
> > 
> > that are:
> > 
> > while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
> > mdelay(1);
> > msecs--;
> > }
> > 
> > where mdelay() becomes __const_udelay() which happens to be in another
> > translation unit (arch/i386/lib/delay.c) and hence saves this callsite
> > from being a bug :-)
> 
> The udelay itself certainly should have some form of cpu_relax in it.

Yes, a form of barrier() must be present in mdelay() or udelay() itself
as you say, having it in __const_udelay() is *not* enough (superflous
actually, considering it is already a separate translation unit and
invisible to the compiler).

However, there are no compiler barriers on the macro-definition-path
between mdelay(1) and __const_udelay(), so the only thing that saves us
from being a bug here is indeed the different-translation-unit concept.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul E. McKenney

On Thu, Aug 16, 2007 at 08:30:23AM +0800, Herbert Xu wrote:
> On Wed, Aug 15, 2007 at 05:23:10PM -0700, Paul E. McKenney wrote:
> > On Thu, Aug 16, 2007 at 08:12:48AM +0800, Herbert Xu wrote:
> > > On Wed, Aug 15, 2007 at 04:53:35PM -0700, Paul E. McKenney wrote:
> > > >
> > > > > > Communicating between process context and interrupt/NMI handlers 
> > > > > > using
> > > > > > per-CPU variables.
> > > > > 
> > > > > Remeber we're talking about atomic_read/atomic_set.  Please
> > > > > cite the actual file/function name you have in mind.
> > > > 
> > > > Yep, we are indeed talking about atomic_read()/atomic_set().
> > > > 
> > > > We have been through this issue already in this thread.
> > > 
> > > Sorry, but I must've missed it.  Could you cite the file or
> > > function for my benefit?
> > 
> > I might summarize the thread if there is interest, but I am not able to
> > do so right this minute.
> 
> Thanks.  But I don't need a summary of the thread, I'm asking
> for an extant code snippet in our kernel that benefits from
> the volatile change and is not part of a busy-wait.

Sorry, can't help you there.  I really do believe that the information
you need (as opposed to the specific item you are asking for) really
has been put forth in this thread.

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Herbert Xu

On Thu, Aug 16, 2007 at 06:28:42AM +0530, Satyam Sharma wrote:
>
> > The udelay itself certainly should have some form of cpu_relax in it.
> 
> Yes, a form of barrier() must be present in mdelay() or udelay() itself
> as you say, having it in __const_udelay() is *not* enough (superflous
> actually, considering it is already a separate translation unit and
> invisible to the compiler).

As long as __const_udelay does something which has the same
effect as barrier it is enough even if it's in the same unit.
As a matter of fact it does on i386 where __delay either uses
rep_nop or asm/volatile.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Herbert Xu

On Wed, Aug 15, 2007 at 05:49:50PM -0700, Paul E. McKenney wrote:
> On Thu, Aug 16, 2007 at 08:30:23AM +0800, Herbert Xu wrote:
>
> > Thanks.  But I don't need a summary of the thread, I'm asking
> > for an extant code snippet in our kernel that benefits from
> > the volatile change and is not part of a busy-wait.
> 
> Sorry, can't help you there.  I really do believe that the information
> you need (as opposed to the specific item you are asking for) really
> has been put forth in this thread.

That only leads me to believe that such a code snippet simply
does not exist.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul E. McKenney

On Wed, Aug 15, 2007 at 05:42:07PM -0700, Christoph Lameter wrote:
> On Wed, 15 Aug 2007, Paul E. McKenney wrote:
> 
> > Seems to me that we face greater chance of confusion without the
> > volatile than with, particularly as compiler optimizations become
> > more aggressive.  Yes, we could simply disable optimization, but
> > optimization can be quite helpful.
> 
> A volatile default would disable optimizations for atomic_read. 
> atomic_read without volatile would allow for full optimization by the 
> compiler. Seems that this is what one wants in many cases.

The volatile cast should not disable all that many optimizations,
for example, it is much less hurtful than barrier().  Furthermore,
the main optimizations disabled (pulling atomic_read() and atomic_set()
out of loops) really do need to be disabled.

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul E. McKenney

On Wed, Aug 15, 2007 at 05:26:34PM -0700, Christoph Lameter wrote:
> On Thu, 16 Aug 2007, Paul Mackerras wrote:
> 
> > In the kernel we use atomic variables in precisely those situations
> > where a variable is potentially accessed concurrently by multiple
> > CPUs, and where each CPU needs to see updates done by other CPUs in a
> > timely fashion.  That is what they are for.  Therefore the compiler
> > must not cache values of atomic variables in registers; each
> > atomic_read must result in a load and each atomic_set must result in a
> > store.  Anything else will just lead to subtle bugs.
> 
> This may have been the intend. However, today the visibility is controlled 
> using barriers. And we have barriers that we use with atomic operations. 
> Having volatile be the default just lead to confusion. Atomic read should 
> just read with no extras. Extras can be added by using variants like 
> atomic_read_volatile or so.

Seems to me that we face greater chance of confusion without the
volatile than with, particularly as compiler optimizations become
more aggressive.  Yes, we could simply disable optimization, but
optimization can be quite helpful.

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma

On Wed, 15 Aug 2007, Segher Boessenkool wrote:

> [...]
> > BTW:
> > 
> > #define atomic_read(a)  (*(volatile int *)&(a))
> > #define atomic_set(a,i) (*(volatile int *)&(a) = (i))
> > 
> > int a;
> > 
> > void func(void)
> > {
> > int b;
> > 
> > b = atomic_read(a);
> > atomic_set(a, 20);
> > b = atomic_read(a);
> > }
> > 
> > gives:
> > 
> > func:
> > pushl   %ebp
> > movla, %eax
> > movl%esp, %ebp
> > movl$20, a
> > movla, %eax
> > popl%ebp
> > ret
> > 
> > so the first atomic_read() wasn't optimized away.
> 
> Of course.  It is executed by the abstract machine, so
> it will be executed by the actual machine.  On the other
> hand, try
> 
>   b = 0;
>   if (b)
>   b = atomic_read(a);
> 
> or similar.

Yup, obviously. Volatile accesses (or any access to volatile objects),
or even "__volatile__ asms" (which gcc normally promises never to elid)
can always be optimized for cases such as these where the compiler can
trivially determine that the code in question is not reachable.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool


No; compilation units have nothing to do with it, GCC can optimise
across compilation unit boundaries just fine, if you tell it to
compile more than one compilation unit at once.


Last I checked, the Linux kernel build system did compile each .c 
file

as a separate compilation unit.


I have some patches to use -combine -fwhole-program for Linux.
Highly experimental, you need a patched bleeding edge toolchain.
If there's interest I'll clean it up and put it online.

David Woodhouse had some similar patches about a year ago.


Sounds exciting...  ;-)


Yeah, the breakage is *quite* spectacular :-)


In many cases, the compiler also has to assume that
msleep_interruptible()
might call back into a function in the current compilation unit, 
thus

possibly modifying global static variables.


It most often is smart enough to see what compilation-unit-local
variables might be modified that way, though :-)


Yep.  For example, if it knows the current value of a given such 
local

variable, and if all code paths that would change some other variable
cannot be reached given that current value of the first variable.


Or the most common thing: if neither the address of the translation-
unit local variable nor the address of any function writing to that
variable can "escape" from that translation unit, nothing outside
the translation unit can write to the variable.


But there is usually at least one externally callable function in
a .c file.


Of course, but often none of those will (indirectly) write a certain
static variable.


Segher

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul E. McKenney

On Thu, Aug 16, 2007 at 08:53:16AM +0800, Herbert Xu wrote:
> On Wed, Aug 15, 2007 at 05:49:50PM -0700, Paul E. McKenney wrote:
> > On Thu, Aug 16, 2007 at 08:30:23AM +0800, Herbert Xu wrote:
> >
> > > Thanks.  But I don't need a summary of the thread, I'm asking
> > > for an extant code snippet in our kernel that benefits from
> > > the volatile change and is not part of a busy-wait.
> > 
> > Sorry, can't help you there.  I really do believe that the information
> > you need (as opposed to the specific item you are asking for) really
> > has been put forth in this thread.
> 
> That only leads me to believe that such a code snippet simply
> does not exist.

Whatever...

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma

Hi Herbert,


On Thu, 16 Aug 2007, Herbert Xu wrote:

> On Thu, Aug 16, 2007 at 06:28:42AM +0530, Satyam Sharma wrote:
> >
> > > The udelay itself certainly should have some form of cpu_relax in it.
> > 
> > Yes, a form of barrier() must be present in mdelay() or udelay() itself
> > as you say, having it in __const_udelay() is *not* enough (superflous
> > actually, considering it is already a separate translation unit and
> > invisible to the compiler).
> 
> As long as __const_udelay does something which has the same
> effect as barrier it is enough even if it's in the same unit.

Only if __const_udelay() is inlined. But as I said, __const_udelay()
-- although marked "inline" -- will never be inlined anywhere in the
kernel in reality. It's an exported symbol, and never inlined from
modules. Even from built-in targets, the definition of __const_udelay
is invisible when gcc is compiling the compilation units of those
callsites. The compiler has no idea that that function has barriers
or not, so we're saved here _only_ by the lucky fact that
__const_udelay() is in a different compilation unit.


> As a matter of fact it does on i386 where __delay either uses
> rep_nop or asm/volatile.

__delay() can be either delay_tsc() or delay_loop() on i386.

delay_tsc() uses the rep_nop() there for it's own little busy
loop, actually. But for a call site that inlines __const_udelay()
-- if it were ever moved to a .h file and marked inline -- the
call to __delay() will _still_ be across compilation units. So,
again for this case, it does not matter if the callee function
has compiler barriers or not (it would've been a different story
if we were discussing real/CPU barriers, I think), what saves us
here is just the fact that a call is made to a function from a
different compilation unit, which is invisible to the compiler
when compiling the callsite, and hence acting as the compiler
barrier.

Regarding delay_loop(), it uses "volatile" for the "asm" which
has quite different semantics from the C language "volatile"
type-qualifier keyword and does not imply any compiler barrier
at all.


Satyam
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Christoph Lameter

On Wed, 15 Aug 2007, Paul E. McKenney wrote:

> The volatile cast should not disable all that many optimizations,
> for example, it is much less hurtful than barrier().  Furthermore,
> the main optimizations disabled (pulling atomic_read() and atomic_set()
> out of loops) really do need to be disabled.

In many cases you do not need a barrier. Having volatile there *will* 
impact optimization because the compiler cannot use a register that may 
contain the value that was fetched earlier. And the compiler cannot choose 
freely when to fetch the value. The order of memory accesses are fixed if 
you use volatile. If the variable is not volatile then the compiler can 
arrange memory accesses any way they fit and thus generate better code.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul E. McKenney

On Wed, Aug 15, 2007 at 05:59:41PM -0700, Christoph Lameter wrote:
> On Wed, 15 Aug 2007, Paul E. McKenney wrote:
> 
> > The volatile cast should not disable all that many optimizations,
> > for example, it is much less hurtful than barrier().  Furthermore,
> > the main optimizations disabled (pulling atomic_read() and atomic_set()
> > out of loops) really do need to be disabled.
> 
> In many cases you do not need a barrier. Having volatile there *will* 
> impact optimization because the compiler cannot use a register that may 
> contain the value that was fetched earlier. And the compiler cannot choose 
> freely when to fetch the value. The order of memory accesses are fixed if 
> you use volatile. If the variable is not volatile then the compiler can 
> arrange memory accesses any way they fit and thus generate better code.

Understood.  My point is not that the impact is precisely zero, but
rather that the impact on optimization is much less hurtful than the
problems that could arise otherwise, particularly as compilers become
more aggressive in their optimizations.

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul Mackerras

Christoph Lameter writes:

> A volatile default would disable optimizations for atomic_read. 
> atomic_read without volatile would allow for full optimization by the 
> compiler. Seems that this is what one wants in many cases.

Name one such case.

An atomic_read should do a load from memory.  If the programmer puts
an atomic_read() in the code then the compiler should emit a load for
it, not re-use a value returned by a previous atomic_read.  I do not
believe it would ever be useful for the compiler to collapse two
atomic_read statements into a single load.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Christoph Lameter

On Wed, 15 Aug 2007, Paul E. McKenney wrote:

> Understood.  My point is not that the impact is precisely zero, but
> rather that the impact on optimization is much less hurtful than the
> problems that could arise otherwise, particularly as compilers become
> more aggressive in their optimizations.

The problems arise because barriers are not used as required. Volatile 
has wishy washy semantics and somehow marries memory barriers with data 
access. It is clearer to separate the two. Conceptual cleanness usually 
translates into better code. If one really wants the volatile then lets 
make it explicit and use

atomic_read_volatile()
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool

Part of the motivation here is to fix heisenbugs.  If I knew where 
they


By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.


Almost everything is a tradeoff; and so is this.  I don't
believe most people would find disabling all compiler
optimisations an acceptable price to pay for some peace
of mind.


Segher

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool

Part of the motivation here is to fix heisenbugs.  If I knew where 
they


By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.


Precisely the point -- use of volatile (whether in casts or on asms)
in these cases are intended to disable those optimizations likely to
result in heisenbugs.


The only thing volatile on an asm does is create a side effect
on the asm statement; in effect, it tells the compiler "do not
remove this asm even if you don't need any of its outputs".

It's not disabling optimisation likely to result in bugs,
heisen- or otherwise; _not_ putting the volatile on an asm
that needs it simply _is_ a bug :-)


Segher

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Herbert Xu

On Thu, Aug 16, 2007 at 11:51:42AM +1000, Paul Mackerras wrote:
> 
> Name one such case.

See sk_stream_mem_schedule in net/core/stream.c:

/* Under limit. */
if (atomic_read(sk->sk_prot->memory_allocated) < 
sk->sk_prot->sysctl_mem[0]) {
if (*sk->sk_prot->memory_pressure)
*sk->sk_prot->memory_pressure = 0;
return 1;
}

/* Over hard limit. */
if (atomic_read(sk->sk_prot->memory_allocated) > 
sk->sk_prot->sysctl_mem[2]) {
sk->sk_prot->enter_memory_pressure();
goto suppress_allocation;
}

We don't need to reload sk->sk_prot->memory_allocated here.

Now where is your example again?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma

On Wed, 15 Aug 2007, Christoph Lameter wrote:

> On Wed, 15 Aug 2007, Paul E. McKenney wrote:
> 
> > Understood.  My point is not that the impact is precisely zero, but
> > rather that the impact on optimization is much less hurtful than the
> > problems that could arise otherwise, particularly as compilers become
> > more aggressive in their optimizations.
> 
> The problems arise because barriers are not used as required. Volatile 
> has wishy washy semantics and somehow marries memory barriers with data 
> access. It is clearer to separate the two. Conceptual cleanness usually 
> translates into better code. If one really wants the volatile then lets 
> make it explicit and use
> 
>   atomic_read_volatile()

Completely agreed, again. To summarize again (had done so about ~100 mails
earlier in this thread too :-) ...

atomic_{read,set}_volatile() -- guarantees volatility also along with
atomicity (the two _are_ different concepts after all, irrespective of
whether callsites normally want one with the other or not)

atomic_{read,set}_nonvolatile() -- only guarantees atomicity, compiler
free to elid / coalesce / optimize such accesses, can keep the object
in question cached in a local register, leads to smaller text, etc.

As to which one should be the default atomic_read() is a question of
whether majority of callsites (more weightage to important / hot
codepaths, lesser to obscure callsites) want a particular behaviour.

Do we have a consensus here? (hoping against hope, probably :-)

[ This thread has gotten completely out of hand ... for my mail client
  alpine as well, it now seems. Reminds of that 1000+ GPLv3 fest :-) ]
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul Mackerras

Herbert Xu writes:

> See sk_stream_mem_schedule in net/core/stream.c:
> 
> /* Under limit. */
> if (atomic_read(sk->sk_prot->memory_allocated) < 
> sk->sk_prot->sysctl_mem[0]) {
> if (*sk->sk_prot->memory_pressure)
> *sk->sk_prot->memory_pressure = 0;
> return 1;
> }
> 
> /* Over hard limit. */
> if (atomic_read(sk->sk_prot->memory_allocated) > 
> sk->sk_prot->sysctl_mem[2]) {
> sk->sk_prot->enter_memory_pressure();
> goto suppress_allocation;
> }
> 
> We don't need to reload sk->sk_prot->memory_allocated here.

Are you sure?  How do you know some other CPU hasn't changed the value
in between?

Paul.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Segher Boessenkool


A volatile default would disable optimizations for atomic_read.
atomic_read without volatile would allow for full optimization by the
compiler. Seems that this is what one wants in many cases.


Name one such case.

An atomic_read should do a load from memory.  If the programmer puts
an atomic_read() in the code then the compiler should emit a load for
it, not re-use a value returned by a previous atomic_read.  I do not
believe it would ever be useful for the compiler to collapse two
atomic_read statements into a single load.


An atomic_read() implemented as a "normal" C variable read would
allow that read to be combined with another "normal" read from
that variable.  This could perhaps be marginally useful, although
I'd bet you cannot see it unless counting cycles on a simulator
or counting bits in the binary size.

With an asm() implementation, the compiler can not do this; with
a "volatile" implementation (either volatile variable or volatile-cast),
this invokes undefined behaviour (in both C and GCC).


Segher

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Herbert Xu

On Thu, Aug 16, 2007 at 07:45:44AM +0530, Satyam Sharma wrote:
>
> Completely agreed, again. To summarize again (had done so about ~100 mails
> earlier in this thread too :-) ...
> 
> atomic_{read,set}_volatile() -- guarantees volatility also along with
> atomicity (the two _are_ different concepts after all, irrespective of
> whether callsites normally want one with the other or not)
> 
> atomic_{read,set}_nonvolatile() -- only guarantees atomicity, compiler
> free to elid / coalesce / optimize such accesses, can keep the object
> in question cached in a local register, leads to smaller text, etc.
> 
> As to which one should be the default atomic_read() is a question of
> whether majority of callsites (more weightage to important / hot
> codepaths, lesser to obscure callsites) want a particular behaviour.
> 
> Do we have a consensus here? (hoping against hope, probably :-)

I can certainly agree with this.

But I have to say that I still don't know of a single place
where one would actually use the volatile variant.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Herbert Xu

On Thu, Aug 16, 2007 at 12:05:56PM +1000, Paul Mackerras wrote:
> Herbert Xu writes:
> 
> > See sk_stream_mem_schedule in net/core/stream.c:
> > 
> > /* Under limit. */
> > if (atomic_read(sk->sk_prot->memory_allocated) < 
> > sk->sk_prot->sysctl_mem[0]) {
> > if (*sk->sk_prot->memory_pressure)
> > *sk->sk_prot->memory_pressure = 0;
> > return 1;
> > }
> > 
> > /* Over hard limit. */
> > if (atomic_read(sk->sk_prot->memory_allocated) > 
> > sk->sk_prot->sysctl_mem[2]) {
> > sk->sk_prot->enter_memory_pressure();
> > goto suppress_allocation;
> > }
> > 
> > We don't need to reload sk->sk_prot->memory_allocated here.
> 
> Are you sure?  How do you know some other CPU hasn't changed the value
> in between?

Yes I'm sure, because we don't care if others have increased
the reservation.

Note that even if we did we'd be using barriers so volatile
won't do us any good here.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Christoph Lameter

On Thu, 16 Aug 2007, Paul Mackerras wrote:

> > We don't need to reload sk->sk_prot->memory_allocated here.
> 
> Are you sure?  How do you know some other CPU hasn't changed the value
> in between?

The cpu knows because the cacheline was not invalidated.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Chris Friesen


Herbert Xu wrote:


But I have to say that I still don't know of a single place
where one would actually use the volatile variant.


Given that many of the existing users do currently have "volatile", are 
you comfortable simply removing that behaviour from them?  Are you sure 
that you will not introduce any issues?


Forcing a re-read is only a performance penalty.  Removing it can cause 
behavioural changes.


I would be more comfortable making the default match the majority of the 
current implementations (ie: volatile semantics).  Then, if someone 
cares about performance they can explicitly validate the call path and 
convert it over to the non-volatile version.


Correctness before speed...

Chris
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma

On Thu, 16 Aug 2007, Paul Mackerras wrote:

> Herbert Xu writes:
> 
> > See sk_stream_mem_schedule in net/core/stream.c:
> > 
> > /* Under limit. */
> > if (atomic_read(sk->sk_prot->memory_allocated) < 
> > sk->sk_prot->sysctl_mem[0]) {
> > if (*sk->sk_prot->memory_pressure)
> > *sk->sk_prot->memory_pressure = 0;
> > return 1;
> > }
> > 
> > /* Over hard limit. */
> > if (atomic_read(sk->sk_prot->memory_allocated) > 
> > sk->sk_prot->sysctl_mem[2]) {
> > sk->sk_prot->enter_memory_pressure();
> > goto suppress_allocation;
> > }
> > 
> > We don't need to reload sk->sk_prot->memory_allocated here.
> 
> Are you sure?  How do you know some other CPU hasn't changed the value
> in between?

I can't speak for this particular case, but there could be similar code
examples elsewhere, where we do the atomic ops on an atomic_t object
inside a higher-level locking scheme that would take care of the kind of
problem you're referring to here. It would be useful for such or similar
code if the compiler kept the value of that atomic object in a register.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul E. McKenney

On Thu, Aug 16, 2007 at 03:23:28AM +0200, Segher Boessenkool wrote:
> No; compilation units have nothing to do with it, GCC can optimise
> across compilation unit boundaries just fine, if you tell it to
> compile more than one compilation unit at once.
> >>>
> >>>Last I checked, the Linux kernel build system did compile each .c 
> >>>file
> >>>as a separate compilation unit.
> >>
> >>I have some patches to use -combine -fwhole-program for Linux.
> >>Highly experimental, you need a patched bleeding edge toolchain.
> >>If there's interest I'll clean it up and put it online.
> >>
> >>David Woodhouse had some similar patches about a year ago.
> >
> >Sounds exciting...  ;-)
> 
> Yeah, the breakage is *quite* spectacular :-)

I bet!!!  ;-)

> >In many cases, the compiler also has to assume that
> >msleep_interruptible()
> >might call back into a function in the current compilation unit, 
> >thus
> >possibly modifying global static variables.
> 
> It most often is smart enough to see what compilation-unit-local
> variables might be modified that way, though :-)
> >>>
> >>>Yep.  For example, if it knows the current value of a given such 
> >>>local
> >>>variable, and if all code paths that would change some other variable
> >>>cannot be reached given that current value of the first variable.
> >>
> >>Or the most common thing: if neither the address of the translation-
> >>unit local variable nor the address of any function writing to that
> >>variable can "escape" from that translation unit, nothing outside
> >>the translation unit can write to the variable.
> >
> >But there is usually at least one externally callable function in
> >a .c file.
> 
> Of course, but often none of those will (indirectly) write a certain
> static variable.

But there has to be some path to the static functions, assuming that
they are not dead code.  Yes, there can be cases where the compiler
knows enough about the state of the variables to rule out some of code
paths to them, but I have no idea how often this happens in kernel
code.

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Nick Piggin


Segher Boessenkool wrote:

Part of the motivation here is to fix heisenbugs.  If I knew where they



By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.



Almost everything is a tradeoff; and so is this.  I don't
believe most people would find disabling all compiler
optimisations an acceptable price to pay for some peace
of mind.


So why is this a good tradeoff?

I also think that just adding things to APIs in the hope it might fix
up some bugs isn't really a good road to go down. Where do you stop?

On the actual proposal to make atomic_operators volatile: I think the
better approach in the long term, for both maintainability of the
code and education of coders, is to make the use of barriers _more_
explicit rather than sprinkling these "just in case" ones around.

You may get rid of a few atomic_read heisenbugs (in noise when
compared to all bugs), but if the coder was using a regular atomic
load, or a test_bit (which is also atomic), etc. then they're going
to have problems.

It would be better for Linux if everyone was to have better awareness
of barriers than to hide some of the cases where they're required.
A pretty large number of bugs I see in lock free code in the VM is
due to memory ordering problems. It's hard to find those bugs, or
even be aware when you're writing buggy code if you don't have some
feel for barriers.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Christoph Lameter

On Thu, 16 Aug 2007, Herbert Xu wrote:

> > Do we have a consensus here? (hoping against hope, probably :-)
> 
> I can certainly agree with this.

I agree too.

> But I have to say that I still don't know of a single place
> where one would actually use the volatile variant.

I suspect that what you say is true after we have looked at all callers.


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Christoph Lameter

On Wed, 15 Aug 2007, Christoph Lameter wrote:

> On Thu, 16 Aug 2007, Paul Mackerras wrote:
> 
> > > We don't need to reload sk->sk_prot->memory_allocated here.
> > 
> > Are you sure?  How do you know some other CPU hasn't changed the value
> > in between?
> 
> The cpu knows because the cacheline was not invalidated.

Crap my statement above is wrong. We do not care that the 
value was changed otherwise we would have put a barrier in there.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul E. McKenney

On Thu, Aug 16, 2007 at 03:30:44AM +0200, Segher Boessenkool wrote:
> >>>Part of the motivation here is to fix heisenbugs.  If I knew where 
> >>>they
> >>
> >>By the same token we should probably disable optimisations
> >>altogether since that too can create heisenbugs.
> >
> >Precisely the point -- use of volatile (whether in casts or on asms)
> >in these cases are intended to disable those optimizations likely to
> >result in heisenbugs.
> 
> The only thing volatile on an asm does is create a side effect
> on the asm statement; in effect, it tells the compiler "do not
> remove this asm even if you don't need any of its outputs".
> 
> It's not disabling optimisation likely to result in bugs,
> heisen- or otherwise; _not_ putting the volatile on an asm
> that needs it simply _is_ a bug :-)

Yep.  And the reason it is a bug is that it fails to disable
the relevant compiler optimizations.  So I suspect that we might
actually be saying the same thing here.

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul E. McKenney

On Wed, Aug 15, 2007 at 06:41:40PM -0700, Christoph Lameter wrote:
> On Wed, 15 Aug 2007, Paul E. McKenney wrote:
> 
> > Understood.  My point is not that the impact is precisely zero, but
> > rather that the impact on optimization is much less hurtful than the
> > problems that could arise otherwise, particularly as compilers become
> > more aggressive in their optimizations.
> 
> The problems arise because barriers are not used as required. Volatile 
> has wishy washy semantics and somehow marries memory barriers with data 
> access. It is clearer to separate the two. Conceptual cleanness usually 
> translates into better code. If one really wants the volatile then lets 
> make it explicit and use
> 
>   atomic_read_volatile()

There are indeed architectures where you can cause gcc to emit memory
barriers in response to volatile.  I am assuming that we are -not-
making gcc do this.  Given this, then volatiles and memory barrier
instructions are orthogonal -- one controls the compiler, the other
controls the CPU.

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul E. McKenney

On Thu, Aug 16, 2007 at 10:11:05AM +0800, Herbert Xu wrote:
> On Thu, Aug 16, 2007 at 12:05:56PM +1000, Paul Mackerras wrote:
> > Herbert Xu writes:
> > 
> > > See sk_stream_mem_schedule in net/core/stream.c:
> > > 
> > > /* Under limit. */
> > > if (atomic_read(sk->sk_prot->memory_allocated) < 
> > > sk->sk_prot->sysctl_mem[0]) {
> > > if (*sk->sk_prot->memory_pressure)
> > > *sk->sk_prot->memory_pressure = 0;
> > > return 1;
> > > }
> > > 
> > > /* Over hard limit. */
> > > if (atomic_read(sk->sk_prot->memory_allocated) > 
> > > sk->sk_prot->sysctl_mem[2]) {
> > > sk->sk_prot->enter_memory_pressure();
> > > goto suppress_allocation;
> > > }
> > > 
> > > We don't need to reload sk->sk_prot->memory_allocated here.
> > 
> > Are you sure?  How do you know some other CPU hasn't changed the value
> > in between?
> 
> Yes I'm sure, because we don't care if others have increased
> the reservation.
> 
> Note that even if we did we'd be using barriers so volatile
> won't do us any good here.

If the load-coalescing is important to performance, why not load into
a local variable?

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Satyam Sharma

On Thu, 16 Aug 2007, Satyam Sharma wrote:

> On Thu, 16 Aug 2007, Paul Mackerras wrote:
> > Herbert Xu writes:
> > 
> > > See sk_stream_mem_schedule in net/core/stream.c:
> > > 
> > > /* Under limit. */
> > > if (atomic_read(sk->sk_prot->memory_allocated) < 
> > > sk->sk_prot->sysctl_mem[0]) {
> > > if (*sk->sk_prot->memory_pressure)
> > > *sk->sk_prot->memory_pressure = 0;
> > > return 1;
> > > }
> > > 
> > > /* Over hard limit. */
> > > if (atomic_read(sk->sk_prot->memory_allocated) > 
> > > sk->sk_prot->sysctl_mem[2]) {
> > > sk->sk_prot->enter_memory_pressure();
> > > goto suppress_allocation;
> > > }
> > > 
> > > We don't need to reload sk->sk_prot->memory_allocated here.
> > 
> > Are you sure?  How do you know some other CPU hasn't changed the value
> > in between?
> 
> I can't speak for this particular case, but there could be similar code
> examples elsewhere, where we do the atomic ops on an atomic_t object
> inside a higher-level locking scheme that would take care of the kind of
> problem you're referring to here. It would be useful for such or similar
> code if the compiler kept the value of that atomic object in a register.

We might not be using atomic_t (and ops) if we already have a higher-level
locking scheme, actually. So as Herbert mentioned, such cases might just
not care. [ Too much of this thread, too little sleep, sorry! ]

Anyway, the problem, of course, is that this conversion to a stronger /
safer-by-default behaviour doesn't happen with zero cost to performance.
Converting atomic ops to "volatile" behaviour did add ~2K to kernel text
for archs such as i386 (possibly to important codepaths) that didn't have
those semantics already so it would be constructive to actually look at
those differences and see if there were really any heisenbugs that got
rectified. Or if there were legitimate optimizations that got wrongly
disabled. Onus lies on those proposing the modifications, I'd say ;-)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul Mackerras

Satyam Sharma writes:

> I can't speak for this particular case, but there could be similar code
> examples elsewhere, where we do the atomic ops on an atomic_t object
> inside a higher-level locking scheme that would take care of the kind of
> problem you're referring to here. It would be useful for such or similar
> code if the compiler kept the value of that atomic object in a register.

If there is a higher-level locking scheme then there is no point to
using atomic_t variables.  Atomic_t is specifically for the situation
where multiple CPUs are updating a variable without locking.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul Mackerras

Herbert Xu writes:

> > Are you sure?  How do you know some other CPU hasn't changed the value
> > in between?
> 
> Yes I'm sure, because we don't care if others have increased
> the reservation.

But others can also reduce the reservation.  Also, the code sets and
clears *sk->sk_prot->memory_pressure nonatomically with respect to the
reads of sk->sk_prot->memory_allocated, so in fact the code doesn't
guarantee any particular relationship between the two.

That code looks like a beautiful example of buggy, racy code where
someone has sprinkled magic fix-the-races dust (otherwise known as
atomic_t) around in a vain attempt to fix the races.

That's assuming that all that stuff actually performs any useful
purpose, of course, and that there isn't some lock held by the
callers.  In the latter case it is pointless using atomic_t.

Paul.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Paul Mackerras

Christoph Lameter writes:

> > But I have to say that I still don't know of a single place
> > where one would actually use the volatile variant.
> 
> I suspect that what you say is true after we have looked at all callers.

It seems that there could be a lot of places where atomic_t is used in
a non-atomic fashion, and that those uses are either buggy, or there
is some lock held at the time which guarantees that other CPUs aren't
changing the value.  In both cases there is no point in using
atomic_t; we might as well just use an ordinary int.

In particular, atomic_read seems to lend itself to buggy uses.  People
seem to do things like:

atomic_add(&v, something);
if (atomic_read(&v) > something_else) ...

and expect that there is some relationship between the value that the
atomic_add stored and the value that the atomic_read will return,
which there isn't.  People seem to think that using atomic_t magically
gets rid of races.  It doesn't.

I'd go so far as to say that anywhere where you want a non-"volatile"
atomic_read, either your code is buggy, or else an int would work just
as well.

Paul.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-15 Thread Herbert Xu

On Thu, Aug 16, 2007 at 01:23:06PM +1000, Paul Mackerras wrote:
>
> In particular, atomic_read seems to lend itself to buggy uses.  People
> seem to do things like:
> 
>   atomic_add(&v, something);
>   if (atomic_read(&v) > something_else) ...

If you're referring to the code in sk_stream_mem_schedule
then it's working as intended.  The atomicity guarantees
that the atomic_add/atomic_sub won't be seen in parts by
other readers.

We certainly do not need to see other atomic_add/atomic_sub
operations immediately.

If you're referring to another code snippet please cite.

> I'd go so far as to say that anywhere where you want a non-"volatile"
> atomic_read, either your code is buggy, or else an int would work just
> as well.

An int won't work here because += and -= do not have the
atomicity guarantees that atomic_add/atomic_sub do.  In
particular, this may cause an atomic_read on another CPU
to give a bogus reading.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[EMAIL PROTECTED]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 3 >

1 - 100 of 291 matches

Mail list logo