Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-08 Thread David Woodhouse


[EMAIL PROTECTED] said:
>  I tried it with two compilers, one older than yours and one newer: 
> In both cases, the memory read occurs before "zzz". 

[EMAIL PROTECTED] said:
>  I guess Alexey point is that the current compiler doesn't notice
> that. 

I don't understand why we're bringing empirical evidence into this
discussion. Didn't we get into the horrible mess we've already got w.r.t.
compilers by relying on the existing behaviour never changing?

The GCC documentation exists. If it is unclear, file a bug, and/or obtain 
clarification from the developers or the C specifications.

Once that is done, code the kernel to that specification, without heed to 
the current _implementation_ of GCC.

Separately, you may wish to test that your particular version of GCC, on
your architecture, with this phase of the moon, conforms to that
specification. If it doesn't, file another bug.

Do not use the ambiguity of the specification as an excuse for coding to 
today's GCC implementation. We should have learned that lesson by now.

Please - come to a clear conclusion about the behaviour _permitted_ by the 
GCC documentation, and _then_ attempt to test it.

--
dwmw2


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-08 Thread David Woodhouse


[EMAIL PROTECTED] said:
  I tried it with two compilers, one older than yours and one newer: 
 In both cases, the memory read occurs before "zzz". 

[EMAIL PROTECTED] said:
  I guess Alexey point is that the current compiler doesn't notice
 that. 

I don't understand why we're bringing empirical evidence into this
discussion. Didn't we get into the horrible mess we've already got w.r.t.
compilers by relying on the existing behaviour never changing?

The GCC documentation exists. If it is unclear, file a bug, and/or obtain 
clarification from the developers or the C specifications.

Once that is done, code the kernel to that specification, without heed to 
the current _implementation_ of GCC.

Separately, you may wish to test that your particular version of GCC, on
your architecture, with this phase of the moon, conforms to that
specification. If it doesn't, file another bug.

Do not use the ambiguity of the specification as an excuse for coding to 
today's GCC implementation. We should have learned that lesson by now.

Please - come to a clear conclusion about the behaviour _permitted_ by the 
GCC documentation, and _then_ attempt to test it.

--
dwmw2


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Jamie Lokier

Andrea Arcangeli wrote:
> >int *p;
> >int func()
> >{
> >  int x;
> >  x = *p;
> >  __asm__ __volatile__ ("" : : : "memory");
> >  x = *p;
> >  return x;
> >}
> 
> Defintely none difference here (-fstrict-aliasing doesn't change anything
> either).
> 
> andrea@inspiron:~ > gcc -v
> Reading specs from /usr/lib/gcc-lib/i486-suse-linux/2.95.2/specs
> gcc version 2.95.2 19991024 (release)

I tried it with two compilers, one older than yours and one newer:

.ident  "GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)"
.ident  "GCC: (GNU) 2.96 2724 (experimental)"

The command lines were:

   gcc -S -O2 test4.c
   kgcc -S -O2 test4.c

In both cases, the memory read occurs before "zzz".

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Jamie Lokier

Andrea Arcangeli wrote:
> Said that if your compiler puts the read before the spin_lock without the
> memory clobber, it is allowed to do that, and in such case you would proof
> it was a real world bug (not just a "documentation" one).

Yes, it does.

> Or maybe your testcase was a bit different then mine, in such case please
> send it to me (I'm curious indeed :).

int *p;
int func()
{
  int x = *p;
  __asm__ __volatile ("zzz" : :);
  x = *p;
  return x;
}

In output:

.ident  "GCC: (GNU) 2.96 2724 (experimental)"

>From the Red Hat 7 beta.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Jamie Lokier

Linus Torvalds wrote:
> Nope. "memory" fills that role too. Remember: "memory" doesn't actually
> say "this clobbers all memory". That would be silly: an asm that just
> wipes all memory would not be a very useful asm (or rather, it would have
> just _one_ use: "execve()"). So "memory" really says that the asm clobbers
> _some_ memory.
> 
> Which in turn means that the code scheduler has to synchronize all memory
> accesses around it - as if the asm was reading all memory. Because if the
> scheduler would move a store to after the asm, it would give the wrong
> results if the asm happened to clobber _that_ memory. And the scheduler
> obviously cannot just drop the store ("Oh, the asm will clobber this
> anyway"), because it doesn't know which memory regions get clobbered.

There are other reasons why stores can be eliminated, and "memory" does
not prevent those.

Example: you have two stores to the same address.  An "asm" with
"memory" is in between the stores.

The first store can still be dropped, even though we don't know _which_
memory the "memory" clobbers.

Now, "memory" may well mean "this asm is considered to read some memory
and clobber some memory".  But that's not what the manual says, and it
would be inconsistent with other kinds of clobber.

> Now, the fact that the "memory" clobber also seems to clobber local
> variables is a bug, I think.

If you are thinking of my examples, "memory" doesn't clobber local
variables.  It does affect the scheduling of reads from general memory
into a local variable.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Jamie Lokier

Linus Torvalds wrote:
> Change it to something like
> 
>   __asm__("":"=r" (x):"0" (x));
> 
> and the "volatile" should matter.

Yes it does.  Without "volatile", the asm disappears :-)

> Not for memory references, perhaps. But for the movement issues.

The compiler isn't moving memory references around "asm volatile", but
it is doing CSE around them to _eliminate_ memory references.

Thus spin_lock needs the memory clobber, to prevent CSE of non-volatile
memory references between the critical region and outside the critical
region.

Maybe spin_unlock doesn't need one because CSE doesn't work the other
way.  (I'd put that clobber in anyway, to be sure).

-- Jamie


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Jamie Lokier

Linus Torvalds wrote:
> "volatile" should be equivalent to clobbering memory, although the gcc
> manual pages are certainly not very verbose on the issue.

It isn't.  Try the following with/without the memory clobber:

int *p;
int func()
{
  int x;
  x = *p;
  __asm__ __volatile__ ("" : : : "memory");
  x = *p;
  return x;
}

With or without, the program reads *p only once.  However, with the
clobber it reads _after_ the asm; without, it reads _before_ the asm.

It's ok for the compiler to do that (given we don't know what "volatile"
means anyway :-).  But it does have implications for spin_lock:
spin_lock must say that it clobbers memory.

spin_unlock should also say it clobbers memory but I have no test case
to demonstrate the compiler moving reads down past an asm.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Jamie Lokier

Andrea Arcangeli wrote:
> >>int a = *p;
> >>__asm__ __volatile__("" : :);
> >>a = *p;
> >> 
> >> (to do two explicit reads)
> >
> >Sorry, that does just one read, kgcc (old stable gcc) and also with
> >gcc-2.96.  Type aliasing on/off makes no difference to the number of reads.
> 
> I wrote the above not just as a complete testecase, but just to mean what
> the case I was talking about. You made int a a local variable and the
> thing you noticed is an otimization that the compiler is allowed to do
> regardless of the "memory" clobber too (`int a' have to be at least extern
> otherwise the compiler understands the first read can go away).

Interestingly enough, the local variable case is one where "memory" does
make a difference.  Without "memory":

movlp, %eax
movl(%eax), %eax
#APP
#NO_APP

With "memory":

#APP
#NO_APP
movlp, %eax
movl(%eax), %eax

> Try to add "memory" as clobber to the above testcase and nothing will
> change. (that's what I meant in my previous email saying that even w/o
> "memory" things got compiled right at least in my simple testcases)

As you can see from above, there are cases where

   local_var = shared->mumble;
   // ...
   spin_lock ();
   local_var = shared->mumble;

requires a "memory" clobber, otherwise the second read, which is in a
critical region, won't be emitted by the compiler.

-- Jamie

ps. There is a _clobber_ for memory, but no way to say "this asm _reads_
arbitrary memory".  __volatile__ may be filling that role though.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Jamie Lokier

asm *__volatile__* seems to make no difference.  I've tried a few things.

Andrea Arcangeli wrote:
> Maybe we can rely on the __volatile__ statement of the asm that will
> enforce that if we write:
> 
>   *p = 0;
>   __asm__ __volatile__("" : :);
>   *p = 1;
>
> in the assembler we'll then find both a write of 0 and then a write of 1
> to memory.

That does 2 writes with gcc-2.96 and also egcs-2.91.66/19990314
(Red Hat's kgcc), with or without -fstrict-aliasing.

It also does 2 writes without __volatile__.

>   int a = *p;
>   __asm__ __volatile__("" : :);
>   a = *p;
> 
> (to do two explicit reads)

Sorry, that does just one read, kgcc (old stable gcc) and also with
gcc-2.96.  Type aliasing on/off makes no difference to the number of reads.

Again, __volatile__ makes no difference.

I cannot tell from the GCC manual whether either of these behaviours is
correct or not.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Franz Sirl

At 17:03 07.09.00, Andrea Arcangeli wrote:
>On Mon, 4 Sep 2000, Andrea Arcangeli wrote:
>
> >barrier()). I also noticed __sti()/__save_flags() doesn't need to clobber
> >"memory".
>
>I'm not sure anymore if __sti and spin_unlock() doesn't need to clobber
>memory (it looks necessary to make sure the compiler doesn't delay to
>write data to the memory out of the critical section).
>
>And in practice I can't reproduce any corruption with any subtle testcase
>by removing "memory" from the clobber list of all the
>spinlocks/__sti/__cli, so it may be the other way around. Maybe we can
>rely on the __volatile__ statement of the asm that will enforce that if we
>write:
>
> *p = 0;
> __asm__ __volatile__("" : :);
> *p = 1;
>
>in the assembler we'll then find both a write of 0 and then a write of 1
>to memory. I'd say with the current compiler we can rely on it (infact the
>spinlock doesn't seems to get miscompiled at the moment even while they
>aren't clobbering "memory".
>
>Same with:
>
> int a = *p;
> __asm__ __volatile__("" : :);
> a = *p;
>
>(to do two explicit reads)
>
>If "memory" isn't necessary in spin_lock, then all the "memory" clobbers
>around include/*/* (also in mb() and __cli() and even barrier()) are not
>necessary. But then "why" we need a "memory" clobber in first place if the
>"memory" isn't necessary? :)) Also the gcc documentation seems to say we
>need "memory" in all such operations (also in __sti() and
>spin_unlock() unlike I said in my earlier email).
>
>I'd like if some compiler guy could comment (I got no one reply yet). I
>tried to grep gcc but my gcc knowledge is too low to reverse engeneer the
>implement semantics of the "memory" clobber fast (though I would
>appreciate a pointer in the gcc sources to start to understand some gcc
>code :).

In short terms:

- __volatile__ assures that the code isn't reordered against other 
__volatile__ and isn't hoisted out of loops, nothing else
- the "memory" clobber makes sure the asm isn't reordered against other 
memory accesses

Essentially, you _always_ have to tell the compiler if you touch memory 
behind it's back, this includes inline assembly to flush the cache or the 
write back queue. Since you usually don't know which part of the memory 
gets touched, you need the global "memory" clobber. If you know which 
addresses you touch, you can get away with something like this, from 
asm-ppc/io.h:

extern inline unsigned in_be32(volatile unsigned *addr)
{
 unsigned ret;

 __asm__ __volatile__("lwz%U1%X1 %0,%1; eieio" : "=r" (ret) : "m" 
(*addr));
 return ret;
}

extern inline void out_le32(volatile unsigned *addr, int val)
{
 __asm__ __volatile__("stwbrx %1,0,%2; eieio" : "=m" (*addr) :
  "r" (val), "r" (addr));
}

General rule of thumb for inline assembly:

   Give the compiler as much information as possible!!

If you know inline assembly read/writes memory, tell it to the compiler, as 
detailed as possible!

Franz.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Jamie Lokier

Andrea Arcangeli wrote:
 int a = *p;
 __asm__ __volatile__("" : :);
 a = *p;
  
  (to do two explicit reads)
 
 Sorry, that does just one read, kgcc (old stable gcc) and also with
 gcc-2.96.  Type aliasing on/off makes no difference to the number of reads.
 
 I wrote the above not just as a complete testecase, but just to mean what
 the case I was talking about. You made int a a local variable and the
 thing you noticed is an otimization that the compiler is allowed to do
 regardless of the "memory" clobber too (`int a' have to be at least extern
 otherwise the compiler understands the first read can go away).

Interestingly enough, the local variable case is one where "memory" does
make a difference.  Without "memory":

movlp, %eax
movl(%eax), %eax
#APP
#NO_APP

With "memory":

#APP
#NO_APP
movlp, %eax
movl(%eax), %eax

 Try to add "memory" as clobber to the above testcase and nothing will
 change. (that's what I meant in my previous email saying that even w/o
 "memory" things got compiled right at least in my simple testcases)

As you can see from above, there are cases where

   local_var = shared-mumble;
   // ...
   spin_lock (spinlock);
   local_var = shared-mumble;

requires a "memory" clobber, otherwise the second read, which is in a
critical region, won't be emitted by the compiler.

-- Jamie

ps. There is a _clobber_ for memory, but no way to say "this asm _reads_
arbitrary memory".  __volatile__ may be filling that role though.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Jamie Lokier

Linus Torvalds wrote:
 "volatile" should be equivalent to clobbering memory, although the gcc
 manual pages are certainly not very verbose on the issue.

It isn't.  Try the following with/without the memory clobber:

int *p;
int func()
{
  int x;
  x = *p;
  __asm__ __volatile__ ("" : : : "memory");
  x = *p;
  return x;
}

With or without, the program reads *p only once.  However, with the
clobber it reads _after_ the asm; without, it reads _before_ the asm.

It's ok for the compiler to do that (given we don't know what "volatile"
means anyway :-).  But it does have implications for spin_lock:
spin_lock must say that it clobbers memory.

spin_unlock should also say it clobbers memory but I have no test case
to demonstrate the compiler moving reads down past an asm.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Jamie Lokier

Linus Torvalds wrote:
 Change it to something like
 
   __asm__("":"=r" (x):"0" (x));
 
 and the "volatile" should matter.

Yes it does.  Without "volatile", the asm disappears :-)

 Not for memory references, perhaps. But for the movement issues.

The compiler isn't moving memory references around "asm volatile", but
it is doing CSE around them to _eliminate_ memory references.

Thus spin_lock needs the memory clobber, to prevent CSE of non-volatile
memory references between the critical region and outside the critical
region.

Maybe spin_unlock doesn't need one because CSE doesn't work the other
way.  (I'd put that clobber in anyway, to be sure).

-- Jamie


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Jamie Lokier

Linus Torvalds wrote:
 Nope. "memory" fills that role too. Remember: "memory" doesn't actually
 say "this clobbers all memory". That would be silly: an asm that just
 wipes all memory would not be a very useful asm (or rather, it would have
 just _one_ use: "execve()"). So "memory" really says that the asm clobbers
 _some_ memory.
 
 Which in turn means that the code scheduler has to synchronize all memory
 accesses around it - as if the asm was reading all memory. Because if the
 scheduler would move a store to after the asm, it would give the wrong
 results if the asm happened to clobber _that_ memory. And the scheduler
 obviously cannot just drop the store ("Oh, the asm will clobber this
 anyway"), because it doesn't know which memory regions get clobbered.

There are other reasons why stores can be eliminated, and "memory" does
not prevent those.

Example: you have two stores to the same address.  An "asm" with
"memory" is in between the stores.

The first store can still be dropped, even though we don't know _which_
memory the "memory" clobbers.

Now, "memory" may well mean "this asm is considered to read some memory
and clobber some memory".  But that's not what the manual says, and it
would be inconsistent with other kinds of clobber.

 Now, the fact that the "memory" clobber also seems to clobber local
 variables is a bug, I think.

If you are thinking of my examples, "memory" doesn't clobber local
variables.  It does affect the scheduling of reads from general memory
into a local variable.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Franz Sirl

At 17:03 07.09.00, Andrea Arcangeli wrote:
On Mon, 4 Sep 2000, Andrea Arcangeli wrote:

 barrier()). I also noticed __sti()/__save_flags() doesn't need to clobber
 "memory".

I'm not sure anymore if __sti and spin_unlock() doesn't need to clobber
memory (it looks necessary to make sure the compiler doesn't delay to
write data to the memory out of the critical section).

And in practice I can't reproduce any corruption with any subtle testcase
by removing "memory" from the clobber list of all the
spinlocks/__sti/__cli, so it may be the other way around. Maybe we can
rely on the __volatile__ statement of the asm that will enforce that if we
write:

 *p = 0;
 __asm__ __volatile__("" : :);
 *p = 1;

in the assembler we'll then find both a write of 0 and then a write of 1
to memory. I'd say with the current compiler we can rely on it (infact the
spinlock doesn't seems to get miscompiled at the moment even while they
aren't clobbering "memory".

Same with:

 int a = *p;
 __asm__ __volatile__("" : :);
 a = *p;

(to do two explicit reads)

If "memory" isn't necessary in spin_lock, then all the "memory" clobbers
around include/*/* (also in mb() and __cli() and even barrier()) are not
necessary. But then "why" we need a "memory" clobber in first place if the
"memory" isn't necessary? :)) Also the gcc documentation seems to say we
need "memory" in all such operations (also in __sti() and
spin_unlock() unlike I said in my earlier email).

I'd like if some compiler guy could comment (I got no one reply yet). I
tried to grep gcc but my gcc knowledge is too low to reverse engeneer the
implement semantics of the "memory" clobber fast (though I would
appreciate a pointer in the gcc sources to start to understand some gcc
code :).

In short terms:

- __volatile__ assures that the code isn't reordered against other 
__volatile__ and isn't hoisted out of loops, nothing else
- the "memory" clobber makes sure the asm isn't reordered against other 
memory accesses

Essentially, you _always_ have to tell the compiler if you touch memory 
behind it's back, this includes inline assembly to flush the cache or the 
write back queue. Since you usually don't know which part of the memory 
gets touched, you need the global "memory" clobber. If you know which 
addresses you touch, you can get away with something like this, from 
asm-ppc/io.h:

extern inline unsigned in_be32(volatile unsigned *addr)
{
 unsigned ret;

 __asm__ __volatile__("lwz%U1%X1 %0,%1; eieio" : "=r" (ret) : "m" 
(*addr));
 return ret;
}

extern inline void out_le32(volatile unsigned *addr, int val)
{
 __asm__ __volatile__("stwbrx %1,0,%2; eieio" : "=m" (*addr) :
  "r" (val), "r" (addr));
}

General rule of thumb for inline assembly:

   Give the compiler as much information as possible!!

If you know inline assembly read/writes memory, tell it to the compiler, as 
detailed as possible!

Franz.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Jamie Lokier

Andrea Arcangeli wrote:
 Said that if your compiler puts the read before the spin_lock without the
 memory clobber, it is allowed to do that, and in such case you would proof
 it was a real world bug (not just a "documentation" one).

Yes, it does.

 Or maybe your testcase was a bit different then mine, in such case please
 send it to me (I'm curious indeed :).

int *p;
int func()
{
  int x = *p;
  __asm__ __volatile ("zzz" : :);
  x = *p;
  return x;
}

In output:

.ident  "GCC: (GNU) 2.96 2724 (experimental)"

From the Red Hat 7 beta.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Jamie Lokier

Andrea Arcangeli wrote:
 int *p;
 int func()
 {
   int x;
   x = *p;
   __asm__ __volatile__ ("" : : : "memory");
   x = *p;
   return x;
 }
 
 Defintely none difference here (-fstrict-aliasing doesn't change anything
 either).
 
 andrea@inspiron:~  gcc -v
 Reading specs from /usr/lib/gcc-lib/i486-suse-linux/2.95.2/specs
 gcc version 2.95.2 19991024 (release)

I tried it with two compilers, one older than yours and one newer:

.ident  "GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)"
.ident  "GCC: (GNU) 2.96 2724 (experimental)"

The command lines were:

   gcc -S -O2 test4.c
   kgcc -S -O2 test4.c

In both cases, the memory read occurs before "zzz".

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Jamie Lokier

asm *__volatile__* seems to make no difference.  I've tried a few things.

Andrea Arcangeli wrote:
 Maybe we can rely on the __volatile__ statement of the asm that will
 enforce that if we write:
 
   *p = 0;
   __asm__ __volatile__("" : :);
   *p = 1;

 in the assembler we'll then find both a write of 0 and then a write of 1
 to memory.

That does 2 writes with gcc-2.96 and also egcs-2.91.66/19990314
(Red Hat's kgcc), with or without -fstrict-aliasing.

It also does 2 writes without __volatile__.

   int a = *p;
   __asm__ __volatile__("" : :);
   a = *p;
 
 (to do two explicit reads)

Sorry, that does just one read, kgcc (old stable gcc) and also with
gcc-2.96.  Type aliasing on/off makes no difference to the number of reads.

Again, __volatile__ makes no difference.

I cannot tell from the GCC manual whether either of these behaviours is
correct or not.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



spin_lock forgets to clobber memory and other smp fixes [was Re:[patch] waitqueue optimization, 2.4.0-test7]

2000-09-04 Thread Andrea Arcangeli

On Tue, 29 Aug 2000, Andrea Arcangeli wrote:

>I'd prefer to have things like UnlockPage implemented as the architecture
>prefers (with a safe SMP common code default) instead of exporting zillons

I changed idea. I'd preferred to implement a mechanism that allows to
implement common code taking care of the implied barrier that some arch
can have but still without having to implement zillons of mb_clear_bit_mb
and all the variants (there are too many considering also the cases where
we need a barrier() or not).

>And surprise: I found a SMP race in UnlockPage. What we're doing now it's

If found some other race of that same kind (missing mb before releasing a
lock) in tasklet_unlock and net_tx_action.

>Furthmore what we need aren't really mb() but smb_mb() that are noops in
>an UP compilation.

Furthmore things like spin_unlock doesn't even need mb() (as alpha was
doing) but just an asm mb instruction that only must not be reordered.
spin_unlock doesn't need to clobber "memory" (no need of a compiler
barrier()). I also noticed __sti()/__save_flags() doesn't need to clobber
"memory".

Instead all spin_lock in 2.2.x and 2.4.x are implemented wrong and they
should clobber "memory".

Think at the below code:

int * p;
spinlock_t lock = SPIN_LOCK_UNLOCKED;

extern void dummy(int, int);

myfunc() {
int a, b;
spin_lock();
a = *p;
spin_unlock();

spin_lock();
b = *p;
spin_unlock();

dummy(a,b);
}

The spin_lock isn't declaring "memory" as a clobber. However the spin_lock
operation _definitely_ imply a random change in all the memory because any
memory contents can have a new completly different meaning as soon as we
acquired the lock (and we don't have any interface to make dependences on
which memory is protected by the spinlock or not to allow the compiler to
only reload the memory contents that change meaning once we acquire the
lock).

Actually GCC is a little underoptimized because adding an asm that
clobbers "memory" (aka barrier()) the address of p is reloaded into the
register. p is a constant as far I can tell and nothing should be able to
change it, only self modifying code would be able to change it, but if the
code is been changed then as well the reload of p can not be there
anymore, and in general C must never suppose anything about self modifying
code. Thus gcc shouldn't cause the address of p to be reloaded into
registers too.

Without "memory" clobber gcc would be allowed to cache the value of *p and
to reuse it also for the second critical section.

mb()/cli() are just adding "memory" as clobber and they were infact
implemented right infact. The fact mb() clobbers "memory" pretty much
proof the current spin_lock is implemented wrong.

The documentation I have is from gcc info pages:

   If your assembler instruction modifies memory in an unpredictable
fashion, add `memory' to the list of clobbered registers.  This will
cause GNU CC to not keep memory values cached in registers across the
assembler instruction.

and spin_lock as said definitely changes the memory in an unpredictable
fascion (as cli and restore_flags do too), since the memory contents after
the lock is acquired will have a completly different meaning.

Also test_and_set_* are missing "memory" as clobber.

The below patch fixes all that race conditions in spinlocks (that probably
doesn't trigger because of -fno-strict-aliasing) and regarding clear_bit
that doesn't imply memory barriers. test_and_set_* still imply an mb() in
case the memory is been changed because too much code depends on this
behaviour (even the alpha tree itself internally depends on it). Anyway
alpha just puts the mb() only in the case the memory is been changed so it
would not be optimized by extracting the mb() out of the test_and_bit*.

I also added very lightweight non atomic and reordinable
__set_bit/__test_and_set* for ext2 and minix that are protected by the big
kernel lock and they only need to efficiently set bitflags beyond the
sizeof(unsigned long) limit.

The patch fixes also an SMP race in the read_unlock of alpha (missing
mb). BTW, the backport to 2.2.x of this single bugfix is here:


ftp://ftp.*.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.17pre20/alpha-read-unlock-SMP-race-1

(I'll submit it to Alan shortly together with other 2.2.x stuff)

Then I simplified the set_bit and clear_bit asm implementation of alpha to
remove a conditional branch in the fast path (this may cause some more
dirty cacheline if somebody is doing a set_bit of a bit just set but it's
faster in the userspace simulation and it takes less space in memory). In
any case I added a #define that you can #unset to return to the previous
implementation of clear_bit/set_bit. I also removed a mb() from the
chmxchg in case none changes to memory is done (if nothing memory changes
there's no need of mb). There's also some compile/obvious bugfix for the
alpha.

It 

spin_lock forgets to clobber memory and other smp fixes [was Re:[patch] waitqueue optimization, 2.4.0-test7]

2000-09-04 Thread Andrea Arcangeli

On Tue, 29 Aug 2000, Andrea Arcangeli wrote:

I'd prefer to have things like UnlockPage implemented as the architecture
prefers (with a safe SMP common code default) instead of exporting zillons

I changed idea. I'd preferred to implement a mechanism that allows to
implement common code taking care of the implied barrier that some arch
can have but still without having to implement zillons of mb_clear_bit_mb
and all the variants (there are too many considering also the cases where
we need a barrier() or not).

And surprise: I found a SMP race in UnlockPage. What we're doing now it's

If found some other race of that same kind (missing mb before releasing a
lock) in tasklet_unlock and net_tx_action.

Furthmore what we need aren't really mb() but smb_mb() that are noops in
an UP compilation.

Furthmore things like spin_unlock doesn't even need mb() (as alpha was
doing) but just an asm mb instruction that only must not be reordered.
spin_unlock doesn't need to clobber "memory" (no need of a compiler
barrier()). I also noticed __sti()/__save_flags() doesn't need to clobber
"memory".

Instead all spin_lock in 2.2.x and 2.4.x are implemented wrong and they
should clobber "memory".

Think at the below code:

int * p;
spinlock_t lock = SPIN_LOCK_UNLOCKED;

extern void dummy(int, int);

myfunc() {
int a, b;
spin_lock(lock);
a = *p;
spin_unlock(lock);

spin_lock(lock);
b = *p;
spin_unlock(lock);

dummy(a,b);
}

The spin_lock isn't declaring "memory" as a clobber. However the spin_lock
operation _definitely_ imply a random change in all the memory because any
memory contents can have a new completly different meaning as soon as we
acquired the lock (and we don't have any interface to make dependences on
which memory is protected by the spinlock or not to allow the compiler to
only reload the memory contents that change meaning once we acquire the
lock).

Actually GCC is a little underoptimized because adding an asm that
clobbers "memory" (aka barrier()) the address of p is reloaded into the
register. p is a constant as far I can tell and nothing should be able to
change it, only self modifying code would be able to change it, but if the
code is been changed then as well the reload of p can not be there
anymore, and in general C must never suppose anything about self modifying
code. Thus gcc shouldn't cause the address of p to be reloaded into
registers too.

Without "memory" clobber gcc would be allowed to cache the value of *p and
to reuse it also for the second critical section.

mb()/cli() are just adding "memory" as clobber and they were infact
implemented right infact. The fact mb() clobbers "memory" pretty much
proof the current spin_lock is implemented wrong.

The documentation I have is from gcc info pages:

   If your assembler instruction modifies memory in an unpredictable
fashion, add `memory' to the list of clobbered registers.  This will
cause GNU CC to not keep memory values cached in registers across the
assembler instruction.

and spin_lock as said definitely changes the memory in an unpredictable
fascion (as cli and restore_flags do too), since the memory contents after
the lock is acquired will have a completly different meaning.

Also test_and_set_* are missing "memory" as clobber.

The below patch fixes all that race conditions in spinlocks (that probably
doesn't trigger because of -fno-strict-aliasing) and regarding clear_bit
that doesn't imply memory barriers. test_and_set_* still imply an mb() in
case the memory is been changed because too much code depends on this
behaviour (even the alpha tree itself internally depends on it). Anyway
alpha just puts the mb() only in the case the memory is been changed so it
would not be optimized by extracting the mb() out of the test_and_bit*.

I also added very lightweight non atomic and reordinable
__set_bit/__test_and_set* for ext2 and minix that are protected by the big
kernel lock and they only need to efficiently set bitflags beyond the
sizeof(unsigned long) limit.

The patch fixes also an SMP race in the read_unlock of alpha (missing
mb). BTW, the backport to 2.2.x of this single bugfix is here:


ftp://ftp.*.kernel.org/pub/linux/kernel/people/andrea/patches/v2.2/2.2.17pre20/alpha-read-unlock-SMP-race-1

(I'll submit it to Alan shortly together with other 2.2.x stuff)

Then I simplified the set_bit and clear_bit asm implementation of alpha to
remove a conditional branch in the fast path (this may cause some more
dirty cacheline if somebody is doing a set_bit of a bit just set but it's
faster in the userspace simulation and it takes less space in memory). In
any case I added a #define that you can #unset to return to the previous
implementation of clear_bit/set_bit. I also removed a mb() from the
chmxchg in case none changes to memory is done (if nothing memory changes
there's no need of mb). There's also some compile/obvious bugfix for the
alpha.