Re: spin_lock forgets to clobber memory and other smp fixes [was

2000-09-08 Thread kuznet

Hello!

> > I guess Alexey point is that the current compiler doesn't notice that.

Rather I proposed explanation, why missing barrier does not
have any effect. 8)8)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-08 Thread Andrea Arcangeli

On Fri, 8 Sep 2000, David Woodhouse wrote:

>[EMAIL PROTECTED] said:
>>  I guess Alexey point is that the current compiler doesn't notice
>> that. 
>
>I don't understand why we're bringing empirical evidence into this
>discussion. Didn't we get into the horrible mess we've already got w.r.t.

I'm the first who noticed and who was complaning the current behaviour, I
was considering it a bug even before I could have empirical evidence (that
I still don't have with my current compiler). I just fixed it partly and
I'm completing the fix now.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was

2000-09-08 Thread Jamie Lokier

[EMAIL PROTECTED] wrote:
> > Well, now GCC does CSE across "asm" and will eliminate memory loads,
> > even though it may not move them!  I suspect it always did CSE across
> > "asm" and we just never got hit by the bug.
> 
> dummy_lock trick is equivalent to "memory" clobber.

For GCC 2.7.2 yes.  For current compilers with -fstrict-aliasing, no:
the dummy_lock type doesn't alias with some other memory accesses.

Longer term the compiler may well take into account the array size.

Of course we still need the dummy_lock trick anyway.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-08 Thread David Woodhouse


[EMAIL PROTECTED] said:
>  I tried it with two compilers, one older than yours and one newer: 
> In both cases, the memory read occurs before "zzz". 

[EMAIL PROTECTED] said:
>  I guess Alexey point is that the current compiler doesn't notice
> that. 

I don't understand why we're bringing empirical evidence into this
discussion. Didn't we get into the horrible mess we've already got w.r.t.
compilers by relying on the existing behaviour never changing?

The GCC documentation exists. If it is unclear, file a bug, and/or obtain 
clarification from the developers or the C specifications.

Once that is done, code the kernel to that specification, without heed to 
the current _implementation_ of GCC.

Separately, you may wish to test that your particular version of GCC, on
your architecture, with this phase of the moon, conforms to that
specification. If it doesn't, file another bug.

Do not use the ambiguity of the specification as an excuse for coding to 
today's GCC implementation. We should have learned that lesson by now.

Please - come to a clear conclusion about the behaviour _permitted_ by the 
GCC documentation, and _then_ attempt to test it.

--
dwmw2


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was

2000-09-08 Thread Jamie Lokier

[EMAIL PROTECTED] wrote:
  Well, now GCC does CSE across "asm" and will eliminate memory loads,
  even though it may not move them!  I suspect it always did CSE across
  "asm" and we just never got hit by the bug.
 
 dummy_lock trick is equivalent to "memory" clobber.

For GCC 2.7.2 yes.  For current compilers with -fstrict-aliasing, no:
the dummy_lock type doesn't alias with some other memory accesses.

Longer term the compiler may well take into account the array size.

Of course we still need the dummy_lock trick anyway.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was

2000-09-08 Thread kuznet

Hello!

  I guess Alexey point is that the current compiler doesn't notice that.

Rather I proposed explanation, why missing barrier does not
have any effect. 8)8)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-08 Thread David Woodhouse


[EMAIL PROTECTED] said:
  I tried it with two compilers, one older than yours and one newer: 
 In both cases, the memory read occurs before "zzz". 

[EMAIL PROTECTED] said:
  I guess Alexey point is that the current compiler doesn't notice
 that. 

I don't understand why we're bringing empirical evidence into this
discussion. Didn't we get into the horrible mess we've already got w.r.t.
compilers by relying on the existing behaviour never changing?

The GCC documentation exists. If it is unclear, file a bug, and/or obtain 
clarification from the developers or the C specifications.

Once that is done, code the kernel to that specification, without heed to 
the current _implementation_ of GCC.

Separately, you may wish to test that your particular version of GCC, on
your architecture, with this phase of the moon, conforms to that
specification. If it doesn't, file another bug.

Do not use the ambiguity of the specification as an excuse for coding to 
today's GCC implementation. We should have learned that lesson by now.

Please - come to a clear conclusion about the behaviour _permitted_ by the 
GCC documentation, and _then_ attempt to test it.

--
dwmw2


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-08 Thread Andrea Arcangeli

On Fri, 8 Sep 2000, David Woodhouse wrote:

[EMAIL PROTECTED] said:
  I guess Alexey point is that the current compiler doesn't notice
 that. 

I don't understand why we're bringing empirical evidence into this
discussion. Didn't we get into the horrible mess we've already got w.r.t.

I'm the first who noticed and who was complaning the current behaviour, I
was considering it a bug even before I could have empirical evidence (that
I still don't have with my current compiler). I just fixed it partly and
I'm completing the fix now.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was

2000-09-07 Thread Andrea Arcangeli

On Thu, 7 Sep 2000, Richard Henderson wrote:

>Perhaps.  But that's not to say no future compiler won't.

I sure agree. That was my original concern w/o looking/knowing how smart
the current compiler was infact.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was

2000-09-07 Thread Richard Henderson

On Fri, Sep 08, 2000 at 12:34:24AM +0200, Andrea Arcangeli wrote:
> >No it's not.  We know how big the dummy_lock structure is, and
> >so might "know" that it doesn't overlap with something else.
> 
> I guess Alexey point is that the current compiler doesn't notice that.

Perhaps.  But that's not to say no future compiler won't.


r~
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was

2000-09-07 Thread Andrea Arcangeli

On Thu, 7 Sep 2000, Richard Henderson wrote:

>No it's not.  We know how big the dummy_lock structure is, and
>so might "know" that it doesn't overlap with something else.

I guess Alexey point is that the current compiler doesn't notice that.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was

2000-09-07 Thread Richard Henderson

On Thu, Sep 07, 2000 at 09:51:26PM +0400, [EMAIL PROTECTED] wrote:
> dummy_lock trick is equivalent to "memory" clobber.

No it's not.  We know how big the dummy_lock structure is, and
so might "know" that it doesn't overlap with something else.


r~
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was

2000-09-07 Thread kuznet

Hello!

> Well, now GCC does CSE across "asm" and will eliminate memory loads,
> even though it may not move them!  I suspect it always did CSE across
> "asm" and we just never got hit by the bug.

dummy_lock trick is equivalent to "memory" clobber.

So that there is no real bug.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was

2000-09-07 Thread Andrea Arcangeli

On Thu, 7 Sep 2000, Jamie Lokier wrote:

>Common Subexpression Elimination.
>
>If the compiler sees an expression equivalent to one it evaluated
>earlier, there is no need to evaluate it a second time.
>
>So "a = x+x; b = x+x" will evaluate "x+x" just once and store it twice.

I didn't know the name of that particular optimization, thanks.

>In more complicated cases, where the result of the first read is
>actually used before the "asm", the "memory" clobber causes the second
>read to occur as well as the first.

Indeed.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was

2000-09-07 Thread Juan J. Quintela

> "andrea" == Andrea Arcangeli <[EMAIL PROTECTED]> writes:

andrea> On Thu, 7 Sep 2000, Jamie Lokier wrote:
>> Well, now GCC does CSE across "asm" and will eliminate memory loads,

andrea> What is "CSE"?

Common Subexpresion Elimination.  You can get a lot of info in "The
Dragon book".

Later, Juan.

-- 
In theory, practice and theory are the same, but in practice they 
are different -- Larry McVoy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was

2000-09-07 Thread Jamie Lokier

Andrea Arcangeli wrote:
> >Well, now GCC does CSE across "asm" and will eliminate memory loads,
> 
> What is "CSE"?

Common Subexpression Elimination.

If the compiler sees an expression equivalent to one it evaluated
earlier, there is no need to evaluate it a second time.

So "a = x+x; b = x+x" will evaluate "x+x" just once and store it twice.

This applies to memory reads too.  If you read from memory once, and
read it again later, the later read can be eliminated.  You have the
right value in a register already.

Memory CSE depends on proving that no intermediate operation clobbers
the data at that address.  That is why "memory" clobbers are important.
They indicate that the data at that address _may_ have been clobbered,
so it must be read again after the clobbering operation.

Sometimes, after you've forced the second read, you can eliminate the
first one :-) That's why "memory" appears to just move the read in our
test case.  It's really forcing a read after the "asm", which in turn
allows the read before the "asm" to be eliminated.

In more complicated cases, where the result of the first read is
actually used before the "asm", the "memory" clobber causes the second
read to occur as well as the first.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Andrea Arcangeli

On Thu, 7 Sep 2000, Jamie Lokier wrote:

>I tried it with two compilers, one older than yours and one newer:

So maybe I'm just been unlucky/lucky (depends on the point of view :) or
maybe we patched something I'm not aware of to make the kernel to compile
right.

>.ident  "GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)"
>.ident  "GCC: (GNU) 2.96 2724 (experimental)"
>
>The command lines were:
>
>   gcc -S -O2 test4.c
>   kgcc -S -O2 test4.c
>
>In both cases, the memory read occurs before "zzz".

andrea@inspiron:~ > cat p.c
int *p;
int func()
{
  int x;
  x = *p;
  __asm__ __volatile__ ("" : : );
  x = *p;
  return x;
}
andrea@inspiron:~ > gcc -O2 -S p.c
andrea@inspiron:~ > cat p.s
.file   "p.c"
.version"01.01"
gcc2_compiled.:
.text
.align 16
.globl func
.typefunc,@function
func:
pushl %ebp
movl %esp,%ebp
#APP
#NO_APP
^^^
movl p,%eax
movl %ebp,%esp
popl %ebp
movl (%eax),%eax

ret
.Lfe1:
.sizefunc,.Lfe1-func
.comm   p,4,4
.ident  "GCC: (GNU) 2.95.2 19991024 (release)"

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Andrea Arcangeli

On Thu, 7 Sep 2000, Jamie Lokier wrote:

>Yes, it does.

Nice.

>.ident  "GCC: (GNU) 2.96 2724 (experimental)"
>
>>From the Red Hat 7 beta.

Ok.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Jamie Lokier

Andrea Arcangeli wrote:
> >int *p;
> >int func()
> >{
> >  int x;
> >  x = *p;
> >  __asm__ __volatile__ ("" : : : "memory");
> >  x = *p;
> >  return x;
> >}
> 
> Defintely none difference here (-fstrict-aliasing doesn't change anything
> either).
> 
> andrea@inspiron:~ > gcc -v
> Reading specs from /usr/lib/gcc-lib/i486-suse-linux/2.95.2/specs
> gcc version 2.95.2 19991024 (release)

I tried it with two compilers, one older than yours and one newer:

.ident  "GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)"
.ident  "GCC: (GNU) 2.96 2724 (experimental)"

The command lines were:

   gcc -S -O2 test4.c
   kgcc -S -O2 test4.c

In both cases, the memory read occurs before "zzz".

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Jamie Lokier

Andrea Arcangeli wrote:
> Said that if your compiler puts the read before the spin_lock without the
> memory clobber, it is allowed to do that, and in such case you would proof
> it was a real world bug (not just a "documentation" one).

Yes, it does.

> Or maybe your testcase was a bit different then mine, in such case please
> send it to me (I'm curious indeed :).

int *p;
int func()
{
  int x = *p;
  __asm__ __volatile ("zzz" : :);
  x = *p;
  return x;
}

In output:

.ident  "GCC: (GNU) 2.96 2724 (experimental)"

>From the Red Hat 7 beta.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was

2000-09-07 Thread Andrea Arcangeli

On Thu, 7 Sep 2000, Jamie Lokier wrote:

>Well, now GCC does CSE across "asm" and will eliminate memory loads,

What is "CSE"?

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Andrea Arcangeli

On Thu, 7 Sep 2000, Jamie Lokier wrote:

>int *p;
>int func()
>{
>  int x;
>  x = *p;
>  __asm__ __volatile__ ("" : : : "memory");
>  x = *p;
>  return x;
>}
>

Defintely none difference here (-fstrict-aliasing doesn't change anything
either).

andrea@inspiron:~ > gcc -v
Reading specs from /usr/lib/gcc-lib/i486-suse-linux/2.95.2/specs
gcc version 2.95.2 19991024 (release)

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Jamie Lokier

Linus Torvalds wrote:
> Nope. "memory" fills that role too. Remember: "memory" doesn't actually
> say "this clobbers all memory". That would be silly: an asm that just
> wipes all memory would not be a very useful asm (or rather, it would have
> just _one_ use: "execve()"). So "memory" really says that the asm clobbers
> _some_ memory.
> 
> Which in turn means that the code scheduler has to synchronize all memory
> accesses around it - as if the asm was reading all memory. Because if the
> scheduler would move a store to after the asm, it would give the wrong
> results if the asm happened to clobber _that_ memory. And the scheduler
> obviously cannot just drop the store ("Oh, the asm will clobber this
> anyway"), because it doesn't know which memory regions get clobbered.

There are other reasons why stores can be eliminated, and "memory" does
not prevent those.

Example: you have two stores to the same address.  An "asm" with
"memory" is in between the stores.

The first store can still be dropped, even though we don't know _which_
memory the "memory" clobbers.

Now, "memory" may well mean "this asm is considered to read some memory
and clobber some memory".  But that's not what the manual says, and it
would be inconsistent with other kinds of clobber.

> Now, the fact that the "memory" clobber also seems to clobber local
> variables is a bug, I think.

If you are thinking of my examples, "memory" doesn't clobber local
variables.  It does affect the scheduling of reads from general memory
into a local variable.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was

2000-09-07 Thread Jamie Lokier

[EMAIL PROTECTED] wrote:
> Just hint. I remember the time when "memory" clobber option
> was _absent_ in gcc. And we managed to compile kernel with such gcc. 8)
> To all that I understand, "asm" (like function calls) implied barrier
> that time and constraints and clobber option were used only for
> register allocation and reloading. 

Well, now GCC does CSE across "asm" and will eliminate memory loads,
even though it may not move them!  I suspect it always did CSE across
"asm" and we just never got hit by the bug.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Jamie Lokier

Linus Torvalds wrote:
> Change it to something like
> 
>   __asm__("":"=r" (x):"0" (x));
> 
> and the "volatile" should matter.

Yes it does.  Without "volatile", the asm disappears :-)

> Not for memory references, perhaps. But for the movement issues.

The compiler isn't moving memory references around "asm volatile", but
it is doing CSE around them to _eliminate_ memory references.

Thus spin_lock needs the memory clobber, to prevent CSE of non-volatile
memory references between the critical region and outside the critical
region.

Maybe spin_unlock doesn't need one because CSE doesn't work the other
way.  (I'd put that clobber in anyway, to be sure).

-- Jamie


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Linus Torvalds



On Thu, 7 Sep 2000, Jamie Lokier wrote:
> 
> It's ok for the compiler to do that (given we don't know what "volatile"
> means anyway :-).  But it does have implications for spin_lock:
> spin_lock must say that it clobbers memory.

Yup. We should just fix that.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Linus Torvalds



On Thu, 7 Sep 2000, Jamie Lokier wrote:
> 
> ps. There is a _clobber_ for memory, but no way to say "this asm _reads_
> arbitrary memory".  __volatile__ may be filling that role though.

Nope. "memory" fills that role too. Remember: "memory" doesn't actually
say "this clobbers all memory". That would be silly: an asm that just
wipes all memory would not be a very useful asm (or rather, it would have
just _one_ use: "execve()"). So "memory" really says that the asm clobbers
_some_ memory.

Which in turn means that the code scheduler has to synchronize all memory
accesses around it - as if the asm was reading all memory. Because if the
scheduler would move a store to after the asm, it would give the wrong
results if the asm happened to clobber _that_ memory. And the scheduler
obviously cannot just drop the store ("Oh, the asm will clobber this
anyway"), because it doesn't know which memory regions get clobbered.

Now, the fact that the "memory" clobber also seems to clobber local
variables is a bug, I think. Or at least a misfeature. It should not be
considered to clobber reloads, as those are really in "registers" - at
least as far as the asm is concerned (the compiler could have chosen to
just allocate a hard register to that local variable or argument).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Jamie Lokier

Linus Torvalds wrote:
> "volatile" should be equivalent to clobbering memory, although the gcc
> manual pages are certainly not very verbose on the issue.

It isn't.  Try the following with/without the memory clobber:

int *p;
int func()
{
  int x;
  x = *p;
  __asm__ __volatile__ ("" : : : "memory");
  x = *p;
  return x;
}

With or without, the program reads *p only once.  However, with the
clobber it reads _after_ the asm; without, it reads _before_ the asm.

It's ok for the compiler to do that (given we don't know what "volatile"
means anyway :-).  But it does have implications for spin_lock:
spin_lock must say that it clobbers memory.

spin_unlock should also say it clobbers memory but I have no test case
to demonstrate the compiler moving reads down past an asm.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Jamie Lokier

Andrea Arcangeli wrote:
> >>int a = *p;
> >>__asm__ __volatile__("" : :);
> >>a = *p;
> >> 
> >> (to do two explicit reads)
> >
> >Sorry, that does just one read, kgcc (old stable gcc) and also with
> >gcc-2.96.  Type aliasing on/off makes no difference to the number of reads.
> 
> I wrote the above not just as a complete testecase, but just to mean what
> the case I was talking about. You made int a a local variable and the
> thing you noticed is an otimization that the compiler is allowed to do
> regardless of the "memory" clobber too (`int a' have to be at least extern
> otherwise the compiler understands the first read can go away).

Interestingly enough, the local variable case is one where "memory" does
make a difference.  Without "memory":

movlp, %eax
movl(%eax), %eax
#APP
#NO_APP

With "memory":

#APP
#NO_APP
movlp, %eax
movl(%eax), %eax

> Try to add "memory" as clobber to the above testcase and nothing will
> change. (that's what I meant in my previous email saying that even w/o
> "memory" things got compiled right at least in my simple testcases)

As you can see from above, there are cases where

   local_var = shared->mumble;
   // ...
   spin_lock ();
   local_var = shared->mumble;

requires a "memory" clobber, otherwise the second read, which is in a
critical region, won't be emitted by the compiler.

-- Jamie

ps. There is a _clobber_ for memory, but no way to say "this asm _reads_
arbitrary memory".  __volatile__ may be filling that role though.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Linus Torvalds



On Thu, 7 Sep 2000, Jamie Lokier wrote:

> asm *__volatile__* seems to make no difference.  I've tried a few things.
> 
> Andrea Arcangeli wrote:
> > Maybe we can rely on the __volatile__ statement of the asm that will
> > enforce that if we write:
> > 
> > *p = 0;
> > __asm__ __volatile__("" : :);
> > *p = 1;
> >
> > in the assembler we'll then find both a write of 0 and then a write of 1
> > to memory.
> 
> That does 2 writes with gcc-2.96 and also egcs-2.91.66/19990314
> (Red Hat's kgcc), with or without -fstrict-aliasing.
> 
> It also does 2 writes without __volatile__.

Your test is broken.

Read the gcc documentation. A inline asm with no outputs is implicitly
considered volatile. So _both_ your tests had volatile there.

Now, that may not matter that much fo ryour test-case: gcc gets careful
around inline asm anyway, even without the volatile.


Change it to something like

__asm__("":"=r" (x):"0" (x));

and the "volatile" should matter.

Not for memory references, perhaps. But for the movement issues.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was

2000-09-07 Thread kuznet

Hello!

> tried to grep gcc but my gcc knowledge is too low to reverse engeneer the
> implement semantics of the "memory" clobber fast

Just hint. I remember the time when "memory" clobber option
was _absent_ in gcc. And we managed to compile kernel with such gcc. 8)
To all that I understand, "asm" (like function calls) implied barrier
that time and constraints and clobber option were used only for
register allocation and reloading. 

Alexey


BTW Look also into asm-i386/bitops.h and dummy cast to some crap there.
Are you impressed? 8)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Andrea Arcangeli

On Thu, 7 Sep 2000, Jamie Lokier wrote:

>asm *__volatile__* seems to make no difference.  I've tried a few things.

It makes a difference, see below.

>
>Andrea Arcangeli wrote:
>> Maybe we can rely on the __volatile__ statement of the asm that will
>> enforce that if we write:
>> 
>>  *p = 0;
>>  __asm__ __volatile__("" : :);
>>  *p = 1;
>>
>> in the assembler we'll then find both a write of 0 and then a write of 1
>> to memory.
>
>That does 2 writes with gcc-2.96 and also egcs-2.91.66/19990314
>(Red Hat's kgcc), with or without -fstrict-aliasing.
>
>It also does 2 writes without __volatile__.
>
>>  int a = *p;
>>  __asm__ __volatile__("" : :);
>>  a = *p;
>> 
>> (to do two explicit reads)
>
>Sorry, that does just one read, kgcc (old stable gcc) and also with
>gcc-2.96.  Type aliasing on/off makes no difference to the number of reads.

I wrote the above not just as a complete testecase, but just to mean what
the case I was talking about. You made int a a local variable and the
thing you noticed is an otimization that the compiler is allowed to do
regardless of the "memory" clobber too (`int a' have to be at least extern
otherwise the compiler understands the first read can go away). Try this:

int * p;
int a;

extern f(int);

main()
{
a = *p;

__asm__ __volatile__("zzz" : : );

a = *p;

f(a);
}

Try to add "memory" as clobber to the above testcase and nothing will
change. (that's what I meant in my previous email saying that even w/o
"memory" things got compiled right at least in my simple testcases)

>Again, __volatile__ makes no difference.

Note that __volatile__ really makes a difference if for example you
speficy as output an operand that isn't used anymore.

Try this:

main()
{
int a;
__asm__("zzz" : "=r" (a));
}

and then this:

main()
{
int a;
__asm__ __volatile__("zzz" : "=r" (a));
}


--- p.s.nonvolatile Thu Sep  7 18:05:30 2000
+++ p.s Thu Sep  7 18:05:53 2000
@@ -6,10 +6,13 @@
 .globl main
.typemain,@function
 main:
pushl %ebp
movl %esp,%ebp
+#APP
+   zzz
+#NO_APP
movl %ebp,%esp
popl %ebp
ret
 .Lfe1:
.sizemain,.Lfe1-main

>I cannot tell from the GCC manual whether either of these behaviours is
>correct or not.

The behaviour of what you described is definitely correct/safe.

My only wondering was about "memory" non-"memory" as clobber because gcc
was doing things right just with the __asm__ __volatile__ thing w/o
"memory" as clobber. However I had the confirm my worries was right and
that "memory" is needed for all the spinlocks.

BTW, we had a bug in the alpha port last month in the
linux/include/asm-alpha/fpu.h:wrfpcr() function, read it for a real world
example of where __volatile__ must be used.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Linus Torvalds



On Thu, 7 Sep 2000, Andrea Arcangeli wrote:

> On Mon, 4 Sep 2000, Andrea Arcangeli wrote:
> 
> >barrier()). I also noticed __sti()/__save_flags() doesn't need to clobber
> >"memory".
> 
> I'm not sure anymore if __sti and spin_unlock() doesn't need to clobber
> memory (it looks necessary to make sure the compiler doesn't delay to
> write data to the memory out of the critical section).

"volatile" should be equivalent to clobbering memory, although the gcc
manual pages are certainly not very verbose on the issue.

Adding a memory clobber certainly migth be a good idea. David, who knows
what gcc actually does wrt "volatile"?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Jamie Lokier

asm *__volatile__* seems to make no difference.  I've tried a few things.

Andrea Arcangeli wrote:
> Maybe we can rely on the __volatile__ statement of the asm that will
> enforce that if we write:
> 
>   *p = 0;
>   __asm__ __volatile__("" : :);
>   *p = 1;
>
> in the assembler we'll then find both a write of 0 and then a write of 1
> to memory.

That does 2 writes with gcc-2.96 and also egcs-2.91.66/19990314
(Red Hat's kgcc), with or without -fstrict-aliasing.

It also does 2 writes without __volatile__.

>   int a = *p;
>   __asm__ __volatile__("" : :);
>   a = *p;
> 
> (to do two explicit reads)

Sorry, that does just one read, kgcc (old stable gcc) and also with
gcc-2.96.  Type aliasing on/off makes no difference to the number of reads.

Again, __volatile__ makes no difference.

I cannot tell from the GCC manual whether either of these behaviours is
correct or not.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Andrea Arcangeli

On Thu, 7 Sep 2000, Franz Sirl wrote:

>In short terms:
>
>- __volatile__ assures that the code isn't reordered against other 
>__volatile__ and isn't hoisted out of loops, nothing else
>- the "memory" clobber makes sure the asm isn't reordered against other 
>memory accesses

Ok. That's all I wanted to hear.

So _definitely_ all spinlocks needs "memory" in the clobber list.

I'll do a new patch reinserting "memory" in __sti and inserting
"memory" also in the spin_unlock() case.

The reason of my doubt was that I only got one agreement by Pavel and none
other comment. Furthmore in practice there was no miscompilation thus I
was wondering if I misunderstood the semantics of __volatile__ (but then
of course I was asking myself what "memory" was good for :))

>Essentially, you _always_ have to tell the compiler if you touch memory 
>behind it's back, this includes inline assembly to flush the cache or the 

I understand this completly. And as said we can't do that with the
spinlocks (at least with the current API to spin_lock and friends) thus we
need "memory" in the clobber list.

>General rule of thumb for inline assembly:
>
>   Give the compiler as much information as possible!!
>
>If you know inline assembly read/writes memory, tell it to the compiler, as 
>detailed as possible!

Indeed :). If we could teach all the memory changes to the inline assembly
then "memory" wouldn't be necessary anymore into the clobber list.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Franz Sirl

At 17:03 07.09.00, Andrea Arcangeli wrote:
>On Mon, 4 Sep 2000, Andrea Arcangeli wrote:
>
> >barrier()). I also noticed __sti()/__save_flags() doesn't need to clobber
> >"memory".
>
>I'm not sure anymore if __sti and spin_unlock() doesn't need to clobber
>memory (it looks necessary to make sure the compiler doesn't delay to
>write data to the memory out of the critical section).
>
>And in practice I can't reproduce any corruption with any subtle testcase
>by removing "memory" from the clobber list of all the
>spinlocks/__sti/__cli, so it may be the other way around. Maybe we can
>rely on the __volatile__ statement of the asm that will enforce that if we
>write:
>
> *p = 0;
> __asm__ __volatile__("" : :);
> *p = 1;
>
>in the assembler we'll then find both a write of 0 and then a write of 1
>to memory. I'd say with the current compiler we can rely on it (infact the
>spinlock doesn't seems to get miscompiled at the moment even while they
>aren't clobbering "memory".
>
>Same with:
>
> int a = *p;
> __asm__ __volatile__("" : :);
> a = *p;
>
>(to do two explicit reads)
>
>If "memory" isn't necessary in spin_lock, then all the "memory" clobbers
>around include/*/* (also in mb() and __cli() and even barrier()) are not
>necessary. But then "why" we need a "memory" clobber in first place if the
>"memory" isn't necessary? :)) Also the gcc documentation seems to say we
>need "memory" in all such operations (also in __sti() and
>spin_unlock() unlike I said in my earlier email).
>
>I'd like if some compiler guy could comment (I got no one reply yet). I
>tried to grep gcc but my gcc knowledge is too low to reverse engeneer the
>implement semantics of the "memory" clobber fast (though I would
>appreciate a pointer in the gcc sources to start to understand some gcc
>code :).

In short terms:

- __volatile__ assures that the code isn't reordered against other 
__volatile__ and isn't hoisted out of loops, nothing else
- the "memory" clobber makes sure the asm isn't reordered against other 
memory accesses

Essentially, you _always_ have to tell the compiler if you touch memory 
behind it's back, this includes inline assembly to flush the cache or the 
write back queue. Since you usually don't know which part of the memory 
gets touched, you need the global "memory" clobber. If you know which 
addresses you touch, you can get away with something like this, from 
asm-ppc/io.h:

extern inline unsigned in_be32(volatile unsigned *addr)
{
 unsigned ret;

 __asm__ __volatile__("lwz%U1%X1 %0,%1; eieio" : "=r" (ret) : "m" 
(*addr));
 return ret;
}

extern inline void out_le32(volatile unsigned *addr, int val)
{
 __asm__ __volatile__("stwbrx %1,0,%2; eieio" : "=m" (*addr) :
  "r" (val), "r" (addr));
}

General rule of thumb for inline assembly:

   Give the compiler as much information as possible!!

If you know inline assembly read/writes memory, tell it to the compiler, as 
detailed as possible!

Franz.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Andrea Arcangeli

On Mon, 4 Sep 2000, Andrea Arcangeli wrote:

>barrier()). I also noticed __sti()/__save_flags() doesn't need to clobber
>"memory".

I'm not sure anymore if __sti and spin_unlock() doesn't need to clobber
memory (it looks necessary to make sure the compiler doesn't delay to
write data to the memory out of the critical section).

And in practice I can't reproduce any corruption with any subtle testcase
by removing "memory" from the clobber list of all the
spinlocks/__sti/__cli, so it may be the other way around. Maybe we can
rely on the __volatile__ statement of the asm that will enforce that if we
write:

*p = 0;
__asm__ __volatile__("" : :);
*p = 1;

in the assembler we'll then find both a write of 0 and then a write of 1
to memory. I'd say with the current compiler we can rely on it (infact the
spinlock doesn't seems to get miscompiled at the moment even while they
aren't clobbering "memory".

Same with:

int a = *p;
__asm__ __volatile__("" : :);
a = *p;

(to do two explicit reads)  

If "memory" isn't necessary in spin_lock, then all the "memory" clobbers
around include/*/* (also in mb() and __cli() and even barrier()) are not
necessary. But then "why" we need a "memory" clobber in first place if the
"memory" isn't necessary? :)) Also the gcc documentation seems to say we
need "memory" in all such operations (also in __sti() and
spin_unlock() unlike I said in my earlier email).

I'd like if some compiler guy could comment (I got no one reply yet). I
tried to grep gcc but my gcc knowledge is too low to reverse engeneer the
implement semantics of the "memory" clobber fast (though I would
appreciate a pointer in the gcc sources to start to understand some gcc
code :).

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was

2000-09-07 Thread kuznet

Hello!

 tried to grep gcc but my gcc knowledge is too low to reverse engeneer the
 implement semantics of the "memory" clobber fast

Just hint. I remember the time when "memory" clobber option
was _absent_ in gcc. And we managed to compile kernel with such gcc. 8)
To all that I understand, "asm" (like function calls) implied barrier
that time and constraints and clobber option were used only for
register allocation and reloading. 

Alexey


BTW Look also into asm-i386/bitops.h and dummy cast to some crap there.
Are you impressed? 8)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was

2000-09-07 Thread Andrea Arcangeli

On Thu, 7 Sep 2000, Jamie Lokier wrote:

Well, now GCC does CSE across "asm" and will eliminate memory loads,

What is "CSE"?

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was

2000-09-07 Thread Andrea Arcangeli

On Thu, 7 Sep 2000, Jamie Lokier wrote:

Common Subexpression Elimination.

If the compiler sees an expression equivalent to one it evaluated
earlier, there is no need to evaluate it a second time.

So "a = x+x; b = x+x" will evaluate "x+x" just once and store it twice.

I didn't know the name of that particular optimization, thanks.

In more complicated cases, where the result of the first read is
actually used before the "asm", the "memory" clobber causes the second
read to occur as well as the first.

Indeed.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was

2000-09-07 Thread Juan J. Quintela

 "andrea" == Andrea Arcangeli [EMAIL PROTECTED] writes:

andrea On Thu, 7 Sep 2000, Jamie Lokier wrote:
 Well, now GCC does CSE across "asm" and will eliminate memory loads,

andrea What is "CSE"?

Common Subexpresion Elimination.  You can get a lot of info in "The
Dragon book".

Later, Juan.

-- 
In theory, practice and theory are the same, but in practice they 
are different -- Larry McVoy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was

2000-09-07 Thread kuznet

Hello!

 Well, now GCC does CSE across "asm" and will eliminate memory loads,
 even though it may not move them!  I suspect it always did CSE across
 "asm" and we just never got hit by the bug.

dummy_lock trick is equivalent to "memory" clobber.

So that there is no real bug.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was

2000-09-07 Thread Richard Henderson

On Thu, Sep 07, 2000 at 09:51:26PM +0400, [EMAIL PROTECTED] wrote:
 dummy_lock trick is equivalent to "memory" clobber.

No it's not.  We know how big the dummy_lock structure is, and
so might "know" that it doesn't overlap with something else.


r~
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was

2000-09-07 Thread Andrea Arcangeli

On Thu, 7 Sep 2000, Richard Henderson wrote:

No it's not.  We know how big the dummy_lock structure is, and
so might "know" that it doesn't overlap with something else.

I guess Alexey point is that the current compiler doesn't notice that.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was

2000-09-07 Thread Richard Henderson

On Fri, Sep 08, 2000 at 12:34:24AM +0200, Andrea Arcangeli wrote:
 No it's not.  We know how big the dummy_lock structure is, and
 so might "know" that it doesn't overlap with something else.
 
 I guess Alexey point is that the current compiler doesn't notice that.

Perhaps.  But that's not to say no future compiler won't.


r~
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was

2000-09-07 Thread Andrea Arcangeli

On Thu, 7 Sep 2000, Richard Henderson wrote:

Perhaps.  But that's not to say no future compiler won't.

I sure agree. That was my original concern w/o looking/knowing how smart
the current compiler was infact.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Linus Torvalds



On Thu, 7 Sep 2000, Jamie Lokier wrote:

 asm *__volatile__* seems to make no difference.  I've tried a few things.
 
 Andrea Arcangeli wrote:
  Maybe we can rely on the __volatile__ statement of the asm that will
  enforce that if we write:
  
  *p = 0;
  __asm__ __volatile__("" : :);
  *p = 1;
 
  in the assembler we'll then find both a write of 0 and then a write of 1
  to memory.
 
 That does 2 writes with gcc-2.96 and also egcs-2.91.66/19990314
 (Red Hat's kgcc), with or without -fstrict-aliasing.
 
 It also does 2 writes without __volatile__.

Your test is broken.

Read the gcc documentation. A inline asm with no outputs is implicitly
considered volatile. So _both_ your tests had volatile there.

Now, that may not matter that much fo ryour test-case: gcc gets careful
around inline asm anyway, even without the volatile.


Change it to something like

__asm__("":"=r" (x):"0" (x));

and the "volatile" should matter.

Not for memory references, perhaps. But for the movement issues.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Jamie Lokier

Andrea Arcangeli wrote:
 int a = *p;
 __asm__ __volatile__("" : :);
 a = *p;
  
  (to do two explicit reads)
 
 Sorry, that does just one read, kgcc (old stable gcc) and also with
 gcc-2.96.  Type aliasing on/off makes no difference to the number of reads.
 
 I wrote the above not just as a complete testecase, but just to mean what
 the case I was talking about. You made int a a local variable and the
 thing you noticed is an otimization that the compiler is allowed to do
 regardless of the "memory" clobber too (`int a' have to be at least extern
 otherwise the compiler understands the first read can go away).

Interestingly enough, the local variable case is one where "memory" does
make a difference.  Without "memory":

movlp, %eax
movl(%eax), %eax
#APP
#NO_APP

With "memory":

#APP
#NO_APP
movlp, %eax
movl(%eax), %eax

 Try to add "memory" as clobber to the above testcase and nothing will
 change. (that's what I meant in my previous email saying that even w/o
 "memory" things got compiled right at least in my simple testcases)

As you can see from above, there are cases where

   local_var = shared-mumble;
   // ...
   spin_lock (spinlock);
   local_var = shared-mumble;

requires a "memory" clobber, otherwise the second read, which is in a
critical region, won't be emitted by the compiler.

-- Jamie

ps. There is a _clobber_ for memory, but no way to say "this asm _reads_
arbitrary memory".  __volatile__ may be filling that role though.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Jamie Lokier

Linus Torvalds wrote:
 "volatile" should be equivalent to clobbering memory, although the gcc
 manual pages are certainly not very verbose on the issue.

It isn't.  Try the following with/without the memory clobber:

int *p;
int func()
{
  int x;
  x = *p;
  __asm__ __volatile__ ("" : : : "memory");
  x = *p;
  return x;
}

With or without, the program reads *p only once.  However, with the
clobber it reads _after_ the asm; without, it reads _before_ the asm.

It's ok for the compiler to do that (given we don't know what "volatile"
means anyway :-).  But it does have implications for spin_lock:
spin_lock must say that it clobbers memory.

spin_unlock should also say it clobbers memory but I have no test case
to demonstrate the compiler moving reads down past an asm.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Linus Torvalds



On Thu, 7 Sep 2000, Jamie Lokier wrote:
 
 It's ok for the compiler to do that (given we don't know what "volatile"
 means anyway :-).  But it does have implications for spin_lock:
 spin_lock must say that it clobbers memory.

Yup. We should just fix that.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Jamie Lokier

Linus Torvalds wrote:
 Change it to something like
 
   __asm__("":"=r" (x):"0" (x));
 
 and the "volatile" should matter.

Yes it does.  Without "volatile", the asm disappears :-)

 Not for memory references, perhaps. But for the movement issues.

The compiler isn't moving memory references around "asm volatile", but
it is doing CSE around them to _eliminate_ memory references.

Thus spin_lock needs the memory clobber, to prevent CSE of non-volatile
memory references between the critical region and outside the critical
region.

Maybe spin_unlock doesn't need one because CSE doesn't work the other
way.  (I'd put that clobber in anyway, to be sure).

-- Jamie


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Linus Torvalds



On Thu, 7 Sep 2000, Andrea Arcangeli wrote:

 On Mon, 4 Sep 2000, Andrea Arcangeli wrote:
 
 barrier()). I also noticed __sti()/__save_flags() doesn't need to clobber
 "memory".
 
 I'm not sure anymore if __sti and spin_unlock() doesn't need to clobber
 memory (it looks necessary to make sure the compiler doesn't delay to
 write data to the memory out of the critical section).

"volatile" should be equivalent to clobbering memory, although the gcc
manual pages are certainly not very verbose on the issue.

Adding a memory clobber certainly migth be a good idea. David, who knows
what gcc actually does wrt "volatile"?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Andrea Arcangeli

On Thu, 7 Sep 2000, Franz Sirl wrote:

In short terms:

- __volatile__ assures that the code isn't reordered against other 
__volatile__ and isn't hoisted out of loops, nothing else
- the "memory" clobber makes sure the asm isn't reordered against other 
memory accesses

Ok. That's all I wanted to hear.

So _definitely_ all spinlocks needs "memory" in the clobber list.

I'll do a new patch reinserting "memory" in __sti and inserting
"memory" also in the spin_unlock() case.

The reason of my doubt was that I only got one agreement by Pavel and none
other comment. Furthmore in practice there was no miscompilation thus I
was wondering if I misunderstood the semantics of __volatile__ (but then
of course I was asking myself what "memory" was good for :))

Essentially, you _always_ have to tell the compiler if you touch memory 
behind it's back, this includes inline assembly to flush the cache or the 

I understand this completly. And as said we can't do that with the
spinlocks (at least with the current API to spin_lock and friends) thus we
need "memory" in the clobber list.

General rule of thumb for inline assembly:

   Give the compiler as much information as possible!!

If you know inline assembly read/writes memory, tell it to the compiler, as 
detailed as possible!

Indeed :). If we could teach all the memory changes to the inline assembly
then "memory" wouldn't be necessary anymore into the clobber list.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Jamie Lokier

Linus Torvalds wrote:
 Nope. "memory" fills that role too. Remember: "memory" doesn't actually
 say "this clobbers all memory". That would be silly: an asm that just
 wipes all memory would not be a very useful asm (or rather, it would have
 just _one_ use: "execve()"). So "memory" really says that the asm clobbers
 _some_ memory.
 
 Which in turn means that the code scheduler has to synchronize all memory
 accesses around it - as if the asm was reading all memory. Because if the
 scheduler would move a store to after the asm, it would give the wrong
 results if the asm happened to clobber _that_ memory. And the scheduler
 obviously cannot just drop the store ("Oh, the asm will clobber this
 anyway"), because it doesn't know which memory regions get clobbered.

There are other reasons why stores can be eliminated, and "memory" does
not prevent those.

Example: you have two stores to the same address.  An "asm" with
"memory" is in between the stores.

The first store can still be dropped, even though we don't know _which_
memory the "memory" clobbers.

Now, "memory" may well mean "this asm is considered to read some memory
and clobber some memory".  But that's not what the manual says, and it
would be inconsistent with other kinds of clobber.

 Now, the fact that the "memory" clobber also seems to clobber local
 variables is a bug, I think.

If you are thinking of my examples, "memory" doesn't clobber local
variables.  It does affect the scheduling of reads from general memory
into a local variable.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Andrea Arcangeli

On Thu, 7 Sep 2000, Jamie Lokier wrote:

int *p;
int func()
{
  int x;
  x = *p;
  __asm__ __volatile__ ("" : : : "memory");
  x = *p;
  return x;
}


Defintely none difference here (-fstrict-aliasing doesn't change anything
either).

andrea@inspiron:~  gcc -v
Reading specs from /usr/lib/gcc-lib/i486-suse-linux/2.95.2/specs
gcc version 2.95.2 19991024 (release)

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Franz Sirl

At 17:03 07.09.00, Andrea Arcangeli wrote:
On Mon, 4 Sep 2000, Andrea Arcangeli wrote:

 barrier()). I also noticed __sti()/__save_flags() doesn't need to clobber
 "memory".

I'm not sure anymore if __sti and spin_unlock() doesn't need to clobber
memory (it looks necessary to make sure the compiler doesn't delay to
write data to the memory out of the critical section).

And in practice I can't reproduce any corruption with any subtle testcase
by removing "memory" from the clobber list of all the
spinlocks/__sti/__cli, so it may be the other way around. Maybe we can
rely on the __volatile__ statement of the asm that will enforce that if we
write:

 *p = 0;
 __asm__ __volatile__("" : :);
 *p = 1;

in the assembler we'll then find both a write of 0 and then a write of 1
to memory. I'd say with the current compiler we can rely on it (infact the
spinlock doesn't seems to get miscompiled at the moment even while they
aren't clobbering "memory".

Same with:

 int a = *p;
 __asm__ __volatile__("" : :);
 a = *p;

(to do two explicit reads)

If "memory" isn't necessary in spin_lock, then all the "memory" clobbers
around include/*/* (also in mb() and __cli() and even barrier()) are not
necessary. But then "why" we need a "memory" clobber in first place if the
"memory" isn't necessary? :)) Also the gcc documentation seems to say we
need "memory" in all such operations (also in __sti() and
spin_unlock() unlike I said in my earlier email).

I'd like if some compiler guy could comment (I got no one reply yet). I
tried to grep gcc but my gcc knowledge is too low to reverse engeneer the
implement semantics of the "memory" clobber fast (though I would
appreciate a pointer in the gcc sources to start to understand some gcc
code :).

In short terms:

- __volatile__ assures that the code isn't reordered against other 
__volatile__ and isn't hoisted out of loops, nothing else
- the "memory" clobber makes sure the asm isn't reordered against other 
memory accesses

Essentially, you _always_ have to tell the compiler if you touch memory 
behind it's back, this includes inline assembly to flush the cache or the 
write back queue. Since you usually don't know which part of the memory 
gets touched, you need the global "memory" clobber. If you know which 
addresses you touch, you can get away with something like this, from 
asm-ppc/io.h:

extern inline unsigned in_be32(volatile unsigned *addr)
{
 unsigned ret;

 __asm__ __volatile__("lwz%U1%X1 %0,%1; eieio" : "=r" (ret) : "m" 
(*addr));
 return ret;
}

extern inline void out_le32(volatile unsigned *addr, int val)
{
 __asm__ __volatile__("stwbrx %1,0,%2; eieio" : "=m" (*addr) :
  "r" (val), "r" (addr));
}

General rule of thumb for inline assembly:

   Give the compiler as much information as possible!!

If you know inline assembly read/writes memory, tell it to the compiler, as 
detailed as possible!

Franz.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Jamie Lokier

Andrea Arcangeli wrote:
 Said that if your compiler puts the read before the spin_lock without the
 memory clobber, it is allowed to do that, and in such case you would proof
 it was a real world bug (not just a "documentation" one).

Yes, it does.

 Or maybe your testcase was a bit different then mine, in such case please
 send it to me (I'm curious indeed :).

int *p;
int func()
{
  int x = *p;
  __asm__ __volatile ("zzz" : :);
  x = *p;
  return x;
}

In output:

.ident  "GCC: (GNU) 2.96 2724 (experimental)"

From the Red Hat 7 beta.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Jamie Lokier

Andrea Arcangeli wrote:
 int *p;
 int func()
 {
   int x;
   x = *p;
   __asm__ __volatile__ ("" : : : "memory");
   x = *p;
   return x;
 }
 
 Defintely none difference here (-fstrict-aliasing doesn't change anything
 either).
 
 andrea@inspiron:~  gcc -v
 Reading specs from /usr/lib/gcc-lib/i486-suse-linux/2.95.2/specs
 gcc version 2.95.2 19991024 (release)

I tried it with two compilers, one older than yours and one newer:

.ident  "GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)"
.ident  "GCC: (GNU) 2.96 2724 (experimental)"

The command lines were:

   gcc -S -O2 test4.c
   kgcc -S -O2 test4.c

In both cases, the memory read occurs before "zzz".

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Andrea Arcangeli

On Thu, 7 Sep 2000, Jamie Lokier wrote:

Yes, it does.

Nice.

.ident  "GCC: (GNU) 2.96 2724 (experimental)"

From the Red Hat 7 beta.

Ok.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Andrea Arcangeli

On Thu, 7 Sep 2000, Jamie Lokier wrote:

I tried it with two compilers, one older than yours and one newer:

So maybe I'm just been unlucky/lucky (depends on the point of view :) or
maybe we patched something I'm not aware of to make the kernel to compile
right.

.ident  "GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)"
.ident  "GCC: (GNU) 2.96 2724 (experimental)"

The command lines were:

   gcc -S -O2 test4.c
   kgcc -S -O2 test4.c

In both cases, the memory read occurs before "zzz".

andrea@inspiron:~  cat p.c
int *p;
int func()
{
  int x;
  x = *p;
  __asm__ __volatile__ ("" : : );
  x = *p;
  return x;
}
andrea@inspiron:~  gcc -O2 -S p.c
andrea@inspiron:~  cat p.s
.file   "p.c"
.version"01.01"
gcc2_compiled.:
.text
.align 16
.globl func
.typefunc,@function
func:
pushl %ebp
movl %esp,%ebp
#APP
#NO_APP
^^^
movl p,%eax
movl %ebp,%esp
popl %ebp
movl (%eax),%eax

ret
.Lfe1:
.sizefunc,.Lfe1-func
.comm   p,4,4
.ident  "GCC: (GNU) 2.95.2 19991024 (release)"

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Jamie Lokier

asm *__volatile__* seems to make no difference.  I've tried a few things.

Andrea Arcangeli wrote:
 Maybe we can rely on the __volatile__ statement of the asm that will
 enforce that if we write:
 
   *p = 0;
   __asm__ __volatile__("" : :);
   *p = 1;

 in the assembler we'll then find both a write of 0 and then a write of 1
 to memory.

That does 2 writes with gcc-2.96 and also egcs-2.91.66/19990314
(Red Hat's kgcc), with or without -fstrict-aliasing.

It also does 2 writes without __volatile__.

   int a = *p;
   __asm__ __volatile__("" : :);
   a = *p;
 
 (to do two explicit reads)

Sorry, that does just one read, kgcc (old stable gcc) and also with
gcc-2.96.  Type aliasing on/off makes no difference to the number of reads.

Again, __volatile__ makes no difference.

I cannot tell from the GCC manual whether either of these behaviours is
correct or not.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Andrea Arcangeli

On Thu, 7 Sep 2000, Jamie Lokier wrote:

asm *__volatile__* seems to make no difference.  I've tried a few things.

It makes a difference, see below.


Andrea Arcangeli wrote:
 Maybe we can rely on the __volatile__ statement of the asm that will
 enforce that if we write:
 
  *p = 0;
  __asm__ __volatile__("" : :);
  *p = 1;

 in the assembler we'll then find both a write of 0 and then a write of 1
 to memory.

That does 2 writes with gcc-2.96 and also egcs-2.91.66/19990314
(Red Hat's kgcc), with or without -fstrict-aliasing.

It also does 2 writes without __volatile__.

  int a = *p;
  __asm__ __volatile__("" : :);
  a = *p;
 
 (to do two explicit reads)

Sorry, that does just one read, kgcc (old stable gcc) and also with
gcc-2.96.  Type aliasing on/off makes no difference to the number of reads.

I wrote the above not just as a complete testecase, but just to mean what
the case I was talking about. You made int a a local variable and the
thing you noticed is an otimization that the compiler is allowed to do
regardless of the "memory" clobber too (`int a' have to be at least extern
otherwise the compiler understands the first read can go away). Try this:

int * p;
int a;

extern f(int);

main()
{
a = *p;

__asm__ __volatile__("zzz" : : );

a = *p;

f(a);
}

Try to add "memory" as clobber to the above testcase and nothing will
change. (that's what I meant in my previous email saying that even w/o
"memory" things got compiled right at least in my simple testcases)

Again, __volatile__ makes no difference.

Note that __volatile__ really makes a difference if for example you
speficy as output an operand that isn't used anymore.

Try this:

main()
{
int a;
__asm__("zzz" : "=r" (a));
}

and then this:

main()
{
int a;
__asm__ __volatile__("zzz" : "=r" (a));
}


--- p.s.nonvolatile Thu Sep  7 18:05:30 2000
+++ p.s Thu Sep  7 18:05:53 2000
@@ -6,10 +6,13 @@
 .globl main
.typemain,@function
 main:
pushl %ebp
movl %esp,%ebp
+#APP
+   zzz
+#NO_APP
movl %ebp,%esp
popl %ebp
ret
 .Lfe1:
.sizemain,.Lfe1-main

I cannot tell from the GCC manual whether either of these behaviours is
correct or not.

The behaviour of what you described is definitely correct/safe.

My only wondering was about "memory" non-"memory" as clobber because gcc
was doing things right just with the __asm__ __volatile__ thing w/o
"memory" as clobber. However I had the confirm my worries was right and
that "memory" is needed for all the spinlocks.

BTW, we had a bug in the alpha port last month in the
linux/include/asm-alpha/fpu.h:wrfpcr() function, read it for a real world
example of where __volatile__ must be used.

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]

2000-09-07 Thread Andrea Arcangeli

On Mon, 4 Sep 2000, Andrea Arcangeli wrote:

barrier()). I also noticed __sti()/__save_flags() doesn't need to clobber
"memory".

I'm not sure anymore if __sti and spin_unlock() doesn't need to clobber
memory (it looks necessary to make sure the compiler doesn't delay to
write data to the memory out of the critical section).

And in practice I can't reproduce any corruption with any subtle testcase
by removing "memory" from the clobber list of all the
spinlocks/__sti/__cli, so it may be the other way around. Maybe we can
rely on the __volatile__ statement of the asm that will enforce that if we
write:

*p = 0;
__asm__ __volatile__("" : :);
*p = 1;

in the assembler we'll then find both a write of 0 and then a write of 1
to memory. I'd say with the current compiler we can rely on it (infact the
spinlock doesn't seems to get miscompiled at the moment even while they
aren't clobbering "memory".

Same with:

int a = *p;
__asm__ __volatile__("" : :);
a = *p;

(to do two explicit reads)  

If "memory" isn't necessary in spin_lock, then all the "memory" clobbers
around include/*/* (also in mb() and __cli() and even barrier()) are not
necessary. But then "why" we need a "memory" clobber in first place if the
"memory" isn't necessary? :)) Also the gcc documentation seems to say we
need "memory" in all such operations (also in __sti() and
spin_unlock() unlike I said in my earlier email).

I'd like if some compiler guy could comment (I got no one reply yet). I
tried to grep gcc but my gcc knowledge is too low to reverse engeneer the
implement semantics of the "memory" clobber fast (though I would
appreciate a pointer in the gcc sources to start to understand some gcc
code :).

Andrea

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/