Re: spin_lock forgets to clobber memory and other smp fixes [was
Hello! > > I guess Alexey point is that the current compiler doesn't notice that. Rather I proposed explanation, why missing barrier does not have any effect. 8)8) Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]
On Fri, 8 Sep 2000, David Woodhouse wrote: >[EMAIL PROTECTED] said: >> I guess Alexey point is that the current compiler doesn't notice >> that. > >I don't understand why we're bringing empirical evidence into this >discussion. Didn't we get into the horrible mess we've already got w.r.t. I'm the first who noticed and who was complaning the current behaviour, I was considering it a bug even before I could have empirical evidence (that I still don't have with my current compiler). I just fixed it partly and I'm completing the fix now. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was
[EMAIL PROTECTED] wrote: > > Well, now GCC does CSE across "asm" and will eliminate memory loads, > > even though it may not move them! I suspect it always did CSE across > > "asm" and we just never got hit by the bug. > > dummy_lock trick is equivalent to "memory" clobber. For GCC 2.7.2 yes. For current compilers with -fstrict-aliasing, no: the dummy_lock type doesn't alias with some other memory accesses. Longer term the compiler may well take into account the array size. Of course we still need the dummy_lock trick anyway. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]
[EMAIL PROTECTED] said: > I tried it with two compilers, one older than yours and one newer: > In both cases, the memory read occurs before "zzz". [EMAIL PROTECTED] said: > I guess Alexey point is that the current compiler doesn't notice > that. I don't understand why we're bringing empirical evidence into this discussion. Didn't we get into the horrible mess we've already got w.r.t. compilers by relying on the existing behaviour never changing? The GCC documentation exists. If it is unclear, file a bug, and/or obtain clarification from the developers or the C specifications. Once that is done, code the kernel to that specification, without heed to the current _implementation_ of GCC. Separately, you may wish to test that your particular version of GCC, on your architecture, with this phase of the moon, conforms to that specification. If it doesn't, file another bug. Do not use the ambiguity of the specification as an excuse for coding to today's GCC implementation. We should have learned that lesson by now. Please - come to a clear conclusion about the behaviour _permitted_ by the GCC documentation, and _then_ attempt to test it. -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was
[EMAIL PROTECTED] wrote: Well, now GCC does CSE across "asm" and will eliminate memory loads, even though it may not move them! I suspect it always did CSE across "asm" and we just never got hit by the bug. dummy_lock trick is equivalent to "memory" clobber. For GCC 2.7.2 yes. For current compilers with -fstrict-aliasing, no: the dummy_lock type doesn't alias with some other memory accesses. Longer term the compiler may well take into account the array size. Of course we still need the dummy_lock trick anyway. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was
Hello! I guess Alexey point is that the current compiler doesn't notice that. Rather I proposed explanation, why missing barrier does not have any effect. 8)8) Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]
[EMAIL PROTECTED] said: I tried it with two compilers, one older than yours and one newer: In both cases, the memory read occurs before "zzz". [EMAIL PROTECTED] said: I guess Alexey point is that the current compiler doesn't notice that. I don't understand why we're bringing empirical evidence into this discussion. Didn't we get into the horrible mess we've already got w.r.t. compilers by relying on the existing behaviour never changing? The GCC documentation exists. If it is unclear, file a bug, and/or obtain clarification from the developers or the C specifications. Once that is done, code the kernel to that specification, without heed to the current _implementation_ of GCC. Separately, you may wish to test that your particular version of GCC, on your architecture, with this phase of the moon, conforms to that specification. If it doesn't, file another bug. Do not use the ambiguity of the specification as an excuse for coding to today's GCC implementation. We should have learned that lesson by now. Please - come to a clear conclusion about the behaviour _permitted_ by the GCC documentation, and _then_ attempt to test it. -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]
On Fri, 8 Sep 2000, David Woodhouse wrote: [EMAIL PROTECTED] said: I guess Alexey point is that the current compiler doesn't notice that. I don't understand why we're bringing empirical evidence into this discussion. Didn't we get into the horrible mess we've already got w.r.t. I'm the first who noticed and who was complaning the current behaviour, I was considering it a bug even before I could have empirical evidence (that I still don't have with my current compiler). I just fixed it partly and I'm completing the fix now. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was
On Thu, 7 Sep 2000, Richard Henderson wrote: >Perhaps. But that's not to say no future compiler won't. I sure agree. That was my original concern w/o looking/knowing how smart the current compiler was infact. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was
On Fri, Sep 08, 2000 at 12:34:24AM +0200, Andrea Arcangeli wrote: > >No it's not. We know how big the dummy_lock structure is, and > >so might "know" that it doesn't overlap with something else. > > I guess Alexey point is that the current compiler doesn't notice that. Perhaps. But that's not to say no future compiler won't. r~ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was
On Thu, 7 Sep 2000, Richard Henderson wrote: >No it's not. We know how big the dummy_lock structure is, and >so might "know" that it doesn't overlap with something else. I guess Alexey point is that the current compiler doesn't notice that. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was
On Thu, Sep 07, 2000 at 09:51:26PM +0400, [EMAIL PROTECTED] wrote: > dummy_lock trick is equivalent to "memory" clobber. No it's not. We know how big the dummy_lock structure is, and so might "know" that it doesn't overlap with something else. r~ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was
Hello! > Well, now GCC does CSE across "asm" and will eliminate memory loads, > even though it may not move them! I suspect it always did CSE across > "asm" and we just never got hit by the bug. dummy_lock trick is equivalent to "memory" clobber. So that there is no real bug. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was
On Thu, 7 Sep 2000, Jamie Lokier wrote: >Common Subexpression Elimination. > >If the compiler sees an expression equivalent to one it evaluated >earlier, there is no need to evaluate it a second time. > >So "a = x+x; b = x+x" will evaluate "x+x" just once and store it twice. I didn't know the name of that particular optimization, thanks. >In more complicated cases, where the result of the first read is >actually used before the "asm", the "memory" clobber causes the second >read to occur as well as the first. Indeed. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was
> "andrea" == Andrea Arcangeli <[EMAIL PROTECTED]> writes: andrea> On Thu, 7 Sep 2000, Jamie Lokier wrote: >> Well, now GCC does CSE across "asm" and will eliminate memory loads, andrea> What is "CSE"? Common Subexpresion Elimination. You can get a lot of info in "The Dragon book". Later, Juan. -- In theory, practice and theory are the same, but in practice they are different -- Larry McVoy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was
Andrea Arcangeli wrote: > >Well, now GCC does CSE across "asm" and will eliminate memory loads, > > What is "CSE"? Common Subexpression Elimination. If the compiler sees an expression equivalent to one it evaluated earlier, there is no need to evaluate it a second time. So "a = x+x; b = x+x" will evaluate "x+x" just once and store it twice. This applies to memory reads too. If you read from memory once, and read it again later, the later read can be eliminated. You have the right value in a register already. Memory CSE depends on proving that no intermediate operation clobbers the data at that address. That is why "memory" clobbers are important. They indicate that the data at that address _may_ have been clobbered, so it must be read again after the clobbering operation. Sometimes, after you've forced the second read, you can eliminate the first one :-) That's why "memory" appears to just move the read in our test case. It's really forcing a read after the "asm", which in turn allows the read before the "asm" to be eliminated. In more complicated cases, where the result of the first read is actually used before the "asm", the "memory" clobber causes the second read to occur as well as the first. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]
On Thu, 7 Sep 2000, Jamie Lokier wrote: >I tried it with two compilers, one older than yours and one newer: So maybe I'm just been unlucky/lucky (depends on the point of view :) or maybe we patched something I'm not aware of to make the kernel to compile right. >.ident "GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)" >.ident "GCC: (GNU) 2.96 2724 (experimental)" > >The command lines were: > > gcc -S -O2 test4.c > kgcc -S -O2 test4.c > >In both cases, the memory read occurs before "zzz". andrea@inspiron:~ > cat p.c int *p; int func() { int x; x = *p; __asm__ __volatile__ ("" : : ); x = *p; return x; } andrea@inspiron:~ > gcc -O2 -S p.c andrea@inspiron:~ > cat p.s .file "p.c" .version"01.01" gcc2_compiled.: .text .align 16 .globl func .typefunc,@function func: pushl %ebp movl %esp,%ebp #APP #NO_APP ^^^ movl p,%eax movl %ebp,%esp popl %ebp movl (%eax),%eax ret .Lfe1: .sizefunc,.Lfe1-func .comm p,4,4 .ident "GCC: (GNU) 2.95.2 19991024 (release)" Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]
On Thu, 7 Sep 2000, Jamie Lokier wrote: >Yes, it does. Nice. >.ident "GCC: (GNU) 2.96 2724 (experimental)" > >>From the Red Hat 7 beta. Ok. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]
Andrea Arcangeli wrote: > >int *p; > >int func() > >{ > > int x; > > x = *p; > > __asm__ __volatile__ ("" : : : "memory"); > > x = *p; > > return x; > >} > > Defintely none difference here (-fstrict-aliasing doesn't change anything > either). > > andrea@inspiron:~ > gcc -v > Reading specs from /usr/lib/gcc-lib/i486-suse-linux/2.95.2/specs > gcc version 2.95.2 19991024 (release) I tried it with two compilers, one older than yours and one newer: .ident "GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)" .ident "GCC: (GNU) 2.96 2724 (experimental)" The command lines were: gcc -S -O2 test4.c kgcc -S -O2 test4.c In both cases, the memory read occurs before "zzz". -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]
Andrea Arcangeli wrote: > Said that if your compiler puts the read before the spin_lock without the > memory clobber, it is allowed to do that, and in such case you would proof > it was a real world bug (not just a "documentation" one). Yes, it does. > Or maybe your testcase was a bit different then mine, in such case please > send it to me (I'm curious indeed :). int *p; int func() { int x = *p; __asm__ __volatile ("zzz" : :); x = *p; return x; } In output: .ident "GCC: (GNU) 2.96 2724 (experimental)" >From the Red Hat 7 beta. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was
On Thu, 7 Sep 2000, Jamie Lokier wrote: >Well, now GCC does CSE across "asm" and will eliminate memory loads, What is "CSE"? Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]
On Thu, 7 Sep 2000, Jamie Lokier wrote: >int *p; >int func() >{ > int x; > x = *p; > __asm__ __volatile__ ("" : : : "memory"); > x = *p; > return x; >} > Defintely none difference here (-fstrict-aliasing doesn't change anything either). andrea@inspiron:~ > gcc -v Reading specs from /usr/lib/gcc-lib/i486-suse-linux/2.95.2/specs gcc version 2.95.2 19991024 (release) Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]
Linus Torvalds wrote: > Nope. "memory" fills that role too. Remember: "memory" doesn't actually > say "this clobbers all memory". That would be silly: an asm that just > wipes all memory would not be a very useful asm (or rather, it would have > just _one_ use: "execve()"). So "memory" really says that the asm clobbers > _some_ memory. > > Which in turn means that the code scheduler has to synchronize all memory > accesses around it - as if the asm was reading all memory. Because if the > scheduler would move a store to after the asm, it would give the wrong > results if the asm happened to clobber _that_ memory. And the scheduler > obviously cannot just drop the store ("Oh, the asm will clobber this > anyway"), because it doesn't know which memory regions get clobbered. There are other reasons why stores can be eliminated, and "memory" does not prevent those. Example: you have two stores to the same address. An "asm" with "memory" is in between the stores. The first store can still be dropped, even though we don't know _which_ memory the "memory" clobbers. Now, "memory" may well mean "this asm is considered to read some memory and clobber some memory". But that's not what the manual says, and it would be inconsistent with other kinds of clobber. > Now, the fact that the "memory" clobber also seems to clobber local > variables is a bug, I think. If you are thinking of my examples, "memory" doesn't clobber local variables. It does affect the scheduling of reads from general memory into a local variable. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was
[EMAIL PROTECTED] wrote: > Just hint. I remember the time when "memory" clobber option > was _absent_ in gcc. And we managed to compile kernel with such gcc. 8) > To all that I understand, "asm" (like function calls) implied barrier > that time and constraints and clobber option were used only for > register allocation and reloading. Well, now GCC does CSE across "asm" and will eliminate memory loads, even though it may not move them! I suspect it always did CSE across "asm" and we just never got hit by the bug. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]
Linus Torvalds wrote: > Change it to something like > > __asm__("":"=r" (x):"0" (x)); > > and the "volatile" should matter. Yes it does. Without "volatile", the asm disappears :-) > Not for memory references, perhaps. But for the movement issues. The compiler isn't moving memory references around "asm volatile", but it is doing CSE around them to _eliminate_ memory references. Thus spin_lock needs the memory clobber, to prevent CSE of non-volatile memory references between the critical region and outside the critical region. Maybe spin_unlock doesn't need one because CSE doesn't work the other way. (I'd put that clobber in anyway, to be sure). -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]
On Thu, 7 Sep 2000, Jamie Lokier wrote: > > It's ok for the compiler to do that (given we don't know what "volatile" > means anyway :-). But it does have implications for spin_lock: > spin_lock must say that it clobbers memory. Yup. We should just fix that. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]
On Thu, 7 Sep 2000, Jamie Lokier wrote: > > ps. There is a _clobber_ for memory, but no way to say "this asm _reads_ > arbitrary memory". __volatile__ may be filling that role though. Nope. "memory" fills that role too. Remember: "memory" doesn't actually say "this clobbers all memory". That would be silly: an asm that just wipes all memory would not be a very useful asm (or rather, it would have just _one_ use: "execve()"). So "memory" really says that the asm clobbers _some_ memory. Which in turn means that the code scheduler has to synchronize all memory accesses around it - as if the asm was reading all memory. Because if the scheduler would move a store to after the asm, it would give the wrong results if the asm happened to clobber _that_ memory. And the scheduler obviously cannot just drop the store ("Oh, the asm will clobber this anyway"), because it doesn't know which memory regions get clobbered. Now, the fact that the "memory" clobber also seems to clobber local variables is a bug, I think. Or at least a misfeature. It should not be considered to clobber reloads, as those are really in "registers" - at least as far as the asm is concerned (the compiler could have chosen to just allocate a hard register to that local variable or argument). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]
Linus Torvalds wrote: > "volatile" should be equivalent to clobbering memory, although the gcc > manual pages are certainly not very verbose on the issue. It isn't. Try the following with/without the memory clobber: int *p; int func() { int x; x = *p; __asm__ __volatile__ ("" : : : "memory"); x = *p; return x; } With or without, the program reads *p only once. However, with the clobber it reads _after_ the asm; without, it reads _before_ the asm. It's ok for the compiler to do that (given we don't know what "volatile" means anyway :-). But it does have implications for spin_lock: spin_lock must say that it clobbers memory. spin_unlock should also say it clobbers memory but I have no test case to demonstrate the compiler moving reads down past an asm. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]
Andrea Arcangeli wrote: > >>int a = *p; > >>__asm__ __volatile__("" : :); > >>a = *p; > >> > >> (to do two explicit reads) > > > >Sorry, that does just one read, kgcc (old stable gcc) and also with > >gcc-2.96. Type aliasing on/off makes no difference to the number of reads. > > I wrote the above not just as a complete testecase, but just to mean what > the case I was talking about. You made int a a local variable and the > thing you noticed is an otimization that the compiler is allowed to do > regardless of the "memory" clobber too (`int a' have to be at least extern > otherwise the compiler understands the first read can go away). Interestingly enough, the local variable case is one where "memory" does make a difference. Without "memory": movlp, %eax movl(%eax), %eax #APP #NO_APP With "memory": #APP #NO_APP movlp, %eax movl(%eax), %eax > Try to add "memory" as clobber to the above testcase and nothing will > change. (that's what I meant in my previous email saying that even w/o > "memory" things got compiled right at least in my simple testcases) As you can see from above, there are cases where local_var = shared->mumble; // ... spin_lock (); local_var = shared->mumble; requires a "memory" clobber, otherwise the second read, which is in a critical region, won't be emitted by the compiler. -- Jamie ps. There is a _clobber_ for memory, but no way to say "this asm _reads_ arbitrary memory". __volatile__ may be filling that role though. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]
On Thu, 7 Sep 2000, Jamie Lokier wrote: > asm *__volatile__* seems to make no difference. I've tried a few things. > > Andrea Arcangeli wrote: > > Maybe we can rely on the __volatile__ statement of the asm that will > > enforce that if we write: > > > > *p = 0; > > __asm__ __volatile__("" : :); > > *p = 1; > > > > in the assembler we'll then find both a write of 0 and then a write of 1 > > to memory. > > That does 2 writes with gcc-2.96 and also egcs-2.91.66/19990314 > (Red Hat's kgcc), with or without -fstrict-aliasing. > > It also does 2 writes without __volatile__. Your test is broken. Read the gcc documentation. A inline asm with no outputs is implicitly considered volatile. So _both_ your tests had volatile there. Now, that may not matter that much fo ryour test-case: gcc gets careful around inline asm anyway, even without the volatile. Change it to something like __asm__("":"=r" (x):"0" (x)); and the "volatile" should matter. Not for memory references, perhaps. But for the movement issues. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was
Hello! > tried to grep gcc but my gcc knowledge is too low to reverse engeneer the > implement semantics of the "memory" clobber fast Just hint. I remember the time when "memory" clobber option was _absent_ in gcc. And we managed to compile kernel with such gcc. 8) To all that I understand, "asm" (like function calls) implied barrier that time and constraints and clobber option were used only for register allocation and reloading. Alexey BTW Look also into asm-i386/bitops.h and dummy cast to some crap there. Are you impressed? 8) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]
On Thu, 7 Sep 2000, Jamie Lokier wrote: >asm *__volatile__* seems to make no difference. I've tried a few things. It makes a difference, see below. > >Andrea Arcangeli wrote: >> Maybe we can rely on the __volatile__ statement of the asm that will >> enforce that if we write: >> >> *p = 0; >> __asm__ __volatile__("" : :); >> *p = 1; >> >> in the assembler we'll then find both a write of 0 and then a write of 1 >> to memory. > >That does 2 writes with gcc-2.96 and also egcs-2.91.66/19990314 >(Red Hat's kgcc), with or without -fstrict-aliasing. > >It also does 2 writes without __volatile__. > >> int a = *p; >> __asm__ __volatile__("" : :); >> a = *p; >> >> (to do two explicit reads) > >Sorry, that does just one read, kgcc (old stable gcc) and also with >gcc-2.96. Type aliasing on/off makes no difference to the number of reads. I wrote the above not just as a complete testecase, but just to mean what the case I was talking about. You made int a a local variable and the thing you noticed is an otimization that the compiler is allowed to do regardless of the "memory" clobber too (`int a' have to be at least extern otherwise the compiler understands the first read can go away). Try this: int * p; int a; extern f(int); main() { a = *p; __asm__ __volatile__("zzz" : : ); a = *p; f(a); } Try to add "memory" as clobber to the above testcase and nothing will change. (that's what I meant in my previous email saying that even w/o "memory" things got compiled right at least in my simple testcases) >Again, __volatile__ makes no difference. Note that __volatile__ really makes a difference if for example you speficy as output an operand that isn't used anymore. Try this: main() { int a; __asm__("zzz" : "=r" (a)); } and then this: main() { int a; __asm__ __volatile__("zzz" : "=r" (a)); } --- p.s.nonvolatile Thu Sep 7 18:05:30 2000 +++ p.s Thu Sep 7 18:05:53 2000 @@ -6,10 +6,13 @@ .globl main .typemain,@function main: pushl %ebp movl %esp,%ebp +#APP + zzz +#NO_APP movl %ebp,%esp popl %ebp ret .Lfe1: .sizemain,.Lfe1-main >I cannot tell from the GCC manual whether either of these behaviours is >correct or not. The behaviour of what you described is definitely correct/safe. My only wondering was about "memory" non-"memory" as clobber because gcc was doing things right just with the __asm__ __volatile__ thing w/o "memory" as clobber. However I had the confirm my worries was right and that "memory" is needed for all the spinlocks. BTW, we had a bug in the alpha port last month in the linux/include/asm-alpha/fpu.h:wrfpcr() function, read it for a real world example of where __volatile__ must be used. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]
On Thu, 7 Sep 2000, Andrea Arcangeli wrote: > On Mon, 4 Sep 2000, Andrea Arcangeli wrote: > > >barrier()). I also noticed __sti()/__save_flags() doesn't need to clobber > >"memory". > > I'm not sure anymore if __sti and spin_unlock() doesn't need to clobber > memory (it looks necessary to make sure the compiler doesn't delay to > write data to the memory out of the critical section). "volatile" should be equivalent to clobbering memory, although the gcc manual pages are certainly not very verbose on the issue. Adding a memory clobber certainly migth be a good idea. David, who knows what gcc actually does wrt "volatile"? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]
asm *__volatile__* seems to make no difference. I've tried a few things. Andrea Arcangeli wrote: > Maybe we can rely on the __volatile__ statement of the asm that will > enforce that if we write: > > *p = 0; > __asm__ __volatile__("" : :); > *p = 1; > > in the assembler we'll then find both a write of 0 and then a write of 1 > to memory. That does 2 writes with gcc-2.96 and also egcs-2.91.66/19990314 (Red Hat's kgcc), with or without -fstrict-aliasing. It also does 2 writes without __volatile__. > int a = *p; > __asm__ __volatile__("" : :); > a = *p; > > (to do two explicit reads) Sorry, that does just one read, kgcc (old stable gcc) and also with gcc-2.96. Type aliasing on/off makes no difference to the number of reads. Again, __volatile__ makes no difference. I cannot tell from the GCC manual whether either of these behaviours is correct or not. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]
On Thu, 7 Sep 2000, Franz Sirl wrote: >In short terms: > >- __volatile__ assures that the code isn't reordered against other >__volatile__ and isn't hoisted out of loops, nothing else >- the "memory" clobber makes sure the asm isn't reordered against other >memory accesses Ok. That's all I wanted to hear. So _definitely_ all spinlocks needs "memory" in the clobber list. I'll do a new patch reinserting "memory" in __sti and inserting "memory" also in the spin_unlock() case. The reason of my doubt was that I only got one agreement by Pavel and none other comment. Furthmore in practice there was no miscompilation thus I was wondering if I misunderstood the semantics of __volatile__ (but then of course I was asking myself what "memory" was good for :)) >Essentially, you _always_ have to tell the compiler if you touch memory >behind it's back, this includes inline assembly to flush the cache or the I understand this completly. And as said we can't do that with the spinlocks (at least with the current API to spin_lock and friends) thus we need "memory" in the clobber list. >General rule of thumb for inline assembly: > > Give the compiler as much information as possible!! > >If you know inline assembly read/writes memory, tell it to the compiler, as >detailed as possible! Indeed :). If we could teach all the memory changes to the inline assembly then "memory" wouldn't be necessary anymore into the clobber list. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]
At 17:03 07.09.00, Andrea Arcangeli wrote: >On Mon, 4 Sep 2000, Andrea Arcangeli wrote: > > >barrier()). I also noticed __sti()/__save_flags() doesn't need to clobber > >"memory". > >I'm not sure anymore if __sti and spin_unlock() doesn't need to clobber >memory (it looks necessary to make sure the compiler doesn't delay to >write data to the memory out of the critical section). > >And in practice I can't reproduce any corruption with any subtle testcase >by removing "memory" from the clobber list of all the >spinlocks/__sti/__cli, so it may be the other way around. Maybe we can >rely on the __volatile__ statement of the asm that will enforce that if we >write: > > *p = 0; > __asm__ __volatile__("" : :); > *p = 1; > >in the assembler we'll then find both a write of 0 and then a write of 1 >to memory. I'd say with the current compiler we can rely on it (infact the >spinlock doesn't seems to get miscompiled at the moment even while they >aren't clobbering "memory". > >Same with: > > int a = *p; > __asm__ __volatile__("" : :); > a = *p; > >(to do two explicit reads) > >If "memory" isn't necessary in spin_lock, then all the "memory" clobbers >around include/*/* (also in mb() and __cli() and even barrier()) are not >necessary. But then "why" we need a "memory" clobber in first place if the >"memory" isn't necessary? :)) Also the gcc documentation seems to say we >need "memory" in all such operations (also in __sti() and >spin_unlock() unlike I said in my earlier email). > >I'd like if some compiler guy could comment (I got no one reply yet). I >tried to grep gcc but my gcc knowledge is too low to reverse engeneer the >implement semantics of the "memory" clobber fast (though I would >appreciate a pointer in the gcc sources to start to understand some gcc >code :). In short terms: - __volatile__ assures that the code isn't reordered against other __volatile__ and isn't hoisted out of loops, nothing else - the "memory" clobber makes sure the asm isn't reordered against other memory accesses Essentially, you _always_ have to tell the compiler if you touch memory behind it's back, this includes inline assembly to flush the cache or the write back queue. Since you usually don't know which part of the memory gets touched, you need the global "memory" clobber. If you know which addresses you touch, you can get away with something like this, from asm-ppc/io.h: extern inline unsigned in_be32(volatile unsigned *addr) { unsigned ret; __asm__ __volatile__("lwz%U1%X1 %0,%1; eieio" : "=r" (ret) : "m" (*addr)); return ret; } extern inline void out_le32(volatile unsigned *addr, int val) { __asm__ __volatile__("stwbrx %1,0,%2; eieio" : "=m" (*addr) : "r" (val), "r" (addr)); } General rule of thumb for inline assembly: Give the compiler as much information as possible!! If you know inline assembly read/writes memory, tell it to the compiler, as detailed as possible! Franz. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]
On Mon, 4 Sep 2000, Andrea Arcangeli wrote: >barrier()). I also noticed __sti()/__save_flags() doesn't need to clobber >"memory". I'm not sure anymore if __sti and spin_unlock() doesn't need to clobber memory (it looks necessary to make sure the compiler doesn't delay to write data to the memory out of the critical section). And in practice I can't reproduce any corruption with any subtle testcase by removing "memory" from the clobber list of all the spinlocks/__sti/__cli, so it may be the other way around. Maybe we can rely on the __volatile__ statement of the asm that will enforce that if we write: *p = 0; __asm__ __volatile__("" : :); *p = 1; in the assembler we'll then find both a write of 0 and then a write of 1 to memory. I'd say with the current compiler we can rely on it (infact the spinlock doesn't seems to get miscompiled at the moment even while they aren't clobbering "memory". Same with: int a = *p; __asm__ __volatile__("" : :); a = *p; (to do two explicit reads) If "memory" isn't necessary in spin_lock, then all the "memory" clobbers around include/*/* (also in mb() and __cli() and even barrier()) are not necessary. But then "why" we need a "memory" clobber in first place if the "memory" isn't necessary? :)) Also the gcc documentation seems to say we need "memory" in all such operations (also in __sti() and spin_unlock() unlike I said in my earlier email). I'd like if some compiler guy could comment (I got no one reply yet). I tried to grep gcc but my gcc knowledge is too low to reverse engeneer the implement semantics of the "memory" clobber fast (though I would appreciate a pointer in the gcc sources to start to understand some gcc code :). Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was
Hello! tried to grep gcc but my gcc knowledge is too low to reverse engeneer the implement semantics of the "memory" clobber fast Just hint. I remember the time when "memory" clobber option was _absent_ in gcc. And we managed to compile kernel with such gcc. 8) To all that I understand, "asm" (like function calls) implied barrier that time and constraints and clobber option were used only for register allocation and reloading. Alexey BTW Look also into asm-i386/bitops.h and dummy cast to some crap there. Are you impressed? 8) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was
On Thu, 7 Sep 2000, Jamie Lokier wrote: Well, now GCC does CSE across "asm" and will eliminate memory loads, What is "CSE"? Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was
On Thu, 7 Sep 2000, Jamie Lokier wrote: Common Subexpression Elimination. If the compiler sees an expression equivalent to one it evaluated earlier, there is no need to evaluate it a second time. So "a = x+x; b = x+x" will evaluate "x+x" just once and store it twice. I didn't know the name of that particular optimization, thanks. In more complicated cases, where the result of the first read is actually used before the "asm", the "memory" clobber causes the second read to occur as well as the first. Indeed. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was
"andrea" == Andrea Arcangeli [EMAIL PROTECTED] writes: andrea On Thu, 7 Sep 2000, Jamie Lokier wrote: Well, now GCC does CSE across "asm" and will eliminate memory loads, andrea What is "CSE"? Common Subexpresion Elimination. You can get a lot of info in "The Dragon book". Later, Juan. -- In theory, practice and theory are the same, but in practice they are different -- Larry McVoy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was
Hello! Well, now GCC does CSE across "asm" and will eliminate memory loads, even though it may not move them! I suspect it always did CSE across "asm" and we just never got hit by the bug. dummy_lock trick is equivalent to "memory" clobber. So that there is no real bug. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was
On Thu, Sep 07, 2000 at 09:51:26PM +0400, [EMAIL PROTECTED] wrote: dummy_lock trick is equivalent to "memory" clobber. No it's not. We know how big the dummy_lock structure is, and so might "know" that it doesn't overlap with something else. r~ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was
On Thu, 7 Sep 2000, Richard Henderson wrote: No it's not. We know how big the dummy_lock structure is, and so might "know" that it doesn't overlap with something else. I guess Alexey point is that the current compiler doesn't notice that. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was
On Fri, Sep 08, 2000 at 12:34:24AM +0200, Andrea Arcangeli wrote: No it's not. We know how big the dummy_lock structure is, and so might "know" that it doesn't overlap with something else. I guess Alexey point is that the current compiler doesn't notice that. Perhaps. But that's not to say no future compiler won't. r~ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was
On Thu, 7 Sep 2000, Richard Henderson wrote: Perhaps. But that's not to say no future compiler won't. I sure agree. That was my original concern w/o looking/knowing how smart the current compiler was infact. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]
On Thu, 7 Sep 2000, Jamie Lokier wrote: asm *__volatile__* seems to make no difference. I've tried a few things. Andrea Arcangeli wrote: Maybe we can rely on the __volatile__ statement of the asm that will enforce that if we write: *p = 0; __asm__ __volatile__("" : :); *p = 1; in the assembler we'll then find both a write of 0 and then a write of 1 to memory. That does 2 writes with gcc-2.96 and also egcs-2.91.66/19990314 (Red Hat's kgcc), with or without -fstrict-aliasing. It also does 2 writes without __volatile__. Your test is broken. Read the gcc documentation. A inline asm with no outputs is implicitly considered volatile. So _both_ your tests had volatile there. Now, that may not matter that much fo ryour test-case: gcc gets careful around inline asm anyway, even without the volatile. Change it to something like __asm__("":"=r" (x):"0" (x)); and the "volatile" should matter. Not for memory references, perhaps. But for the movement issues. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]
Andrea Arcangeli wrote: int a = *p; __asm__ __volatile__("" : :); a = *p; (to do two explicit reads) Sorry, that does just one read, kgcc (old stable gcc) and also with gcc-2.96. Type aliasing on/off makes no difference to the number of reads. I wrote the above not just as a complete testecase, but just to mean what the case I was talking about. You made int a a local variable and the thing you noticed is an otimization that the compiler is allowed to do regardless of the "memory" clobber too (`int a' have to be at least extern otherwise the compiler understands the first read can go away). Interestingly enough, the local variable case is one where "memory" does make a difference. Without "memory": movlp, %eax movl(%eax), %eax #APP #NO_APP With "memory": #APP #NO_APP movlp, %eax movl(%eax), %eax Try to add "memory" as clobber to the above testcase and nothing will change. (that's what I meant in my previous email saying that even w/o "memory" things got compiled right at least in my simple testcases) As you can see from above, there are cases where local_var = shared-mumble; // ... spin_lock (spinlock); local_var = shared-mumble; requires a "memory" clobber, otherwise the second read, which is in a critical region, won't be emitted by the compiler. -- Jamie ps. There is a _clobber_ for memory, but no way to say "this asm _reads_ arbitrary memory". __volatile__ may be filling that role though. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]
Linus Torvalds wrote: "volatile" should be equivalent to clobbering memory, although the gcc manual pages are certainly not very verbose on the issue. It isn't. Try the following with/without the memory clobber: int *p; int func() { int x; x = *p; __asm__ __volatile__ ("" : : : "memory"); x = *p; return x; } With or without, the program reads *p only once. However, with the clobber it reads _after_ the asm; without, it reads _before_ the asm. It's ok for the compiler to do that (given we don't know what "volatile" means anyway :-). But it does have implications for spin_lock: spin_lock must say that it clobbers memory. spin_unlock should also say it clobbers memory but I have no test case to demonstrate the compiler moving reads down past an asm. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]
On Thu, 7 Sep 2000, Jamie Lokier wrote: It's ok for the compiler to do that (given we don't know what "volatile" means anyway :-). But it does have implications for spin_lock: spin_lock must say that it clobbers memory. Yup. We should just fix that. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]
Linus Torvalds wrote: Change it to something like __asm__("":"=r" (x):"0" (x)); and the "volatile" should matter. Yes it does. Without "volatile", the asm disappears :-) Not for memory references, perhaps. But for the movement issues. The compiler isn't moving memory references around "asm volatile", but it is doing CSE around them to _eliminate_ memory references. Thus spin_lock needs the memory clobber, to prevent CSE of non-volatile memory references between the critical region and outside the critical region. Maybe spin_unlock doesn't need one because CSE doesn't work the other way. (I'd put that clobber in anyway, to be sure). -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]
On Thu, 7 Sep 2000, Andrea Arcangeli wrote: On Mon, 4 Sep 2000, Andrea Arcangeli wrote: barrier()). I also noticed __sti()/__save_flags() doesn't need to clobber "memory". I'm not sure anymore if __sti and spin_unlock() doesn't need to clobber memory (it looks necessary to make sure the compiler doesn't delay to write data to the memory out of the critical section). "volatile" should be equivalent to clobbering memory, although the gcc manual pages are certainly not very verbose on the issue. Adding a memory clobber certainly migth be a good idea. David, who knows what gcc actually does wrt "volatile"? Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]
On Thu, 7 Sep 2000, Franz Sirl wrote: In short terms: - __volatile__ assures that the code isn't reordered against other __volatile__ and isn't hoisted out of loops, nothing else - the "memory" clobber makes sure the asm isn't reordered against other memory accesses Ok. That's all I wanted to hear. So _definitely_ all spinlocks needs "memory" in the clobber list. I'll do a new patch reinserting "memory" in __sti and inserting "memory" also in the spin_unlock() case. The reason of my doubt was that I only got one agreement by Pavel and none other comment. Furthmore in practice there was no miscompilation thus I was wondering if I misunderstood the semantics of __volatile__ (but then of course I was asking myself what "memory" was good for :)) Essentially, you _always_ have to tell the compiler if you touch memory behind it's back, this includes inline assembly to flush the cache or the I understand this completly. And as said we can't do that with the spinlocks (at least with the current API to spin_lock and friends) thus we need "memory" in the clobber list. General rule of thumb for inline assembly: Give the compiler as much information as possible!! If you know inline assembly read/writes memory, tell it to the compiler, as detailed as possible! Indeed :). If we could teach all the memory changes to the inline assembly then "memory" wouldn't be necessary anymore into the clobber list. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]
Linus Torvalds wrote: Nope. "memory" fills that role too. Remember: "memory" doesn't actually say "this clobbers all memory". That would be silly: an asm that just wipes all memory would not be a very useful asm (or rather, it would have just _one_ use: "execve()"). So "memory" really says that the asm clobbers _some_ memory. Which in turn means that the code scheduler has to synchronize all memory accesses around it - as if the asm was reading all memory. Because if the scheduler would move a store to after the asm, it would give the wrong results if the asm happened to clobber _that_ memory. And the scheduler obviously cannot just drop the store ("Oh, the asm will clobber this anyway"), because it doesn't know which memory regions get clobbered. There are other reasons why stores can be eliminated, and "memory" does not prevent those. Example: you have two stores to the same address. An "asm" with "memory" is in between the stores. The first store can still be dropped, even though we don't know _which_ memory the "memory" clobbers. Now, "memory" may well mean "this asm is considered to read some memory and clobber some memory". But that's not what the manual says, and it would be inconsistent with other kinds of clobber. Now, the fact that the "memory" clobber also seems to clobber local variables is a bug, I think. If you are thinking of my examples, "memory" doesn't clobber local variables. It does affect the scheduling of reads from general memory into a local variable. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]
On Thu, 7 Sep 2000, Jamie Lokier wrote: int *p; int func() { int x; x = *p; __asm__ __volatile__ ("" : : : "memory"); x = *p; return x; } Defintely none difference here (-fstrict-aliasing doesn't change anything either). andrea@inspiron:~ gcc -v Reading specs from /usr/lib/gcc-lib/i486-suse-linux/2.95.2/specs gcc version 2.95.2 19991024 (release) Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]
At 17:03 07.09.00, Andrea Arcangeli wrote: On Mon, 4 Sep 2000, Andrea Arcangeli wrote: barrier()). I also noticed __sti()/__save_flags() doesn't need to clobber "memory". I'm not sure anymore if __sti and spin_unlock() doesn't need to clobber memory (it looks necessary to make sure the compiler doesn't delay to write data to the memory out of the critical section). And in practice I can't reproduce any corruption with any subtle testcase by removing "memory" from the clobber list of all the spinlocks/__sti/__cli, so it may be the other way around. Maybe we can rely on the __volatile__ statement of the asm that will enforce that if we write: *p = 0; __asm__ __volatile__("" : :); *p = 1; in the assembler we'll then find both a write of 0 and then a write of 1 to memory. I'd say with the current compiler we can rely on it (infact the spinlock doesn't seems to get miscompiled at the moment even while they aren't clobbering "memory". Same with: int a = *p; __asm__ __volatile__("" : :); a = *p; (to do two explicit reads) If "memory" isn't necessary in spin_lock, then all the "memory" clobbers around include/*/* (also in mb() and __cli() and even barrier()) are not necessary. But then "why" we need a "memory" clobber in first place if the "memory" isn't necessary? :)) Also the gcc documentation seems to say we need "memory" in all such operations (also in __sti() and spin_unlock() unlike I said in my earlier email). I'd like if some compiler guy could comment (I got no one reply yet). I tried to grep gcc but my gcc knowledge is too low to reverse engeneer the implement semantics of the "memory" clobber fast (though I would appreciate a pointer in the gcc sources to start to understand some gcc code :). In short terms: - __volatile__ assures that the code isn't reordered against other __volatile__ and isn't hoisted out of loops, nothing else - the "memory" clobber makes sure the asm isn't reordered against other memory accesses Essentially, you _always_ have to tell the compiler if you touch memory behind it's back, this includes inline assembly to flush the cache or the write back queue. Since you usually don't know which part of the memory gets touched, you need the global "memory" clobber. If you know which addresses you touch, you can get away with something like this, from asm-ppc/io.h: extern inline unsigned in_be32(volatile unsigned *addr) { unsigned ret; __asm__ __volatile__("lwz%U1%X1 %0,%1; eieio" : "=r" (ret) : "m" (*addr)); return ret; } extern inline void out_le32(volatile unsigned *addr, int val) { __asm__ __volatile__("stwbrx %1,0,%2; eieio" : "=m" (*addr) : "r" (val), "r" (addr)); } General rule of thumb for inline assembly: Give the compiler as much information as possible!! If you know inline assembly read/writes memory, tell it to the compiler, as detailed as possible! Franz. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]
Andrea Arcangeli wrote: Said that if your compiler puts the read before the spin_lock without the memory clobber, it is allowed to do that, and in such case you would proof it was a real world bug (not just a "documentation" one). Yes, it does. Or maybe your testcase was a bit different then mine, in such case please send it to me (I'm curious indeed :). int *p; int func() { int x = *p; __asm__ __volatile ("zzz" : :); x = *p; return x; } In output: .ident "GCC: (GNU) 2.96 2724 (experimental)" From the Red Hat 7 beta. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]
Andrea Arcangeli wrote: int *p; int func() { int x; x = *p; __asm__ __volatile__ ("" : : : "memory"); x = *p; return x; } Defintely none difference here (-fstrict-aliasing doesn't change anything either). andrea@inspiron:~ gcc -v Reading specs from /usr/lib/gcc-lib/i486-suse-linux/2.95.2/specs gcc version 2.95.2 19991024 (release) I tried it with two compilers, one older than yours and one newer: .ident "GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)" .ident "GCC: (GNU) 2.96 2724 (experimental)" The command lines were: gcc -S -O2 test4.c kgcc -S -O2 test4.c In both cases, the memory read occurs before "zzz". -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]
On Thu, 7 Sep 2000, Jamie Lokier wrote: Yes, it does. Nice. .ident "GCC: (GNU) 2.96 2724 (experimental)" From the Red Hat 7 beta. Ok. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]
On Thu, 7 Sep 2000, Jamie Lokier wrote: I tried it with two compilers, one older than yours and one newer: So maybe I'm just been unlucky/lucky (depends on the point of view :) or maybe we patched something I'm not aware of to make the kernel to compile right. .ident "GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)" .ident "GCC: (GNU) 2.96 2724 (experimental)" The command lines were: gcc -S -O2 test4.c kgcc -S -O2 test4.c In both cases, the memory read occurs before "zzz". andrea@inspiron:~ cat p.c int *p; int func() { int x; x = *p; __asm__ __volatile__ ("" : : ); x = *p; return x; } andrea@inspiron:~ gcc -O2 -S p.c andrea@inspiron:~ cat p.s .file "p.c" .version"01.01" gcc2_compiled.: .text .align 16 .globl func .typefunc,@function func: pushl %ebp movl %esp,%ebp #APP #NO_APP ^^^ movl p,%eax movl %ebp,%esp popl %ebp movl (%eax),%eax ret .Lfe1: .sizefunc,.Lfe1-func .comm p,4,4 .ident "GCC: (GNU) 2.95.2 19991024 (release)" Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [was Re: [patch] waitqueue optimization, 2.4.0-test7]
asm *__volatile__* seems to make no difference. I've tried a few things. Andrea Arcangeli wrote: Maybe we can rely on the __volatile__ statement of the asm that will enforce that if we write: *p = 0; __asm__ __volatile__("" : :); *p = 1; in the assembler we'll then find both a write of 0 and then a write of 1 to memory. That does 2 writes with gcc-2.96 and also egcs-2.91.66/19990314 (Red Hat's kgcc), with or without -fstrict-aliasing. It also does 2 writes without __volatile__. int a = *p; __asm__ __volatile__("" : :); a = *p; (to do two explicit reads) Sorry, that does just one read, kgcc (old stable gcc) and also with gcc-2.96. Type aliasing on/off makes no difference to the number of reads. Again, __volatile__ makes no difference. I cannot tell from the GCC manual whether either of these behaviours is correct or not. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]
On Thu, 7 Sep 2000, Jamie Lokier wrote: asm *__volatile__* seems to make no difference. I've tried a few things. It makes a difference, see below. Andrea Arcangeli wrote: Maybe we can rely on the __volatile__ statement of the asm that will enforce that if we write: *p = 0; __asm__ __volatile__("" : :); *p = 1; in the assembler we'll then find both a write of 0 and then a write of 1 to memory. That does 2 writes with gcc-2.96 and also egcs-2.91.66/19990314 (Red Hat's kgcc), with or without -fstrict-aliasing. It also does 2 writes without __volatile__. int a = *p; __asm__ __volatile__("" : :); a = *p; (to do two explicit reads) Sorry, that does just one read, kgcc (old stable gcc) and also with gcc-2.96. Type aliasing on/off makes no difference to the number of reads. I wrote the above not just as a complete testecase, but just to mean what the case I was talking about. You made int a a local variable and the thing you noticed is an otimization that the compiler is allowed to do regardless of the "memory" clobber too (`int a' have to be at least extern otherwise the compiler understands the first read can go away). Try this: int * p; int a; extern f(int); main() { a = *p; __asm__ __volatile__("zzz" : : ); a = *p; f(a); } Try to add "memory" as clobber to the above testcase and nothing will change. (that's what I meant in my previous email saying that even w/o "memory" things got compiled right at least in my simple testcases) Again, __volatile__ makes no difference. Note that __volatile__ really makes a difference if for example you speficy as output an operand that isn't used anymore. Try this: main() { int a; __asm__("zzz" : "=r" (a)); } and then this: main() { int a; __asm__ __volatile__("zzz" : "=r" (a)); } --- p.s.nonvolatile Thu Sep 7 18:05:30 2000 +++ p.s Thu Sep 7 18:05:53 2000 @@ -6,10 +6,13 @@ .globl main .typemain,@function main: pushl %ebp movl %esp,%ebp +#APP + zzz +#NO_APP movl %ebp,%esp popl %ebp ret .Lfe1: .sizemain,.Lfe1-main I cannot tell from the GCC manual whether either of these behaviours is correct or not. The behaviour of what you described is definitely correct/safe. My only wondering was about "memory" non-"memory" as clobber because gcc was doing things right just with the __asm__ __volatile__ thing w/o "memory" as clobber. However I had the confirm my worries was right and that "memory" is needed for all the spinlocks. BTW, we had a bug in the alpha port last month in the linux/include/asm-alpha/fpu.h:wrfpcr() function, read it for a real world example of where __volatile__ must be used. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: spin_lock forgets to clobber memory and other smp fixes [wasRe: [patch] waitqueue optimization, 2.4.0-test7]
On Mon, 4 Sep 2000, Andrea Arcangeli wrote: barrier()). I also noticed __sti()/__save_flags() doesn't need to clobber "memory". I'm not sure anymore if __sti and spin_unlock() doesn't need to clobber memory (it looks necessary to make sure the compiler doesn't delay to write data to the memory out of the critical section). And in practice I can't reproduce any corruption with any subtle testcase by removing "memory" from the clobber list of all the spinlocks/__sti/__cli, so it may be the other way around. Maybe we can rely on the __volatile__ statement of the asm that will enforce that if we write: *p = 0; __asm__ __volatile__("" : :); *p = 1; in the assembler we'll then find both a write of 0 and then a write of 1 to memory. I'd say with the current compiler we can rely on it (infact the spinlock doesn't seems to get miscompiled at the moment even while they aren't clobbering "memory". Same with: int a = *p; __asm__ __volatile__("" : :); a = *p; (to do two explicit reads) If "memory" isn't necessary in spin_lock, then all the "memory" clobbers around include/*/* (also in mb() and __cli() and even barrier()) are not necessary. But then "why" we need a "memory" clobber in first place if the "memory" isn't necessary? :)) Also the gcc documentation seems to say we need "memory" in all such operations (also in __sti() and spin_unlock() unlike I said in my earlier email). I'd like if some compiler guy could comment (I got no one reply yet). I tried to grep gcc but my gcc knowledge is too low to reverse engeneer the implement semantics of the "memory" clobber fast (though I would appreciate a pointer in the gcc sources to start to understand some gcc code :). Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/