Re: GCC proposal for "@" asm constraint
On Fri, Sep 22, 2000 at 11:08:21AM -0700, Linus Torvalds wrote: > The same is true in the filesystems - many of them want to change block or > inode allocation bitmap bits, but they have to hold a lock anyway (usually > the kernel lock in addition to the superblock lock. The patch addresses the filesystems case introducing __set_bit, __test_and_clear_bit and __test_and_set_bit and it infact starts the split of the bitops. In the patch the __*_bit non atomic API isn't fully implemented though (__clear_bit, __test_and_change_bit are missing for example) and no common code is using the current __*_bit nonatomic variant of the bitops directly anyway. The fs happens to use it because of the minix_* ext2_* macros in the arch specific part). Finishing implementing the nonatomic bitops is very easy and I agree it's good idea (I'll soon upload an incremental patch to implement the few missing __*_bit calls) and then the common code (like the alloc/free as you pointed out) can start using the __ variant to allow the compiler not to trash registers and to skip SMP locks when not necessary. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for "@" asm constraint
On Fri, 22 Sep 2000, Andrea Arcangeli wrote: > > If you don't like the name smp_mb__before/after_clear_bit (not that I like it > too much too) suggestions are welcome. Maybe we could use a single > smp_mb__across_bitops() instead? I suspect that we should just split up the bitops. We've done this once already. We used to have just one set of bitops that always returned the old value, and it was inefficient when many users don't actually care and you can often do it faster if the return value doesn't matter. Thus the (long ago) split of "set_bit()" to "set_bit()" and "test_and_set_bit()". These days we have similar issues. In many cases we don't even want the SMP safety - we just want to set a bit, and we already do locking. This shows up in the mm code, for example - we use test_and_change_bit() for the buddy allocator, and it doesn't have to be atomic because we have to have locking for other reasons anyway. So on x86 we basically waste cycles on locked accesses several times for each alloc/free page. The same is true in the filesystems - many of them want to change block or inode allocation bitmap bits, but they have to hold a lock anyway (usually the kernel lock in addition to the superblock lock. And in some circumstances (page locking, possible buffer cache locking) you obviously do want the memory barrier stuff, and you're right, making it associated with the bit operations allows for faster code on x86 where the locked bit ops are barriers in themselves. Ugh. There's quite a lot of combinations there. Maybe your approach is the cleanest one. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for "@" asm constraint
> On Fri, 22 Sep 2000, Andrea Arcangeli wrote: > > > > This patch fixes the spinlock problems in read_lock/write_lock and also some > > alpha SMP race where clear_bit isn't enforcing a memory barrier > > Why would clear_bit() be a memory barrier? Because historically it sort of was. This came up with the old PPP code for one. > Anything that expects clear_bit() to be a memory barrier should be fixed. No argument about that however. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for "@" asm constraint
On Fri, 22 Sep 2000, Andrea Arcangeli wrote: > > This patch fixes the spinlock problems in read_lock/write_lock and also some > alpha SMP race where clear_bit isn't enforcing a memory barrier Why would clear_bit() be a memory barrier? Anything that expects clear_bit() to be a memory barrier should be fixed. Anything that wants a spinlock should use the spinlocks. Not the bit ops. They are atomic, but they are not memory barriers. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for "@" asm constraint
This patch fixes the spinlock problems in read_lock/write_lock and also some alpha SMP race where clear_bit isn't enforcing a memory barrier, plus some improvement in some place where we can check the waitqueue is not full before entering wake_up. There's some minor alpha compile fix too. On the alpha side I also removed the condition branches in set_bit/clear_bit (this was running faster on my userspace testcase). Anyway I made that a compile time #define/#undef decision so returning to the previous behaviour is trivial. I removed the mb in the UP case in test_and_set_bit like functions. It also avoids to use atomic operations for ext2 and minix that only needs bitops that goes over sizeof(long). (previously suggested by Ingo) set_bit/clear_bit doesn't clobber "memory" (they are __asm__ __volatile__ though) but they delcare the memory written as volatile so gcc should make sure not to pass a copy of such memory but its real memory repository. Also the other places previously using the "fool gcc" thing now instead declare the memory as volatile (some of the gcc fool wasn't declaring the array as volatile, for example the dummy_lock in rwlock.h). The atomic_t is now delcared volatile also in UP (for irqs and preemptive UP kernel). For the atomic_set_mask/atomic_clear_mask I'm not casting the pointer to the memory as a pointer to volatile because "memory" is clobbered and that should be enough to ensure gcc doesn't use a copy of the memory. Sparc64 now implements a more finegrined wmb as alpha (and rmb in the same wmb way). br_read_lock puts an mb() after acquring its per cpu slot. Dave, I think you could replace such mb() (currently it's an rmb() that looks wrong for sparc64 even with the current rmb implementation) with membar("#StoreLoad"), but there's no common code API for such a finegriend memory barrier only between store and loads (mb() is the most finegrined call at the moment). ftp://ftp.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.0-test9-pre5/spinlock-1 The patch is against test9-5 but it's been tested only based on test8-pre5 with a night of cerberus load with 100Mbit network transfers in background on IA32 2-way SMP. The alpha side passed now 59 minutes of cerberus overload as well without any apparent problem (load of 23 with over 300 mbyte in SWAP on a 2G ram machine 2-way SMP with flood of input and output TCP at 100Mbit). The asm generated by the "memory" clobber doesn't seems to be inferior to the previous asm btw (I compared two kernel images). There was no one bug in the previous asm comparing to the new one though (on IA32). One thing I still like to change (not addressed in the patch) is the eax constraint on the pointer to the rwlocks in the read_lock/write_lock where the spinlock is not a builtin constant. I removed the constraint (using "r") and that generated better asm. I think it's better to increase of some byte the size of the slow path (doing the push of eax to save it on the stack while we use it to pass the spinlock pointer to the slow path function) than to harm the fast path. Taking only the builtin variant should be enough to fix it. The only architectures audited are IA32 and alpha, others may needs that changes as well. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for @ asm constraint
On Fri, 22 Sep 2000, Andrea Arcangeli wrote: This patch fixes the spinlock problems in read_lock/write_lock and also some alpha SMP race where clear_bit isn't enforcing a memory barrier Why would clear_bit() be a memory barrier? Anything that expects clear_bit() to be a memory barrier should be fixed. Anything that wants a spinlock should use the spinlocks. Not the bit ops. They are atomic, but they are not memory barriers. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for @ asm constraint
This patch fixes the spinlock problems in read_lock/write_lock and also some alpha SMP race where clear_bit isn't enforcing a memory barrier, plus some improvement in some place where we can check the waitqueue is not full before entering wake_up. There's some minor alpha compile fix too. On the alpha side I also removed the condition branches in set_bit/clear_bit (this was running faster on my userspace testcase). Anyway I made that a compile time #define/#undef decision so returning to the previous behaviour is trivial. I removed the mb in the UP case in test_and_set_bit like functions. It also avoids to use atomic operations for ext2 and minix that only needs bitops that goes over sizeof(long). (previously suggested by Ingo) set_bit/clear_bit doesn't clobber "memory" (they are __asm__ __volatile__ though) but they delcare the memory written as volatile so gcc should make sure not to pass a copy of such memory but its real memory repository. Also the other places previously using the "fool gcc" thing now instead declare the memory as volatile (some of the gcc fool wasn't declaring the array as volatile, for example the dummy_lock in rwlock.h). The atomic_t is now delcared volatile also in UP (for irqs and preemptive UP kernel). For the atomic_set_mask/atomic_clear_mask I'm not casting the pointer to the memory as a pointer to volatile because "memory" is clobbered and that should be enough to ensure gcc doesn't use a copy of the memory. Sparc64 now implements a more finegrined wmb as alpha (and rmb in the same wmb way). br_read_lock puts an mb() after acquring its per cpu slot. Dave, I think you could replace such mb() (currently it's an rmb() that looks wrong for sparc64 even with the current rmb implementation) with membar("#StoreLoad"), but there's no common code API for such a finegriend memory barrier only between store and loads (mb() is the most finegrined call at the moment). ftp://ftp.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.0-test9-pre5/spinlock-1 The patch is against test9-5 but it's been tested only based on test8-pre5 with a night of cerberus load with 100Mbit network transfers in background on IA32 2-way SMP. The alpha side passed now 59 minutes of cerberus overload as well without any apparent problem (load of 23 with over 300 mbyte in SWAP on a 2G ram machine 2-way SMP with flood of input and output TCP at 100Mbit). The asm generated by the "memory" clobber doesn't seems to be inferior to the previous asm btw (I compared two kernel images). There was no one bug in the previous asm comparing to the new one though (on IA32). One thing I still like to change (not addressed in the patch) is the eax constraint on the pointer to the rwlocks in the read_lock/write_lock where the spinlock is not a builtin constant. I removed the constraint (using "r") and that generated better asm. I think it's better to increase of some byte the size of the slow path (doing the push of eax to save it on the stack while we use it to pass the spinlock pointer to the slow path function) than to harm the fast path. Taking only the builtin variant should be enough to fix it. The only architectures audited are IA32 and alpha, others may needs that changes as well. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for @ asm constraint
On Fri, Sep 22, 2000 at 11:08:21AM -0700, Linus Torvalds wrote: The same is true in the filesystems - many of them want to change block or inode allocation bitmap bits, but they have to hold a lock anyway (usually the kernel lock in addition to the superblock lock. The patch addresses the filesystems case introducing __set_bit, __test_and_clear_bit and __test_and_set_bit and it infact starts the split of the bitops. In the patch the __*_bit non atomic API isn't fully implemented though (__clear_bit, __test_and_change_bit are missing for example) and no common code is using the current __*_bit nonatomic variant of the bitops directly anyway. The fs happens to use it because of the minix_* ext2_* macros in the arch specific part). Finishing implementing the nonatomic bitops is very easy and I agree it's good idea (I'll soon upload an incremental patch to implement the few missing __*_bit calls) and then the common code (like the alloc/free as you pointed out) can start using the __ variant to allow the compiler not to trash registers and to skip SMP locks when not necessary. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for @ asm constraint
On Fri, 22 Sep 2000, Andrea Arcangeli wrote: This patch fixes the spinlock problems in read_lock/write_lock and also some alpha SMP race where clear_bit isn't enforcing a memory barrier Why would clear_bit() be a memory barrier? Because historically it sort of was. This came up with the old PPP code for one. Anything that expects clear_bit() to be a memory barrier should be fixed. No argument about that however. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for @ asm constraint
On Fri, 22 Sep 2000, Andrea Arcangeli wrote: If you don't like the name smp_mb__before/after_clear_bit (not that I like it too much too) suggestions are welcome. Maybe we could use a single smp_mb__across_bitops() instead? I suspect that we should just split up the bitops. We've done this once already. We used to have just one set of bitops that always returned the old value, and it was inefficient when many users don't actually care and you can often do it faster if the return value doesn't matter. Thus the (long ago) split of "set_bit()" to "set_bit()" and "test_and_set_bit()". These days we have similar issues. In many cases we don't even want the SMP safety - we just want to set a bit, and we already do locking. This shows up in the mm code, for example - we use test_and_change_bit() for the buddy allocator, and it doesn't have to be atomic because we have to have locking for other reasons anyway. So on x86 we basically waste cycles on locked accesses several times for each alloc/free page. The same is true in the filesystems - many of them want to change block or inode allocation bitmap bits, but they have to hold a lock anyway (usually the kernel lock in addition to the superblock lock. And in some circumstances (page locking, possible buffer cache locking) you obviously do want the memory barrier stuff, and you're right, making it associated with the bit operations allows for faster code on x86 where the locked bit ops are barriers in themselves. Ugh. There's quite a lot of combinations there. Maybe your approach is the cleanest one. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for "@" asm constraint
On Tue, Sep 19, 2000 at 04:16:07PM -0400, John Wehle wrote: > Umm ... "miscompilation"? As in the compiler produced the wrong code > based on the input provided? That's not a gcc bug (gcc is doing the right thing). It's the kernel that should use the "memory" clobber in the spinlock implementation. The sad code generated was in reality the _right_ code. I was blind not noticing the missing $ (I missed it the first time in the first testcase I tried and I kept missing it, I was probably also biased assuming the current spinlocks was safe with the commonly used compilers, I was thinking to fix only a theorical bug). I'm sorry for that (and thanks again to Richard and Jamie). Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for "@" asm constraint
> I see. So Jamie was right and we reproduced a case of miscompilation. Umm ... "miscompilation"? As in the compiler produced the wrong code based on the input provided? int * p; ... a = *p; movl p,%eax movl (%eax),%edx The assembly code appears to load the address stored at p (keep in mind that p is a pointer), then use that address to fetch the value which is placed in a. What do you believe should have been generated by the compiler? -- John - | Feith Systems | Voice: 1-215-646-8000 | Email: [EMAIL PROTECTED] | |John Wehle| Fax: 1-215-540-5495 | | - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for "@" asm constraint
On Tue, Sep 19, 2000 at 07:22:48PM +0200, Jamie Lokier wrote: > That instruction loads the _value_ of p. I.e. reads the memory from > location 0x80495a4 into %eax. The source instruction was: > >movl p,%eax > > The instructions that you're thinking of, that load fixed addresses, > look like these: > >mov $0x80495a4,%eax >lea 0x80495a4,%eax > > or in source form: > >movl $p,%eax >leal p,%eax Thanks for noticing my error. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for "@" asm constraint
On Tue, Sep 19, 2000 at 10:23:05AM -0700, Richard Henderson wrote: > On Tue, Sep 19, 2000 at 04:32:16PM +0200, Andrea Arcangeli wrote: > > Wrong: it's really loading the _address_. > [...] > > 80483f5: a1 a4 95 04 08 mov0x80495a4,%eax > >^^^ ^ > > No, that's an absolute memory load. If we were loading > the address, there'd be a '$' before that number. I see. So Jamie was right and we reproduced a case of miscompilation. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for "@" asm constraint
On Tue, Sep 19, 2000 at 04:32:16PM +0200, Andrea Arcangeli wrote: > Wrong: it's really loading the _address_. [...] > 80483f5: a1 a4 95 04 08 mov0x80495a4,%eax >^^^ ^ No, that's an absolute memory load. If we were loading the address, there'd be a '$' before that number. r~ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for "@" asm constraint
On Tue, Sep 19, 2000 at 04:01:26PM +0100, David Howells wrote: > I can't remember exactly what it was now, but I think it was either something > to do with spinlocks or bitops. I'll re-investigate tonight and see if I can > come back with some benchmarks/code-snippets tomorrow. Yes you should tell us which is the inlined function that generated different asm (if you post the two differnt asm or the two different .o we can probably find it ourself). I seen the rw_spin_locks are silly requesting the address of the spinlock to be in the register eax when the address of the spinlock isn't a constant (while it should instead at least use "r" and not "a") and I was going to fix it, however that's not changed between test7 and test8... Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for "@" asm constraint
I've been writing a kernel module and I've noticed a measurable performance drop between the same code compiled against linux-2.4.0-test7 and against test8. I disassembled the code to try and work out what was going on and I saw the following happen: * [test8] One of the atomic memory access primitives supplied by the kernel was putting immediate data into a register outside of the inline-asm instruction group and then using the register inside. * [test7] The immediate data was passed directly to the inline-asm instruction. In test8, of course, this means that the compiler has to scratch up a spare register, which is totally unnecessary, as the immediate data could be attached directly to the instruction opcode as was done in test7. This had the effect of making the compiler have to push the old contents of the register into a slot on the stack (I think it held a local variable at the time), which had the further effects of using more stack memory, introducing more register rearrangement (the code ended up longer), and burning up more CPU cycles. I can't remember exactly what it was now, but I think it was either something to do with spinlocks or bitops. I'll re-investigate tonight and see if I can come back with some benchmarks/code-snippets tomorrow. David Howells - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for "@" asm constraint
On Mon, Sep 18, 2000 at 09:37:43PM -0400, John Wehle wrote: > It's perhaps not optimal, however I'm not sure that it's wrong. In It's not "wrong" in the sense that something breaks but it's definitely suboptimal. There's no reason to reload a value that can't change because it's embedded into the opcodes of the asm and set a static linking time. > any case if you can supply a small standalone test case (i.e. preprocessed > source code) I'll take a quick look at things. I take it that you haven't > tried the current gcc sources? The first testcase is the current spinlock implementation, the second testcase adds the "memory" clobber and it generates the _spurious_ reload of `p'. You should be able to compile with `gcc -O2 -S p-*.i`. Andrea # 1 "p.c" # 1 "/home/andrea/kernel/devel/2.2.18pre9aa1/include/linux/spinlock.h" 1 # 1 "/usr/include/asm/spinlock.h" 1 3 # 134 "/usr/include/asm/spinlock.h" 3 typedef struct { volatile unsigned int lock; } spinlock_t; typedef struct { unsigned long a[100]; } __dummy_lock_t; # 169 "/usr/include/asm/spinlock.h" 3 typedef struct { volatile unsigned int lock; unsigned long previous; } rwlock_t; # 231 "/usr/include/asm/spinlock.h" 3 # 249 "/usr/include/asm/spinlock.h" 3 # 3 "/home/andrea/kernel/devel/2.2.18pre9aa1/include/linux/spinlock.h" 2 # 1 "p.c" 2 int * p; spinlock_t lock = (spinlock_t) { 0 } ; void dummy(int x , int y) {} main() { int a, b; __asm__ __volatile__( "\n1:\t" "lock ; btsl $0,%0\n\t" "jc 2f\n" ".section .text.lock,\"ax\"\n" "2:\t" "testb $1,%0\n\t" "jne 2b\n\t" "jmp 1b\n" ".previous" :"=m" ((*(__dummy_lock_t *)()) )) ; a = *p; __asm__ __volatile__( "lock ; btrl $0,%0" :"=m" ((*(__dummy_lock_t *)( )) )) ; __asm__ __volatile__( "\n1:\t" "lock ; btsl $0,%0\n\t" "jc 2f\n" ".section .text.lock,\"ax\"\n" "2:\t" "testb $1,%0\n\t" "jne 2b\n\t" "jmp 1b\n" ".previous" :"=m" ((*(__dummy_lock_t *)()) )) ; b = *p; __asm__ __volatile__( "lock ; btrl $0,%0" :"=m" ((*(__dummy_lock_t *)( )) )) ; dummy(a,b); } # 1 "p.c" # 1 "/home/andrea/kernel/devel/2.2.18pre9aa1/include/linux/spinlock.h" 1 # 1 "/usr/include/asm/spinlock.h" 1 3 # 134 "/usr/include/asm/spinlock.h" 3 typedef struct { volatile unsigned int lock; } spinlock_t; typedef struct { unsigned long a[100]; } __dummy_lock_t; # 169 "/usr/include/asm/spinlock.h" 3 typedef struct { volatile unsigned int lock; unsigned long previous; } rwlock_t; # 231 "/usr/include/asm/spinlock.h" 3 # 249 "/usr/include/asm/spinlock.h" 3 # 3 "/home/andrea/kernel/devel/2.2.18pre9aa1/include/linux/spinlock.h" 2 # 1 "p.c" 2 int * p; spinlock_t lock = (spinlock_t) { 0 } ; void dummy(int x , int y) {} main() { int a, b; __asm__ __volatile__( "\n1:\t" "lock ; btsl $0,%0\n\t" "jc 2f\n" ".section .text.lock,\"ax\"\n" "2:\t" "testb $1,%0\n\t" "jne 2b\n\t" "jmp 1b\n" ".previous" :"=m" ((*(__dummy_lock_t *)()) ) : : "memory") ; a = *p; __asm__ __volatile__( "lock ; btrl $0,%0" :"=m" ((*(__dummy_lock_t *)( )) ) : : "memory") ; __asm__ __volatile__( "\n1:\t" "lock ; btsl $0,%0\n\t" "jc 2f\n" ".section .text.lock,\"ax\"\n" "2:\t" "testb $1,%0\n\t" "jne 2b\n\t" "jmp 1b\n" ".previous" :"=m" ((*(__dummy_lock_t *)()) ) : : "memory") ; b = *p; __asm__ __volatile__( "lock ; btrl $0,%0" :"=m" ((*(__dummy_lock_t *)( )) ) : : "memory") ; dummy(a,b); }
Re: GCC proposal for "@" asm constraint
Andrea Arcangeli wrote: > int * p; > [...] > spin_lock(); > a = *p; > spin_unlock(); > > spin_lock(); > b = *p; > spin_unlock(); > [With "memory" clobber"] the [second] reload of the address of `p' > isn't necessary and gcc is wrong in generating it. Wrong, GCC is behaving correctly. > p is a constant embedded into the .text section and set at link time, p is a variable. The _address_ of p is constant, but the reload is not loading the address of p, it's loading the _value_. That value can be changed by other threads. In fact, you have demonstrated why the "memory" clobber is necessary for spinlocks. A perfect test case! In your first example, without the clobber, the asm code is incorrect. A parallel thread can change the value of p between the first spin_unlock and the second spin_lock, and the GCC-generated code does not notice. > The above reload are just wasted CPU cycles that we're little worried > to waste. Here, the saved cycle is a kernel bug. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for @ asm constraint
Andrea Arcangeli wrote: int * p; [...] spin_lock(lock); a = *p; spin_unlock(lock); spin_lock(lock); b = *p; spin_unlock(lock); [With "memory" clobber"] the [second] reload of the address of `p' isn't necessary and gcc is wrong in generating it. Wrong, GCC is behaving correctly. p is a constant embedded into the .text section and set at link time, p is a variable. The _address_ of p is constant, but the reload is not loading the address of p, it's loading the _value_. That value can be changed by other threads. In fact, you have demonstrated why the "memory" clobber is necessary for spinlocks. A perfect test case! In your first example, without the clobber, the asm code is incorrect. A parallel thread can change the value of p between the first spin_unlock and the second spin_lock, and the GCC-generated code does not notice. The above reload are just wasted CPU cycles that we're little worried to waste. Here, the saved cycle is a kernel bug. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for @ asm constraint
I've been writing a kernel module and I've noticed a measurable performance drop between the same code compiled against linux-2.4.0-test7 and against test8. I disassembled the code to try and work out what was going on and I saw the following happen: * [test8] One of the atomic memory access primitives supplied by the kernel was putting immediate data into a register outside of the inline-asm instruction group and then using the register inside. * [test7] The immediate data was passed directly to the inline-asm instruction. In test8, of course, this means that the compiler has to scratch up a spare register, which is totally unnecessary, as the immediate data could be attached directly to the instruction opcode as was done in test7. This had the effect of making the compiler have to push the old contents of the register into a slot on the stack (I think it held a local variable at the time), which had the further effects of using more stack memory, introducing more register rearrangement (the code ended up longer), and burning up more CPU cycles. I can't remember exactly what it was now, but I think it was either something to do with spinlocks or bitops. I'll re-investigate tonight and see if I can come back with some benchmarks/code-snippets tomorrow. David Howells - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for @ asm constraint
On Tue, Sep 19, 2000 at 04:01:26PM +0100, David Howells wrote: I can't remember exactly what it was now, but I think it was either something to do with spinlocks or bitops. I'll re-investigate tonight and see if I can come back with some benchmarks/code-snippets tomorrow. Yes you should tell us which is the inlined function that generated different asm (if you post the two differnt asm or the two different .o we can probably find it ourself). I seen the rw_spin_locks are silly requesting the address of the spinlock to be in the register eax when the address of the spinlock isn't a constant (while it should instead at least use "r" and not "a") and I was going to fix it, however that's not changed between test7 and test8... Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for @ asm constraint
On Mon, Sep 18, 2000 at 09:37:43PM -0400, John Wehle wrote: It's perhaps not optimal, however I'm not sure that it's wrong. In It's not "wrong" in the sense that something breaks but it's definitely suboptimal. There's no reason to reload a value that can't change because it's embedded into the opcodes of the asm and set a static linking time. any case if you can supply a small standalone test case (i.e. preprocessed source code) I'll take a quick look at things. I take it that you haven't tried the current gcc sources? The first testcase is the current spinlock implementation, the second testcase adds the "memory" clobber and it generates the _spurious_ reload of `p'. You should be able to compile with `gcc -O2 -S p-*.i`. Andrea # 1 "p.c" # 1 "/home/andrea/kernel/devel/2.2.18pre9aa1/include/linux/spinlock.h" 1 # 1 "/usr/include/asm/spinlock.h" 1 3 # 134 "/usr/include/asm/spinlock.h" 3 typedef struct { volatile unsigned int lock; } spinlock_t; typedef struct { unsigned long a[100]; } __dummy_lock_t; # 169 "/usr/include/asm/spinlock.h" 3 typedef struct { volatile unsigned int lock; unsigned long previous; } rwlock_t; # 231 "/usr/include/asm/spinlock.h" 3 # 249 "/usr/include/asm/spinlock.h" 3 # 3 "/home/andrea/kernel/devel/2.2.18pre9aa1/include/linux/spinlock.h" 2 # 1 "p.c" 2 int * p; spinlock_t lock = (spinlock_t) { 0 } ; void dummy(int x , int y) {} main() { int a, b; __asm__ __volatile__( "\n1:\t" "lock ; btsl $0,%0\n\t" "jc 2f\n" ".section .text.lock,\"ax\"\n" "2:\t" "testb $1,%0\n\t" "jne 2b\n\t" "jmp 1b\n" ".previous" :"=m" ((*(__dummy_lock_t *)( lock )) )) ; a = *p; __asm__ __volatile__( "lock ; btrl $0,%0" :"=m" ((*(__dummy_lock_t *)( lock )) )) ; __asm__ __volatile__( "\n1:\t" "lock ; btsl $0,%0\n\t" "jc 2f\n" ".section .text.lock,\"ax\"\n" "2:\t" "testb $1,%0\n\t" "jne 2b\n\t" "jmp 1b\n" ".previous" :"=m" ((*(__dummy_lock_t *)( lock )) )) ; b = *p; __asm__ __volatile__( "lock ; btrl $0,%0" :"=m" ((*(__dummy_lock_t *)( lock )) )) ; dummy(a,b); } # 1 "p.c" # 1 "/home/andrea/kernel/devel/2.2.18pre9aa1/include/linux/spinlock.h" 1 # 1 "/usr/include/asm/spinlock.h" 1 3 # 134 "/usr/include/asm/spinlock.h" 3 typedef struct { volatile unsigned int lock; } spinlock_t; typedef struct { unsigned long a[100]; } __dummy_lock_t; # 169 "/usr/include/asm/spinlock.h" 3 typedef struct { volatile unsigned int lock; unsigned long previous; } rwlock_t; # 231 "/usr/include/asm/spinlock.h" 3 # 249 "/usr/include/asm/spinlock.h" 3 # 3 "/home/andrea/kernel/devel/2.2.18pre9aa1/include/linux/spinlock.h" 2 # 1 "p.c" 2 int * p; spinlock_t lock = (spinlock_t) { 0 } ; void dummy(int x , int y) {} main() { int a, b; __asm__ __volatile__( "\n1:\t" "lock ; btsl $0,%0\n\t" "jc 2f\n" ".section .text.lock,\"ax\"\n" "2:\t" "testb $1,%0\n\t" "jne 2b\n\t" "jmp 1b\n" ".previous" :"=m" ((*(__dummy_lock_t *)( lock )) ) : : "memory") ; a = *p; __asm__ __volatile__( "lock ; btrl $0,%0" :"=m" ((*(__dummy_lock_t *)( lock )) ) : : "memory") ; __asm__ __volatile__( "\n1:\t" "lock ; btsl $0,%0\n\t" "jc 2f\n" ".section .text.lock,\"ax\"\n" "2:\t" "testb $1,%0\n\t" "jne 2b\n\t" "jmp 1b\n" ".previous" :"=m" ((*(__dummy_lock_t *)( lock )) ) : : "memory") ; b = *p; __asm__ __volatile__( "lock ; btrl $0,%0" :"=m" ((*(__dummy_lock_t *)( lock )) ) : : "memory") ; dummy(a,b); }
Re: GCC proposal for @ asm constraint
On Tue, Sep 19, 2000 at 04:32:16PM +0200, Andrea Arcangeli wrote: Wrong: it's really loading the _address_. [...] 80483f5: a1 a4 95 04 08 mov0x80495a4,%eax ^^^ ^ No, that's an absolute memory load. If we were loading the address, there'd be a '$' before that number. r~ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for @ asm constraint
On Tue, Sep 19, 2000 at 10:23:05AM -0700, Richard Henderson wrote: On Tue, Sep 19, 2000 at 04:32:16PM +0200, Andrea Arcangeli wrote: Wrong: it's really loading the _address_. [...] 80483f5: a1 a4 95 04 08 mov0x80495a4,%eax ^^^ ^ No, that's an absolute memory load. If we were loading the address, there'd be a '$' before that number. I see. So Jamie was right and we reproduced a case of miscompilation. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for @ asm constraint
I see. So Jamie was right and we reproduced a case of miscompilation. Umm ... "miscompilation"? As in the compiler produced the wrong code based on the input provided? int * p; ... a = *p; movl p,%eax movl (%eax),%edx The assembly code appears to load the address stored at p (keep in mind that p is a pointer), then use that address to fetch the value which is placed in a. What do you believe should have been generated by the compiler? -- John - | Feith Systems | Voice: 1-215-646-8000 | Email: [EMAIL PROTECTED] | |John Wehle| Fax: 1-215-540-5495 | | - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for "@" asm constraint
> The reload of the address of `p' isn't necessary and gcc is wrong in > generating it. p is a constant embedded into the .text section ... It's perhaps not optimal, however I'm not sure that it's wrong. In any case if you can supply a small standalone test case (i.e. preprocessed source code) I'll take a quick look at things. I take it that you haven't tried the current gcc sources? -- John - | Feith Systems | Voice: 1-215-646-8000 | Email: [EMAIL PROTECTED] | |John Wehle| Fax: 1-215-540-5495 | | - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for "@" asm constraint
On Mon, Sep 18, 2000 at 07:53:04PM -0400, John Wehle wrote: > What version of gcc? Recently some work was done to improve the handling of > constant memory. I'm using 2.95.2 19991024. Take this small testcase: #include int * p; spinlock_t lock = SPIN_LOCK_UNLOCKED; extern void dummy(int, int); myfunc() { int a, b; spin_lock(); a = *p; spin_unlock(); spin_lock(); b = *p; spin_unlock(); dummy(a,b); } If I compile it with: gcc -O2 -D__SMP__ -I ~/kernel/devel/2.2.18pre9aa1/include/ -S p.c where 2.2.18pre9aa1 is the current spinlock implementation without the "memory" clobber and with the __dummy trick I get: .file "p.c" .version"01.01" gcc2_compiled.: .globl lock .data .align 4 .typelock,@object .sizelock,4 lock: .long 0 .text .align 16 .globl myfunc .typemyfunc,@function myfunc: pushl %ebp movl %esp,%ebp subl $8,%esp #APP 1: lock ; btsl $0,lock jc 2f .section .text.lock,"ax" 2: testb $1,lock jne 2b jmp 1b .previous #NO_APP movl p,%eax ^^^ movl (%eax),%edx #APP lock ; btrl $0,lock 1: lock ; btsl $0,lock jc 2f .section .text.lock,"ax" 2: testb $1,lock jne 2b jmp 1b .previous #NO_APP movl (%eax),%eax #APP lock ; btrl $0,lock #NO_APP addl $-8,%esp pushl %eax pushl %edx call dummy movl %ebp,%esp popl %ebp ret .Lfe1: .sizemyfunc,.Lfe1-myfunc .comm p,4,4 .ident "GCC: (GNU) 2.95.2 19991024 (release)" If now I repeat the same after applying this patch to the kernel tree that I was inlining: --- 2.2.18pre9aa1/include/asm-i386/spinlock.h.~1~ Mon Sep 18 04:56:28 2000 +++ 2.2.18pre9aa1/include/asm-i386/spinlock.h Tue Sep 19 03:04:56 2000 @@ -173,12 +173,12 @@ #define spin_lock(lock) \ __asm__ __volatile__( \ spin_lock_string \ - :"=m" (__dummy_lock(lock))) + :"=m" (__dummy_lock(lock)) : : "memory") #define spin_unlock(lock) \ __asm__ __volatile__( \ spin_unlock_string \ - :"=m" (__dummy_lock(lock))) + :"=m" (__dummy_lock(lock)) : : "memory") #define spin_trylock(lock) (!test_and_set_bit(0,(lock))) I then get this assembler: .file "p.c" .version"01.01" gcc2_compiled.: .globl lock .data .align 4 .typelock,@object .sizelock,4 lock: .long 0 .text .align 16 .globl myfunc .typemyfunc,@function myfunc: pushl %ebp movl %esp,%ebp subl $8,%esp #APP 1: lock ; btsl $0,lock jc 2f .section .text.lock,"ax" 2: testb $1,lock jne 2b jmp 1b .previous #NO_APP movl p,%eax ^^^ movl (%eax),%edx #APP lock ; btrl $0,lock 1: lock ; btsl $0,lock jc 2f .section .text.lock,"ax" 2: testb $1,lock jne 2b jmp 1b .previous #NO_APP movl p,%eax ^^^ movl (%eax),%eax #APP lock ; btrl $0,lock #NO_APP addl $-8,%esp pushl %eax pushl %edx call dummy movl %ebp,%esp popl %ebp ret .Lfe1: .sizemyfunc,.Lfe1-myfunc .comm p,4,4 .ident "GCC: (GNU) 2.95.2 19991024 (release)" The diff between the generated asms: --- p.s.default-spinlocks Tue Sep 19 03:10:14 2000 +++ p.s.memory Tue Sep 19 03:10:29 2000 @@ -39,6 +39,7 @@ jmp 1b .previous #NO_APP + movl p,%eax movl (%eax),%eax #APP lock ; btrl $0,lock The reload of the address of `p' isn't necessary and gcc is wrong in generating it. p is a constant embedded into the .text section and set at link time, the only way to change it would be if the assembler that declares "memory" as clobber would be self modifying the code itself and gcc should assume nothing about self modifying code instead (none bit of IA32 linux is self modifying). The above reload are just wasted CPU cycles that we're little worried to waste. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for "@" asm constraint
> I read the asm produced by some of some of my testcases. The current spinlock > implementation seems to do exactly the _right_ thing in practice and nothing > more. "memory" was instead causing reloads of constant addresses into registers > and stuff that shouldn't be necessary (I was infact wondering about the reason > of those suprious loads also in my first email). What version of gcc? Recently some work was done to improve the handling of constant memory. -- John - | Feith Systems | Voice: 1-215-646-8000 | Email: [EMAIL PROTECTED] | |John Wehle| Fax: 1-215-540-5495 | | - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for "@" asm constraint
On Mon, Sep 18, 2000 at 03:39:50PM -0700, Linus Torvalds wrote: > Have you looked at the code it generates? Quite sad, really. I read the asm produced by some of some of my testcases. The current spinlock implementation seems to do exactly the _right_ thing in practice and nothing more. "memory" was instead causing reloads of constant addresses into registers and stuff that shouldn't be necessary (I was infact wondering about the reason of those suprious loads also in my first email). Said that I heard that some recent gcc miscompiles the kernel and we also have to always compile with -fno-strict-aliasing. I think gcc developers should comment about this issue. If they say the __dummy way is still going to be safe for some gcc release, we can skip those spurious loads caused by the "memory" clobber. From the email I received from Richard Henderson in this thread it seems they prefer that the kernel doesn't relys on those __dummy just now (and they have the rights to complain because that's a kernel bug). Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for "@" asm constraint
On Tue, 19 Sep 2000, Andrea Arcangeli wrote: > > I think we can remove all the __dummy stuff and put the "memory" in such asm > statements. > > Comments? Have you looked at the code it generates? Quite sad, really. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for "@" asm constraint
On Fri, Sep 08, 2000 at 01:41:05PM +0200, Jamie Lokier wrote: > Casting via __dummy is there so that the "m" (or "=m") memory constraint > will make that operand refer to the actual object in memory, and not a > copy (in a different area of memory). Are you really sure gcc could pass a copy even when I specify "memory" in the clobber list? My point is that "memory" could also mean that the address where gcc is taking the _copy_ could be clobbered by the asm itself as side effect and so gcc shouldn't allowed to assume anything and it should first copy the result value of the previous actions to its real final location before starting up any asm that clobbers "memory". I think the "memory" clobber should be enough to ensure that the address used with the .*m constraints refers to the real backend of the memory passed as parameter to the inline asm. I think we can remove all the __dummy stuff and put the "memory" in such asm statements. Comments? Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for @ asm constraint
On Fri, Sep 08, 2000 at 01:41:05PM +0200, Jamie Lokier wrote: Casting via __dummy is there so that the "m" (or "=m") memory constraint will make that operand refer to the actual object in memory, and not a copy (in a different area of memory). Are you really sure gcc could pass a copy even when I specify "memory" in the clobber list? My point is that "memory" could also mean that the address where gcc is taking the _copy_ could be clobbered by the asm itself as side effect and so gcc shouldn't allowed to assume anything and it should first copy the result value of the previous actions to its real final location before starting up any asm that clobbers "memory". I think the "memory" clobber should be enough to ensure that the address used with the .*m constraints refers to the real backend of the memory passed as parameter to the inline asm. I think we can remove all the __dummy stuff and put the "memory" in such asm statements. Comments? Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for @ asm constraint
On Mon, Sep 18, 2000 at 03:39:50PM -0700, Linus Torvalds wrote: Have you looked at the code it generates? Quite sad, really. I read the asm produced by some of some of my testcases. The current spinlock implementation seems to do exactly the _right_ thing in practice and nothing more. "memory" was instead causing reloads of constant addresses into registers and stuff that shouldn't be necessary (I was infact wondering about the reason of those suprious loads also in my first email). Said that I heard that some recent gcc miscompiles the kernel and we also have to always compile with -fno-strict-aliasing. I think gcc developers should comment about this issue. If they say the __dummy way is still going to be safe for some gcc release, we can skip those spurious loads caused by the "memory" clobber. From the email I received from Richard Henderson in this thread it seems they prefer that the kernel doesn't relys on those __dummy just now (and they have the rights to complain because that's a kernel bug). Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for @ asm constraint
I read the asm produced by some of some of my testcases. The current spinlock implementation seems to do exactly the _right_ thing in practice and nothing more. "memory" was instead causing reloads of constant addresses into registers and stuff that shouldn't be necessary (I was infact wondering about the reason of those suprious loads also in my first email). What version of gcc? Recently some work was done to improve the handling of constant memory. -- John - | Feith Systems | Voice: 1-215-646-8000 | Email: [EMAIL PROTECTED] | |John Wehle| Fax: 1-215-540-5495 | | - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for @ asm constraint
On Tue, 19 Sep 2000, Andrea Arcangeli wrote: I think we can remove all the __dummy stuff and put the "memory" in such asm statements. Comments? Have you looked at the code it generates? Quite sad, really. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for @ asm constraint
On Mon, Sep 18, 2000 at 07:53:04PM -0400, John Wehle wrote: What version of gcc? Recently some work was done to improve the handling of constant memory. I'm using 2.95.2 19991024. Take this small testcase: #include linux/spinlock.h int * p; spinlock_t lock = SPIN_LOCK_UNLOCKED; extern void dummy(int, int); myfunc() { int a, b; spin_lock(lock); a = *p; spin_unlock(lock); spin_lock(lock); b = *p; spin_unlock(lock); dummy(a,b); } If I compile it with: gcc -O2 -D__SMP__ -I ~/kernel/devel/2.2.18pre9aa1/include/ -S p.c where 2.2.18pre9aa1 is the current spinlock implementation without the "memory" clobber and with the __dummy trick I get: .file "p.c" .version"01.01" gcc2_compiled.: .globl lock .data .align 4 .typelock,@object .sizelock,4 lock: .long 0 .text .align 16 .globl myfunc .typemyfunc,@function myfunc: pushl %ebp movl %esp,%ebp subl $8,%esp #APP 1: lock ; btsl $0,lock jc 2f .section .text.lock,"ax" 2: testb $1,lock jne 2b jmp 1b .previous #NO_APP movl p,%eax ^^^ movl (%eax),%edx #APP lock ; btrl $0,lock 1: lock ; btsl $0,lock jc 2f .section .text.lock,"ax" 2: testb $1,lock jne 2b jmp 1b .previous #NO_APP movl (%eax),%eax #APP lock ; btrl $0,lock #NO_APP addl $-8,%esp pushl %eax pushl %edx call dummy movl %ebp,%esp popl %ebp ret .Lfe1: .sizemyfunc,.Lfe1-myfunc .comm p,4,4 .ident "GCC: (GNU) 2.95.2 19991024 (release)" If now I repeat the same after applying this patch to the kernel tree that I was inlining: --- 2.2.18pre9aa1/include/asm-i386/spinlock.h.~1~ Mon Sep 18 04:56:28 2000 +++ 2.2.18pre9aa1/include/asm-i386/spinlock.h Tue Sep 19 03:04:56 2000 @@ -173,12 +173,12 @@ #define spin_lock(lock) \ __asm__ __volatile__( \ spin_lock_string \ - :"=m" (__dummy_lock(lock))) + :"=m" (__dummy_lock(lock)) : : "memory") #define spin_unlock(lock) \ __asm__ __volatile__( \ spin_unlock_string \ - :"=m" (__dummy_lock(lock))) + :"=m" (__dummy_lock(lock)) : : "memory") #define spin_trylock(lock) (!test_and_set_bit(0,(lock))) I then get this assembler: .file "p.c" .version"01.01" gcc2_compiled.: .globl lock .data .align 4 .typelock,@object .sizelock,4 lock: .long 0 .text .align 16 .globl myfunc .typemyfunc,@function myfunc: pushl %ebp movl %esp,%ebp subl $8,%esp #APP 1: lock ; btsl $0,lock jc 2f .section .text.lock,"ax" 2: testb $1,lock jne 2b jmp 1b .previous #NO_APP movl p,%eax ^^^ movl (%eax),%edx #APP lock ; btrl $0,lock 1: lock ; btsl $0,lock jc 2f .section .text.lock,"ax" 2: testb $1,lock jne 2b jmp 1b .previous #NO_APP movl p,%eax ^^^ movl (%eax),%eax #APP lock ; btrl $0,lock #NO_APP addl $-8,%esp pushl %eax pushl %edx call dummy movl %ebp,%esp popl %ebp ret .Lfe1: .sizemyfunc,.Lfe1-myfunc .comm p,4,4 .ident "GCC: (GNU) 2.95.2 19991024 (release)" The diff between the generated asms: --- p.s.default-spinlocks Tue Sep 19 03:10:14 2000 +++ p.s.memory Tue Sep 19 03:10:29 2000 @@ -39,6 +39,7 @@ jmp 1b .previous #NO_APP + movl p,%eax movl (%eax),%eax #APP lock ; btrl $0,lock The reload of the address of `p' isn't necessary and gcc is wrong in generating it. p is a constant embedded into the .text section and set at link time, the only way to change it would be if the assembler that declares "memory" as clobber would be self modifying the code itself and gcc should assume nothing about self modifying code instead (none bit of IA32 linux is self modifying). The above reload are just wasted CPU cycles that we're little worried to waste. Andrea - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: GCC proposal for @ asm constraint
The reload of the address of `p' isn't necessary and gcc is wrong in generating it. p is a constant embedded into the .text section ... It's perhaps not optimal, however I'm not sure that it's wrong. In any case if you can supply a small standalone test case (i.e. preprocessed source code) I'll take a quick look at things. I take it that you haven't tried the current gcc sources? -- John - | Feith Systems | Voice: 1-215-646-8000 | Email: [EMAIL PROTECTED] | |John Wehle| Fax: 1-215-540-5495 | | - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
GCC proposal for "@" asm constraint
Andrea Arcangeli wrote: > >BTW Look also into asm-i386/bitops.h and dummy cast to some crap there. > >Are you impressed? 8) > > Yep 8). If we add "memory" such stuff could be removed I think. As far I > can see the object of such stuff is to cause gcc to say `I'm too lazy to > see exactly what memory this guy is trying to change, so just assume he > added "memory" in the clobber list' :)) No, that's not the reason for __dummy. It is an important side effect, though as ever it isn't guaranteed. Someone should add "memory" to the bitops _iff_ the bitops are supposed to imply a compiler memory barrier. It's a kernel policy decision. --- For the benefit of GCC list readers: Linux uses asm constraints like this: `"m" (*(volatile struct __dummy *) )', where __dummy is defined by `struct __dummy { unsigned long a[100]; }'. This is used extensively in asms for spinlocks, semaphores and atomic bit operations, atomic counters etc. In short, anything needing to operate on a specific memory object. Passing the address as an operand would be correct but generates worse code, because in general we don't need a register to hold the address of `object'. It is often part of a larger struct, and the __dummy method lets it be addressed using offset addressing, and often fewer registers. Casting via __dummy is there so that the "m" (or "=m") memory constraint will make that operand refer to the actual object in memory, and not a copy (in a different area of memory). Most of the time there is no reason for GCC to use a copy of the object for an "m" constraint, but things like CSE can allow the compiler to choose a different object known to have the same contents. Other scenarios cause __dummy to be required for "=m" constraints. (Even with __dummy there is no guarantee that a future GCC won't use a different object anyway, but I expect that is years away). I'm posting this to the GCC list to make a feature request. GCC feature request --- An additional constraint character, like "m" but means "the object at the specified address". The operand in C source code would be the object's address (not dereferenced as Linux does now), so there is no need for bizarre semantics. So if I write (assume `@' because most letters are taken): asm ("movl %1,%0" : "=g" (result) : "@" ()); it would a clean equivalent to this: asm ("movl %1,%0" : "=g" (result) : "m" (*(volatile struct __dummy *) )); and more or less equivalent to this (but generates better code): asm ("movl (%1),%0" : "=g" (result) : "r" ()); An alternative would be a modifier to "m" that means "definitely use the actual object referred to". I prefer the direct approach. What do GCC designers think? This is useful for any code that must use asms for atomic operations such as semaphores in threads etc. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
GCC proposal for @ asm constraint
Andrea Arcangeli wrote: BTW Look also into asm-i386/bitops.h and dummy cast to some crap there. Are you impressed? 8) Yep 8). If we add "memory" such stuff could be removed I think. As far I can see the object of such stuff is to cause gcc to say `I'm too lazy to see exactly what memory this guy is trying to change, so just assume he added "memory" in the clobber list' :)) No, that's not the reason for __dummy. It is an important side effect, though as ever it isn't guaranteed. Someone should add "memory" to the bitops _iff_ the bitops are supposed to imply a compiler memory barrier. It's a kernel policy decision. --- For the benefit of GCC list readers: Linux uses asm constraints like this: `"m" (*(volatile struct __dummy *) object)', where __dummy is defined by `struct __dummy { unsigned long a[100]; }'. This is used extensively in asms for spinlocks, semaphores and atomic bit operations, atomic counters etc. In short, anything needing to operate on a specific memory object. Passing the address as an operand would be correct but generates worse code, because in general we don't need a register to hold the address of `object'. It is often part of a larger struct, and the __dummy method lets it be addressed using offset addressing, and often fewer registers. Casting via __dummy is there so that the "m" (or "=m") memory constraint will make that operand refer to the actual object in memory, and not a copy (in a different area of memory). Most of the time there is no reason for GCC to use a copy of the object for an "m" constraint, but things like CSE can allow the compiler to choose a different object known to have the same contents. Other scenarios cause __dummy to be required for "=m" constraints. (Even with __dummy there is no guarantee that a future GCC won't use a different object anyway, but I expect that is years away). I'm posting this to the GCC list to make a feature request. GCC feature request --- An additional constraint character, like "m" but means "the object at the specified address". The operand in C source code would be the object's address (not dereferenced as Linux does now), so there is no need for bizarre semantics. So if I write (assume `@' because most letters are taken): asm ("movl %1,%0" : "=g" (result) : "@" (object)); it would a clean equivalent to this: asm ("movl %1,%0" : "=g" (result) : "m" (*(volatile struct __dummy *) object)); and more or less equivalent to this (but generates better code): asm ("movl (%1),%0" : "=g" (result) : "r" (object)); An alternative would be a modifier to "m" that means "definitely use the actual object referred to". I prefer the direct approach. What do GCC designers think? This is useful for any code that must use asms for atomic operations such as semaphores in threads etc. -- Jamie - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/