Re: GCC proposal for "@" asm constraint

2000-09-22 Thread Andrea Arcangeli

On Fri, Sep 22, 2000 at 11:08:21AM -0700, Linus Torvalds wrote:
> The same is true in the filesystems - many of them want to change block or
> inode allocation bitmap bits, but they have to hold a lock anyway (usually
> the kernel lock in addition to the superblock lock.

The patch addresses the filesystems case introducing __set_bit,
__test_and_clear_bit and __test_and_set_bit and it infact starts the split
of the bitops.

In the patch the __*_bit non atomic API isn't fully implemented though
(__clear_bit, __test_and_change_bit are missing for example) and no common code
is using the current __*_bit nonatomic variant of the bitops directly anyway.
The fs happens to use it because of the minix_* ext2_* macros in the arch
specific part).  Finishing implementing the nonatomic bitops is very easy and I
agree it's good idea (I'll soon upload an incremental patch to implement the
few missing __*_bit calls) and then the common code (like the alloc/free as you
pointed out) can start using the __ variant to allow the compiler not to trash
registers and to skip SMP locks when not necessary.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for "@" asm constraint

2000-09-22 Thread Linus Torvalds



On Fri, 22 Sep 2000, Andrea Arcangeli wrote:
>
> If you don't like the name smp_mb__before/after_clear_bit (not that I like it
> too much too) suggestions are welcome. Maybe we could use a single
> smp_mb__across_bitops() instead?

I suspect that we should just split up the bitops.

We've done this once already. We used to have just one set of bitops that
always returned the old value, and it was inefficient when many users
don't actually care and you can often do it faster if the return value
doesn't matter. Thus the (long ago) split of "set_bit()" to "set_bit()"
and "test_and_set_bit()".

These days we have similar issues. In many cases we don't even want the
SMP safety - we just want to set a bit, and we already do locking. This
shows up in the mm code, for example - we use test_and_change_bit() for
the buddy allocator, and it doesn't have to be atomic because we have to
have locking for other reasons anyway. So on x86 we basically waste cycles
on locked accesses several times for each alloc/free page.

The same is true in the filesystems - many of them want to change block or
inode allocation bitmap bits, but they have to hold a lock anyway (usually
the kernel lock in addition to the superblock lock.

And in some circumstances (page locking, possible buffer cache locking)
you obviously do want the memory barrier stuff, and you're right, making
it associated with the bit operations allows for faster code on x86 where
the locked bit ops are barriers in themselves.

Ugh. There's quite a lot of combinations there. Maybe your approach is the
cleanest one.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for "@" asm constraint

2000-09-22 Thread Alan Cox

> On Fri, 22 Sep 2000, Andrea Arcangeli wrote:
> >
> > This patch fixes the spinlock problems in read_lock/write_lock and also some
> > alpha SMP race where clear_bit isn't enforcing a memory barrier
> 
> Why would clear_bit() be a memory barrier?

Because historically it sort of was. This came up with the old PPP code
for one.

> Anything that expects clear_bit() to be a memory barrier should be fixed.

No argument about that however.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for "@" asm constraint

2000-09-22 Thread Linus Torvalds



On Fri, 22 Sep 2000, Andrea Arcangeli wrote:
>
> This patch fixes the spinlock problems in read_lock/write_lock and also some
> alpha SMP race where clear_bit isn't enforcing a memory barrier

Why would clear_bit() be a memory barrier?

Anything that expects clear_bit() to be a memory barrier should be fixed.

Anything that wants a spinlock should use the spinlocks. Not the bit ops.
They are atomic, but they are not memory barriers.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for "@" asm constraint

2000-09-22 Thread Andrea Arcangeli

This patch fixes the spinlock problems in read_lock/write_lock and also some
alpha SMP race where clear_bit isn't enforcing a memory barrier, plus some
improvement in some place where we can check the waitqueue is not full before
entering wake_up. There's some minor alpha compile fix too.

On the alpha side I also removed the condition branches in set_bit/clear_bit
(this was running faster on my userspace testcase). Anyway I made that a
compile time #define/#undef decision so returning to the previous behaviour is
trivial. I removed the mb in the UP case in test_and_set_bit like functions.

It also avoids to use atomic operations for ext2 and minix that only needs
bitops that goes over sizeof(long). (previously suggested by Ingo)

set_bit/clear_bit doesn't clobber "memory" (they are __asm__ __volatile__
though) but they delcare the memory written as volatile so gcc should make sure
not to pass a copy of such memory but its real memory repository.  Also the
other places previously using the "fool gcc" thing now instead declare the
memory as volatile (some of the gcc fool wasn't declaring the array as
volatile, for example the dummy_lock in rwlock.h). The atomic_t is now delcared
volatile also in UP (for irqs and preemptive UP kernel).

For the atomic_set_mask/atomic_clear_mask I'm not casting the pointer
to the memory as a pointer to volatile because "memory" is clobbered
and that should be enough to ensure gcc doesn't use a copy of the memory.

Sparc64 now implements a more finegrined wmb as alpha (and rmb in the same wmb
way).

br_read_lock puts an mb() after acquring its per cpu slot. Dave, I think you
could replace such mb() (currently it's an rmb() that looks wrong for sparc64
even with the current rmb implementation) with membar("#StoreLoad"), but
there's no common code API for such a finegriend memory barrier only between
store and loads (mb() is the most finegrined call at the moment).


ftp://ftp.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.0-test9-pre5/spinlock-1

The patch is against test9-5 but it's been tested only based on test8-pre5 with
a night of cerberus load with 100Mbit network transfers in background on IA32
2-way SMP.  The alpha side passed now 59 minutes of cerberus overload as well
without any apparent problem (load of 23 with over 300 mbyte in SWAP on a 2G ram
machine 2-way SMP with flood of input and output TCP at 100Mbit).

The asm generated by the "memory" clobber doesn't seems to be inferior to the
previous asm btw (I compared two kernel images). There was no one bug in the
previous asm comparing to the new one though (on IA32).

One thing I still like to change (not addressed in the patch) is the eax
constraint on the pointer to the rwlocks in the read_lock/write_lock where the
spinlock is not a builtin constant. I removed the constraint (using "r") and
that generated better asm. I think it's better to increase of some byte the
size of the slow path (doing the push of eax to save it on the stack while we
use it to pass the spinlock pointer to the slow path function) than to harm the
fast path. Taking only the builtin variant should be enough to fix it.

The only architectures audited are IA32 and alpha, others may needs that
changes as well.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for @ asm constraint

2000-09-22 Thread Linus Torvalds



On Fri, 22 Sep 2000, Andrea Arcangeli wrote:

 This patch fixes the spinlock problems in read_lock/write_lock and also some
 alpha SMP race where clear_bit isn't enforcing a memory barrier

Why would clear_bit() be a memory barrier?

Anything that expects clear_bit() to be a memory barrier should be fixed.

Anything that wants a spinlock should use the spinlocks. Not the bit ops.
They are atomic, but they are not memory barriers.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for @ asm constraint

2000-09-22 Thread Andrea Arcangeli

This patch fixes the spinlock problems in read_lock/write_lock and also some
alpha SMP race where clear_bit isn't enforcing a memory barrier, plus some
improvement in some place where we can check the waitqueue is not full before
entering wake_up. There's some minor alpha compile fix too.

On the alpha side I also removed the condition branches in set_bit/clear_bit
(this was running faster on my userspace testcase). Anyway I made that a
compile time #define/#undef decision so returning to the previous behaviour is
trivial. I removed the mb in the UP case in test_and_set_bit like functions.

It also avoids to use atomic operations for ext2 and minix that only needs
bitops that goes over sizeof(long). (previously suggested by Ingo)

set_bit/clear_bit doesn't clobber "memory" (they are __asm__ __volatile__
though) but they delcare the memory written as volatile so gcc should make sure
not to pass a copy of such memory but its real memory repository.  Also the
other places previously using the "fool gcc" thing now instead declare the
memory as volatile (some of the gcc fool wasn't declaring the array as
volatile, for example the dummy_lock in rwlock.h). The atomic_t is now delcared
volatile also in UP (for irqs and preemptive UP kernel).

For the atomic_set_mask/atomic_clear_mask I'm not casting the pointer
to the memory as a pointer to volatile because "memory" is clobbered
and that should be enough to ensure gcc doesn't use a copy of the memory.

Sparc64 now implements a more finegrined wmb as alpha (and rmb in the same wmb
way).

br_read_lock puts an mb() after acquring its per cpu slot. Dave, I think you
could replace such mb() (currently it's an rmb() that looks wrong for sparc64
even with the current rmb implementation) with membar("#StoreLoad"), but
there's no common code API for such a finegriend memory barrier only between
store and loads (mb() is the most finegrined call at the moment).


ftp://ftp.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.0-test9-pre5/spinlock-1

The patch is against test9-5 but it's been tested only based on test8-pre5 with
a night of cerberus load with 100Mbit network transfers in background on IA32
2-way SMP.  The alpha side passed now 59 minutes of cerberus overload as well
without any apparent problem (load of 23 with over 300 mbyte in SWAP on a 2G ram
machine 2-way SMP with flood of input and output TCP at 100Mbit).

The asm generated by the "memory" clobber doesn't seems to be inferior to the
previous asm btw (I compared two kernel images). There was no one bug in the
previous asm comparing to the new one though (on IA32).

One thing I still like to change (not addressed in the patch) is the eax
constraint on the pointer to the rwlocks in the read_lock/write_lock where the
spinlock is not a builtin constant. I removed the constraint (using "r") and
that generated better asm. I think it's better to increase of some byte the
size of the slow path (doing the push of eax to save it on the stack while we
use it to pass the spinlock pointer to the slow path function) than to harm the
fast path. Taking only the builtin variant should be enough to fix it.

The only architectures audited are IA32 and alpha, others may needs that
changes as well.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for @ asm constraint

2000-09-22 Thread Andrea Arcangeli

On Fri, Sep 22, 2000 at 11:08:21AM -0700, Linus Torvalds wrote:
 The same is true in the filesystems - many of them want to change block or
 inode allocation bitmap bits, but they have to hold a lock anyway (usually
 the kernel lock in addition to the superblock lock.

The patch addresses the filesystems case introducing __set_bit,
__test_and_clear_bit and __test_and_set_bit and it infact starts the split
of the bitops.

In the patch the __*_bit non atomic API isn't fully implemented though
(__clear_bit, __test_and_change_bit are missing for example) and no common code
is using the current __*_bit nonatomic variant of the bitops directly anyway.
The fs happens to use it because of the minix_* ext2_* macros in the arch
specific part).  Finishing implementing the nonatomic bitops is very easy and I
agree it's good idea (I'll soon upload an incremental patch to implement the
few missing __*_bit calls) and then the common code (like the alloc/free as you
pointed out) can start using the __ variant to allow the compiler not to trash
registers and to skip SMP locks when not necessary.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for @ asm constraint

2000-09-22 Thread Alan Cox

 On Fri, 22 Sep 2000, Andrea Arcangeli wrote:
 
  This patch fixes the spinlock problems in read_lock/write_lock and also some
  alpha SMP race where clear_bit isn't enforcing a memory barrier
 
 Why would clear_bit() be a memory barrier?

Because historically it sort of was. This came up with the old PPP code
for one.

 Anything that expects clear_bit() to be a memory barrier should be fixed.

No argument about that however.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for @ asm constraint

2000-09-22 Thread Linus Torvalds



On Fri, 22 Sep 2000, Andrea Arcangeli wrote:

 If you don't like the name smp_mb__before/after_clear_bit (not that I like it
 too much too) suggestions are welcome. Maybe we could use a single
 smp_mb__across_bitops() instead?

I suspect that we should just split up the bitops.

We've done this once already. We used to have just one set of bitops that
always returned the old value, and it was inefficient when many users
don't actually care and you can often do it faster if the return value
doesn't matter. Thus the (long ago) split of "set_bit()" to "set_bit()"
and "test_and_set_bit()".

These days we have similar issues. In many cases we don't even want the
SMP safety - we just want to set a bit, and we already do locking. This
shows up in the mm code, for example - we use test_and_change_bit() for
the buddy allocator, and it doesn't have to be atomic because we have to
have locking for other reasons anyway. So on x86 we basically waste cycles
on locked accesses several times for each alloc/free page.

The same is true in the filesystems - many of them want to change block or
inode allocation bitmap bits, but they have to hold a lock anyway (usually
the kernel lock in addition to the superblock lock.

And in some circumstances (page locking, possible buffer cache locking)
you obviously do want the memory barrier stuff, and you're right, making
it associated with the bit operations allows for faster code on x86 where
the locked bit ops are barriers in themselves.

Ugh. There's quite a lot of combinations there. Maybe your approach is the
cleanest one.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for "@" asm constraint

2000-09-19 Thread Andrea Arcangeli

On Tue, Sep 19, 2000 at 04:16:07PM -0400, John Wehle wrote:
> Umm ... "miscompilation"?  As in the compiler produced the wrong code
> based on the input provided?

That's not a gcc bug (gcc is doing the right thing). It's the kernel that
should use the "memory" clobber in the spinlock implementation.

The sad code generated was in reality the _right_ code. I was blind not
noticing the missing $ (I missed it the first time in the first testcase I
tried and I kept missing it, I was probably also biased assuming the current
spinlocks was safe with the commonly used compilers, I was thinking to fix only
a theorical bug). I'm sorry for that (and thanks again to Richard and Jamie).

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for "@" asm constraint

2000-09-19 Thread John Wehle

> I see. So Jamie was right and we reproduced a case of miscompilation.

Umm ... "miscompilation"?  As in the compiler produced the wrong code
based on the input provided?

  int * p;

  ...

  a = *p;

movl p,%eax
movl (%eax),%edx

The assembly code appears to load the address stored at p (keep in mind
that p is a pointer), then use that address to fetch the value which is
placed in a.  What do you believe should have been generated by the compiler?

-- John
-
|   Feith Systems  |   Voice: 1-215-646-8000  |  Email: [EMAIL PROTECTED]  |
|John Wehle| Fax: 1-215-540-5495  | |
-

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for "@" asm constraint

2000-09-19 Thread Andrea Arcangeli

On Tue, Sep 19, 2000 at 07:22:48PM +0200, Jamie Lokier wrote:
> That instruction loads the _value_ of p.  I.e. reads the memory from
> location 0x80495a4 into %eax.  The source instruction was:
> 
>movl p,%eax
> 
> The instructions that you're thinking of, that load fixed addresses,
> look like these:
> 
>mov $0x80495a4,%eax
>lea 0x80495a4,%eax
> 
> or in source form:
> 
>movl $p,%eax
>leal p,%eax

Thanks for noticing my error.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for "@" asm constraint

2000-09-19 Thread Andrea Arcangeli

On Tue, Sep 19, 2000 at 10:23:05AM -0700, Richard Henderson wrote:
> On Tue, Sep 19, 2000 at 04:32:16PM +0200, Andrea Arcangeli wrote:
> > Wrong: it's really loading the _address_.
> [...]
> >  80483f5:   a1 a4 95 04 08  mov0x80495a4,%eax
> >^^^ ^
> 
> No, that's an absolute memory load.  If we were loading
> the address, there'd be a '$' before that number.

I see. So Jamie was right and we reproduced a case of miscompilation.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for "@" asm constraint

2000-09-19 Thread Richard Henderson

On Tue, Sep 19, 2000 at 04:32:16PM +0200, Andrea Arcangeli wrote:
> Wrong: it's really loading the _address_.
[...]
>  80483f5:   a1 a4 95 04 08  mov0x80495a4,%eax
>^^^ ^

No, that's an absolute memory load.  If we were loading
the address, there'd be a '$' before that number.


r~
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for "@" asm constraint

2000-09-19 Thread Andrea Arcangeli

On Tue, Sep 19, 2000 at 04:01:26PM +0100, David Howells wrote:
> I can't remember exactly what it was now, but I think it was either something
> to do with spinlocks or bitops. I'll re-investigate tonight and see if I can
> come back with some benchmarks/code-snippets tomorrow.

Yes you should tell us which is the inlined function that generated
different asm (if you post the two differnt asm or the two different .o we
can probably find it ourself).

I seen the rw_spin_locks are silly requesting the address of the spinlock to be
in the register eax when the address of the spinlock isn't a constant (while it
should instead at least use "r" and not "a") and I was going to fix it, however
that's not changed between test7 and test8...

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for "@" asm constraint

2000-09-19 Thread David Howells


I've been writing a kernel module and I've noticed a measurable performance
drop between the same code compiled against linux-2.4.0-test7 and against
test8. I disassembled the code to try and work out what was going on and I saw
the following happen:

 * [test8]
   One of the atomic memory access primitives supplied by the kernel
   was putting immediate data into a register outside of the inline-asm
   instruction group and then using the register inside.

 * [test7]
   The immediate data was passed directly to the inline-asm instruction.

In test8, of course, this means that the compiler has to scratch up a spare
register, which is totally unnecessary, as the immediate data could be
attached directly to the instruction opcode as was done in test7.

This had the effect of making the compiler have to push the old contents of
the register into a slot on the stack (I think it held a local variable at the
time), which had the further effects of using more stack memory, introducing
more register rearrangement (the code ended up longer), and burning up more
CPU cycles.

I can't remember exactly what it was now, but I think it was either something
to do with spinlocks or bitops. I'll re-investigate tonight and see if I can
come back with some benchmarks/code-snippets tomorrow.

David Howells
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for "@" asm constraint

2000-09-19 Thread Andrea Arcangeli

On Mon, Sep 18, 2000 at 09:37:43PM -0400, John Wehle wrote:
> It's perhaps not optimal, however I'm not sure that it's wrong.  In

It's not "wrong" in the sense that something breaks but it's definitely
suboptimal. There's no reason to reload a value that can't change because it's
embedded into the opcodes of the asm and set a static linking time.

> any case if you can supply a small standalone test case (i.e. preprocessed
> source code) I'll take a quick look at things.  I take it that you haven't
> tried the current gcc sources?

The first testcase is the current spinlock implementation, the second testcase
adds the "memory" clobber and it generates the _spurious_ reload of `p'. You
should be able to compile with `gcc -O2 -S p-*.i`.

Andrea


# 1 "p.c"
# 1 "/home/andrea/kernel/devel/2.2.18pre9aa1/include/linux/spinlock.h" 1


# 1 "/usr/include/asm/spinlock.h" 1 3



# 134 "/usr/include/asm/spinlock.h" 3


 



typedef struct {
volatile unsigned int lock;
} spinlock_t;




 








typedef struct { unsigned long a[100]; } __dummy_lock_t;



# 169 "/usr/include/asm/spinlock.h" 3




























 









typedef struct {
volatile unsigned int lock;
unsigned long previous;
} rwlock_t;



 






# 231 "/usr/include/asm/spinlock.h" 3






# 249 "/usr/include/asm/spinlock.h" 3




















# 3 "/home/andrea/kernel/devel/2.2.18pre9aa1/include/linux/spinlock.h" 2


# 1 "p.c" 2


int * p;
spinlock_t lock = (spinlock_t) { 0 } ;

void dummy(int x , int y) {}

main() {
int a, b; 
__asm__ __volatile__( "\n1:\t" "lock ; btsl $0,%0\n\t" "jc 2f\n" ".section 
.text.lock,\"ax\"\n" "2:\t" "testb $1,%0\n\t" "jne 2b\n\t" "jmp 1b\n" ".previous"  
:"=m" ((*(__dummy_lock_t *)()) )) ;
a = *p;
__asm__ __volatile__( "lock ; btrl $0,%0"  :"=m" ((*(__dummy_lock_t *)(   
 )) )) ;

__asm__ __volatile__( "\n1:\t" "lock ; btsl $0,%0\n\t" "jc 2f\n" ".section 
.text.lock,\"ax\"\n" "2:\t" "testb $1,%0\n\t" "jne 2b\n\t" "jmp 1b\n" ".previous"  
:"=m" ((*(__dummy_lock_t *)()) )) ;  
b = *p;
__asm__ __volatile__( "lock ; btrl $0,%0"  :"=m" ((*(__dummy_lock_t *)(   
 )) )) ;

dummy(a,b);
}


# 1 "p.c"
# 1 "/home/andrea/kernel/devel/2.2.18pre9aa1/include/linux/spinlock.h" 1


# 1 "/usr/include/asm/spinlock.h" 1 3



# 134 "/usr/include/asm/spinlock.h" 3


 



typedef struct {
volatile unsigned int lock;
} spinlock_t;




 








typedef struct { unsigned long a[100]; } __dummy_lock_t;



# 169 "/usr/include/asm/spinlock.h" 3




























 









typedef struct {
volatile unsigned int lock;
unsigned long previous;
} rwlock_t;



 






# 231 "/usr/include/asm/spinlock.h" 3






# 249 "/usr/include/asm/spinlock.h" 3




















# 3 "/home/andrea/kernel/devel/2.2.18pre9aa1/include/linux/spinlock.h" 2


# 1 "p.c" 2


int * p;
spinlock_t lock = (spinlock_t) { 0 } ;

void dummy(int x , int y) {}

main() {
int a, b; 
__asm__ __volatile__( "\n1:\t" "lock ; btsl $0,%0\n\t" "jc 2f\n" ".section 
.text.lock,\"ax\"\n" "2:\t" "testb $1,%0\n\t" "jne 2b\n\t" "jmp 1b\n" ".previous"  
:"=m" ((*(__dummy_lock_t *)()) ) : : "memory") ;
a = *p;
__asm__ __volatile__( "lock ; btrl $0,%0"  :"=m" ((*(__dummy_lock_t *)(   
 )) ) : : "memory") ;

__asm__ __volatile__( "\n1:\t" "lock ; btsl $0,%0\n\t" "jc 2f\n" ".section 
.text.lock,\"ax\"\n" "2:\t" "testb $1,%0\n\t" "jne 2b\n\t" "jmp 1b\n" ".previous"  
:"=m" ((*(__dummy_lock_t *)()) ) : : "memory") ;  
b = *p;
__asm__ __volatile__( "lock ; btrl $0,%0"  :"=m" ((*(__dummy_lock_t *)(   
 )) ) : : "memory") ;

dummy(a,b);
}



Re: GCC proposal for "@" asm constraint

2000-09-19 Thread Jamie Lokier

Andrea Arcangeli wrote:
> int * p;
> [...]
> spin_lock();
> a = *p;
> spin_unlock();
> 
> spin_lock();  
> b = *p;
> spin_unlock();

> [With "memory" clobber"] the [second] reload of the address of `p'
> isn't necessary and gcc is wrong in generating it.

Wrong, GCC is behaving correctly.

> p is a constant embedded into the .text section and set at link time,

p is a variable.  The _address_ of p is constant, but the reload is
not loading the address of p, it's loading the _value_.  That value can
be changed by other threads.

In fact, you have demonstrated why the "memory" clobber is necessary for
spinlocks.  A perfect test case!

In your first example, without the clobber, the asm code is incorrect.
A parallel thread can change the value of p between the first
spin_unlock and the second spin_lock, and the GCC-generated code does
not notice.

> The above reload are just wasted CPU cycles that we're little worried
> to waste.

Here, the saved cycle is a kernel bug.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for @ asm constraint

2000-09-19 Thread Jamie Lokier

Andrea Arcangeli wrote:
 int * p;
 [...]
 spin_lock(lock);
 a = *p;
 spin_unlock(lock);
 
 spin_lock(lock);  
 b = *p;
 spin_unlock(lock);

 [With "memory" clobber"] the [second] reload of the address of `p'
 isn't necessary and gcc is wrong in generating it.

Wrong, GCC is behaving correctly.

 p is a constant embedded into the .text section and set at link time,

p is a variable.  The _address_ of p is constant, but the reload is
not loading the address of p, it's loading the _value_.  That value can
be changed by other threads.

In fact, you have demonstrated why the "memory" clobber is necessary for
spinlocks.  A perfect test case!

In your first example, without the clobber, the asm code is incorrect.
A parallel thread can change the value of p between the first
spin_unlock and the second spin_lock, and the GCC-generated code does
not notice.

 The above reload are just wasted CPU cycles that we're little worried
 to waste.

Here, the saved cycle is a kernel bug.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for @ asm constraint

2000-09-19 Thread David Howells


I've been writing a kernel module and I've noticed a measurable performance
drop between the same code compiled against linux-2.4.0-test7 and against
test8. I disassembled the code to try and work out what was going on and I saw
the following happen:

 * [test8]
   One of the atomic memory access primitives supplied by the kernel
   was putting immediate data into a register outside of the inline-asm
   instruction group and then using the register inside.

 * [test7]
   The immediate data was passed directly to the inline-asm instruction.

In test8, of course, this means that the compiler has to scratch up a spare
register, which is totally unnecessary, as the immediate data could be
attached directly to the instruction opcode as was done in test7.

This had the effect of making the compiler have to push the old contents of
the register into a slot on the stack (I think it held a local variable at the
time), which had the further effects of using more stack memory, introducing
more register rearrangement (the code ended up longer), and burning up more
CPU cycles.

I can't remember exactly what it was now, but I think it was either something
to do with spinlocks or bitops. I'll re-investigate tonight and see if I can
come back with some benchmarks/code-snippets tomorrow.

David Howells
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for @ asm constraint

2000-09-19 Thread Andrea Arcangeli

On Tue, Sep 19, 2000 at 04:01:26PM +0100, David Howells wrote:
 I can't remember exactly what it was now, but I think it was either something
 to do with spinlocks or bitops. I'll re-investigate tonight and see if I can
 come back with some benchmarks/code-snippets tomorrow.

Yes you should tell us which is the inlined function that generated
different asm (if you post the two differnt asm or the two different .o we
can probably find it ourself).

I seen the rw_spin_locks are silly requesting the address of the spinlock to be
in the register eax when the address of the spinlock isn't a constant (while it
should instead at least use "r" and not "a") and I was going to fix it, however
that's not changed between test7 and test8...

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for @ asm constraint

2000-09-19 Thread Andrea Arcangeli

On Mon, Sep 18, 2000 at 09:37:43PM -0400, John Wehle wrote:
 It's perhaps not optimal, however I'm not sure that it's wrong.  In

It's not "wrong" in the sense that something breaks but it's definitely
suboptimal. There's no reason to reload a value that can't change because it's
embedded into the opcodes of the asm and set a static linking time.

 any case if you can supply a small standalone test case (i.e. preprocessed
 source code) I'll take a quick look at things.  I take it that you haven't
 tried the current gcc sources?

The first testcase is the current spinlock implementation, the second testcase
adds the "memory" clobber and it generates the _spurious_ reload of `p'. You
should be able to compile with `gcc -O2 -S p-*.i`.

Andrea


# 1 "p.c"
# 1 "/home/andrea/kernel/devel/2.2.18pre9aa1/include/linux/spinlock.h" 1


# 1 "/usr/include/asm/spinlock.h" 1 3



# 134 "/usr/include/asm/spinlock.h" 3


 



typedef struct {
volatile unsigned int lock;
} spinlock_t;




 








typedef struct { unsigned long a[100]; } __dummy_lock_t;



# 169 "/usr/include/asm/spinlock.h" 3




























 









typedef struct {
volatile unsigned int lock;
unsigned long previous;
} rwlock_t;



 






# 231 "/usr/include/asm/spinlock.h" 3






# 249 "/usr/include/asm/spinlock.h" 3




















# 3 "/home/andrea/kernel/devel/2.2.18pre9aa1/include/linux/spinlock.h" 2


# 1 "p.c" 2


int * p;
spinlock_t lock = (spinlock_t) { 0 } ;

void dummy(int x , int y) {}

main() {
int a, b; 
__asm__ __volatile__( "\n1:\t" "lock ; btsl $0,%0\n\t" "jc 2f\n" ".section 
.text.lock,\"ax\"\n" "2:\t" "testb $1,%0\n\t" "jne 2b\n\t" "jmp 1b\n" ".previous"  
:"=m" ((*(__dummy_lock_t *)(  lock  )) )) ;
a = *p;
__asm__ __volatile__( "lock ; btrl $0,%0"  :"=m" ((*(__dummy_lock_t *)(  lock 
 )) )) ;

__asm__ __volatile__( "\n1:\t" "lock ; btsl $0,%0\n\t" "jc 2f\n" ".section 
.text.lock,\"ax\"\n" "2:\t" "testb $1,%0\n\t" "jne 2b\n\t" "jmp 1b\n" ".previous"  
:"=m" ((*(__dummy_lock_t *)(  lock  )) )) ;  
b = *p;
__asm__ __volatile__( "lock ; btrl $0,%0"  :"=m" ((*(__dummy_lock_t *)(  lock 
 )) )) ;

dummy(a,b);
}


# 1 "p.c"
# 1 "/home/andrea/kernel/devel/2.2.18pre9aa1/include/linux/spinlock.h" 1


# 1 "/usr/include/asm/spinlock.h" 1 3



# 134 "/usr/include/asm/spinlock.h" 3


 



typedef struct {
volatile unsigned int lock;
} spinlock_t;




 








typedef struct { unsigned long a[100]; } __dummy_lock_t;



# 169 "/usr/include/asm/spinlock.h" 3




























 









typedef struct {
volatile unsigned int lock;
unsigned long previous;
} rwlock_t;



 






# 231 "/usr/include/asm/spinlock.h" 3






# 249 "/usr/include/asm/spinlock.h" 3




















# 3 "/home/andrea/kernel/devel/2.2.18pre9aa1/include/linux/spinlock.h" 2


# 1 "p.c" 2


int * p;
spinlock_t lock = (spinlock_t) { 0 } ;

void dummy(int x , int y) {}

main() {
int a, b; 
__asm__ __volatile__( "\n1:\t" "lock ; btsl $0,%0\n\t" "jc 2f\n" ".section 
.text.lock,\"ax\"\n" "2:\t" "testb $1,%0\n\t" "jne 2b\n\t" "jmp 1b\n" ".previous"  
:"=m" ((*(__dummy_lock_t *)(  lock  )) ) : : "memory") ;
a = *p;
__asm__ __volatile__( "lock ; btrl $0,%0"  :"=m" ((*(__dummy_lock_t *)(  lock 
 )) ) : : "memory") ;

__asm__ __volatile__( "\n1:\t" "lock ; btsl $0,%0\n\t" "jc 2f\n" ".section 
.text.lock,\"ax\"\n" "2:\t" "testb $1,%0\n\t" "jne 2b\n\t" "jmp 1b\n" ".previous"  
:"=m" ((*(__dummy_lock_t *)(  lock  )) ) : : "memory") ;  
b = *p;
__asm__ __volatile__( "lock ; btrl $0,%0"  :"=m" ((*(__dummy_lock_t *)(  lock 
 )) ) : : "memory") ;

dummy(a,b);
}



Re: GCC proposal for @ asm constraint

2000-09-19 Thread Richard Henderson

On Tue, Sep 19, 2000 at 04:32:16PM +0200, Andrea Arcangeli wrote:
 Wrong: it's really loading the _address_.
[...]
  80483f5:   a1 a4 95 04 08  mov0x80495a4,%eax
^^^ ^

No, that's an absolute memory load.  If we were loading
the address, there'd be a '$' before that number.


r~
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for @ asm constraint

2000-09-19 Thread Andrea Arcangeli

On Tue, Sep 19, 2000 at 10:23:05AM -0700, Richard Henderson wrote:
 On Tue, Sep 19, 2000 at 04:32:16PM +0200, Andrea Arcangeli wrote:
  Wrong: it's really loading the _address_.
 [...]
   80483f5:   a1 a4 95 04 08  mov0x80495a4,%eax
 ^^^ ^
 
 No, that's an absolute memory load.  If we were loading
 the address, there'd be a '$' before that number.

I see. So Jamie was right and we reproduced a case of miscompilation.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for @ asm constraint

2000-09-19 Thread John Wehle

 I see. So Jamie was right and we reproduced a case of miscompilation.

Umm ... "miscompilation"?  As in the compiler produced the wrong code
based on the input provided?

  int * p;

  ...

  a = *p;

movl p,%eax
movl (%eax),%edx

The assembly code appears to load the address stored at p (keep in mind
that p is a pointer), then use that address to fetch the value which is
placed in a.  What do you believe should have been generated by the compiler?

-- John
-
|   Feith Systems  |   Voice: 1-215-646-8000  |  Email: [EMAIL PROTECTED]  |
|John Wehle| Fax: 1-215-540-5495  | |
-

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for "@" asm constraint

2000-09-18 Thread John Wehle

> The reload of the address of `p' isn't necessary and gcc is wrong in
> generating it. p is a constant embedded into the .text section ...

It's perhaps not optimal, however I'm not sure that it's wrong.  In
any case if you can supply a small standalone test case (i.e. preprocessed
source code) I'll take a quick look at things.  I take it that you haven't
tried the current gcc sources?

-- John
-
|   Feith Systems  |   Voice: 1-215-646-8000  |  Email: [EMAIL PROTECTED]  |
|John Wehle| Fax: 1-215-540-5495  | |
-

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for "@" asm constraint

2000-09-18 Thread Andrea Arcangeli

On Mon, Sep 18, 2000 at 07:53:04PM -0400, John Wehle wrote:
> What version of gcc?  Recently some work was done to improve the handling of
> constant memory.

I'm using 2.95.2 19991024.

Take this small testcase:

#include 

int * p;
spinlock_t lock = SPIN_LOCK_UNLOCKED;

extern void dummy(int, int);

myfunc() {
int a, b; 
spin_lock();
a = *p;
spin_unlock();

spin_lock();  
b = *p;
spin_unlock();

dummy(a,b);
}

If I compile it with:

gcc -O2 -D__SMP__ -I ~/kernel/devel/2.2.18pre9aa1/include/ -S p.c

where 2.2.18pre9aa1 is the current spinlock implementation without
the "memory" clobber and with the __dummy trick I get:

.file   "p.c"
.version"01.01"
gcc2_compiled.:
.globl lock
.data
.align 4
.typelock,@object
.sizelock,4
lock:
.long 0
.text
.align 16
.globl myfunc
.typemyfunc,@function
myfunc:
pushl %ebp
movl %esp,%ebp
subl $8,%esp
#APP

1:  lock ; btsl $0,lock
jc 2f
.section .text.lock,"ax"
2:  testb $1,lock
jne 2b
jmp 1b
.previous
#NO_APP
movl p,%eax
^^^
movl (%eax),%edx
#APP
lock ; btrl $0,lock

1:  lock ; btsl $0,lock
jc 2f
.section .text.lock,"ax"
2:  testb $1,lock
jne 2b
jmp 1b
.previous
#NO_APP
movl (%eax),%eax
#APP
lock ; btrl $0,lock
#NO_APP
addl $-8,%esp
pushl %eax
pushl %edx
call dummy
movl %ebp,%esp
popl %ebp
ret
.Lfe1:
.sizemyfunc,.Lfe1-myfunc
.comm   p,4,4
.ident  "GCC: (GNU) 2.95.2 19991024 (release)"

If now I repeat the same after applying this patch to the
kernel tree that I was inlining:

--- 2.2.18pre9aa1/include/asm-i386/spinlock.h.~1~   Mon Sep 18 04:56:28 2000
+++ 2.2.18pre9aa1/include/asm-i386/spinlock.h   Tue Sep 19 03:04:56 2000
@@ -173,12 +173,12 @@
 #define spin_lock(lock) \
 __asm__ __volatile__( \
spin_lock_string \
-   :"=m" (__dummy_lock(lock)))
+   :"=m" (__dummy_lock(lock)) : : "memory")
 
 #define spin_unlock(lock) \
 __asm__ __volatile__( \
spin_unlock_string \
-   :"=m" (__dummy_lock(lock)))
+   :"=m" (__dummy_lock(lock)) : : "memory")
 
 #define spin_trylock(lock) (!test_and_set_bit(0,(lock)))

I then get this assembler:

.file   "p.c"
.version"01.01"
gcc2_compiled.:
.globl lock
.data
.align 4
.typelock,@object
.sizelock,4
lock:
.long 0
.text
.align 16
.globl myfunc
.typemyfunc,@function
myfunc:
pushl %ebp
movl %esp,%ebp
subl $8,%esp
#APP

1:  lock ; btsl $0,lock
jc 2f
.section .text.lock,"ax"
2:  testb $1,lock
jne 2b
jmp 1b
.previous
#NO_APP
movl p,%eax
^^^
movl (%eax),%edx
#APP
lock ; btrl $0,lock

1:  lock ; btsl $0,lock
jc 2f
.section .text.lock,"ax"
2:  testb $1,lock
jne 2b
jmp 1b
.previous
#NO_APP
movl p,%eax
^^^
movl (%eax),%eax
#APP
lock ; btrl $0,lock
#NO_APP
addl $-8,%esp
pushl %eax
pushl %edx
call dummy
movl %ebp,%esp
popl %ebp
ret
.Lfe1:
.sizemyfunc,.Lfe1-myfunc
.comm   p,4,4
.ident  "GCC: (GNU) 2.95.2 19991024 (release)"

The diff between the generated asms:

--- p.s.default-spinlocks   Tue Sep 19 03:10:14 2000
+++ p.s.memory  Tue Sep 19 03:10:29 2000
@@ -39,6 +39,7 @@
jmp 1b
 .previous
 #NO_APP
+   movl p,%eax
movl (%eax),%eax
 #APP
lock ; btrl $0,lock


The reload of the address of `p' isn't necessary and gcc is wrong in generating
it. p is a constant embedded into the .text section and set at link time, the
only way to change it would be if the assembler that declares "memory" as
clobber would be self modifying the code itself and gcc should assume nothing
about self modifying code instead (none bit of IA32 linux is self modifying).

The above reload are just wasted CPU cycles that we're little worried to waste.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for "@" asm constraint

2000-09-18 Thread John Wehle

> I read the asm produced by some of some of my testcases.  The current spinlock
> implementation seems to do exactly the _right_ thing in practice and nothing
> more. "memory" was instead causing reloads of constant addresses into registers
> and stuff that shouldn't be necessary (I was infact wondering about the reason
> of those suprious loads also in my first email).

What version of gcc?  Recently some work was done to improve the handling
of constant memory.

-- John
-
|   Feith Systems  |   Voice: 1-215-646-8000  |  Email: [EMAIL PROTECTED]  |
|John Wehle| Fax: 1-215-540-5495  | |
-

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for "@" asm constraint

2000-09-18 Thread Andrea Arcangeli

On Mon, Sep 18, 2000 at 03:39:50PM -0700, Linus Torvalds wrote:

> Have you looked at the code it generates? Quite sad, really.

I read the asm produced by some of some of my testcases.  The current spinlock
implementation seems to do exactly the _right_ thing in practice and nothing
more. "memory" was instead causing reloads of constant addresses into registers
and stuff that shouldn't be necessary (I was infact wondering about the reason
of those suprious loads also in my first email).

Said that I heard that some recent gcc miscompiles the kernel and we also have
to always compile with -fno-strict-aliasing. I think gcc developers should
comment about this issue. If they say the __dummy way is still going to be safe
for some gcc release, we can skip those spurious loads caused by the "memory"
clobber. From the email I received from Richard Henderson in this thread it
seems they prefer that the kernel doesn't relys on those __dummy just now (and
they have the rights to complain because that's a kernel bug).

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for "@" asm constraint

2000-09-18 Thread Linus Torvalds



On Tue, 19 Sep 2000, Andrea Arcangeli wrote:
> 
> I think we can remove all the __dummy stuff and put the "memory" in such asm
> statements.
> 
> Comments?

Have you looked at the code it generates? Quite sad, really.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for "@" asm constraint

2000-09-18 Thread Andrea Arcangeli

On Fri, Sep 08, 2000 at 01:41:05PM +0200, Jamie Lokier wrote:
> Casting via __dummy is there so that the "m" (or "=m") memory constraint
> will make that operand refer to the actual object in memory, and not a
> copy (in a different area of memory).

Are you really sure gcc could pass a copy even when I specify "memory" in the
clobber list? My point is that "memory" could also mean that the address where
gcc is taking the _copy_ could be clobbered by the asm itself as side effect
and so gcc shouldn't allowed to assume anything and it should first copy the
result value of the previous actions to its real final location before starting
up any asm that clobbers "memory". I think the "memory" clobber should be
enough to ensure that the address used with the .*m constraints refers to the
real backend of the memory passed as parameter to the inline asm.

I think we can remove all the __dummy stuff and put the "memory" in such asm
statements.

Comments?

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for @ asm constraint

2000-09-18 Thread Andrea Arcangeli

On Fri, Sep 08, 2000 at 01:41:05PM +0200, Jamie Lokier wrote:
 Casting via __dummy is there so that the "m" (or "=m") memory constraint
 will make that operand refer to the actual object in memory, and not a
 copy (in a different area of memory).

Are you really sure gcc could pass a copy even when I specify "memory" in the
clobber list? My point is that "memory" could also mean that the address where
gcc is taking the _copy_ could be clobbered by the asm itself as side effect
and so gcc shouldn't allowed to assume anything and it should first copy the
result value of the previous actions to its real final location before starting
up any asm that clobbers "memory". I think the "memory" clobber should be
enough to ensure that the address used with the .*m constraints refers to the
real backend of the memory passed as parameter to the inline asm.

I think we can remove all the __dummy stuff and put the "memory" in such asm
statements.

Comments?

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for @ asm constraint

2000-09-18 Thread Andrea Arcangeli

On Mon, Sep 18, 2000 at 03:39:50PM -0700, Linus Torvalds wrote:

 Have you looked at the code it generates? Quite sad, really.

I read the asm produced by some of some of my testcases.  The current spinlock
implementation seems to do exactly the _right_ thing in practice and nothing
more. "memory" was instead causing reloads of constant addresses into registers
and stuff that shouldn't be necessary (I was infact wondering about the reason
of those suprious loads also in my first email).

Said that I heard that some recent gcc miscompiles the kernel and we also have
to always compile with -fno-strict-aliasing. I think gcc developers should
comment about this issue. If they say the __dummy way is still going to be safe
for some gcc release, we can skip those spurious loads caused by the "memory"
clobber. From the email I received from Richard Henderson in this thread it
seems they prefer that the kernel doesn't relys on those __dummy just now (and
they have the rights to complain because that's a kernel bug).

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for @ asm constraint

2000-09-18 Thread John Wehle

 I read the asm produced by some of some of my testcases.  The current spinlock
 implementation seems to do exactly the _right_ thing in practice and nothing
 more. "memory" was instead causing reloads of constant addresses into registers
 and stuff that shouldn't be necessary (I was infact wondering about the reason
 of those suprious loads also in my first email).

What version of gcc?  Recently some work was done to improve the handling
of constant memory.

-- John
-
|   Feith Systems  |   Voice: 1-215-646-8000  |  Email: [EMAIL PROTECTED]  |
|John Wehle| Fax: 1-215-540-5495  | |
-

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for @ asm constraint

2000-09-18 Thread Linus Torvalds



On Tue, 19 Sep 2000, Andrea Arcangeli wrote:
 
 I think we can remove all the __dummy stuff and put the "memory" in such asm
 statements.
 
 Comments?

Have you looked at the code it generates? Quite sad, really.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for @ asm constraint

2000-09-18 Thread Andrea Arcangeli

On Mon, Sep 18, 2000 at 07:53:04PM -0400, John Wehle wrote:
 What version of gcc?  Recently some work was done to improve the handling of
 constant memory.

I'm using 2.95.2 19991024.

Take this small testcase:

#include linux/spinlock.h

int * p;
spinlock_t lock = SPIN_LOCK_UNLOCKED;

extern void dummy(int, int);

myfunc() {
int a, b; 
spin_lock(lock);
a = *p;
spin_unlock(lock);

spin_lock(lock);  
b = *p;
spin_unlock(lock);

dummy(a,b);
}

If I compile it with:

gcc -O2 -D__SMP__ -I ~/kernel/devel/2.2.18pre9aa1/include/ -S p.c

where 2.2.18pre9aa1 is the current spinlock implementation without
the "memory" clobber and with the __dummy trick I get:

.file   "p.c"
.version"01.01"
gcc2_compiled.:
.globl lock
.data
.align 4
.typelock,@object
.sizelock,4
lock:
.long 0
.text
.align 16
.globl myfunc
.typemyfunc,@function
myfunc:
pushl %ebp
movl %esp,%ebp
subl $8,%esp
#APP

1:  lock ; btsl $0,lock
jc 2f
.section .text.lock,"ax"
2:  testb $1,lock
jne 2b
jmp 1b
.previous
#NO_APP
movl p,%eax
^^^
movl (%eax),%edx
#APP
lock ; btrl $0,lock

1:  lock ; btsl $0,lock
jc 2f
.section .text.lock,"ax"
2:  testb $1,lock
jne 2b
jmp 1b
.previous
#NO_APP
movl (%eax),%eax
#APP
lock ; btrl $0,lock
#NO_APP
addl $-8,%esp
pushl %eax
pushl %edx
call dummy
movl %ebp,%esp
popl %ebp
ret
.Lfe1:
.sizemyfunc,.Lfe1-myfunc
.comm   p,4,4
.ident  "GCC: (GNU) 2.95.2 19991024 (release)"

If now I repeat the same after applying this patch to the
kernel tree that I was inlining:

--- 2.2.18pre9aa1/include/asm-i386/spinlock.h.~1~   Mon Sep 18 04:56:28 2000
+++ 2.2.18pre9aa1/include/asm-i386/spinlock.h   Tue Sep 19 03:04:56 2000
@@ -173,12 +173,12 @@
 #define spin_lock(lock) \
 __asm__ __volatile__( \
spin_lock_string \
-   :"=m" (__dummy_lock(lock)))
+   :"=m" (__dummy_lock(lock)) : : "memory")
 
 #define spin_unlock(lock) \
 __asm__ __volatile__( \
spin_unlock_string \
-   :"=m" (__dummy_lock(lock)))
+   :"=m" (__dummy_lock(lock)) : : "memory")
 
 #define spin_trylock(lock) (!test_and_set_bit(0,(lock)))

I then get this assembler:

.file   "p.c"
.version"01.01"
gcc2_compiled.:
.globl lock
.data
.align 4
.typelock,@object
.sizelock,4
lock:
.long 0
.text
.align 16
.globl myfunc
.typemyfunc,@function
myfunc:
pushl %ebp
movl %esp,%ebp
subl $8,%esp
#APP

1:  lock ; btsl $0,lock
jc 2f
.section .text.lock,"ax"
2:  testb $1,lock
jne 2b
jmp 1b
.previous
#NO_APP
movl p,%eax
^^^
movl (%eax),%edx
#APP
lock ; btrl $0,lock

1:  lock ; btsl $0,lock
jc 2f
.section .text.lock,"ax"
2:  testb $1,lock
jne 2b
jmp 1b
.previous
#NO_APP
movl p,%eax
^^^
movl (%eax),%eax
#APP
lock ; btrl $0,lock
#NO_APP
addl $-8,%esp
pushl %eax
pushl %edx
call dummy
movl %ebp,%esp
popl %ebp
ret
.Lfe1:
.sizemyfunc,.Lfe1-myfunc
.comm   p,4,4
.ident  "GCC: (GNU) 2.95.2 19991024 (release)"

The diff between the generated asms:

--- p.s.default-spinlocks   Tue Sep 19 03:10:14 2000
+++ p.s.memory  Tue Sep 19 03:10:29 2000
@@ -39,6 +39,7 @@
jmp 1b
 .previous
 #NO_APP
+   movl p,%eax
movl (%eax),%eax
 #APP
lock ; btrl $0,lock


The reload of the address of `p' isn't necessary and gcc is wrong in generating
it. p is a constant embedded into the .text section and set at link time, the
only way to change it would be if the assembler that declares "memory" as
clobber would be self modifying the code itself and gcc should assume nothing
about self modifying code instead (none bit of IA32 linux is self modifying).

The above reload are just wasted CPU cycles that we're little worried to waste.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: GCC proposal for @ asm constraint

2000-09-18 Thread John Wehle

 The reload of the address of `p' isn't necessary and gcc is wrong in
 generating it. p is a constant embedded into the .text section ...

It's perhaps not optimal, however I'm not sure that it's wrong.  In
any case if you can supply a small standalone test case (i.e. preprocessed
source code) I'll take a quick look at things.  I take it that you haven't
tried the current gcc sources?

-- John
-
|   Feith Systems  |   Voice: 1-215-646-8000  |  Email: [EMAIL PROTECTED]  |
|John Wehle| Fax: 1-215-540-5495  | |
-

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



GCC proposal for "@" asm constraint

2000-09-08 Thread Jamie Lokier

Andrea Arcangeli wrote:
> >BTW Look also into asm-i386/bitops.h and dummy cast to some crap there.
> >Are you impressed? 8)
> 
> Yep 8). If we add "memory" such stuff could be removed I think. As far I
> can see the object of such stuff is to cause gcc to say `I'm too lazy to
> see exactly what memory this guy is trying to change, so just assume he
> added "memory" in the clobber list' :))

No, that's not the reason for __dummy.  It is an important side effect,
though as ever it isn't guaranteed.  Someone should add "memory" to the
bitops _iff_ the bitops are supposed to imply a compiler memory barrier.
It's a kernel policy decision.

 ---

For the benefit of GCC list readers: Linux uses asm constraints like
this:  `"m" (*(volatile struct __dummy *) )', where __dummy is
defined by  `struct __dummy { unsigned long a[100]; }'.

This is used extensively in asms for spinlocks, semaphores and atomic
bit operations, atomic counters etc.  In short, anything needing to
operate on a specific memory object.

Passing the address as an operand would be correct but generates worse
code, because in general we don't need a register to hold the address of
`object'.  It is often part of a larger struct, and the __dummy method
lets it be addressed using offset addressing, and often fewer registers.

Casting via __dummy is there so that the "m" (or "=m") memory constraint
will make that operand refer to the actual object in memory, and not a
copy (in a different area of memory).

Most of the time there is no reason for GCC to use a copy of the object
for an "m" constraint, but things like CSE can allow the compiler to
choose a different object known to have the same contents.  Other
scenarios cause __dummy to be required for "=m" constraints.

(Even with __dummy there is no guarantee that a future GCC won't use a
different object anyway, but I expect that is years away).

I'm posting this to the GCC list to make a feature request.

GCC feature request
---

An additional constraint character, like "m" but means "the object at
the specified address".

The operand in C source code would be the object's address (not
dereferenced as Linux does now), so there is no need for bizarre
semantics.

So if I write (assume `@' because most letters are taken):

   asm ("movl %1,%0" : "=g" (result) : "@" ());

it would a clean equivalent to this:

   asm ("movl %1,%0" : "=g" (result)
 : "m" (*(volatile struct __dummy *) ));

and more or less equivalent to this (but generates better code):

   asm ("movl (%1),%0" : "=g" (result) : "r" ());

An alternative would be a modifier to "m" that means "definitely use the
actual object referred to".  I prefer the direct approach.

What do GCC designers think?  This is useful for any code that must use
asms for atomic operations such as semaphores in threads etc.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



GCC proposal for @ asm constraint

2000-09-08 Thread Jamie Lokier

Andrea Arcangeli wrote:
 BTW Look also into asm-i386/bitops.h and dummy cast to some crap there.
 Are you impressed? 8)
 
 Yep 8). If we add "memory" such stuff could be removed I think. As far I
 can see the object of such stuff is to cause gcc to say `I'm too lazy to
 see exactly what memory this guy is trying to change, so just assume he
 added "memory" in the clobber list' :))

No, that's not the reason for __dummy.  It is an important side effect,
though as ever it isn't guaranteed.  Someone should add "memory" to the
bitops _iff_ the bitops are supposed to imply a compiler memory barrier.
It's a kernel policy decision.

 ---

For the benefit of GCC list readers: Linux uses asm constraints like
this:  `"m" (*(volatile struct __dummy *) object)', where __dummy is
defined by  `struct __dummy { unsigned long a[100]; }'.

This is used extensively in asms for spinlocks, semaphores and atomic
bit operations, atomic counters etc.  In short, anything needing to
operate on a specific memory object.

Passing the address as an operand would be correct but generates worse
code, because in general we don't need a register to hold the address of
`object'.  It is often part of a larger struct, and the __dummy method
lets it be addressed using offset addressing, and often fewer registers.

Casting via __dummy is there so that the "m" (or "=m") memory constraint
will make that operand refer to the actual object in memory, and not a
copy (in a different area of memory).

Most of the time there is no reason for GCC to use a copy of the object
for an "m" constraint, but things like CSE can allow the compiler to
choose a different object known to have the same contents.  Other
scenarios cause __dummy to be required for "=m" constraints.

(Even with __dummy there is no guarantee that a future GCC won't use a
different object anyway, but I expect that is years away).

I'm posting this to the GCC list to make a feature request.

GCC feature request
---

An additional constraint character, like "m" but means "the object at
the specified address".

The operand in C source code would be the object's address (not
dereferenced as Linux does now), so there is no need for bizarre
semantics.

So if I write (assume `@' because most letters are taken):

   asm ("movl %1,%0" : "=g" (result) : "@" (object));

it would a clean equivalent to this:

   asm ("movl %1,%0" : "=g" (result)
 : "m" (*(volatile struct __dummy *) object));

and more or less equivalent to this (but generates better code):

   asm ("movl (%1),%0" : "=g" (result) : "r" (object));

An alternative would be a modifier to "m" that means "definitely use the
actual object referred to".  I prefer the direct approach.

What do GCC designers think?  This is useful for any code that must use
asms for atomic operations such as semaphores in threads etc.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/