Re: [PATCH] move xchg/cmpxchg to atomic.h

2001-01-02 Thread Daniel Phillips

Roman Zippel wrote:
> 
> On Tue, 2 Jan 2001, David S. Miller wrote:
> 
> >We really can't.  We _only_ have load-and-zero.  And it has to be
> >16-byte aligned.  xchg() is just not something the CPU implements.
> >
> > Oh bugger... you do have real problems.
> 
> For 2.5 we could move all the atomic functions from atomic.h, bitops.h,
> system.h and give them a common interface. We could also give them a new
> argument atomic_spinlock_t, which is a normal spinlock, but only used on
> architectures which need it, everyone else can "optimize" it away. I think
> one such lock per major subsystem should be enough, as the lock is only
> held for a very short time, so contentation should be no problem.
> Anyway, this had the huge advantage that we could use the complete 32/64
> bit of the atomic value, e.g. for pointer operations.

*Yes*, and I could write:
waiters = xchg(_waiters.counter, 0);

instead of:
waiters = atomic_read(_waiters);
atomic_sub(waiters, _waiters);

in my daemon wakeup patch.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] move xchg/cmpxchg to atomic.h

2001-01-02 Thread Roman Zippel

Hi,

On Tue, 2 Jan 2001, David S. Miller wrote:

>We really can't.  We _only_ have load-and-zero.  And it has to be
>16-byte aligned.  xchg() is just not something the CPU implements.
> 
> Oh bugger... you do have real problems.

For 2.5 we could move all the atomic functions from atomic.h, bitops.h,
system.h and give them a common interface. We could also give them a new
argument atomic_spinlock_t, which is a normal spinlock, but only used on
architectures which need it, everyone else can "optimize" it away. I think
one such lock per major subsystem should be enough, as the lock is only
held for a very short time, so contentation should be no problem.
Anyway, this had the huge advantage that we could use the complete 32/64
bit of the atomic value, e.g. for pointer operations.

bye, Roman


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] move xchg/cmpxchg to atomic.h

2001-01-02 Thread Grant Grundler


David,
Sorry for being dense - but I don't see the problem in using
a spinlock to implement xchg(). The example algorithm looks broken.
Or am I missing something obvious here?

"David S. Miller" wrote:
> It is very common to do things like:
> 
> producer(elem)
> {
>   elem->next = list->head;
>   xchg(>head, elem);
> }
> 
> consumer()
> {
>   local_list = xchg(>head, NULL);
>   for_each(elem, local_list)
>   do_something(elem);
> }

producer() looks broken. The problem is two producers can race and
one will put the wrong value of list->head in elem->next.

I think prepending to list->head needs to either be protected by a spinlock
or be a per-cpu data structure. consumer() should be ok assuming the code
can tolerate picking up "late arrivals" in the next pass.
Or am I missing something obvious here?

It's worse if producer were inlined: the arch specific optimisers might
re-order the "elem->next = list->head" statement to be quite a bit more
than 1 or 2 cycles from the xchg() operation.

thanks,
grant

Grant Grundler
Unix Systems Enablement Lab
+1.408.447.7253
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] move xchg/cmpxchg to atomic.h

2001-01-02 Thread Linus Torvalds

In article <[EMAIL PROTECTED]>,
Alan Cox  <[EMAIL PROTECTED]> wrote:
>> > We really can't.  We _only_ have load-and-zero.  And it has to be 16-byte
>> > aligned.  xchg() is just not something the CPU implements.
>> 
>> The network code relies on the reader-xchg semantics David described in 
>> several places.
>
>I guess the network code will just have to change for 2.5. read_xchg_val()
>can be a null macro for everyone else at least

You can easily do reader-xchg semantics even if you don't have an atomic
xchg and are using spinlocks. In fact, it will work the obvious way
correctly, assuming that the reader either gets the old value or the new
value but not some "partway old and new".

A xchg() that uses spinlocks and a simple read+write inside the spinlock
will give that exact behaviour as long as the "load-and-zero" is not
used for the xchg _value_, but only used for the spinlock, which is the
obvious implementation.

So we're fine. The parisc implementation isn't the fastest in the world,
but hey, that's what you get for having bad hardware support for SMP.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] move xchg/cmpxchg to atomic.h

2001-01-02 Thread Alan Cox

> > We really can't.  We _only_ have load-and-zero.  And it has to be 16-byte
> > aligned.  xchg() is just not something the CPU implements.
> 
> The network code relies on the reader-xchg semantics David described in 
> several places.

I guess the network code will just have to change for 2.5. read_xchg_val()
can be a null macro for everyone else at least

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] move xchg/cmpxchg to atomic.h

2001-01-02 Thread Andi Kleen

On Tue, Jan 02, 2001 at 11:22:42AM +, Matthew Wilcox wrote:
> On Tue, Jan 02, 2001 at 01:03:48AM -0800, David S. Miller wrote:
> > If you require an external agent (f.e. your spinlock) because you
> > cannot implement xchg with a real atomic sequence, this breaks the
> > above assumptions.
> 
> We really can't.  We _only_ have load-and-zero.  And it has to be 16-byte
> aligned.  xchg() is just not something the CPU implements.

The network code relies on the reader-xchg semantics David described in 
several places.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] move xchg/cmpxchg to atomic.h

2001-01-02 Thread Matthew Wilcox

On Tue, Jan 02, 2001 at 01:03:48AM -0800, David S. Miller wrote:
> If you require an external agent (f.e. your spinlock) because you
> cannot implement xchg with a real atomic sequence, this breaks the
> above assumptions.

We really can't.  We _only_ have load-and-zero.  And it has to be 16-byte
aligned.  xchg() is just not something the CPU implements.

-- 
Revolutions do not require corporate support.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] move xchg/cmpxchg to atomic.h

2001-01-02 Thread David S. Miller

   Date: Tue, 2 Jan 2001 11:22:42 +
   From: Matthew Wilcox <[EMAIL PROTECTED]>

   We really can't.  We _only_ have load-and-zero.  And it has to be
   16-byte aligned.  xchg() is just not something the CPU implements.

Oh bugger... you do have real problems.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] move xchg/cmpxchg to atomic.h

2001-01-02 Thread David S. Miller

   Date:Tue, 2 Jan 2001 00:11:57 -0800 (PST)
   From: Grant Grundler <[EMAIL PROTECTED]>

   Fundemental problem is parisc only supports one atomic operation
   (LDCW/LDCD) and uses spinlocks for all atomic operations including
   xchg/cmpxchg.

Using spinlocks for the implementation of xchg on SMP might be
problematic.

If you implement things like this, several subtle things might
break.  For example, there is code in a few spots (or, at least at one
time there was) which assumed the update of the datum itself is atomic
and uses this assumption to do lock-free read-only accesses of the
data.

If you require an external agent (f.e. your spinlock) because you
cannot implement xchg with a real atomic sequence, this breaks the
above assumptions.

It is very common to do things like:

producer(elem)
{
elem->next = list->head;
xchg(>head, elem);
}

consumer()
{
local_list = xchg(>head, NULL);
for_each(elem, local_list)
do_something(elem);
}

In fact we had code excatly like this in the buffer cache at one
point in time.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] move xchg/cmpxchg to atomic.h

2001-01-02 Thread David S. Miller

   Date:Tue, 2 Jan 2001 00:11:57 -0800 (PST)
   From: Grant Grundler [EMAIL PROTECTED]

   Fundemental problem is parisc only supports one atomic operation
   (LDCW/LDCD) and uses spinlocks for all atomic operations including
   xchg/cmpxchg.

Using spinlocks for the implementation of xchg on SMP might be
problematic.

If you implement things like this, several subtle things might
break.  For example, there is code in a few spots (or, at least at one
time there was) which assumed the update of the datum itself is atomic
and uses this assumption to do lock-free read-only accesses of the
data.

If you require an external agent (f.e. your spinlock) because you
cannot implement xchg with a real atomic sequence, this breaks the
above assumptions.

It is very common to do things like:

producer(elem)
{
elem-next = list-head;
xchg(list-head, elem);
}

consumer()
{
local_list = xchg(list-head, NULL);
for_each(elem, local_list)
do_something(elem);
}

In fact we had code excatly like this in the buffer cache at one
point in time.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] move xchg/cmpxchg to atomic.h

2001-01-02 Thread David S. Miller

   Date: Tue, 2 Jan 2001 11:22:42 +
   From: Matthew Wilcox [EMAIL PROTECTED]

   We really can't.  We _only_ have load-and-zero.  And it has to be
   16-byte aligned.  xchg() is just not something the CPU implements.

Oh bugger... you do have real problems.

Later,
David S. Miller
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] move xchg/cmpxchg to atomic.h

2001-01-02 Thread Matthew Wilcox

On Tue, Jan 02, 2001 at 01:03:48AM -0800, David S. Miller wrote:
 If you require an external agent (f.e. your spinlock) because you
 cannot implement xchg with a real atomic sequence, this breaks the
 above assumptions.

We really can't.  We _only_ have load-and-zero.  And it has to be 16-byte
aligned.  xchg() is just not something the CPU implements.

-- 
Revolutions do not require corporate support.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] move xchg/cmpxchg to atomic.h

2001-01-02 Thread Andi Kleen

On Tue, Jan 02, 2001 at 11:22:42AM +, Matthew Wilcox wrote:
 On Tue, Jan 02, 2001 at 01:03:48AM -0800, David S. Miller wrote:
  If you require an external agent (f.e. your spinlock) because you
  cannot implement xchg with a real atomic sequence, this breaks the
  above assumptions.
 
 We really can't.  We _only_ have load-and-zero.  And it has to be 16-byte
 aligned.  xchg() is just not something the CPU implements.

The network code relies on the reader-xchg semantics David described in 
several places.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] move xchg/cmpxchg to atomic.h

2001-01-02 Thread Alan Cox

  We really can't.  We _only_ have load-and-zero.  And it has to be 16-byte
  aligned.  xchg() is just not something the CPU implements.
 
 The network code relies on the reader-xchg semantics David described in 
 several places.

I guess the network code will just have to change for 2.5. read_xchg_val()
can be a null macro for everyone else at least

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] move xchg/cmpxchg to atomic.h

2001-01-02 Thread Linus Torvalds

In article [EMAIL PROTECTED],
Alan Cox  [EMAIL PROTECTED] wrote:
  We really can't.  We _only_ have load-and-zero.  And it has to be 16-byte
  aligned.  xchg() is just not something the CPU implements.
 
 The network code relies on the reader-xchg semantics David described in 
 several places.

I guess the network code will just have to change for 2.5. read_xchg_val()
can be a null macro for everyone else at least

You can easily do reader-xchg semantics even if you don't have an atomic
xchg and are using spinlocks. In fact, it will work the obvious way
correctly, assuming that the reader either gets the old value or the new
value but not some "partway old and new".

A xchg() that uses spinlocks and a simple read+write inside the spinlock
will give that exact behaviour as long as the "load-and-zero" is not
used for the xchg _value_, but only used for the spinlock, which is the
obvious implementation.

So we're fine. The parisc implementation isn't the fastest in the world,
but hey, that's what you get for having bad hardware support for SMP.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] move xchg/cmpxchg to atomic.h

2001-01-02 Thread Grant Grundler


David,
Sorry for being dense - but I don't see the problem in using
a spinlock to implement xchg(). The example algorithm looks broken.
Or am I missing something obvious here?

"David S. Miller" wrote:
 It is very common to do things like:
 
 producer(elem)
 {
   elem-next = list-head;
   xchg(list-head, elem);
 }
 
 consumer()
 {
   local_list = xchg(list-head, NULL);
   for_each(elem, local_list)
   do_something(elem);
 }

producer() looks broken. The problem is two producers can race and
one will put the wrong value of list-head in elem-next.

I think prepending to list-head needs to either be protected by a spinlock
or be a per-cpu data structure. consumer() should be ok assuming the code
can tolerate picking up "late arrivals" in the next pass.
Or am I missing something obvious here?

It's worse if producer were inlined: the arch specific optimisers might
re-order the "elem-next = list-head" statement to be quite a bit more
than 1 or 2 cycles from the xchg() operation.

thanks,
grant

Grant Grundler
Unix Systems Enablement Lab
+1.408.447.7253
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] move xchg/cmpxchg to atomic.h

2001-01-02 Thread Roman Zippel

Hi,

On Tue, 2 Jan 2001, David S. Miller wrote:

We really can't.  We _only_ have load-and-zero.  And it has to be
16-byte aligned.  xchg() is just not something the CPU implements.
 
 Oh bugger... you do have real problems.

For 2.5 we could move all the atomic functions from atomic.h, bitops.h,
system.h and give them a common interface. We could also give them a new
argument atomic_spinlock_t, which is a normal spinlock, but only used on
architectures which need it, everyone else can "optimize" it away. I think
one such lock per major subsystem should be enough, as the lock is only
held for a very short time, so contentation should be no problem.
Anyway, this had the huge advantage that we could use the complete 32/64
bit of the atomic value, e.g. for pointer operations.

bye, Roman


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: [PATCH] move xchg/cmpxchg to atomic.h

2001-01-02 Thread Daniel Phillips

Roman Zippel wrote:
 
 On Tue, 2 Jan 2001, David S. Miller wrote:
 
 We really can't.  We _only_ have load-and-zero.  And it has to be
 16-byte aligned.  xchg() is just not something the CPU implements.
 
  Oh bugger... you do have real problems.
 
 For 2.5 we could move all the atomic functions from atomic.h, bitops.h,
 system.h and give them a common interface. We could also give them a new
 argument atomic_spinlock_t, which is a normal spinlock, but only used on
 architectures which need it, everyone else can "optimize" it away. I think
 one such lock per major subsystem should be enough, as the lock is only
 held for a very short time, so contentation should be no problem.
 Anyway, this had the huge advantage that we could use the complete 32/64
 bit of the atomic value, e.g. for pointer operations.

*Yes*, and I could write:
waiters = xchg(bdflush_waiters.counter, 0);

instead of:
waiters = atomic_read(bdflush_waiters);
atomic_sub(waiters, bdflush_waiters);

in my daemon wakeup patch.

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[PATCH] move xchg/cmpxchg to atomic.h

2001-01-01 Thread Grant Grundler


On parisc-linux mailing list, Grant Grundler wrote:
> After surveying all the arches that define __HAVE_ARCH_CMPXCHG:
> 
> ./include/asm-alpha/system.h:#define __HAVE_ARCH_CMPXCHG 1
> ./include/asm-i386/system.h:#define __HAVE_ARCH_CMPXCHG 1
> ./include/asm-ia64/system.h:#define __HAVE_ARCH_CMPXCHG 1
> ./include/asm-ppc/system.h:#define __HAVE_ARCH_CMPXCHG  1
> ./include/asm-sparc64/system.h:#define __HAVE_ARCH_CMPXCHG 1
> 
> I've come to the conclusion xchg/cmpxchg definitions do NOT
> belong in system.h.  AFAICT, all the above use Load Linked semantics
> (or in the i386 case, operation is atomic). In other words, xchg/cmpxchg
> are atomic operations.  Shouldn't xchg/cmpxchg definitions live
> with other atomic operations - asm/atomic.h?
 
On Sat, 30 Dec 2000 16:46:57 + (GMT), Alan Cox replied:
| Seems a reasonable thing to try and move to atomic.h yes

Fundemental problem is parisc only supports one atomic operation
(LDCW/LDCD) and uses spinlocks for all atomic operations including
xchg/cmpxchg. Issue is dependencies between system.h, atomic.h
and spinlock.h are *really* ugly and prevented parisc port from
inlining xchg/cmpxchg definitions. This is a first step in fixing
that problem.

I've already made this change to the parisc-linux source tree for parisc
and parisc64 builds. Below is the i386 patch for linux-2.4.0-prerelease.
This is a simple cut/paste.

thanks,
grant

diff -ruNp linux/include/asm-i386/atomic.h linux.patch/include/asm-i386/atomic.h
--- linux/include/asm-i386/atomic.h Sun Dec 31 11:10:16 2000
+++ linux.patch/include/asm-i386/atomic.h   Mon Jan  1 23:28:08 2001
@@ -2,6 +2,7 @@
 #define __ARCH_I386_ATOMIC__
 
 #include 
+#include  /* for LOCK_PREFIX */
 
 /*
  * Atomic operations that C can't guarantee us.  Useful for
@@ -111,4 +112,136 @@ __asm__ __volatile__(LOCK "andl %0,%1" \
 __asm__ __volatile__(LOCK "orl %0,%1" \
 : : "r" (mask),"m" (*addr) : "memory")
 
+
+/* xchg/cmpxchg moved from asm/system.h */
+#define xchg(ptr,v) ((__typeof__(*(ptr)))__xchg((unsigned 
+long)(v),(ptr),sizeof(*(ptr
+
+#define tas(ptr) (xchg((ptr),1))
+
+struct __xchg_dummy { unsigned long a[100]; };
+#define __xg(x) ((struct __xchg_dummy *)(x))
+
+
+/*
+ * The semantics of XCHGCMP8B are a bit strange, this is why
+ * there is a loop and the loading of %%eax and %%edx has to
+ * be inside. This inlines well in most cases, the cached
+ * cost is around ~38 cycles. (in the future we might want
+ * to do an SIMD/3DNOW!/MMX/FPU 64-bit store here, but that
+ * might have an implicit FPU-save as a cost, so it's not
+ * clear which path to go.)
+ */
+extern inline void __set_64bit (unsigned long long * ptr,
+   unsigned int low, unsigned int high)
+{
+__asm__ __volatile__ (
+   "1: movl (%0), %%eax;
+   movl 4(%0), %%edx;
+   cmpxchg8b (%0);
+   jnz 1b"
+   ::  "D"(ptr),
+   "b"(low),
+   "c"(high)
+   :
+   "ax","dx","memory");
+}
+
+extern void inline __set_64bit_constant (unsigned long long *ptr,
+unsigned long long value)
+{
+   __set_64bit(ptr,(unsigned int)(value), (unsigned int)((value)>>32ULL));
+}
+#define ll_low(x)  *(((unsigned int*)&(x))+0)
+#define ll_high(x) *(((unsigned int*)&(x))+1)
+
+extern void inline __set_64bit_var (unsigned long long *ptr,
+unsigned long long value)
+{
+   __set_64bit(ptr,ll_low(value), ll_high(value));
+}
+
+#define set_64bit(ptr,value) \
+(__builtin_constant_p(value) ? \
+ __set_64bit_constant(ptr, value) : \
+ __set_64bit_var(ptr, value) )
+
+#define _set_64bit(ptr,value) \
+(__builtin_constant_p(value) ? \
+ __set_64bit(ptr, (unsigned int)(value), (unsigned int)((value)>>32ULL) ) : \
+ __set_64bit(ptr, ll_low(value), ll_high(value)) )
+
+/*
+ * Note: no "lock" prefix even on SMP: xchg always implies lock anyway
+ * Note 2: xchg has side effect, so that attribute volatile is necessary,
+ *   but generally the primitive is invalid, *ptr is output argument. --ANK
+ */
+static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int size)
+{
+   switch (size) {
+   case 1:
+   __asm__ __volatile__("xchgb %b0,%1"
+   :"=q" (x)
+   :"m" (*__xg(ptr)), "0" (x)
+   :"memory");
+   break;
+   case 2:
+   __asm__ __volatile__("xchgw %w0,%1"
+   :"=r" (x)
+   :"m" (*__xg(ptr)), "0" (x)
+   :"memory");
+   break;
+   case 4:
+   __asm__ __volatile__("xchgl %0,%1"
+   :"=r" (x)
+   :"m" (*__xg(ptr)), "0" (x)
+   :"memory");
+   

[PATCH] move xchg/cmpxchg to atomic.h

2001-01-01 Thread Grant Grundler


On parisc-linux mailing list, Grant Grundler wrote:
 After surveying all the arches that define __HAVE_ARCH_CMPXCHG:
 
 ./include/asm-alpha/system.h:#define __HAVE_ARCH_CMPXCHG 1
 ./include/asm-i386/system.h:#define __HAVE_ARCH_CMPXCHG 1
 ./include/asm-ia64/system.h:#define __HAVE_ARCH_CMPXCHG 1
 ./include/asm-ppc/system.h:#define __HAVE_ARCH_CMPXCHG  1
 ./include/asm-sparc64/system.h:#define __HAVE_ARCH_CMPXCHG 1
 
 I've come to the conclusion xchg/cmpxchg definitions do NOT
 belong in system.h.  AFAICT, all the above use Load Linked semantics
 (or in the i386 case, operation is atomic). In other words, xchg/cmpxchg
 are atomic operations.  Shouldn't xchg/cmpxchg definitions live
 with other atomic operations - asm/atomic.h?
 
On Sat, 30 Dec 2000 16:46:57 + (GMT), Alan Cox replied:
| Seems a reasonable thing to try and move to atomic.h yes

Fundemental problem is parisc only supports one atomic operation
(LDCW/LDCD) and uses spinlocks for all atomic operations including
xchg/cmpxchg. Issue is dependencies between system.h, atomic.h
and spinlock.h are *really* ugly and prevented parisc port from
inlining xchg/cmpxchg definitions. This is a first step in fixing
that problem.

I've already made this change to the parisc-linux source tree for parisc
and parisc64 builds. Below is the i386 patch for linux-2.4.0-prerelease.
This is a simple cut/paste.

thanks,
grant

diff -ruNp linux/include/asm-i386/atomic.h linux.patch/include/asm-i386/atomic.h
--- linux/include/asm-i386/atomic.h Sun Dec 31 11:10:16 2000
+++ linux.patch/include/asm-i386/atomic.h   Mon Jan  1 23:28:08 2001
@@ -2,6 +2,7 @@
 #define __ARCH_I386_ATOMIC__
 
 #include linux/config.h
+#include linux/bitops.h /* for LOCK_PREFIX */
 
 /*
  * Atomic operations that C can't guarantee us.  Useful for
@@ -111,4 +112,136 @@ __asm__ __volatile__(LOCK "andl %0,%1" \
 __asm__ __volatile__(LOCK "orl %0,%1" \
 : : "r" (mask),"m" (*addr) : "memory")
 
+
+/* xchg/cmpxchg moved from asm/system.h */
+#define xchg(ptr,v) ((__typeof__(*(ptr)))__xchg((unsigned 
+long)(v),(ptr),sizeof(*(ptr
+
+#define tas(ptr) (xchg((ptr),1))
+
+struct __xchg_dummy { unsigned long a[100]; };
+#define __xg(x) ((struct __xchg_dummy *)(x))
+
+
+/*
+ * The semantics of XCHGCMP8B are a bit strange, this is why
+ * there is a loop and the loading of %%eax and %%edx has to
+ * be inside. This inlines well in most cases, the cached
+ * cost is around ~38 cycles. (in the future we might want
+ * to do an SIMD/3DNOW!/MMX/FPU 64-bit store here, but that
+ * might have an implicit FPU-save as a cost, so it's not
+ * clear which path to go.)
+ */
+extern inline void __set_64bit (unsigned long long * ptr,
+   unsigned int low, unsigned int high)
+{
+__asm__ __volatile__ (
+   "1: movl (%0), %%eax;
+   movl 4(%0), %%edx;
+   cmpxchg8b (%0);
+   jnz 1b"
+   ::  "D"(ptr),
+   "b"(low),
+   "c"(high)
+   :
+   "ax","dx","memory");
+}
+
+extern void inline __set_64bit_constant (unsigned long long *ptr,
+unsigned long long value)
+{
+   __set_64bit(ptr,(unsigned int)(value), (unsigned int)((value)32ULL));
+}
+#define ll_low(x)  *(((unsigned int*)(x))+0)
+#define ll_high(x) *(((unsigned int*)(x))+1)
+
+extern void inline __set_64bit_var (unsigned long long *ptr,
+unsigned long long value)
+{
+   __set_64bit(ptr,ll_low(value), ll_high(value));
+}
+
+#define set_64bit(ptr,value) \
+(__builtin_constant_p(value) ? \
+ __set_64bit_constant(ptr, value) : \
+ __set_64bit_var(ptr, value) )
+
+#define _set_64bit(ptr,value) \
+(__builtin_constant_p(value) ? \
+ __set_64bit(ptr, (unsigned int)(value), (unsigned int)((value)32ULL) ) : \
+ __set_64bit(ptr, ll_low(value), ll_high(value)) )
+
+/*
+ * Note: no "lock" prefix even on SMP: xchg always implies lock anyway
+ * Note 2: xchg has side effect, so that attribute volatile is necessary,
+ *   but generally the primitive is invalid, *ptr is output argument. --ANK
+ */
+static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int size)
+{
+   switch (size) {
+   case 1:
+   __asm__ __volatile__("xchgb %b0,%1"
+   :"=q" (x)
+   :"m" (*__xg(ptr)), "0" (x)
+   :"memory");
+   break;
+   case 2:
+   __asm__ __volatile__("xchgw %w0,%1"
+   :"=r" (x)
+   :"m" (*__xg(ptr)), "0" (x)
+   :"memory");
+   break;
+   case 4:
+   __asm__ __volatile__("xchgl %0,%1"
+   :"=r" (x)
+   :"m" (*__xg(ptr)), "0" (x)
+   :"memory");
+