Re: [PATCH] move xchg/cmpxchg to atomic.h
Roman Zippel wrote: > > On Tue, 2 Jan 2001, David S. Miller wrote: > > >We really can't. We _only_ have load-and-zero. And it has to be > >16-byte aligned. xchg() is just not something the CPU implements. > > > > Oh bugger... you do have real problems. > > For 2.5 we could move all the atomic functions from atomic.h, bitops.h, > system.h and give them a common interface. We could also give them a new > argument atomic_spinlock_t, which is a normal spinlock, but only used on > architectures which need it, everyone else can "optimize" it away. I think > one such lock per major subsystem should be enough, as the lock is only > held for a very short time, so contentation should be no problem. > Anyway, this had the huge advantage that we could use the complete 32/64 > bit of the atomic value, e.g. for pointer operations. *Yes*, and I could write: waiters = xchg(_waiters.counter, 0); instead of: waiters = atomic_read(_waiters); atomic_sub(waiters, _waiters); in my daemon wakeup patch. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] move xchg/cmpxchg to atomic.h
Hi, On Tue, 2 Jan 2001, David S. Miller wrote: >We really can't. We _only_ have load-and-zero. And it has to be >16-byte aligned. xchg() is just not something the CPU implements. > > Oh bugger... you do have real problems. For 2.5 we could move all the atomic functions from atomic.h, bitops.h, system.h and give them a common interface. We could also give them a new argument atomic_spinlock_t, which is a normal spinlock, but only used on architectures which need it, everyone else can "optimize" it away. I think one such lock per major subsystem should be enough, as the lock is only held for a very short time, so contentation should be no problem. Anyway, this had the huge advantage that we could use the complete 32/64 bit of the atomic value, e.g. for pointer operations. bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] move xchg/cmpxchg to atomic.h
David, Sorry for being dense - but I don't see the problem in using a spinlock to implement xchg(). The example algorithm looks broken. Or am I missing something obvious here? "David S. Miller" wrote: > It is very common to do things like: > > producer(elem) > { > elem->next = list->head; > xchg(>head, elem); > } > > consumer() > { > local_list = xchg(>head, NULL); > for_each(elem, local_list) > do_something(elem); > } producer() looks broken. The problem is two producers can race and one will put the wrong value of list->head in elem->next. I think prepending to list->head needs to either be protected by a spinlock or be a per-cpu data structure. consumer() should be ok assuming the code can tolerate picking up "late arrivals" in the next pass. Or am I missing something obvious here? It's worse if producer were inlined: the arch specific optimisers might re-order the "elem->next = list->head" statement to be quite a bit more than 1 or 2 cycles from the xchg() operation. thanks, grant Grant Grundler Unix Systems Enablement Lab +1.408.447.7253 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] move xchg/cmpxchg to atomic.h
In article <[EMAIL PROTECTED]>, Alan Cox <[EMAIL PROTECTED]> wrote: >> > We really can't. We _only_ have load-and-zero. And it has to be 16-byte >> > aligned. xchg() is just not something the CPU implements. >> >> The network code relies on the reader-xchg semantics David described in >> several places. > >I guess the network code will just have to change for 2.5. read_xchg_val() >can be a null macro for everyone else at least You can easily do reader-xchg semantics even if you don't have an atomic xchg and are using spinlocks. In fact, it will work the obvious way correctly, assuming that the reader either gets the old value or the new value but not some "partway old and new". A xchg() that uses spinlocks and a simple read+write inside the spinlock will give that exact behaviour as long as the "load-and-zero" is not used for the xchg _value_, but only used for the spinlock, which is the obvious implementation. So we're fine. The parisc implementation isn't the fastest in the world, but hey, that's what you get for having bad hardware support for SMP. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] move xchg/cmpxchg to atomic.h
> > We really can't. We _only_ have load-and-zero. And it has to be 16-byte > > aligned. xchg() is just not something the CPU implements. > > The network code relies on the reader-xchg semantics David described in > several places. I guess the network code will just have to change for 2.5. read_xchg_val() can be a null macro for everyone else at least - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] move xchg/cmpxchg to atomic.h
On Tue, Jan 02, 2001 at 11:22:42AM +, Matthew Wilcox wrote: > On Tue, Jan 02, 2001 at 01:03:48AM -0800, David S. Miller wrote: > > If you require an external agent (f.e. your spinlock) because you > > cannot implement xchg with a real atomic sequence, this breaks the > > above assumptions. > > We really can't. We _only_ have load-and-zero. And it has to be 16-byte > aligned. xchg() is just not something the CPU implements. The network code relies on the reader-xchg semantics David described in several places. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] move xchg/cmpxchg to atomic.h
On Tue, Jan 02, 2001 at 01:03:48AM -0800, David S. Miller wrote: > If you require an external agent (f.e. your spinlock) because you > cannot implement xchg with a real atomic sequence, this breaks the > above assumptions. We really can't. We _only_ have load-and-zero. And it has to be 16-byte aligned. xchg() is just not something the CPU implements. -- Revolutions do not require corporate support. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] move xchg/cmpxchg to atomic.h
Date: Tue, 2 Jan 2001 11:22:42 + From: Matthew Wilcox <[EMAIL PROTECTED]> We really can't. We _only_ have load-and-zero. And it has to be 16-byte aligned. xchg() is just not something the CPU implements. Oh bugger... you do have real problems. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] move xchg/cmpxchg to atomic.h
Date:Tue, 2 Jan 2001 00:11:57 -0800 (PST) From: Grant Grundler <[EMAIL PROTECTED]> Fundemental problem is parisc only supports one atomic operation (LDCW/LDCD) and uses spinlocks for all atomic operations including xchg/cmpxchg. Using spinlocks for the implementation of xchg on SMP might be problematic. If you implement things like this, several subtle things might break. For example, there is code in a few spots (or, at least at one time there was) which assumed the update of the datum itself is atomic and uses this assumption to do lock-free read-only accesses of the data. If you require an external agent (f.e. your spinlock) because you cannot implement xchg with a real atomic sequence, this breaks the above assumptions. It is very common to do things like: producer(elem) { elem->next = list->head; xchg(>head, elem); } consumer() { local_list = xchg(>head, NULL); for_each(elem, local_list) do_something(elem); } In fact we had code excatly like this in the buffer cache at one point in time. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] move xchg/cmpxchg to atomic.h
Date:Tue, 2 Jan 2001 00:11:57 -0800 (PST) From: Grant Grundler [EMAIL PROTECTED] Fundemental problem is parisc only supports one atomic operation (LDCW/LDCD) and uses spinlocks for all atomic operations including xchg/cmpxchg. Using spinlocks for the implementation of xchg on SMP might be problematic. If you implement things like this, several subtle things might break. For example, there is code in a few spots (or, at least at one time there was) which assumed the update of the datum itself is atomic and uses this assumption to do lock-free read-only accesses of the data. If you require an external agent (f.e. your spinlock) because you cannot implement xchg with a real atomic sequence, this breaks the above assumptions. It is very common to do things like: producer(elem) { elem-next = list-head; xchg(list-head, elem); } consumer() { local_list = xchg(list-head, NULL); for_each(elem, local_list) do_something(elem); } In fact we had code excatly like this in the buffer cache at one point in time. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] move xchg/cmpxchg to atomic.h
Date: Tue, 2 Jan 2001 11:22:42 + From: Matthew Wilcox [EMAIL PROTECTED] We really can't. We _only_ have load-and-zero. And it has to be 16-byte aligned. xchg() is just not something the CPU implements. Oh bugger... you do have real problems. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] move xchg/cmpxchg to atomic.h
On Tue, Jan 02, 2001 at 01:03:48AM -0800, David S. Miller wrote: If you require an external agent (f.e. your spinlock) because you cannot implement xchg with a real atomic sequence, this breaks the above assumptions. We really can't. We _only_ have load-and-zero. And it has to be 16-byte aligned. xchg() is just not something the CPU implements. -- Revolutions do not require corporate support. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] move xchg/cmpxchg to atomic.h
On Tue, Jan 02, 2001 at 11:22:42AM +, Matthew Wilcox wrote: On Tue, Jan 02, 2001 at 01:03:48AM -0800, David S. Miller wrote: If you require an external agent (f.e. your spinlock) because you cannot implement xchg with a real atomic sequence, this breaks the above assumptions. We really can't. We _only_ have load-and-zero. And it has to be 16-byte aligned. xchg() is just not something the CPU implements. The network code relies on the reader-xchg semantics David described in several places. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] move xchg/cmpxchg to atomic.h
We really can't. We _only_ have load-and-zero. And it has to be 16-byte aligned. xchg() is just not something the CPU implements. The network code relies on the reader-xchg semantics David described in several places. I guess the network code will just have to change for 2.5. read_xchg_val() can be a null macro for everyone else at least - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] move xchg/cmpxchg to atomic.h
In article [EMAIL PROTECTED], Alan Cox [EMAIL PROTECTED] wrote: We really can't. We _only_ have load-and-zero. And it has to be 16-byte aligned. xchg() is just not something the CPU implements. The network code relies on the reader-xchg semantics David described in several places. I guess the network code will just have to change for 2.5. read_xchg_val() can be a null macro for everyone else at least You can easily do reader-xchg semantics even if you don't have an atomic xchg and are using spinlocks. In fact, it will work the obvious way correctly, assuming that the reader either gets the old value or the new value but not some "partway old and new". A xchg() that uses spinlocks and a simple read+write inside the spinlock will give that exact behaviour as long as the "load-and-zero" is not used for the xchg _value_, but only used for the spinlock, which is the obvious implementation. So we're fine. The parisc implementation isn't the fastest in the world, but hey, that's what you get for having bad hardware support for SMP. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] move xchg/cmpxchg to atomic.h
David, Sorry for being dense - but I don't see the problem in using a spinlock to implement xchg(). The example algorithm looks broken. Or am I missing something obvious here? "David S. Miller" wrote: It is very common to do things like: producer(elem) { elem-next = list-head; xchg(list-head, elem); } consumer() { local_list = xchg(list-head, NULL); for_each(elem, local_list) do_something(elem); } producer() looks broken. The problem is two producers can race and one will put the wrong value of list-head in elem-next. I think prepending to list-head needs to either be protected by a spinlock or be a per-cpu data structure. consumer() should be ok assuming the code can tolerate picking up "late arrivals" in the next pass. Or am I missing something obvious here? It's worse if producer were inlined: the arch specific optimisers might re-order the "elem-next = list-head" statement to be quite a bit more than 1 or 2 cycles from the xchg() operation. thanks, grant Grant Grundler Unix Systems Enablement Lab +1.408.447.7253 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] move xchg/cmpxchg to atomic.h
Hi, On Tue, 2 Jan 2001, David S. Miller wrote: We really can't. We _only_ have load-and-zero. And it has to be 16-byte aligned. xchg() is just not something the CPU implements. Oh bugger... you do have real problems. For 2.5 we could move all the atomic functions from atomic.h, bitops.h, system.h and give them a common interface. We could also give them a new argument atomic_spinlock_t, which is a normal spinlock, but only used on architectures which need it, everyone else can "optimize" it away. I think one such lock per major subsystem should be enough, as the lock is only held for a very short time, so contentation should be no problem. Anyway, this had the huge advantage that we could use the complete 32/64 bit of the atomic value, e.g. for pointer operations. bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] move xchg/cmpxchg to atomic.h
Roman Zippel wrote: On Tue, 2 Jan 2001, David S. Miller wrote: We really can't. We _only_ have load-and-zero. And it has to be 16-byte aligned. xchg() is just not something the CPU implements. Oh bugger... you do have real problems. For 2.5 we could move all the atomic functions from atomic.h, bitops.h, system.h and give them a common interface. We could also give them a new argument atomic_spinlock_t, which is a normal spinlock, but only used on architectures which need it, everyone else can "optimize" it away. I think one such lock per major subsystem should be enough, as the lock is only held for a very short time, so contentation should be no problem. Anyway, this had the huge advantage that we could use the complete 32/64 bit of the atomic value, e.g. for pointer operations. *Yes*, and I could write: waiters = xchg(bdflush_waiters.counter, 0); instead of: waiters = atomic_read(bdflush_waiters); atomic_sub(waiters, bdflush_waiters); in my daemon wakeup patch. -- Daniel - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[PATCH] move xchg/cmpxchg to atomic.h
On parisc-linux mailing list, Grant Grundler wrote: > After surveying all the arches that define __HAVE_ARCH_CMPXCHG: > > ./include/asm-alpha/system.h:#define __HAVE_ARCH_CMPXCHG 1 > ./include/asm-i386/system.h:#define __HAVE_ARCH_CMPXCHG 1 > ./include/asm-ia64/system.h:#define __HAVE_ARCH_CMPXCHG 1 > ./include/asm-ppc/system.h:#define __HAVE_ARCH_CMPXCHG 1 > ./include/asm-sparc64/system.h:#define __HAVE_ARCH_CMPXCHG 1 > > I've come to the conclusion xchg/cmpxchg definitions do NOT > belong in system.h. AFAICT, all the above use Load Linked semantics > (or in the i386 case, operation is atomic). In other words, xchg/cmpxchg > are atomic operations. Shouldn't xchg/cmpxchg definitions live > with other atomic operations - asm/atomic.h? On Sat, 30 Dec 2000 16:46:57 + (GMT), Alan Cox replied: | Seems a reasonable thing to try and move to atomic.h yes Fundemental problem is parisc only supports one atomic operation (LDCW/LDCD) and uses spinlocks for all atomic operations including xchg/cmpxchg. Issue is dependencies between system.h, atomic.h and spinlock.h are *really* ugly and prevented parisc port from inlining xchg/cmpxchg definitions. This is a first step in fixing that problem. I've already made this change to the parisc-linux source tree for parisc and parisc64 builds. Below is the i386 patch for linux-2.4.0-prerelease. This is a simple cut/paste. thanks, grant diff -ruNp linux/include/asm-i386/atomic.h linux.patch/include/asm-i386/atomic.h --- linux/include/asm-i386/atomic.h Sun Dec 31 11:10:16 2000 +++ linux.patch/include/asm-i386/atomic.h Mon Jan 1 23:28:08 2001 @@ -2,6 +2,7 @@ #define __ARCH_I386_ATOMIC__ #include +#include /* for LOCK_PREFIX */ /* * Atomic operations that C can't guarantee us. Useful for @@ -111,4 +112,136 @@ __asm__ __volatile__(LOCK "andl %0,%1" \ __asm__ __volatile__(LOCK "orl %0,%1" \ : : "r" (mask),"m" (*addr) : "memory") + +/* xchg/cmpxchg moved from asm/system.h */ +#define xchg(ptr,v) ((__typeof__(*(ptr)))__xchg((unsigned +long)(v),(ptr),sizeof(*(ptr + +#define tas(ptr) (xchg((ptr),1)) + +struct __xchg_dummy { unsigned long a[100]; }; +#define __xg(x) ((struct __xchg_dummy *)(x)) + + +/* + * The semantics of XCHGCMP8B are a bit strange, this is why + * there is a loop and the loading of %%eax and %%edx has to + * be inside. This inlines well in most cases, the cached + * cost is around ~38 cycles. (in the future we might want + * to do an SIMD/3DNOW!/MMX/FPU 64-bit store here, but that + * might have an implicit FPU-save as a cost, so it's not + * clear which path to go.) + */ +extern inline void __set_64bit (unsigned long long * ptr, + unsigned int low, unsigned int high) +{ +__asm__ __volatile__ ( + "1: movl (%0), %%eax; + movl 4(%0), %%edx; + cmpxchg8b (%0); + jnz 1b" + :: "D"(ptr), + "b"(low), + "c"(high) + : + "ax","dx","memory"); +} + +extern void inline __set_64bit_constant (unsigned long long *ptr, +unsigned long long value) +{ + __set_64bit(ptr,(unsigned int)(value), (unsigned int)((value)>>32ULL)); +} +#define ll_low(x) *(((unsigned int*)&(x))+0) +#define ll_high(x) *(((unsigned int*)&(x))+1) + +extern void inline __set_64bit_var (unsigned long long *ptr, +unsigned long long value) +{ + __set_64bit(ptr,ll_low(value), ll_high(value)); +} + +#define set_64bit(ptr,value) \ +(__builtin_constant_p(value) ? \ + __set_64bit_constant(ptr, value) : \ + __set_64bit_var(ptr, value) ) + +#define _set_64bit(ptr,value) \ +(__builtin_constant_p(value) ? \ + __set_64bit(ptr, (unsigned int)(value), (unsigned int)((value)>>32ULL) ) : \ + __set_64bit(ptr, ll_low(value), ll_high(value)) ) + +/* + * Note: no "lock" prefix even on SMP: xchg always implies lock anyway + * Note 2: xchg has side effect, so that attribute volatile is necessary, + * but generally the primitive is invalid, *ptr is output argument. --ANK + */ +static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int size) +{ + switch (size) { + case 1: + __asm__ __volatile__("xchgb %b0,%1" + :"=q" (x) + :"m" (*__xg(ptr)), "0" (x) + :"memory"); + break; + case 2: + __asm__ __volatile__("xchgw %w0,%1" + :"=r" (x) + :"m" (*__xg(ptr)), "0" (x) + :"memory"); + break; + case 4: + __asm__ __volatile__("xchgl %0,%1" + :"=r" (x) + :"m" (*__xg(ptr)), "0" (x) + :"memory"); +
[PATCH] move xchg/cmpxchg to atomic.h
On parisc-linux mailing list, Grant Grundler wrote: After surveying all the arches that define __HAVE_ARCH_CMPXCHG: ./include/asm-alpha/system.h:#define __HAVE_ARCH_CMPXCHG 1 ./include/asm-i386/system.h:#define __HAVE_ARCH_CMPXCHG 1 ./include/asm-ia64/system.h:#define __HAVE_ARCH_CMPXCHG 1 ./include/asm-ppc/system.h:#define __HAVE_ARCH_CMPXCHG 1 ./include/asm-sparc64/system.h:#define __HAVE_ARCH_CMPXCHG 1 I've come to the conclusion xchg/cmpxchg definitions do NOT belong in system.h. AFAICT, all the above use Load Linked semantics (or in the i386 case, operation is atomic). In other words, xchg/cmpxchg are atomic operations. Shouldn't xchg/cmpxchg definitions live with other atomic operations - asm/atomic.h? On Sat, 30 Dec 2000 16:46:57 + (GMT), Alan Cox replied: | Seems a reasonable thing to try and move to atomic.h yes Fundemental problem is parisc only supports one atomic operation (LDCW/LDCD) and uses spinlocks for all atomic operations including xchg/cmpxchg. Issue is dependencies between system.h, atomic.h and spinlock.h are *really* ugly and prevented parisc port from inlining xchg/cmpxchg definitions. This is a first step in fixing that problem. I've already made this change to the parisc-linux source tree for parisc and parisc64 builds. Below is the i386 patch for linux-2.4.0-prerelease. This is a simple cut/paste. thanks, grant diff -ruNp linux/include/asm-i386/atomic.h linux.patch/include/asm-i386/atomic.h --- linux/include/asm-i386/atomic.h Sun Dec 31 11:10:16 2000 +++ linux.patch/include/asm-i386/atomic.h Mon Jan 1 23:28:08 2001 @@ -2,6 +2,7 @@ #define __ARCH_I386_ATOMIC__ #include linux/config.h +#include linux/bitops.h /* for LOCK_PREFIX */ /* * Atomic operations that C can't guarantee us. Useful for @@ -111,4 +112,136 @@ __asm__ __volatile__(LOCK "andl %0,%1" \ __asm__ __volatile__(LOCK "orl %0,%1" \ : : "r" (mask),"m" (*addr) : "memory") + +/* xchg/cmpxchg moved from asm/system.h */ +#define xchg(ptr,v) ((__typeof__(*(ptr)))__xchg((unsigned +long)(v),(ptr),sizeof(*(ptr + +#define tas(ptr) (xchg((ptr),1)) + +struct __xchg_dummy { unsigned long a[100]; }; +#define __xg(x) ((struct __xchg_dummy *)(x)) + + +/* + * The semantics of XCHGCMP8B are a bit strange, this is why + * there is a loop and the loading of %%eax and %%edx has to + * be inside. This inlines well in most cases, the cached + * cost is around ~38 cycles. (in the future we might want + * to do an SIMD/3DNOW!/MMX/FPU 64-bit store here, but that + * might have an implicit FPU-save as a cost, so it's not + * clear which path to go.) + */ +extern inline void __set_64bit (unsigned long long * ptr, + unsigned int low, unsigned int high) +{ +__asm__ __volatile__ ( + "1: movl (%0), %%eax; + movl 4(%0), %%edx; + cmpxchg8b (%0); + jnz 1b" + :: "D"(ptr), + "b"(low), + "c"(high) + : + "ax","dx","memory"); +} + +extern void inline __set_64bit_constant (unsigned long long *ptr, +unsigned long long value) +{ + __set_64bit(ptr,(unsigned int)(value), (unsigned int)((value)32ULL)); +} +#define ll_low(x) *(((unsigned int*)(x))+0) +#define ll_high(x) *(((unsigned int*)(x))+1) + +extern void inline __set_64bit_var (unsigned long long *ptr, +unsigned long long value) +{ + __set_64bit(ptr,ll_low(value), ll_high(value)); +} + +#define set_64bit(ptr,value) \ +(__builtin_constant_p(value) ? \ + __set_64bit_constant(ptr, value) : \ + __set_64bit_var(ptr, value) ) + +#define _set_64bit(ptr,value) \ +(__builtin_constant_p(value) ? \ + __set_64bit(ptr, (unsigned int)(value), (unsigned int)((value)32ULL) ) : \ + __set_64bit(ptr, ll_low(value), ll_high(value)) ) + +/* + * Note: no "lock" prefix even on SMP: xchg always implies lock anyway + * Note 2: xchg has side effect, so that attribute volatile is necessary, + * but generally the primitive is invalid, *ptr is output argument. --ANK + */ +static inline unsigned long __xchg(unsigned long x, volatile void * ptr, int size) +{ + switch (size) { + case 1: + __asm__ __volatile__("xchgb %b0,%1" + :"=q" (x) + :"m" (*__xg(ptr)), "0" (x) + :"memory"); + break; + case 2: + __asm__ __volatile__("xchgw %w0,%1" + :"=r" (x) + :"m" (*__xg(ptr)), "0" (x) + :"memory"); + break; + case 4: + __asm__ __volatile__("xchgl %0,%1" + :"=r" (x) + :"m" (*__xg(ptr)), "0" (x) + :"memory"); +