Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
Russell King <[EMAIL PROTECTED]> wrote: > Yes you can. Well, you can on ARM at least. Between the load exclusive > you can do anything you like until you hit the store exclusive. How come atomic_set() on arm6 is implemented as: static inline void atomic_set(atomic_t *v, int i) { unsigned long tmp; __asm__ __volatile__("@ atomic_set\n" "1: ldrex %0, [%1]\n" " strex %0, %2, [%1]\n" " teq %0, #0\n" " bne 1b" : "=&r" (tmp) : "r" (&v->counter), "r" (i) : "cc"); } Why LDREX/STREX and not direct assignment? David - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
Russell King wrote: On Fri, Dec 08, 2006 at 12:18:52PM +1100, Nick Piggin wrote: Russell King wrote: On Thu, Dec 07, 2006 at 08:31:08PM +1100, Nick Piggin wrote: Implementing ll/sc based accessor macros allows both ll/sc _and_ cmpxchg architectures to produce optimal code. Implementing an cmpxchg based accessor macro allows cmpxchg architectures to produce optimal code and ll/sc non-optimal code. See my point? Wrong. Your ll/sc implementation with cmpxchg is buggy. The cmpxchg load_locked is not locked at all, Intentional - cmpxchg architectures don't generally have a load locked. Exactly, so it is wrong -- you can't implement that behaviour with load + cmpxchg. I disagree. I _have_ implemented the required behaviour. I really don't understand your point saying that it is wrong. and there can be interleaving writes between the load and cmpxchg which do not cause the store_conditional to fail. In which case the cmpxchg fails and we do the atomic operation again, in exactly the same way that we do the operation again if the 'sc' fails in the ll/sc case. Not if cmpxchg sees the same value, it won't fail, regardless of how many writes have hit that memory address. Don't see anything wrong with that. If that was a problem, atomic implementations using cmpxchg on x86 would be impossible. I think you're trying to implement ll/sc semantics on CPUs without ll/sc which is exactly not what I'm trying to do. I'd argue that's impossible. Yes, I did think that from reading your emails. It is not a problem as such, but it is important to be clear on semantics. I'm trying to suggest a better implementation for atomic ops rather than just bowing to this x86-centric "cmpxchg is the best, everyone must implement it" mentality. Even if ARM is able to handle any arbitrary C code between the "load locked" and store conditional API, other architectures can not by definition. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Fri, 8 Dec 2006, Russell King wrote: >> > The only instructions which affect the exclusive access state are the > exclusive access instructions themselves. Not according to the docs I found. The ARM1136 manual explicitly states that any attempt to modify that address clears the tag (for shared memory regions, by _any_ CPU, and for nonshared regions by _that_ CPU). And btw, that _has_ to be true, because otherwise the whole ldrex/strex sequence would be totally unusable as a way to do atomic bit operations on UP in the presense of interrupts (well, you could have a clrex instruction in the interrupt handler, but as far as I know you don't, so that seems to be a moot point - you only seem to do it in __switch_to). In other words, I _really_ think you're wrong. So the very code sequence you quote MUST NOT WORK the way you claim it does. And not only that, since the granularity of the mark is not just for the bytes in question, but potentially apparently up to 128 bytes, any store even _close_ to the memory you had xclusive access to will break the exclusive access. Really, Russell. Your stance makes no sense. It doesn't make any sense from a microarchitectural standpoint (it's not how you'd normally implement these things), but it ALSO makes no sense from the way you already use those instructions (as a way to protect against other processors - including your own - touching that word). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Fri, 8 Dec 2006, Russell King wrote: > > No such restriction on ARM. > > Also not true. The architectural implementation is: I checked the ARM manuals, and quite fankly, they don't back you up. They do not claim that the physical address tag is byte-granular, and in fact they make it pretty clear that the same tag is used for all the sizes, which implies that the tag granularity is NOT byte granular, but likely something else. Both the manuals I checked also say: "Other events might cause the tag to be cleared", without going into particular details other than saying that a region that is marked non-shared might still clear the tag on access by other CPU's - but they leave it open whether that's by design or not. In other words, if there actually is an architectural guarantee that ldrex/strex are really as strong as you imply, it's not in the standard architecture manuals from ARM at least for the ARM11562 or the ARM1136. So I suspect you're wrong, and that the ldrex/strex tags actually are not all that different from other archtiectures which tend to have cacheline granularities or more (I _think_ the original alpha granularity was the whole address space, and any cache traffic would clear it. That's _really_ pathetically weak, but hey, I might remember wrong, and it was the very first implementation. I doubt ARM is _that_ weak, but I doubt it's as strong as you claim). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Fri, Dec 08, 2006 at 11:35:39AM -0800, Linus Torvalds wrote: > > > On Fri, 8 Dec 2006, Russell King wrote: > > > > Yes you can. Well, you can on ARM at least. Between the load exclusive > > you can do anything you like until you hit the store exclusive. If you > > haven't touched the location (with anything other than another load > > exclusive) and no other CPU has issued a load exclusive, your store > > exclusive succeeds. > > Is that actually true? > > Almost all LL/SC implementations have granularity rules, where "touch the > location" is not a byte-granular thing, but actually ends up being > something like "touch the same cachline". ARM's implementation works by using some CPU core peripheral hardware to hold two pieces of information: 1. the physical address being accessed by the load exclusive instruction. 2. the cpu number. When an exclusive load is performed, this hardware "remembers" bits [31:N] (7 >= N >= 2 *) of the physical address and the CPU number. When an exclusive store is performed, the address and CPU number are compared to the stored versions, and the store is only allowed to succeed if they match. Hence, it is independent of the actual cache state, and is relatively cheap to implement. The only instructions which affect the exclusive access state are the exclusive access instructions themselves. Hence, to the same memory location with the following sequence: load exclusive load normal modify store normal store exclusive the store exclusive _will_ succeed even though you've modified the value in that memory location via standard loads/stores. (* - while it is true that the physical address is stored, implementations may themselves choose not to store between the lower 7 to 2 bits inclusive - which incidentally means putting two atomic_t's next together is probably a Bad Thing(tm)... and rather sucks. However, non-exclusive accesses within that address region do not affect the exclusive access state.) -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Fri, Dec 08, 2006 at 11:37:45AM -0800, Linus Torvalds wrote: > > > On Fri, 8 Dec 2006, Russell King wrote: > > > > I utterly disagree. I could code atomic_add() as: > > Sure. And Alpha could do that too. If you write the C code a specific way, > you can make it work. That does NOT mean that you can expose it widely as > a portable interface - it's still just a very _nonportable_ interface that > you use internally within one architecture to implement other interfaces. However, nothing stops you wrapping the non-portable nature of ll/sc up into the store part though. If you can efficiently implement cmpxchg inside an ll/sc based portable interface (yes you can) and you can implement problematical ll/sc structures inside a cmpxchg() interface, you can do it either way around. Only one way doesn't penalise broken ll/sc based implementations though. That is the essence of my argument. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Fri, 8 Dec 2006, Russell King wrote: > > Yes you can. Well, you can on ARM at least. Between the load exclusive > you can do anything you like until you hit the store exclusive. If you > haven't touched the location (with anything other than another load > exclusive) and no other CPU has issued a load exclusive, your store > exclusive succeeds. Is that actually true? Almost all LL/SC implementations have granularity rules, where "touch the location" is not a byte-granular thing, but actually ends up being something like "touch the same cachline". They also often have _other_ rules like: "the cacheline has to stay in the L1 in exclusive state" etc. Which means that in a direct-mapped L1 cache, you can't even load anything that might be in the same way, because it would cause a cache eviction that invalidates the SC. It's possible that ARM has really strong LL/SC, but quite frankly, that sounds unlikely. I've never heard of anybody ever _architecturally_ saying that they support that strong requirements, even if certain micro- architectures might actually support stronger semantics than the ones guaranteed by the architectural rules. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Fri, 8 Dec 2006, Russell King wrote: > > I utterly disagree. I could code atomic_add() as: Sure. And Alpha could do that too. If you write the C code a specific way, you can make it work. That does NOT mean that you can expose it widely as a portable interface - it's still just a very _nonportable_ interface that you use internally within one architecture to implement other interfaces. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Fri, Dec 08, 2006 at 11:15:58AM -0800, Linus Torvalds wrote: > On Fri, 8 Dec 2006, Christoph Lameter wrote: > > > > As also shown in this thread: There are restrictions on what you can do > > between ll/sc > > This, btw, is almost certainly true on ARM too. > > There are three major reasons for restrictions on ll/sc: > > - bus-cycle induced things (eg variations of "you cannot do a store in >between the ll and the sc, because it will touch the cache and clear >the bit", where "the store" might be a load too, and "the cache" might >be just "the bus interface") No such restriction on ARM. > - trap handling usually clears the internal lock bit too, which means >that depending on the micro-architecture, even internal microtraps >(like even just branch misprediction, but more commonly things like TLB >misses etc) can cause a sc to always fail. Also not true. The architectural implementation is: ldrex: tags the _physical_ address + cpu number, transitions to exclusive access. strex: if in exclusive access state and matches the previous tagged physical address + cpu number pair, store succeeds. This is typically implemented in hardware, and in the case of a SMP system, external to the CPU cores themselves. So all it's doing is looking at the exclusive accesses. It is not embedded into the CPU core so that micro-architectural stuff affects it. > - timing. Livelock in particular. That is a problem that we're facing, and the solution is rather simple. You need to introduce a CPU-number specific number of cycles before retrying the operation on failure. > All of which means that _nobody_ can really do this reliably in C. I utterly disagree. I could code atomic_add() as: extern void cpu_specific_delay(void); static inline int atomic_add_return(int i, atomic_t *v) { do { asm("ldrex %0, [%1]" : "=r" (val) : "r" (v)); val += i; asm("strex %0, %1, [%2]" : "=r" (res) : "r" (val), "r" (v)); if (res) cpu_specific_delay(); } while (res); return val; } and actually we /are/ going to have to go down this path to break the livelock problem, like it or not. Ditto for the ARM bitops operations, so we might as well have ARM at least implement a generic ll/sc thing. Coding the cpu specific delays into every strex site is going to make them far too heavy to put inline. > So right now, I think the "cmpxchg" or the "bitmask set" approach are the > alternatives. Russell - LL/SC simply isn't on the table as an interface, > whether you like it or not. Buggerit then. cmpxchg sucks as an interface, and I am going to continue to assert that. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Fri, 8 Dec 2006, Christoph Lameter wrote: > > As also shown in this thread: There are restrictions on what you can do > between ll/sc This, btw, is almost certainly true on ARM too. There are three major reasons for restrictions on ll/sc: - bus-cycle induced things (eg variations of "you cannot do a store in between the ll and the sc, because it will touch the cache and clear the bit", where "the store" might be a load too, and "the cache" might be just "the bus interface") - trap handling usually clears the internal lock bit too, which means that depending on the micro-architecture, even internal microtraps (like even just branch misprediction, but more commonly things like TLB misses etc) can cause a sc to always fail. - timing. Livelock in particular. The last one is the one that hits everybody, regardless of microarchitecture. The rule may be that the LL/SC need to be within a certain number of cycles (which can be very small - like ten) in order to guarantee that the cacheline can't be stolen. All of which means that _nobody_ can really do this reliably in C. Even if there are no other microarchitectural rules (and it sounds like that might be true on ARM), the timing issue means that you can _still_ only use it for very specific and simple sequences, and trying to expose it as a higher-level thing is not going to work in general for anything even remotely complicated. (The timing may also mean that you end up having to do random back-off etc, just to make sure _somebody_ makes progress. Ie it might not be a matter of "within ten cycles", but "you need to randomize the timing"). In other words, it's simply not an option to expose LL/SC as an interface. It would be VERY convenient to do, since cmpxchg can emulate ll/sc (the "ll" part is a normal load, the "sc" part is a "compare that the old value still matches, and store the new one if so"). But because you can't expose LL/SC anyway in any reasonably portable way, that just doesn't work. So, you really do end up with three possibilities: - do things with TRULY PORTABLE interfaces. And like it or not, cmpxchg is the closest thing you can get to that. It's trivial to do cmpxchg using ll/sc (modulo the "random backoff part" if you need it, which is still pretty simple, but no longer totally trivial), and architectures that have neither ll/sc _nor_ a native cmpxchg can just go screw themselves with spinlocks - they really aren't worth worrying about in SMP. At some point you have to tell hardware designers that their hardware just sucks. - have ugly conditional code in generic code. I personally think this is a _much_ worse option in most cases. - have a much higher-level interface and make it _all_ architecture- dependent (possibly with a "generic" version for sane architectures). This works, but the more high-level it is, the more you end up having the same thign written in many different ways, and nasty maintenance. So we generally set the bar pretty low. Things like semaphore locking primitives are high-level enough already that we prefer to try to make them use common lower-level interfaces (spinlocks, cmpxchg etc). Something like kernel/workqueue.c is _way_ too high a level to do arch-specific. So right now, I think the "cmpxchg" or the "bitmask set" approach are the alternatives. Russell - LL/SC simply isn't on the table as an interface, whether you like it or not. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Fri, Dec 08, 2006 at 10:46:17AM -0800, Linus Torvalds wrote: > > > On Fri, 8 Dec 2006, David Howells wrote: > > > > In fact I think more things have LL/SC than have CMPXCHG. > > But you cannot expose ll/sc to C, so that's a bogus argument. Yes you can. Well, you can on ARM at least. Between the load exclusive you can do anything you like until you hit the store exclusive. If you haven't touched the location (with anything other than another load exclusive) and no other CPU has issued a load exclusive, your store exclusive succeeds. Whether you have branches, loads or stores to other locations, etc. None of that matters. Not so bogus as you make out. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Fri, 8 Dec 2006, David Howells wrote: > > In fact I think more things have LL/SC than have CMPXCHG. But you cannot expose ll/sc to C, so that's a bogus argument. If you do ll/sc, you need to program in assembly language. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Fri, 8 Dec 2006, Russell King wrote: > As proven previously the reverse is also true. And as shown previously > the cheaper out of the two for all platforms is the LL/SC based > implementation, where the architecture specific implementation can > be _either_ LL/SC based or cmpxchg based depending on what is > supported in their hardware. As also shown in this thread: There are restrictions on what you can do between ll/sc. You would not want to use C code there. ll/sc is an thing that needs to be restricted to asm code. So this is not a viable proposal at all. ll/sc is useful to construct various atomic functions but cannot be directly used in C code. cmpxchg can be effectively realized using ll/sc. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Fri, Dec 08, 2006 at 09:06:00AM -0800, Christoph Lameter wrote: > On Fri, 8 Dec 2006, David Howells wrote: > > > > It is the most universal atomic instruction that I know of. > > > > I think TAS-type things and XCHG-type things are more common. > > Huh? The most popular architectures are i386 x86_64 sparc ia64 etc which > all have one or the other form of cmpxchg (some issues with early sparc > and i386). According to the latest figures, 621 million ARM processors were shipped in Q3, which equates to about 78 processors per second. (taken from http://www.arm.com/news/15300.html). I'd like to know the figures for these other so-called "popular architectures". But in any case that's an utterly irrelevant point to the argument at hand. > > In fact I think more things have LL/SC than have CMPXCHG. > > LL/SC can be easily used to come up with a cmpxchg equivalent. As proven previously the reverse is also true. And as shown previously the cheaper out of the two for all platforms is the LL/SC based implementation, where the architecture specific implementation can be _either_ LL/SC based or cmpxchg based depending on what is supported in their hardware. I'm going to keep saying this until people get it. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Fri, 8 Dec 2006, David Howells wrote: > > It is the most universal atomic instruction that I know of. > > I think TAS-type things and XCHG-type things are more common. Huh? The most popular architectures are i386 x86_64 sparc ia64 etc which all have one or the other form of cmpxchg (some issues with early sparc and i386). And yes the xchg was the first multiprocessor instruction and therefore is also available on very old processors. > In fact I think more things have LL/SC than have CMPXCHG. LL/SC can be easily used to come up with a cmpxchg equivalent. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Fri, Dec 08, 2006 at 08:53:22AM -0800, Christoph Lameter wrote: > On Fri, 8 Dec 2006, Russell King wrote: > > > > Not having cmpxchg is even worse because it requires the introduction and > > > maintenance of large sets of arch specific operations. Much more complex. > > > > And which bit of "not available on many architectures" have you not grasped > > yet? > > We discussed various forms of emulating that functionality on this thread. > Seems to work satisfactorily. You can discover the information you skipped > by going back to some earlier messages of this thread. Oh for god sake. I've put forward a coherent argument. I've disproved every point put forward by people for my approach. ARM continues to be broken because people like you are stuck with doing things only one way. Grow up and open your mind to other possibilities. (It should be noted that LDREX/STREX is broken on some ARM implementations at the moment - they will cause a livelock due to silicon bugs if both CPUs have exactly the same number of cycles between LDREX and STREX. This makes loops-within-loops implementations using cmpxchg _extremely_ expensive on ARM.) -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
Christoph Lameter <[EMAIL PROTECTED]> wrote: > cmpxchg is the simplest solution to realize many other atomic operations and > its widely available on a wide variety of platforms. But by no means all. Where it doesn't exist it can only be properly emulated by something such as LL/SC if they are there. If not, then you can't do a safe cmpxchg on SMP. > It is the most universal atomic instruction that I know of. I think TAS-type things and XCHG-type things are more common. In fact I think more things have LL/SC than have CMPXCHG. David - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Fri, 8 Dec 2006, Russell King wrote: > > Not having cmpxchg is even worse because it requires the introduction and > > maintenance of large sets of arch specific operations. Much more complex. > > And which bit of "not available on many architectures" have you not grasped > yet? We discussed various forms of emulating that functionality on this thread. Seems to work satisfactorily. You can discover the information you skipped by going back to some earlier messages of this thread. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Fri, Dec 08, 2006 at 08:43:09AM -0800, Christoph Lameter wrote: > On Fri, 8 Dec 2006, Russell King wrote: > > > You're advocating cmpxchg is adopted by all architectures. It isn't > > available on many architectures, and those which it can be requires > > unnecessarily complicated coding. > > Not having cmpxchg is even worse because it requires the introduction and > maintenance of large sets of arch specific operations. Much more complex. And which bit of "not available on many architectures" have you not grasped yet? -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Fri, 8 Dec 2006, Russell King wrote: > You're advocating cmpxchg is adopted by all architectures. It isn't > available on many architectures, and those which it can be requires > unnecessarily complicated coding. Not having cmpxchg is even worse because it requires the introduction and maintenance of large sets of arch specific operations. Much more complex. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Fri, Dec 08, 2006 at 08:06:23AM -0800, Christoph Lameter wrote: > On Fri, 8 Dec 2006, Russell King wrote: > > > I'm trying to suggest a better implementation for atomic ops rather > > than just bowing to this x86-centric "cmpxchg is the best, everyone > > must implement it" mentality. > > cmpxchg is the simplest solution to realize many other atomic operations > and its widely available on a wide variety of platforms. It is the most > universal atomic instruction that I know of. Other atomic operations may > be more efficient but certainly cmpxchg is the most universal. > > Having multiple instructions with restrictions of what can be done in > between just complicates the use and seems to be arch specific. I have not > seen a better solution. Are you really advocating the weirdly complex > ll/sc be adopted by other architectures? You're advocating cmpxchg is adopted by all architectures. It isn't available on many architectures, and those which it can be requires unnecessarily complicated coding. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Fri, 8 Dec 2006, Russell King wrote: > I'm trying to suggest a better implementation for atomic ops rather > than just bowing to this x86-centric "cmpxchg is the best, everyone > must implement it" mentality. cmpxchg is the simplest solution to realize many other atomic operations and its widely available on a wide variety of platforms. It is the most universal atomic instruction that I know of. Other atomic operations may be more efficient but certainly cmpxchg is the most universal. Having multiple instructions with restrictions of what can be done in between just complicates the use and seems to be arch specific. I have not seen a better solution. Are you really advocating the weirdly complex ll/sc be adopted by other architectures? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Thu, Dec 07, 2006 at 03:06:06PM +, Russell King wrote: > On Wed, Dec 06, 2006 at 11:05:22AM -0800, Linus Torvalds wrote: > > > > > > On Wed, 6 Dec 2006, Christoph Lameter wrote: > > > > > > I'd really appreciate a cmpxchg that is generically available for > > > all arches. It will allow lockless implementation for various performance > > > criticial portions of the kernel. > > > > I suspect ARM may have been the last one without one, no? > > > > That said, cmpxchg won't necessarily be "high-performance" unless the hw > > supports it naturally in hardware, so.. > > > > Russell, are you ok with the code DavidH posted (the "try 2" one)? I'd > > like to get an ack from the ARM maintainer before applying it, but it > > looked ok. > > Things seem to have moved on since this request. Do I need to do > anything? I dunno. ARM's still broken. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Fri, Dec 08, 2006 at 12:18:52PM +1100, Nick Piggin wrote: > Russell King wrote: > >On Thu, Dec 07, 2006 at 08:31:08PM +1100, Nick Piggin wrote: > > >>>Implementing ll/sc based accessor macros allows both ll/sc _and_ cmpxchg > >>>architectures to produce optimal code. > >>> > >>>Implementing an cmpxchg based accessor macro allows cmpxchg architectures > >>>to produce optimal code and ll/sc non-optimal code. > >>> > >>>See my point? > >> > >>Wrong. Your ll/sc implementation with cmpxchg is buggy. The cmpxchg > >>load_locked is not locked at all, > > > > > >Intentional - cmpxchg architectures don't generally have a load locked. > > Exactly, so it is wrong -- you can't implement that behaviour with > load + cmpxchg. I disagree. I _have_ implemented the required behaviour. I really don't understand your point saying that it is wrong. > >>and there can be interleaving writes > >>between the load and cmpxchg which do not cause the store_conditional > >>to fail. > > > > > >In which case the cmpxchg fails and we do the atomic operation again, > >in exactly the same way that we do the operation again if the 'sc' > >fails in the ll/sc case. > > Not if cmpxchg sees the same value, it won't fail, regardless of how > many writes have hit that memory address. Don't see anything wrong with that. If that was a problem, atomic implementations using cmpxchg on x86 would be impossible. I think you're trying to implement ll/sc semantics on CPUs without ll/sc which is exactly not what I'm trying to do. I'd argue that's impossible. I'm trying to suggest a better implementation for atomic ops rather than just bowing to this x86-centric "cmpxchg is the best, everyone must implement it" mentality. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
Russell King wrote: On Thu, Dec 07, 2006 at 08:31:08PM +1100, Nick Piggin wrote: Implementing ll/sc based accessor macros allows both ll/sc _and_ cmpxchg architectures to produce optimal code. Implementing an cmpxchg based accessor macro allows cmpxchg architectures to produce optimal code and ll/sc non-optimal code. See my point? Wrong. Your ll/sc implementation with cmpxchg is buggy. The cmpxchg load_locked is not locked at all, Intentional - cmpxchg architectures don't generally have a load locked. Exactly, so it is wrong -- you can't implement that behaviour with load + cmpxchg. and there can be interleaving writes between the load and cmpxchg which do not cause the store_conditional to fail. In which case the cmpxchg fails and we do the atomic operation again, in exactly the same way that we do the operation again if the 'sc' fails in the ll/sc case. Not if cmpxchg sees the same value, it won't fail, regardless of how many writes have hit that memory address. I do not see any problem. This was not the big problem -- as I said, if this was the only problem we could opt for a "watered down" version that doesn't actually load locked [the ll/sc interface would be much cooler than cmpxchg, IMO :)] The main problem is the restrictions between the ll and sc. This is why I implemented atomic_cmpxchg rather than atomic_ll/sc. However I agree that in critical code, a higher level API should be used instead (eg. see atomic_add_unless, which can be implemented optimally on RISCs). Nick -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Thu, Dec 07, 2006 at 08:31:08PM +1100, Nick Piggin wrote: > Wrong. Your ll/sc implementation with cmpxchg is buggy. The cmpxchg > load_locked is not locked at all, and there can be interleaving writes > between the load and cmpxchg which do not cause the store_conditional > to fail. > > It might be reasonable to implement this watered down version, but: > don't some architectures have restrictions on what instructions can > be issued between the ll and the sc? On MIPS the restriction is no loads or stores or sync instructions between ll/sc. Also there may be at most 2048 bytes between the address of the ll and sc instructions. Which means ll/sc sequences should better be written in assembler or gcc might do a bit too creative things ... Ralf - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Thu, 7 Dec 2006, Nick Piggin wrote: > > It might be reasonable to implement this watered down version, but: > don't some architectures have restrictions on what instructions can > be issued between the ll and the sc? Yes. You really probably do not want to expose ll/sc on a C level because of this. On alpha, the architecture manual says (I didn't go back and check, but I'm pretty sure) that a ld.l and st.c cannot have a taken branch in between then, for example. That basically means that you can't allow the compiler to reorder the basic blocks (which it often will with a while-loop). Now, I actually suspect that this was not a microarchitectural flaw, and that a branch would _work_ there, and that the architecture manual was just being anal, but strictly speaking, it means that these things had better always be in assembly, and you can sadly not expose them (on alpha, at least) as higher-level primitives as such - you can only expose the end result ("cmpxchg" or similar). Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, Dec 06, 2006 at 11:05:22AM -0800, Linus Torvalds wrote: > > > On Wed, 6 Dec 2006, Christoph Lameter wrote: > > > > I'd really appreciate a cmpxchg that is generically available for > > all arches. It will allow lockless implementation for various performance > > criticial portions of the kernel. > > I suspect ARM may have been the last one without one, no? > > That said, cmpxchg won't necessarily be "high-performance" unless the hw > supports it naturally in hardware, so.. > > Russell, are you ok with the code DavidH posted (the "try 2" one)? I'd > like to get an ack from the ARM maintainer before applying it, but it > looked ok. Things seem to have moved on since this request. Do I need to do anything? I dunno. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Thu, Dec 07, 2006 at 08:31:08PM +1100, Nick Piggin wrote: > Russell King wrote: > >On Wed, Dec 06, 2006 at 11:16:55AM -0800, Christoph Lameter wrote: > > >No. If you read what I said, you'll see that you can _cheaply_ use > >cmpxchg in a ll/sc based implementation. Take an atomic increment > >operation. > > > > do { > > old = load_locked(addr); > > } while (store_exclusive(old, old + 1, addr); > > > >On a cmpxchg, that "store_exclusive" (loosely) becomes your cmpxchg > >instruction, comparing the first arg, and if equal storing the second. > >The "load_locked" macro becomes a standard pointer deref. Ergo, x86 > >becomes: > > > > do { > > load value > > manipulate it > > conditional store > > } while not stored > > > >On ll/sc, the load_locked() macro is the load locked instruction. The > >store_exclusive() macro is the exclusive store and it doesn't need to > >use the first parameter at all. Ergo, ARM becomes: > > > > do { > > ldrex r1, [r2] > > manipulate r1 > > strex r0, r1, [r2] > > } while failed > > > >Notice that both are optimal. > > > >Now let's consider the cmpxchg case. > > > > do { > > val = *addr; > > } while (cmpxchg(val, val + 1, addr); > > > >The x86 case is _identical_ to the ll/sc based implementation. Absolutely > >entirely. No impact what so ever. > > > >Let's look at the ll/sc case. The cmpxchg code implemented on this has > >to reload the original value, compare it, if equal store the new value. > >So: > > > > do { > > val = *addr; > > (r2 = addr, > > ldrex r1, [r2] > > compare r1, r0 > > strexeq r4, r3, [r2] (store exclusive if equal) > > } while store failed or comparecondition failed > > > >Note how the cmpxchg has _forced_ the ll/sc implementation to become > >more complex. > > > >So, let's recap. > > > >Implementing ll/sc based accessor macros allows both ll/sc _and_ cmpxchg > >architectures to produce optimal code. > > > >Implementing an cmpxchg based accessor macro allows cmpxchg architectures > >to produce optimal code and ll/sc non-optimal code. > > > >See my point? > > Wrong. Your ll/sc implementation with cmpxchg is buggy. The cmpxchg > load_locked is not locked at all, Intentional - cmpxchg architectures don't generally have a load locked. > and there can be interleaving writes > between the load and cmpxchg which do not cause the store_conditional > to fail. In which case the cmpxchg fails and we do the atomic operation again, in exactly the same way that we do the operation again if the 'sc' fails in the ll/sc case. I do not see any problem. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Thu, Dec 07, 2006 at 08:31:08PM +1100, Nick Piggin wrote: > It might be reasonable to implement this watered down version, but: > don't some architectures have restrictions on what instructions can > be issued between the ll and the sc? Yes. On Alpha you cannot execute other load/stores, taken branches etc. So in general case the code between ll and sc cannot be written in C... Ivan. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
Russell King wrote: On Wed, Dec 06, 2006 at 11:16:55AM -0800, Christoph Lameter wrote: No. If you read what I said, you'll see that you can _cheaply_ use cmpxchg in a ll/sc based implementation. Take an atomic increment operation. do { old = load_locked(addr); } while (store_exclusive(old, old + 1, addr); On a cmpxchg, that "store_exclusive" (loosely) becomes your cmpxchg instruction, comparing the first arg, and if equal storing the second. The "load_locked" macro becomes a standard pointer deref. Ergo, x86 becomes: do { load value manipulate it conditional store } while not stored On ll/sc, the load_locked() macro is the load locked instruction. The store_exclusive() macro is the exclusive store and it doesn't need to use the first parameter at all. Ergo, ARM becomes: do { ldrex r1, [r2] manipulate r1 strex r0, r1, [r2] } while failed Notice that both are optimal. Now let's consider the cmpxchg case. do { val = *addr; } while (cmpxchg(val, val + 1, addr); The x86 case is _identical_ to the ll/sc based implementation. Absolutely entirely. No impact what so ever. Let's look at the ll/sc case. The cmpxchg code implemented on this has to reload the original value, compare it, if equal store the new value. So: do { val = *addr; (r2 = addr, ldrex r1, [r2] compare r1, r0 strexeq r4, r3, [r2] (store exclusive if equal) } while store failed or comparecondition failed Note how the cmpxchg has _forced_ the ll/sc implementation to become more complex. So, let's recap. Implementing ll/sc based accessor macros allows both ll/sc _and_ cmpxchg architectures to produce optimal code. Implementing an cmpxchg based accessor macro allows cmpxchg architectures to produce optimal code and ll/sc non-optimal code. See my point? Wrong. Your ll/sc implementation with cmpxchg is buggy. The cmpxchg load_locked is not locked at all, and there can be interleaving writes between the load and cmpxchg which do not cause the store_conditional to fail. It might be reasonable to implement this watered down version, but: don't some architectures have restrictions on what instructions can be issued between the ll and the sc? But in general I agree with you, in that a higher level primitive is preferable (eg. atomic_add_unless). -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
Linus Torvalds wrote: On Thu, 7 Dec 2006, Roman Zippel wrote: On Wed, 6 Dec 2006, Matthew Wilcox wrote: To be honest, it'd be much easier if we only defined these operations on atomic_t's. We have all the infrastructure in place for them, and they're fairly well understood. If you need different sizes, I'm OK with an atomic_pointer_t, or whatever. FWIW Seconded. I disagree. Any _real_ CPU will simply never care about _anything_ else than just the size of the datum in question. There's absolutely no point to only allow it on certain types, especially as we _know_ those certain types are already going to be more than one, and we also know that they are going to be different sizes. In other words, in reality, we have to handle a sizeable subset of the whole generic situation, and the "on certain types only" situation is only going to be awkward and irritating. For example, would we have a different "cmpxchg_ptr()" function for the atomic pointer thing? With any reasonable CPU just depending on the _size_ of the type, I don't see what the problem is with just doing the bog-standard "cmpxchg_8/16/32/64" and having the bog-standard case- statement in a header file to do it all automatically for you, and then we don't need to worry about it. What's wrong with using atomic_cmpxchg? For unsigned long / pointers, there is a patch to implement atomic_long_cmpxchg. Some architectures simply can't implement cmpxchg on memory that may be modified in arbitrary ways. If you add some more conditions to use cmpxchg, then you weaken it (ie. can't use it for synchronisation with userspace) and on top of that you don't get the easy static checking that atomic_t gives you. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
Matthew Wilcox <[EMAIL PROTECTED]> writes: > On Wed, Dec 06, 2006 at 05:36:29PM -0800, Linus Torvalds wrote: >> Or are you saying that gcc aligns normal 32-bit entities at >> 16-bit alignment? Neither of those sound very likely. > > alignof(u32) is 2 on m68k. Crazy, huh? The original 68000 had a 16-bit bus (but 32-bit registers), which is probably why it's that way. -Doug - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
Hi, On Wed, 6 Dec 2006, Linus Torvalds wrote: > > m68060 produces a trap for unaligned atomic access, unfortunately standard > > alignment is smaller than this. > > Umm. on 68060, since it's 32-bit, you'd only have the 32-bit case. Are you > saying that you can't do a 32-bit atomic access at any 32-bit aligned > boundary? Or are you saying that gcc aligns normal 32-bit entities at > 16-bit alignment? Neither of those sound very likely. The latter. And yes, I'm starting to hate it too, especially after I've seen all the weird bugs this causes in userspace, which then only trigger on m68k. I'm thinking of changing it when switching to proper TLS support, but it's not there yet. BTW there is another reason I actually like the atomic type - documentation. It makes it more clear that this intended to be used as atomic value, so an unhealthy mixture of different accesses is less likely and if they pile up at some place it's a good indicator to maybe switch back to spinlocks. bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, Dec 06, 2006 at 05:36:29PM -0800, Linus Torvalds wrote: > Or are you saying that gcc aligns normal 32-bit entities at > 16-bit alignment? Neither of those sound very likely. alignof(u32) is 2 on m68k. Crazy, huh? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Thu, 7 Dec 2006, Roman Zippel wrote: > > m68060 produces a trap for unaligned atomic access, unfortunately standard > alignment is smaller than this. Umm. on 68060, since it's 32-bit, you'd only have the 32-bit case. Are you saying that you can't do a 32-bit atomic access at any 32-bit aligned boundary? Or are you saying that gcc aligns normal 32-bit entities at 16-bit alignment? Neither of those sound very likely. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
Hi, On Wed, 6 Dec 2006, Linus Torvalds wrote: > > > Any _real_ CPU will simply never care about _anything_ else than just the > > > size of the datum in question. > > > > ..or alignment which a dedicated atomic type would allow to be attached. > > Can you give any example of a real CPU where alignment matters? > > Sure, it needs to be naturally aligned, but that's true of _any_ type in > the kernel. We don't do unaligneds without "get_unaligned()" and friends. m68060 produces a trap for unaligned atomic access, unfortunately standard alignment is smaller than this. bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Thu, 7 Dec 2006, Roman Zippel wrote: > > > > Any _real_ CPU will simply never care about _anything_ else than just the > > size of the datum in question. > > ..or alignment which a dedicated atomic type would allow to be attached. Can you give any example of a real CPU where alignment matters? Sure, it needs to be naturally aligned, but that's true of _any_ type in the kernel. We don't do unaligneds without "get_unaligned()" and friends. Btw, if you want to leave out 8-bit and 16-bit data, that's fine. So generally you'd only need to handle 32-bit (and 64-bit on a 64-bit architecture) accesses anyway. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
From: Al Viro <[EMAIL PROTECTED]> Date: Wed, 6 Dec 2006 19:08:28 + > On Wed, Dec 06, 2006 at 11:05:22AM -0800, Linus Torvalds wrote: > > > > > > On Wed, 6 Dec 2006, Christoph Lameter wrote: > > > > > > I'd really appreciate a cmpxchg that is generically available for > > > all arches. It will allow lockless implementation for various performance > > > criticial portions of the kernel. > > > > I suspect ARM may have been the last one without one, no? > > No. sparc32 doesn't have one, for instance. That's correct. It has an atomic swap, but not a cmpxchg. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
Hi, On Wed, 6 Dec 2006, Linus Torvalds wrote: > > > To be honest, it'd be much easier if we only defined these operations on > > > atomic_t's. We have all the infrastructure in place for them, and > > > they're fairly well understood. If you need different sizes, I'm OK > > > with an atomic_pointer_t, or whatever. > > > > FWIW Seconded. > > I disagree. > > Any _real_ CPU will simply never care about _anything_ else than just the > size of the datum in question. ..or alignment which a dedicated atomic type would allow to be attached. bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Thu, 7 Dec 2006, Roman Zippel wrote: > On Wed, 6 Dec 2006, Matthew Wilcox wrote: > > > To be honest, it'd be much easier if we only defined these operations on > > atomic_t's. We have all the infrastructure in place for them, and > > they're fairly well understood. If you need different sizes, I'm OK > > with an atomic_pointer_t, or whatever. > > FWIW Seconded. I disagree. Any _real_ CPU will simply never care about _anything_ else than just the size of the datum in question. There's absolutely no point to only allow it on certain types, especially as we _know_ those certain types are already going to be more than one, and we also know that they are going to be different sizes. In other words, in reality, we have to handle a sizeable subset of the whole generic situation, and the "on certain types only" situation is only going to be awkward and irritating. For example, would we have a different "cmpxchg_ptr()" function for the atomic pointer thing? With any reasonable CPU just depending on the _size_ of the type, I don't see what the problem is with just doing the bog-standard "cmpxchg_8/16/32/64" and having the bog-standard case- statement in a header file to do it all automatically for you, and then we don't need to worry about it. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, Dec 06, 2006 at 11:16:55AM -0800, Christoph Lameter wrote: > But then its also just requires disable/enable interrupts on UP which may > be cheaper than an atomic operation. > > > For CPUs with load locked + store conditional, it is expensive. > > Because it locks the bus? I am not that familiar with those architectures > but it seems that those will have a general problem anyways. On a decent implementation ll/sc will have the same cost as ordinary non-atomic load and store instructions. A likely uniprocessor implementation uses a single flip-flop ("llbit") in the CPU which is set by ll and cleared by any exception handler, especially interrupt. A later store conditional will then simply fail if that bit is cleared. That is extremly trivial stuff. On SMP it's somewhat more complex; A processor will have to remember the address used with ll and start snooping the bus for writes to it. The store conditional will then go and upgrade the cache line to exclusive state if the llbit is still set and perform the store. The llbit would be cleared if the processor has snooped any other write to the cacheline. Details are fun but that's bascially how it's implemented. Of course load linked / store conditional are typically used in loops so there is a little extra overhead from that especially where when the branch is misspredicted. Also note there is no locked cycle required to implement load linked / store conditional. Ralf - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
Hi, On Wed, 6 Dec 2006, Matthew Wilcox wrote: > To be honest, it'd be much easier if we only defined these operations on > atomic_t's. We have all the infrastructure in place for them, and > they're fairly well understood. If you need different sizes, I'm OK > with an atomic_pointer_t, or whatever. FWIW Seconded. bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, 6 Dec 2006, Christoph Lameter wrote: > > This means we tolerate the assignment race for SMP that was pointed out > earlier? The assignment "race" doesn't really exist in real code, I suspect. There's two cases of assignment: - pure initialization. This one isn't racy, since it's literally the initial setting of some data structure, and if it's visible in uninitialized state on other CPU's before being initialized, you have bigger problems than some silly assignment race ;) So the race doesn't exist for this class of assignment at all, and that's the _common_ case of a pure assignment (ie the "we start out with this field having value X") - mixing assignment on one CPU (and not using "atomic_set()") and using "cmpxchg()". This would be a problem on architectures that use an external spinlock or something, but I don't really see how it could be a valid code sequence ANYWAY. Any code that does this had better have some _higherlevel_ serialization around that data structure, since while it's not a _bug_ on architectures that do "cmpxchg()" in hardware (as far as the cmpxchg itself is concerned), the fact is, in the absense of any other locking, what the hell is such code supposed to _mean_? In other words - the second case would be a bug in that it bypasses the serialization of the spinlock, but why would you ever do that anyway? What could _possibly_ be a valid use of such a "set value blindly" kind of sequence? Since another CPU is clearly doing something to the value (ie the cmpxchg), the "set it to value X" is just not a well-defined operation in the first place, since you don't know what the end result is (will it be "X", or will the "cmpxchg" on the other CPU have set it to something else?). So I don't think the race really exists for any sane code anyway. Even if people don't use "atomic_set()". But hey, maybe I'm just ignoring some really odd usage case. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, 6 Dec 2006, Matthew Wilcox wrote: > To be honest, it'd be much easier if we only defined these operations on > atomic_t's. We have all the infrastructure in place for them, and > they're fairly well understood. If you need different sizes, I'm OK > with an atomic_pointer_t, or whatever. An pointer is probably one of the most important entities to use with cmpxchg (aside from the ints and longs that we already support). A pointer can be defined to point to any other type. So we would need atomic_pointer_t()? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, Dec 06, 2006 at 01:52:20PM -0800, Christoph Lameter wrote: > On Wed, 6 Dec 2006, Matthew Wilcox wrote: > > > And for those of us with only load-and-zero, that's simply: > > > > #define load_locked(addr) spin_lock(hash(addr)), *addr > > #define store_exclusive(addr, old, new) \ > > *addr = new, spin_unlock(hash(addr)), 0 > > > > which is also optimal for us. > > This means we tolerate the assignment race for SMP that was pointed out > earlier? What gave you that impression? It simply wasn't part of this example. To be honest, it'd be much easier if we only defined these operations on atomic_t's. We have all the infrastructure in place for them, and they're fairly well understood. If you need different sizes, I'm OK with an atomic_pointer_t, or whatever. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, 6 Dec 2006, Matthew Wilcox wrote: > And for those of us with only load-and-zero, that's simply: > > #define load_locked(addr) spin_lock(hash(addr)), *addr > #define store_exclusive(addr, old, new) \ > *addr = new, spin_unlock(hash(addr)), 0 > > which is also optimal for us. This means we tolerate the assignment race for SMP that was pointed out earlier? cmpxchg emulation may then also be tolerable just replace the irq enable/disable in David's implementation with taking a spin lock? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, Dec 06, 2006 at 07:58:20PM +, Russell King wrote: > No. If you read what I said, you'll see that you can _cheaply_ use > cmpxchg in a ll/sc based implementation. Take an atomic increment > operation. > > do { > old = load_locked(addr); > } while (store_exclusive(old, old + 1, addr); [...] > Implementing ll/sc based accessor macros allows both ll/sc _and_ cmpxchg > architectures to produce optimal code. > > Implementing an cmpxchg based accessor macro allows cmpxchg architectures > to produce optimal code and ll/sc non-optimal code. And for those of us with only load-and-zero, that's simply: #define load_locked(addr) spin_lock(hash(addr)), *addr #define store_exclusive(addr, old, new) \ *addr = new, spin_unlock(hash(addr)), 0 which is also optimal for us. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, Dec 06, 2006 at 12:11:01PM -0800, Christoph Lameter wrote: > On Wed, 6 Dec 2006, Matthew Wilcox wrote: > > Follow the thread back. I'm talking about parisc. Machines exist (are > > still being sold) with up to 128 cores. I think the largest we've > > confirmed work are 8 CPUs. > > And you only have an atomic get and clear atomic op? Amazing. Is this > really still in production? Yes. And yes. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, Dec 06, 2006 at 07:47:22PM +, David Howells wrote: > > > Pre-v6 ARM doesn't support SMP according to ARM's atomic.h, > > > > That's not quite true, there exist ARMv5 processors that in theory > > can support SMP. > > I meant that the Linux ARM arch doesn't support it: Ah, yes, not currently. That's true. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, 6 Dec 2006, Matthew Wilcox wrote: > On Wed, Dec 06, 2006 at 11:47:40AM -0800, Christoph Lameter wrote: > > Nope this is a UP implementation. There is no cpu B. > > Follow the thread back. I'm talking about parisc. Machines exist (are > still being sold) with up to 128 cores. I think the largest we've > confirmed work are 8 CPUs. And you only have an atomic get and clear atomic op? Amazing. Is this really still in production? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, Dec 06, 2006 at 11:16:55AM -0800, Christoph Lameter wrote: > On Wed, 6 Dec 2006, Russell King wrote: > > > On Wed, Dec 06, 2006 at 10:56:14AM -0800, Christoph Lameter wrote: > > > I'd really appreciate a cmpxchg that is generically available for > > > all arches. It will allow lockless implementation for various performance > > > criticial portions of the kernel. > > > > Let's recap on cmpxchg. > > > > For CPUs with no atomic operation other than SWP, it is not lockless. > > But then its also just requires disable/enable interrupts on UP which may > be cheaper than an atomic operation. No. SWP is atomic on the CPU it's being issued on, especially wrt interrupts. Only on one ARM CPU (which is UP) does it have a questionable use, and there we do it via interrupt disable/enable. > > For CPUs with load locked + store conditional, it is expensive. > > Because it locks the bus? I am not that familiar with those architectures > but it seems that those will have a general problem anyways. No. That certainly would be bad for performance. I can talk authoritively from the ARM implementation. When you use a special "ldrex" (load exclusive) instruction, the CPU remembers the "address + cpu" pair. If another access occurs to the same address, this state is reset. Only if this state is preserved will a "strex" (store exclusive) instruction succeed. This instruction returns status indicating whether it succeeded. So, to implement cmpxchg, you need to do this: ; r1 = temporary register ; r2 = address ; r4 = new value ; r3 = returned status ldrex r1, [r2] cmp r1, old_value streqex r3, r4, [r2] > > If you want an operation for performance critical portions of the > > kernel, please allow architecture maintainers the freedom to use their > > best performance enhancements. > > And thereby denying the kernel developers to use a simple atomic SMP > operation? Adding additional defines for each arch and each performance > critical piece of kernel logic? No. If you read what I said, you'll see that you can _cheaply_ use cmpxchg in a ll/sc based implementation. Take an atomic increment operation. do { old = load_locked(addr); } while (store_exclusive(old, old + 1, addr); On a cmpxchg, that "store_exclusive" (loosely) becomes your cmpxchg instruction, comparing the first arg, and if equal storing the second. The "load_locked" macro becomes a standard pointer deref. Ergo, x86 becomes: do { load value manipulate it conditional store } while not stored On ll/sc, the load_locked() macro is the load locked instruction. The store_exclusive() macro is the exclusive store and it doesn't need to use the first parameter at all. Ergo, ARM becomes: do { ldrex r1, [r2] manipulate r1 strex r0, r1, [r2] } while failed Notice that both are optimal. Now let's consider the cmpxchg case. do { val = *addr; } while (cmpxchg(val, val + 1, addr); The x86 case is _identical_ to the ll/sc based implementation. Absolutely entirely. No impact what so ever. Let's look at the ll/sc case. The cmpxchg code implemented on this has to reload the original value, compare it, if equal store the new value. So: do { val = *addr; (r2 = addr, ldrex r1, [r2] compare r1, r0 strexeq r4, r3, [r2] (store exclusive if equal) } while store failed or comparecondition failed Note how the cmpxchg has _forced_ the ll/sc implementation to become more complex. So, let's recap. Implementing ll/sc based accessor macros allows both ll/sc _and_ cmpxchg architectures to produce optimal code. Implementing an cmpxchg based accessor macro allows cmpxchg architectures to produce optimal code and ll/sc non-optimal code. See my point? -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, Dec 06, 2006 at 11:05:22AM -0800, Linus Torvalds wrote: > > > On Wed, 6 Dec 2006, Christoph Lameter wrote: > > > > I'd really appreciate a cmpxchg that is generically available for > > all arches. It will allow lockless implementation for various performance > > criticial portions of the kernel. > > I suspect ARM may have been the last one without one, no? > > That said, cmpxchg won't necessarily be "high-performance" unless the hw > supports it naturally in hardware, so.. > > Russell, are you ok with the code DavidH posted (the "try 2" one)? I'd > like to get an ack from the ARM maintainer before applying it, but it > looked ok. I've only recently got in today, so it'll be a while (== tomorrow) before I can get around to doing any tests. (It's evening this side of the pond.) -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, 6 Dec 2006, Linus Torvalds wrote: > > Your problem will be, of course, that any architecture that does this in > hardware will just DoTheRightThing, and as such, broken architectures with > bad locking primitives will have to test and do source-level analysis > more. (The underlying thread here being that the workqueue stuff _should_ be safe in this regard. But the "testing will not catch bugs like these is certainly true in general) Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, Dec 06, 2006 at 11:47:40AM -0800, Christoph Lameter wrote: > Nope this is a UP implementation. There is no cpu B. Follow the thread back. I'm talking about parisc. Machines exist (are still being sold) with up to 128 cores. I think the largest we've confirmed work are 8 CPUs. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, 6 Dec 2006, Matthew Wilcox wrote: > > That doesn't help, since assignment can't be guarded by any lock. True. Pure assignment will be lost, and is only ok for the case of a pure initializer (where nobody can see the old state). Your problem will be, of course, that any architecture that does this in hardware will just DoTheRightThing, and as such, broken architectures with bad locking primitives will have to test and do source-level analysis more. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
Lennert Buytenhek <[EMAIL PROTECTED]> wrote: > > Pre-v6 ARM doesn't support SMP according to ARM's atomic.h, > > That's not quite true, there exist ARMv5 processors that in theory > can support SMP. I meant that the Linux ARM arch doesn't support it: #else /* ARM_ARCH_6 */ #include #ifdef CONFIG_SMP #error SMP not supported on pre-ARMv6 CPUs #endif David - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, 6 Dec 2006, Matthew Wilcox wrote: > On Wed, Dec 06, 2006 at 11:29:42AM -0800, Christoph Lameter wrote: > > On Wed, 6 Dec 2006, Matthew Wilcox wrote: > > > > > It's just been pointed out to me that the parisc one isn't safe. > > > > > > imagine variable X is set to 3 > > > CPU A issues cmpxchg(&X, 3, 5) > > > you'd expect that to change X to 5 > > > but what if CPU B assigns 6 to X between cmpxchg reading X > > > and it setting X? > > > > The same could happen with a regular cmpxchg. Cmpxchg changes it to 5 and > > then other cpu performs a store before the next instruction. > > For someone who's advocating use of cmpxchg, it seems you don't > understand its semantics! In the scenario dhowells pointed out, X would > be left set to 5. X should have the value 6 under any legitimate > implementation: Nope this is a UP implementation. There is no cpu B. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
Linus Torvalds <[EMAIL PROTECTED]> wrote: > Also, I don't see quite why you think "cmpxchg()" and "atomic_cmpxchg()" > would be different. ANY cmpxchg() needs to be atomic - if it's not, > there's no point to the operation at all, since you'd just write it as It's not that atomic_cmpxchg() is different to cmpxchg(), it's that atomic_set() is different to direct assignment. atomic_set() on PA-RISC, for example, has spinlocks in it. David - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
Matthew Wilcox <[EMAIL PROTECTED]> wrote: > > Ok. For SMP-safety, it's important that any architecture that can't do > > this needs to _share_ the same spinlock (on SMP only, of course) that it > > uses for the bitops. > > That doesn't help, since assignment can't be guarded by any lock. It's not a problem for workqueues, since the only direct assignment to the management member variable is during initialisation. But in general cmpxchg() might be a problem with respect to assignment. atomic_cmpxchg() should be safe wrt atomic_set(). David - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, Dec 06, 2006 at 11:34:52AM -0800, Linus Torvalds wrote: > On Wed, 6 Dec 2006, Matthew Wilcox wrote: > > Given parisc's paucity of atomic operations (load-and-zero-32bit and > > load-and-zero-64bit), cmpxchg() is impossible to implement safely. > > There has to be something we can hook to exclude another processor > > modifying the variable. I'm OK with using atomic_cmpxchg(); we have > > atomic_set locked against it. > > How do you to the atomic bitops? The same way we do atomic_t. What I hadn't realised (because I hadn't read dhowell's implementation ... because it hasn't shown up on git2.kernel.org yet) is that he doesn't actually *use* this unlocked-assignment that would cause the problem. He uses bitops which use the same locks. > Also, I don't see quite why you think "cmpxchg()" and "atomic_cmpxchg()" > would be different. ANY cmpxchg() needs to be atomic - if it's not, > there's no point to the operation at all, since you'd just write it as > > if (*p == x) > *p = y; > > instead, and not use cmpxchg(). The difference is that we can (and do) acquire a lock for atomic_set. We can't acquire one for X = 6. > - on SMP, use a spinlock (with interrupts disabled), and share that >spinlock with all the other atomic ops (bitops at a minimum - the >"atomic_t" operations have traditionally been in another "locking >space" because of sparc32 historic braindamage, but I'd suggest >sharing the same spinlock between them all). Yep, we agree. > And yeah, it sucks. You _can_ (if you really want to) make the spinlock be > hashed based on the address of the atomic data structure. That at least > allows you to do _multiple_ spinlocks, but let's face it, your real > problem is _likely_ going to be cacheline bouncing, not contention, and > then using a hashed lock won't be likely to buy you all that much. We do hash based on the address -- and we try to arrange things such that different spinlocks are acquired for different cachelines. I don't know if anyone's benchmarked it recently to see how well it performs. It's a bit of a waltzing bear [1] at times ;-) [1] The wonder is not how well it waltzes, but that it waltzes at all - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, Dec 06, 2006 at 11:29:42AM -0800, Christoph Lameter wrote: > On Wed, 6 Dec 2006, Matthew Wilcox wrote: > > > It's just been pointed out to me that the parisc one isn't safe. > > > > imagine variable X is set to 3 > > CPU A issues cmpxchg(&X, 3, 5) > > you'd expect that to change X to 5 > > but what if CPU B assigns 6 to X between cmpxchg reading X > > and it setting X? > > The same could happen with a regular cmpxchg. Cmpxchg changes it to 5 and > then other cpu performs a store before the next instruction. For someone who's advocating use of cmpxchg, it seems you don't understand its semantics! In the scenario dhowells pointed out, X would be left set to 5. X should have the value 6 under any legitimate implementation: CPU A CPU B cmpxchg(3,5) X = 6 CPU A CPU B X = 6 cmpxhcg(3,5) CPU A cmpxchg(3, X = 6 5) Given that even yourself got confused about how to use it, perhaps it's not a good idea to expose this primitive to most programmers anyway? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, 6 Dec 2006, Matthew Wilcox wrote: > > Given parisc's paucity of atomic operations (load-and-zero-32bit and > load-and-zero-64bit), cmpxchg() is impossible to implement safely. > There has to be something we can hook to exclude another processor > modifying the variable. I'm OK with using atomic_cmpxchg(); we have > atomic_set locked against it. How do you to the atomic bitops? Also, I don't see quite why you think "cmpxchg()" and "atomic_cmpxchg()" would be different. ANY cmpxchg() needs to be atomic - if it's not, there's no point to the operation at all, since you'd just write it as if (*p == x) *p = y; instead, and not use cmpxchg(). So yes, architectures without native support (where "native" includes load-locked + store-conditional) always need to - on UP, just disable interrupts - on SMP, use a spinlock (with interrupts disabled), and share that spinlock with all the other atomic ops (bitops at a minimum - the "atomic_t" operations have traditionally been in another "locking space" because of sparc32 historic braindamage, but I'd suggest sharing the same spinlock between them all). And yeah, it sucks. You _can_ (if you really want to) make the spinlock be hashed based on the address of the atomic data structure. That at least allows you to do _multiple_ spinlocks, but let's face it, your real problem is _likely_ going to be cacheline bouncing, not contention, and then using a hashed lock won't be likely to buy you all that much. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, 6 Dec 2006, Christoph Lameter wrote: > > > For CPUs with load locked + store conditional, it is expensive. > > Because it locks the bus? I am not that familiar with those architectures > but it seems that those will have a general problem anyways. load_locked + store_conditional should _not_ be any more expensive than any atomic sequence will always be. Atomic sequences in SMP are obviously never "cheap". But cmpxchg shouldn't be any more expensive than any other atomic sequence if you have load-locked and store-conditional. There are obviously _implementation_ bugs. The early alpha's had such an atrocious ldl/stc that it wasn't even funny. That might be true in other implementations too, but it's definitely not cmpxchg-specific if so. It will affect _any_ atomic ops on such an architecture (atomic_inc() and friends) Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, Dec 06, 2006 at 11:05:22AM -0800, Linus Torvalds wrote: > On Wed, 6 Dec 2006, Christoph Lameter wrote: > > > > I'd really appreciate a cmpxchg that is generically available for > > all arches. It will allow lockless implementation for various performance > > criticial portions of the kernel. > > I suspect ARM may have been the last one without one, no? It's just been pointed out to me that the parisc one isn't safe. imagine variable X is set to 3 CPU A issues cmpxchg(&X, 3, 5) you'd expect that to change X to 5 but what if CPU B assigns 6 to X between cmpxchg reading X and it setting X? Given parisc's paucity of atomic operations (load-and-zero-32bit and load-and-zero-64bit), cmpxchg() is impossible to implement safely. There has to be something we can hook to exclude another processor modifying the variable. I'm OK with using atomic_cmpxchg(); we have atomic_set locked against it. Of course, using cmpxchg() isn't really lockless. It's just hidden locking. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, 6 Dec 2006, Matthew Wilcox wrote: > It's just been pointed out to me that the parisc one isn't safe. > > imagine variable X is set to 3 > CPU A issues cmpxchg(&X, 3, 5) > you'd expect that to change X to 5 > but what if CPU B assigns 6 to X between cmpxchg reading X > and it setting X? The same could happen with a regular cmpxchg. Cmpxchg changes it to 5 and then other cpu performs a store before the next instruction. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, Dec 06, 2006 at 11:25:35AM -0800, Linus Torvalds wrote: > Ok. For SMP-safety, it's important that any architecture that can't do > this needs to _share_ the same spinlock (on SMP only, of course) that it > uses for the bitops. That doesn't help, since assignment can't be guarded by any lock. > It would be good (but perhaps not as strict a requirement) if the atomic > counters also use the same lock. But that is probably impossible on > sparc32 (since it has a per-counter "lock"-like thing, iirc). So doing a > cmpxchg() on an atomic_t would be a bug. sparc32 switched over to the parisc way of doing things, so they could expand their atomic_t to a full 32 bits. They still have the old atomic_24_t lying around for their arch-private use, but atomic_t uses a hashed spinlock. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, 6 Dec 2006, Al Viro wrote: > > No. sparc32 doesn't have one, for instance. Ok. For SMP-safety, it's important that any architecture that can't do this needs to _share_ the same spinlock (on SMP only, of course) that it uses for the bitops. It would be good (but perhaps not as strict a requirement) if the atomic counters also use the same lock. But that is probably impossible on sparc32 (since it has a per-counter "lock"-like thing, iirc). So doing a cmpxchg() on an atomic_t would be a bug. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it [try #2]
Have a look at arch/i386/kernel/cpu/intel.c. You can probably replace my code that simulates cmpxchg for 386s arch/i386/kernel/cpu/intel.c: #ifndef CONFIG_X86_CMPXCHG unsigned long cmpxchg_386_u8(volatile void *ptr, u8 old, u8 new) { u8 prev; unsigned long flags; /* Poor man's cmpxchg for 386. Unsuitable for SMP */ local_irq_save(flags); prev = *(u8 *)ptr; if (prev == old) *(u8 *)ptr = new; local_irq_restore(flags); return prev; } EXPORT_SYMBOL(cmpxchg_386_u8); unsigned long cmpxchg_386_u16(volatile void *ptr, u16 old, u16 new) { u16 prev; unsigned long flags; /* Poor man's cmpxchg for 386. Unsuitable for SMP */ local_irq_save(flags); prev = *(u16 *)ptr; if (prev == old) *(u16 *)ptr = new; local_irq_restore(flags); return prev; } EXPORT_SYMBOL(cmpxchg_386_u16); unsigned long cmpxchg_386_u32(volatile void *ptr, u32 old, u32 new) { u32 prev; unsigned long flags; /* Poor man's cmpxchg for 386. Unsuitable for SMP */ local_irq_save(flags); prev = *(u32 *)ptr; if (prev == old) *(u32 *)ptr = new; local_irq_restore(flags); return prev; } EXPORT_SYMBOL(cmpxchg_386_u32); #endif - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, 6 Dec 2006, Russell King wrote: > On Wed, Dec 06, 2006 at 10:56:14AM -0800, Christoph Lameter wrote: > > I'd really appreciate a cmpxchg that is generically available for > > all arches. It will allow lockless implementation for various performance > > criticial portions of the kernel. > > Let's recap on cmpxchg. > > For CPUs with no atomic operation other than SWP, it is not lockless. But then its also just requires disable/enable interrupts on UP which may be cheaper than an atomic operation. > For CPUs with load locked + store conditional, it is expensive. Because it locks the bus? I am not that familiar with those architectures but it seems that those will have a general problem anyways. > If you want an operation for performance critical portions of the > kernel, please allow architecture maintainers the freedom to use their > best performance enhancements. And thereby denying the kernel developers to use a simple atomic SMP operation? Adding additional defines for each arch and each performance critical piece of kernel logic? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, Dec 06, 2006 at 04:43:14PM +, David Howells wrote: > Pre-v6 ARM doesn't support SMP according to ARM's atomic.h, That's not quite true, there exist ARMv5 processors that in theory can support SMP. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, Dec 06, 2006 at 11:05:22AM -0800, Linus Torvalds wrote: > > > On Wed, 6 Dec 2006, Christoph Lameter wrote: > > > > I'd really appreciate a cmpxchg that is generically available for > > all arches. It will allow lockless implementation for various performance > > criticial portions of the kernel. > > I suspect ARM may have been the last one without one, no? No. sparc32 doesn't have one, for instance. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, 6 Dec 2006, Christoph Lameter wrote: > > I'd really appreciate a cmpxchg that is generically available for > all arches. It will allow lockless implementation for various performance > criticial portions of the kernel. I suspect ARM may have been the last one without one, no? That said, cmpxchg won't necessarily be "high-performance" unless the hw supports it naturally in hardware, so.. Russell, are you ok with the code DavidH posted (the "try 2" one)? I'd like to get an ack from the ARM maintainer before applying it, but it looked ok. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it [try #2]
On Wed, 6 Dec 2006, David Howells wrote: > Implement generic UP cmpxchg() where an arch doesn't otherwise support it. > This assuming that the arch doesn't have support SMP without providing its own > cmpxchg() implementation. > > Signed-Off-By: David Howells <[EMAIL PROTECTED]> I cannot evaluate the ARM implementation but otherwise. Acked-by: Christoph Lameter <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, Dec 06, 2006 at 10:56:14AM -0800, Christoph Lameter wrote: > I'd really appreciate a cmpxchg that is generically available for > all arches. It will allow lockless implementation for various performance > criticial portions of the kernel. Let's recap on cmpxchg. For CPUs with no atomic operation other than SWP, it is not lockless. For CPUs with load locked + store conditional, it is expensive. I've shown in the past that a generic implementation based around ll/sc can be cheaply implemented using cmpxchg. The reverse is *not* true. If you want an operation for performance critical portions of the kernel, please allow architecture maintainers the freedom to use their best performance enhancements. -- Russell King Linux kernel2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
I'd really appreciate a cmpxchg that is generically available for all arches. It will allow lockless implementation for various performance criticial portions of the kernel. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
On Wed, 6 Dec 2006, David Howells wrote: > > Implement generic UP cmpxchg() where an arch doesn't otherwise support it. > This assuming that the arch doesn't have support SMP without providing its own > cmpxchg() implementation. This is too ugly to live. At least the kernel/workqueue.c part. The requirement that everybody implement a workable cmpxchg (and falling back to spinlocks if required as per the atomic bit operations) looks ok, but don't show that in generic code. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/