Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code
On Tue, 2008-05-20 at 15:53 -0700, David Miller wrote: From: Scott Wood [EMAIL PROTECTED] Date: Tue, 20 May 2008 17:43:58 -0500 David Miller wrote: The __volatile__ in the asm construct disallows movement of the inline asm relative to statements surrounding it. The only reason barrier() in kernel.h needs a memory clobber is because of a bug in ancient versions of gcc. In fact, I think that memory clobber might even be removable. Current versions of GCC seem quite happy to move non-asm memory accesses around a volatile asm without a memory clobber; see the test Trent posted. Indeed, and even the GCC manual is clear about this. So what is the scope of that problem ? IE. Take an x86 version of that test, writing to memory, doing a writel to some MMIO, then another memory write, can those be re-ordered with the current x86 version of writel ? static inline void writel(unsigned int b, volatile void __iomem *addr) { *(volatile unsigned int __force *)addr = b; } This is becoming a serious issue... Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code
On Fri, 23 May 2008, Benjamin Herrenschmidt wrote: On Tue, 2008-05-20 at 15:53 -0700, David Miller wrote: From: Scott Wood [EMAIL PROTECTED] Date: Tue, 20 May 2008 17:43:58 -0500 David Miller wrote: The __volatile__ in the asm construct disallows movement of the inline asm relative to statements surrounding it. The only reason barrier() in kernel.h needs a memory clobber is because of a bug in ancient versions of gcc. In fact, I think that memory clobber might even be removable. Current versions of GCC seem quite happy to move non-asm memory accesses around a volatile asm without a memory clobber; see the test Trent posted. Indeed, and even the GCC manual is clear about this. So what is the scope of that problem ? IE. Take an x86 version of that test, writing to memory, doing a writel to some MMIO, then another memory write, can those be re-ordered with the current x86 version of writel ? Yes, the same thing can happen on x86. As far as I could tell, this is something that all other arches can have happen. Usually aliasing prevents it, but it's not hard to constuct a test case where it doesn't. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code
Trent Piepho [EMAIL PROTECTED] writes: On Wed, 21 May 2008, Andreas Schwab wrote: Trent Piepho [EMAIL PROTECTED] writes: It's the _le versions that have a problem, since we can't get gcc to just use the register indexed mode. It seems like an obvious thing to have a constraint for, but I guess there weren't enough instructions that only come in 'x' versions to bother with it. There is a 'Z' constraint, Memory operand that is an indexed or indirect from a register, but I tried it and it can use both rb,ri and disp(rb) forms. Actually, I'm not sure how 'Z' is any different than m? 'Z' will never emit a non-zero constant displacement. It's too bad gas doesn't appear to be smart enough to turn: stwbrx 0, 0(3) -or- stwbr 0, 0(3) Use the %y modifier when substituting the operand. Andreas. -- Andreas Schwab, SuSE Labs, [EMAIL PROTECTED] SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code
Depends on what you define as necessary. It's seem clear that I/O accessors _no not_ need to be strictly ordered with respect to normal memory accesses, by what's defined in memory-barriers.txt. So if by necessary you mean what the Linux standard for I/O accessors requires (and what other archs provide), then yes, they have the necessary ordering guarantees. But, if you want them to be strictly ordered w.r.t to normal memory, that's not the case. They should be. For example, in something like: u32 *dmabuf = kmalloc(...); ... dmabuf[0] = 1; out_be32(regs-dmactl, DMA_SEND_BUFFER); dmabuf[0] = 2; out_be32(regs-dmactl, DMA_SEND_BUFFER); gcc might decide to optimize this code to: out_be32(regs-dmactl, DMA_SEND_BUFFER); out_be32(regs-dmactl, DMA_SEND_BUFFER); dmabuf[0] = 2; If that's the case, there is a bug. Ignoring gcc possible optimisations, the accessors contain the necessary memory barriers for things to work the way you describe above. If the use of volatile and clobber in our macros isn't enough to also prevent optimisations, then we have a bug and you are welcome to provide a patch to fix it. gcc will often not do this optimization, because there might be aliasing between regs-dmact and dmabuf, but it _can_ do it. gcc can't optimize the two identical out_be32's into one, or re-order them if they were to different registers, but it can move the normal memory accesses around them. The linus kernel -cannot- be compiled with strict aliasing rules. This is one of the many areas where those are violated. Frankly, this strict aliasing stuff is just a total nightmare turning a pefectly nice and useable language into something it's not meant to be. Here's a quick hack I stuck in a driver to test. compile with -save-temps and check the resulting asm. gcc will do the optimization I described above. static void __iomem *baz = (void*)0x1234; static struct bar { u32 bar[256]; } bar; void foo(void) { bar.bar[0] = 44; out_be32(baz+100, 200); bar.bar[0] = 45; out_be32(baz+101, 201); } Have you removed -fno-strict-aliasing ? Just don't do that. Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code
On Tue, 2008-05-20 at 15:55 -0700, Trent Piepho wrote: here doesn't appear to be any barriers to use for coherent dma other than mb() and wmb(). Correct me if I'm wrong, but I think the sync isn't actually _required_ (by memory-barriers.txt's definitions), and it would be enough to use eieio, except there is code that doesn't use mmiowb() between I/O access and unlocking. So, as I understand it, the minimum needed is eieio. To provide strict ordering w.r.t. spin locks without using mmiowb(), you need sync. To provide strict ordering w.r.t. normal memory, you need sync and a compiler barrier. Right now no archs provide the last option. powerpc is currently the middle option. I don't know if anything uses the first option, maybe alpha? I'm almost certain x86 is the middle option (the first isn't possible, the arch already has more ordering than that), which is probably why powerpc used that option and not the first. I don't have time for that now. Can you dig into the archives ? The whole thing has been discussed at lenght already. Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code
On Wed, 21 May 2008, Benjamin Herrenschmidt wrote: Depends on what you define as necessary. It's seem clear that I/O accessors _no not_ need to be strictly ordered with respect to normal memory accesses, by what's defined in memory-barriers.txt. So if by necessary you mean what the Linux standard for I/O accessors requires (and what other archs provide), then yes, they have the necessary ordering guarantees. But, if you want them to be strictly ordered w.r.t to normal memory, that's not the case. They should be. Someone should update memory-barriers.txt, because it doesn't say that, and all I/O accessors for all the arches, because none of them are. Here's a quick hack I stuck in a driver to test. compile with -save-temps and check the resulting asm. gcc will do the optimization I described above. static void __iomem *baz = (void*)0x1234; static struct bar { u32 bar[256]; } bar; void foo(void) { bar.bar[0] = 44; out_be32(baz+100, 200); bar.bar[0] = 45; out_be32(baz+101, 201); } Have you removed -fno-strict-aliasing ? Just don't do that. No, it's compiled with a normal kernel build, which includes -fno-strict-aliasing. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code
On Wed, 21 May 2008, Andreas Schwab wrote: Trent Piepho [EMAIL PROTECTED] writes: On Wed, 21 May 2008, Andreas Schwab wrote: Trent Piepho [EMAIL PROTECTED] writes: It's the _le versions that have a problem, since we can't get gcc to just use the register indexed mode. It seems like an obvious thing to have a constraint for, but I guess there weren't enough instructions that only come in 'x' versions to bother with it. There is a 'Z' constraint, Memory operand that is an indexed or indirect from a register, but I tried it and it can use both rb,ri and disp(rb) forms. Actually, I'm not sure how 'Z' is any different than m? 'Z' will never emit a non-zero constant displacement. It's too bad gas doesn't appear to be smart enough to turn: stwbrx 0, 0(3) -or- stwbr 0, 0(3) Use the %y modifier when substituting the operand. Of course, the undocumented y modifier! Print AltiVec or SPE memory operand Why didn't I think of that? That appears to work. I can get gcc to to emit 0,reg and reg,reg but not disp(reg). It won't try to use an update address either, though it will with an m constraint. But, gcc 4.0.2 can't handle the 'Z' constraint. It looks like it's not supported. Other than this, I can build a kernel with 4.0.2 that appears to work. Is it ok to break compatibility with 4.0.2, or should I put in a gcc version check? ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code
On Wed, 2008-05-21 at 12:44 -0700, Trent Piepho wrote: Someone should update memory-barriers.txt, because it doesn't say that, and all I/O accessors for all the arches, because none of them are. There have been long discussions about that. The end result was that being too weakly ordered is just asking for trouble because the majority of drivers are written tested on x86 which is in order. If you look at our accessors, minus that gcc problem you found, the barriers in there should pretty much guarantee ordering in the cases that matter, which are basically MMIO read followed by memory accesses and memory writes followed by MMIO. In fact, MMIO read are fully sychronous. No, it's compiled with a normal kernel build, which includes -fno-strict-aliasing Ok, so there is a very bad bug indeed, we need to fix that. Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code
Benjamin Herrenschmidt wrote: On Tue, 2008-05-20 at 13:40 -0700, Trent Piepho wrote: There was some discussion on a Freescale list if the powerpc I/O accessors should be strictly ordered w.r.t. normal memory. Currently they are not. It does not appear as if any other architecture's I/O accessors are strictly ordered in this manner. memory-barriers.txt explicitly states that the I/O space (inb, outw, etc.) are NOT strictly ordered w.r.t. normal memory accesses and it's implied the other I/O accessors (e.g., writel) are the same. However, it is somewhat harder to program for this model, and there are almost certainly a number of drivers using coherent DMA which have subtle bugs because the do not include the necessary barriers. But clearly and change to this would be a subject for a different patch. The current accessors should provide all the necessary ordering guarantees... It looks like we rely on -fno-strict-aliasing to prevent reordering ordinary memory accesses (such as to DMA descriptors) past the I/O access. It won't prevent reordering of memory reads around an I/O read, though, which could be a problem if the I/O read result determines the validity of the DMA buffer. IMHO, a memory clobber would be better. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code
Trent Piepho [EMAIL PROTECTED] writes: For the LE versions, eventually they boil down to an asm that will look something like this: asm(sync; stwbrx %1,0,%2 : =m (*addr) : r (val), r (addr)); While not perfect, this appears to be the best one can do. The issue is that the stwbrx instruction only comes in an indexed, or 'x', version, in which the address is represented by the sum of two registers (the 0,%2). Unfortunately, gcc doesn't have a constraint for an indexed memory reference. There is the Z constraint, which matches either an indirect or an indexed memory address. That should fit here. Andreas. -- Andreas Schwab, SuSE Labs, [EMAIL PROTECTED] SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code
On Tue, 2008-05-20 at 16:38 -0500, Scott Wood wrote: It looks like we rely on -fno-strict-aliasing to prevent reordering ordinary memory accesses (such as to DMA descriptors) past the I/O access. It won't prevent reordering of memory reads around an I/O read, though, which could be a problem if the I/O read result determines the validity of the DMA buffer. IMHO, a memory clobber would be better. We probably want a full memory clobber then... Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code
On Tue, 20 May 2008, Benjamin Herrenschmidt wrote: On Tue, 2008-05-20 at 13:40 -0700, Trent Piepho wrote: There was some discussion on a Freescale list if the powerpc I/O accessors should be strictly ordered w.r.t. normal memory. Currently they are not. It does not appear as if any other architecture's I/O accessors are strictly ordered in this manner. memory-barriers.txt explicitly states that the I/O space (inb, outw, etc.) are NOT strictly ordered w.r.t. normal memory accesses and it's implied the other I/O accessors (e.g., writel) are the same. However, it is somewhat harder to program for this model, and there are almost certainly a number of drivers using coherent DMA which have subtle bugs because the do not include the necessary barriers. But clearly and change to this would be a subject for a different patch. The current accessors should provide all the necessary ordering guarantees... Depends on what you define as necessary. It's seem clear that I/O accessors _no not_ need to be strictly ordered with respect to normal memory accesses, by what's defined in memory-barriers.txt. So if by necessary you mean what the Linux standard for I/O accessors requires (and what other archs provide), then yes, they have the necessary ordering guarantees. But, if you want them to be strictly ordered w.r.t to normal memory, that's not the case. For example, in something like: u32 *dmabuf = kmalloc(...); ... dmabuf[0] = 1; out_be32(regs-dmactl, DMA_SEND_BUFFER); dmabuf[0] = 2; out_be32(regs-dmactl, DMA_SEND_BUFFER); gcc might decide to optimize this code to: out_be32(regs-dmactl, DMA_SEND_BUFFER); out_be32(regs-dmactl, DMA_SEND_BUFFER); dmabuf[0] = 2; gcc will often not do this optimization, because there might be aliasing between regs-dmact and dmabuf, but it _can_ do it. gcc can't optimize the two identical out_be32's into one, or re-order them if they were to different registers, but it can move the normal memory accesses around them. Here's a quick hack I stuck in a driver to test. compile with -save-temps and check the resulting asm. gcc will do the optimization I described above. static void __iomem *baz = (void*)0x1234; static struct bar { u32 bar[256]; } bar; void foo(void) { bar.bar[0] = 44; out_be32(baz+100, 200); bar.bar[0] = 45; out_be32(baz+101, 201); } ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code
On Wed, 21 May 2008, Andreas Schwab wrote: Trent Piepho [EMAIL PROTECTED] writes: For the LE versions, eventually they boil down to an asm that will look something like this: asm(sync; stwbrx %1,0,%2 : =m (*addr) : r (val), r (addr)); While not perfect, this appears to be the best one can do. The issue is that the stwbrx instruction only comes in an indexed, or 'x', version, in which the address is represented by the sum of two registers (the 0,%2). Unfortunately, gcc doesn't have a constraint for an indexed memory reference. There is the Z constraint, which matches either an indirect or an indexed memory address. That should fit here. This came up on the Freescale list. I should have put what I wrote there into my patch descrition: It's the _le versions that have a problem, since we can't get gcc to just use the register indexed mode. It seems like an obvious thing to have a constraint for, but I guess there weren't enough instructions that only come in 'x' versions to bother with it. There is a 'Z' constraint, Memory operand that is an indexed or indirect from a register, but I tried it and it can use both rb,ri and disp(rb) forms. Actually, I'm not sure how 'Z' is any different than m? ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code
On Tue, 20 May 2008, Benjamin Herrenschmidt wrote: On Tue, 2008-05-20 at 16:38 -0500, Scott Wood wrote: It looks like we rely on -fno-strict-aliasing to prevent reordering ordinary memory accesses (such as to DMA descriptors) past the I/O access. It won't prevent reordering of memory reads around an I/O read, though, which could be a problem if the I/O read result determines the validity of the DMA buffer. IMHO, a memory clobber would be better. We probably want a full memory clobber then... As far as I could tell, no other arch has a full memory clobber. I can see the argument for changing the Linux model to be stricter and less efficient, but easier to program for. Not that I entirely agree with it. But I don't see a good reason for why powerpc should be different than everything else. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code
Alan Cox wrote: It looks like we rely on -fno-strict-aliasing to prevent reordering ordinary memory accesses (such as to DMA descriptors) past the I/O DMA descriptors in main memory are dependant on cache behaviour anyway and the dma_* operators should be the ones enforcing the needed behaviour. What about memory obtained from dma_alloc_coherent()? We still need a sync and a compiler barrier. The current I/O accessors have the former, but not the latter. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code
From: Scott Wood [EMAIL PROTECTED] Date: Tue, 20 May 2008 17:35:56 -0500 Alan Cox wrote: It looks like we rely on -fno-strict-aliasing to prevent reordering ordinary memory accesses (such as to DMA descriptors) past the I/O DMA descriptors in main memory are dependant on cache behaviour anyway and the dma_* operators should be the ones enforcing the needed behaviour. What about memory obtained from dma_alloc_coherent()? We still need a sync and a compiler barrier. The current I/O accessors have the former, but not the latter. The __volatile__ in the asm construct disallows movement of the inline asm relative to statements surrounding it. The only reason barrier() in kernel.h needs a memory clobber is because of a bug in ancient versions of gcc. In fact, I think that memory clobber might even be removable. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code
David Miller wrote: From: Scott Wood [EMAIL PROTECTED] Date: Tue, 20 May 2008 17:35:56 -0500 Alan Cox wrote: It looks like we rely on -fno-strict-aliasing to prevent reordering ordinary memory accesses (such as to DMA descriptors) past the I/O DMA descriptors in main memory are dependant on cache behaviour anyway and the dma_* operators should be the ones enforcing the needed behaviour. What about memory obtained from dma_alloc_coherent()? We still need a sync and a compiler barrier. The current I/O accessors have the former, but not the latter. The __volatile__ in the asm construct disallows movement of the inline asm relative to statements surrounding it. The only reason barrier() in kernel.h needs a memory clobber is because of a bug in ancient versions of gcc. In fact, I think that memory clobber might even be removable. Current versions of GCC seem quite happy to move non-asm memory accesses around a volatile asm without a memory clobber; see the test Trent posted. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code
Trent Piepho [EMAIL PROTECTED] writes: It's the _le versions that have a problem, since we can't get gcc to just use the register indexed mode. It seems like an obvious thing to have a constraint for, but I guess there weren't enough instructions that only come in 'x' versions to bother with it. There is a 'Z' constraint, Memory operand that is an indexed or indirect from a register, but I tried it and it can use both rb,ri and disp(rb) forms. Actually, I'm not sure how 'Z' is any different than m? 'Z' will never emit a non-zero constant displacement. Andreas. -- Andreas Schwab, SuSE Labs, [EMAIL PROTECTED] SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany PGP key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code
From: Scott Wood [EMAIL PROTECTED] Date: Tue, 20 May 2008 17:43:58 -0500 David Miller wrote: The __volatile__ in the asm construct disallows movement of the inline asm relative to statements surrounding it. The only reason barrier() in kernel.h needs a memory clobber is because of a bug in ancient versions of gcc. In fact, I think that memory clobber might even be removable. Current versions of GCC seem quite happy to move non-asm memory accesses around a volatile asm without a memory clobber; see the test Trent posted. Indeed, and even the GCC manual is clear about this. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code
It looks like we rely on -fno-strict-aliasing to prevent reordering ordinary memory accesses (such as to DMA descriptors) past the I/O DMA descriptors in main memory are dependant on cache behaviour anyway and the dma_* operators should be the ones enforcing the needed behaviour. Alan ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code
On Tue, 20 May 2008, Scott Wood wrote: Alan Cox wrote: It looks like we rely on -fno-strict-aliasing to prevent reordering ordinary memory accesses (such as to DMA descriptors) past the I/O DMA descriptors in main memory are dependant on cache behaviour anyway and the dma_* operators should be the ones enforcing the needed behaviour. What about memory obtained from dma_alloc_coherent()? We still need a sync and a compiler barrier. The current I/O accessors have the former, but not the latter. There doesn't appear to be any barriers to use for coherent dma other than mb() and wmb(). Correct me if I'm wrong, but I think the sync isn't actually _required_ (by memory-barriers.txt's definitions), and it would be enough to use eieio, except there is code that doesn't use mmiowb() between I/O access and unlocking. So, as I understand it, the minimum needed is eieio. To provide strict ordering w.r.t. spin locks without using mmiowb(), you need sync. To provide strict ordering w.r.t. normal memory, you need sync and a compiler barrier. Right now no archs provide the last option. powerpc is currently the middle option. I don't know if anything uses the first option, maybe alpha? I'm almost certain x86 is the middle option (the first isn't possible, the arch already has more ordering than that), which is probably why powerpc used that option and not the first. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code
On Wed, 21 May 2008, Andreas Schwab wrote: Trent Piepho [EMAIL PROTECTED] writes: It's the _le versions that have a problem, since we can't get gcc to just use the register indexed mode. It seems like an obvious thing to have a constraint for, but I guess there weren't enough instructions that only come in 'x' versions to bother with it. There is a 'Z' constraint, Memory operand that is an indexed or indirect from a register, but I tried it and it can use both rb,ri and disp(rb) forms. Actually, I'm not sure how 'Z' is any different than m? 'Z' will never emit a non-zero constant displacement. It's too bad gas doesn't appear to be smart enough to turn: stwbrx 0, 0(3) -or- stwbr 0, 0(3) into the desired: stwbrx 0, 0, 3 ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev