Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-22 Thread Benjamin Herrenschmidt

On Tue, 2008-05-20 at 15:53 -0700, David Miller wrote:
 From: Scott Wood [EMAIL PROTECTED]
 Date: Tue, 20 May 2008 17:43:58 -0500
 
  David Miller wrote:
   The __volatile__ in the asm construct disallows movement of the
   inline asm relative to statements surrounding it.
   
   The only reason barrier() in kernel.h needs a memory clobber is
   because of a bug in ancient versions of gcc.  In fact, I think
   that memory clobber might even be removable.
  
  Current versions of GCC seem quite happy to move non-asm memory accesses 
  around a volatile asm without a memory clobber; see the test Trent posted.
 
 Indeed, and even the GCC manual is clear about this.

So what is the scope of that problem ?

IE. Take an x86 version of that test, writing to memory, doing a writel
to some MMIO, then another memory write, can those be re-ordered with
the current x86 version of writel ?

static inline void writel(unsigned int b, volatile void __iomem *addr)
{
*(volatile unsigned int __force *)addr = b;
}

This is becoming a serious issue...

Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-22 Thread Trent Piepho

On Fri, 23 May 2008, Benjamin Herrenschmidt wrote:

On Tue, 2008-05-20 at 15:53 -0700, David Miller wrote:

From: Scott Wood [EMAIL PROTECTED]
Date: Tue, 20 May 2008 17:43:58 -0500

David Miller wrote:

The __volatile__ in the asm construct disallows movement of the
inline asm relative to statements surrounding it.

The only reason barrier() in kernel.h needs a memory clobber is
because of a bug in ancient versions of gcc.  In fact, I think
that memory clobber might even be removable.


Current versions of GCC seem quite happy to move non-asm memory accesses
around a volatile asm without a memory clobber; see the test Trent posted.


Indeed, and even the GCC manual is clear about this.


So what is the scope of that problem ?

IE. Take an x86 version of that test, writing to memory, doing a writel
to some MMIO, then another memory write, can those be re-ordered with
the current x86 version of writel ?


Yes, the same thing can happen on x86.  As far as I could tell, this is
something that all other arches can have happen.  Usually aliasing prevents
it, but it's not hard to constuct a test case where it doesn't.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-21 Thread Andreas Schwab
Trent Piepho [EMAIL PROTECTED] writes:

 On Wed, 21 May 2008, Andreas Schwab wrote:
 Trent Piepho [EMAIL PROTECTED] writes:

 It's the _le versions that have a problem, since we can't get gcc to just 
 use
 the register indexed mode.  It seems like an obvious thing to have a
 constraint for, but I guess there weren't enough instructions that only come
 in 'x' versions to bother with it.  There is a 'Z' constraint, Memory 
 operand
 that is an indexed or indirect from a register, but I tried it and it can 
 use
 both rb,ri and disp(rb) forms.  Actually, I'm not sure how 'Z' is any
 different than m?

 'Z' will never emit a non-zero constant displacement.

 It's too bad gas doesn't appear to be smart enough to turn:
 stwbrx 0, 0(3)   -or-   stwbr 0, 0(3)

Use the %y modifier when substituting the operand.

Andreas.

-- 
Andreas Schwab, SuSE Labs, [EMAIL PROTECTED]
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-21 Thread Benjamin Herrenschmidt

 Depends on what you define as necessary.  It's seem clear that I/O accessors
 _no not_ need to be strictly ordered with respect to normal memory accesses,
 by what's defined in memory-barriers.txt.  So if by necessary you mean what
 the Linux standard for I/O accessors requires (and what other archs provide),
 then yes, they have the necessary ordering guarantees.
 
 But, if you want them to be strictly ordered w.r.t to normal memory, that's
 not the case.

They should be.

 For example, in something like:
 
 u32 *dmabuf = kmalloc(...);
 ...
 dmabuf[0] = 1;
 out_be32(regs-dmactl, DMA_SEND_BUFFER);
 dmabuf[0] = 2;
 out_be32(regs-dmactl, DMA_SEND_BUFFER);
 
 gcc might decide to optimize this code to:
 
 out_be32(regs-dmactl, DMA_SEND_BUFFER);
 out_be32(regs-dmactl, DMA_SEND_BUFFER);
 dmabuf[0] = 2;

If that's the case, there is a bug. Ignoring gcc possible optimisations,
the accessors contain the necessary memory barriers for things to work
the way you describe above. If the use of volatile and clobber in our
macros isn't enough to also prevent optimisations, then we have a bug
and you are welcome to provide a patch to fix it.

 gcc will often not do this optimization, because there might be aliasing
 between regs-dmact and dmabuf, but it _can_ do it.  gcc can't optimize
 the two identical out_be32's into one, or re-order them if they were to
 different registers, but it can move the normal memory accesses around them.

The linus kernel -cannot- be compiled with strict aliasing rules. This
is one of the many areas where those are violated. Frankly, this strict
aliasing stuff is just a total nightmare turning a pefectly nice and
useable language into something it's not meant to be.

 Here's a quick hack I stuck in a driver to test.  compile with -save-temps and
 check the resulting asm.  gcc will do the optimization I described above.
 
 static void __iomem *baz = (void*)0x1234;
 static struct bar {
  u32 bar[256];
 } bar;
 
 void foo(void) {
  bar.bar[0] = 44;
  out_be32(baz+100, 200);
  bar.bar[0] = 45;
  out_be32(baz+101, 201);
 }

Have you removed -fno-strict-aliasing ? Just don't do that.

Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-21 Thread Benjamin Herrenschmidt

On Tue, 2008-05-20 at 15:55 -0700, Trent Piepho wrote:
 here doesn't appear to be any barriers to use for coherent dma other than
 mb() and wmb().
 
 Correct me if I'm wrong, but I think the sync isn't actually _required_ (by
 memory-barriers.txt's definitions), and it would be enough to use eieio,
 except there is code that doesn't use mmiowb() between I/O access and
 unlocking.
 
 So, as I understand it, the minimum needed is eieio.  To provide strict
 ordering w.r.t. spin locks without using mmiowb(), you need sync.  To provide
 strict ordering w.r.t. normal memory, you need sync and a compiler barrier.
 
 Right now no archs provide the last option.  powerpc is currently the middle
 option.  I don't know if anything uses the first option, maybe alpha?  I'm
 almost certain x86 is the middle option (the first isn't possible, the arch
 already has more ordering than that), which is probably why powerpc used that
 option and not the first.

I don't have time for that now. Can you dig into the archives ? The
whole thing has been discussed at lenght already.

Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-21 Thread Trent Piepho

On Wed, 21 May 2008, Benjamin Herrenschmidt wrote:

Depends on what you define as necessary.  It's seem clear that I/O accessors
_no not_ need to be strictly ordered with respect to normal memory accesses,
by what's defined in memory-barriers.txt.  So if by necessary you mean what
the Linux standard for I/O accessors requires (and what other archs provide),
then yes, they have the necessary ordering guarantees.

But, if you want them to be strictly ordered w.r.t to normal memory, that's
not the case.


They should be.


Someone should update memory-barriers.txt, because it doesn't say that, and
all I/O accessors for all the arches, because none of them are.


Here's a quick hack I stuck in a driver to test.  compile with -save-temps and
check the resulting asm.  gcc will do the optimization I described above.

static void __iomem *baz = (void*)0x1234;
static struct bar {
 u32 bar[256];
} bar;

void foo(void) {
 bar.bar[0] = 44;
 out_be32(baz+100, 200);
 bar.bar[0] = 45;
 out_be32(baz+101, 201);
}


Have you removed -fno-strict-aliasing ? Just don't do that.


No, it's compiled with a normal kernel build, which includes
-fno-strict-aliasing.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-21 Thread Trent Piepho

On Wed, 21 May 2008, Andreas Schwab wrote:

Trent Piepho [EMAIL PROTECTED] writes:

On Wed, 21 May 2008, Andreas Schwab wrote:

Trent Piepho [EMAIL PROTECTED] writes:

It's the _le versions that have a problem, since we can't get gcc to just use
the register indexed mode.  It seems like an obvious thing to have a
constraint for, but I guess there weren't enough instructions that only come
in 'x' versions to bother with it.  There is a 'Z' constraint, Memory operand
that is an indexed or indirect from a register, but I tried it and it can use
both rb,ri and disp(rb) forms.  Actually, I'm not sure how 'Z' is any
different than m?


'Z' will never emit a non-zero constant displacement.


It's too bad gas doesn't appear to be smart enough to turn:
stwbrx 0, 0(3)   -or-   stwbr 0, 0(3)


Use the %y modifier when substituting the operand.


Of course, the undocumented y modifier!  Print AltiVec or SPE memory operand
Why didn't I think of that?

That appears to work.  I can get gcc to to emit 0,reg and reg,reg but not
disp(reg).  It won't try to use an update address either, though it will
with an m constraint.

But, gcc 4.0.2 can't handle the 'Z' constraint.  It looks like it's not
supported.  Other than this, I can build a kernel with 4.0.2 that appears to
work.  Is it ok to break compatibility with 4.0.2, or should I put in a gcc
version check?
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-21 Thread Benjamin Herrenschmidt

On Wed, 2008-05-21 at 12:44 -0700, Trent Piepho wrote:
 
 Someone should update memory-barriers.txt, because it doesn't say
 that, and
 all I/O accessors for all the arches, because none of them are.

There have been long discussions about that. The end result was that
being too weakly ordered is just asking for trouble because the majority
of drivers are written  tested on x86 which is in order.

If you look at our accessors, minus that gcc problem you found, the
barriers in there should pretty much guarantee ordering in the cases
that matter, which are basically MMIO read followed by memory accesses
and memory writes followed by MMIO. In fact, MMIO read are fully
sychronous.

 No, it's compiled with a normal kernel build, which includes
 -fno-strict-aliasing

Ok, so there is a very bad bug indeed, we need to fix that.

Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-20 Thread Scott Wood

Benjamin Herrenschmidt wrote:

On Tue, 2008-05-20 at 13:40 -0700, Trent Piepho wrote:

There was some discussion on a Freescale list if the powerpc I/O accessors
should be strictly ordered w.r.t.  normal memory.  Currently they are not.  It
does not appear as if any other architecture's I/O accessors are strictly
ordered in this manner.  memory-barriers.txt explicitly states that the I/O
space (inb, outw, etc.) are NOT strictly ordered w.r.t. normal memory
accesses and it's implied the other I/O accessors (e.g., writel) are the same.

However, it is somewhat harder to program for this model, and there are almost
certainly a number of drivers using coherent DMA which have subtle bugs because
the do not include the necessary barriers.

But clearly and change to this would be a subject for a different patch.


The current accessors should provide all the necessary ordering
guarantees...


It looks like we rely on -fno-strict-aliasing to prevent reordering 
ordinary memory accesses (such as to DMA descriptors) past the I/O 
access.  It won't prevent reordering of memory reads around an I/O read, 
though, which could be a problem if the I/O read result determines the 
validity of the DMA buffer.  IMHO, a memory clobber would be better.


-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-20 Thread Andreas Schwab
Trent Piepho [EMAIL PROTECTED] writes:

 For the LE versions, eventually they boil down to an asm that will look
 something like this:
 asm(sync; stwbrx %1,0,%2 : =m (*addr) : r (val), r (addr));

 While not perfect, this appears to be the best one can do.  The issue is
 that the stwbrx instruction only comes in an indexed, or 'x', version, in
 which the address is represented by the sum of two registers (the 0,%2).
 Unfortunately, gcc doesn't have a constraint for an indexed memory
 reference.

There is the Z constraint, which matches either an indirect or an
indexed memory address.  That should fit here.

Andreas.

-- 
Andreas Schwab, SuSE Labs, [EMAIL PROTECTED]
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-20 Thread Benjamin Herrenschmidt

On Tue, 2008-05-20 at 16:38 -0500, Scott Wood wrote:
 It looks like we rely on -fno-strict-aliasing to prevent reordering 
 ordinary memory accesses (such as to DMA descriptors) past the I/O 
 access.  It won't prevent reordering of memory reads around an I/O
 read, 
 though, which could be a problem if the I/O read result determines
 the 
 validity of the DMA buffer.  IMHO, a memory clobber would be better.

We probably want a full memory clobber then...

Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-20 Thread Trent Piepho

On Tue, 20 May 2008, Benjamin Herrenschmidt wrote:

On Tue, 2008-05-20 at 13:40 -0700, Trent Piepho wrote:

There was some discussion on a Freescale list if the powerpc I/O accessors
should be strictly ordered w.r.t.  normal memory.  Currently they are not.  It
does not appear as if any other architecture's I/O accessors are strictly
ordered in this manner.  memory-barriers.txt explicitly states that the I/O
space (inb, outw, etc.) are NOT strictly ordered w.r.t. normal memory
accesses and it's implied the other I/O accessors (e.g., writel) are the same.

However, it is somewhat harder to program for this model, and there are almost
certainly a number of drivers using coherent DMA which have subtle bugs because
the do not include the necessary barriers.

But clearly and change to this would be a subject for a different patch.


The current accessors should provide all the necessary ordering
guarantees...


Depends on what you define as necessary.  It's seem clear that I/O accessors
_no not_ need to be strictly ordered with respect to normal memory accesses,
by what's defined in memory-barriers.txt.  So if by necessary you mean what
the Linux standard for I/O accessors requires (and what other archs provide),
then yes, they have the necessary ordering guarantees.

But, if you want them to be strictly ordered w.r.t to normal memory, that's
not the case.

For example, in something like:

u32 *dmabuf = kmalloc(...);
...
dmabuf[0] = 1;
out_be32(regs-dmactl, DMA_SEND_BUFFER);
dmabuf[0] = 2;
out_be32(regs-dmactl, DMA_SEND_BUFFER);

gcc might decide to optimize this code to:

out_be32(regs-dmactl, DMA_SEND_BUFFER);
out_be32(regs-dmactl, DMA_SEND_BUFFER);
dmabuf[0] = 2;

gcc will often not do this optimization, because there might be aliasing
between regs-dmact and dmabuf, but it _can_ do it.  gcc can't optimize
the two identical out_be32's into one, or re-order them if they were to
different registers, but it can move the normal memory accesses around them.

Here's a quick hack I stuck in a driver to test.  compile with -save-temps and
check the resulting asm.  gcc will do the optimization I described above.

static void __iomem *baz = (void*)0x1234;
static struct bar {
u32 bar[256];
} bar;

void foo(void) {
bar.bar[0] = 44;
out_be32(baz+100, 200);
bar.bar[0] = 45;
out_be32(baz+101, 201);
}
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-20 Thread Trent Piepho

On Wed, 21 May 2008, Andreas Schwab wrote:

Trent Piepho [EMAIL PROTECTED] writes:


For the LE versions, eventually they boil down to an asm that will look
something like this:
asm(sync; stwbrx %1,0,%2 : =m (*addr) : r (val), r (addr));

While not perfect, this appears to be the best one can do.  The issue is
that the stwbrx instruction only comes in an indexed, or 'x', version, in
which the address is represented by the sum of two registers (the 0,%2).
Unfortunately, gcc doesn't have a constraint for an indexed memory
reference.


There is the Z constraint, which matches either an indirect or an
indexed memory address.  That should fit here.


This came up on the Freescale list.  I should have put what I wrote there into
my patch descrition:

It's the _le versions that have a problem, since we can't get gcc to just use
the register indexed mode.  It seems like an obvious thing to have a
constraint for, but I guess there weren't enough instructions that only come
in 'x' versions to bother with it.  There is a 'Z' constraint, Memory operand
that is an indexed or indirect from a register, but I tried it and it can use
both rb,ri and disp(rb) forms.  Actually, I'm not sure how 'Z' is any
different than m?
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-20 Thread Trent Piepho

On Tue, 20 May 2008, Benjamin Herrenschmidt wrote:

On Tue, 2008-05-20 at 16:38 -0500, Scott Wood wrote:

It looks like we rely on -fno-strict-aliasing to prevent reordering
ordinary memory accesses (such as to DMA descriptors) past the I/O
access.  It won't prevent reordering of memory reads around an I/O
read,
though, which could be a problem if the I/O read result determines
the
validity of the DMA buffer.  IMHO, a memory clobber would be better.


We probably want a full memory clobber then...


As far as I could tell, no other arch has a full memory clobber.  I can see
the argument for changing the Linux model to be stricter and less efficient,
but easier to program for.  Not that I entirely agree with it.

But I don't see a good reason for why powerpc should be different than
everything else.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-20 Thread Scott Wood

Alan Cox wrote:
It looks like we rely on -fno-strict-aliasing to prevent reordering 
ordinary memory accesses (such as to DMA descriptors) past the I/O 


DMA descriptors in main memory are dependant on cache behaviour anyway
and the dma_* operators should be the ones enforcing the needed behaviour.


What about memory obtained from dma_alloc_coherent()?  We still need a 
sync and a compiler barrier.  The current I/O accessors have the former, 
but not the latter.


-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-20 Thread David Miller
From: Scott Wood [EMAIL PROTECTED]
Date: Tue, 20 May 2008 17:35:56 -0500

 Alan Cox wrote:
  It looks like we rely on -fno-strict-aliasing to prevent reordering 
  ordinary memory accesses (such as to DMA descriptors) past the I/O 
  
  DMA descriptors in main memory are dependant on cache behaviour anyway
  and the dma_* operators should be the ones enforcing the needed behaviour.
 
 What about memory obtained from dma_alloc_coherent()?  We still need a 
 sync and a compiler barrier.  The current I/O accessors have the former, 
 but not the latter.

The __volatile__ in the asm construct disallows movement of the
inline asm relative to statements surrounding it.

The only reason barrier() in kernel.h needs a memory clobber is
because of a bug in ancient versions of gcc.  In fact, I think
that memory clobber might even be removable.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-20 Thread Scott Wood

David Miller wrote:

From: Scott Wood [EMAIL PROTECTED]
Date: Tue, 20 May 2008 17:35:56 -0500


Alan Cox wrote:
It looks like we rely on -fno-strict-aliasing to prevent reordering 
ordinary memory accesses (such as to DMA descriptors) past the I/O 

DMA descriptors in main memory are dependant on cache behaviour anyway
and the dma_* operators should be the ones enforcing the needed behaviour.
What about memory obtained from dma_alloc_coherent()?  We still need a 
sync and a compiler barrier.  The current I/O accessors have the former, 
but not the latter.


The __volatile__ in the asm construct disallows movement of the
inline asm relative to statements surrounding it.

The only reason barrier() in kernel.h needs a memory clobber is
because of a bug in ancient versions of gcc.  In fact, I think
that memory clobber might even be removable.


Current versions of GCC seem quite happy to move non-asm memory accesses 
around a volatile asm without a memory clobber; see the test Trent posted.


-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-20 Thread Andreas Schwab
Trent Piepho [EMAIL PROTECTED] writes:

 It's the _le versions that have a problem, since we can't get gcc to just use
 the register indexed mode.  It seems like an obvious thing to have a
 constraint for, but I guess there weren't enough instructions that only come
 in 'x' versions to bother with it.  There is a 'Z' constraint, Memory operand
 that is an indexed or indirect from a register, but I tried it and it can use
 both rb,ri and disp(rb) forms.  Actually, I'm not sure how 'Z' is any
 different than m?

'Z' will never emit a non-zero constant displacement.

Andreas.

-- 
Andreas Schwab, SuSE Labs, [EMAIL PROTECTED]
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-20 Thread David Miller
From: Scott Wood [EMAIL PROTECTED]
Date: Tue, 20 May 2008 17:43:58 -0500

 David Miller wrote:
  The __volatile__ in the asm construct disallows movement of the
  inline asm relative to statements surrounding it.
  
  The only reason barrier() in kernel.h needs a memory clobber is
  because of a bug in ancient versions of gcc.  In fact, I think
  that memory clobber might even be removable.
 
 Current versions of GCC seem quite happy to move non-asm memory accesses 
 around a volatile asm without a memory clobber; see the test Trent posted.

Indeed, and even the GCC manual is clear about this.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-20 Thread Alan Cox
 It looks like we rely on -fno-strict-aliasing to prevent reordering 
 ordinary memory accesses (such as to DMA descriptors) past the I/O 

DMA descriptors in main memory are dependant on cache behaviour anyway
and the dma_* operators should be the ones enforcing the needed behaviour.

Alan
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-20 Thread Trent Piepho

On Tue, 20 May 2008, Scott Wood wrote:

Alan Cox wrote:
  It looks like we rely on -fno-strict-aliasing to prevent reordering 
  ordinary memory accesses (such as to DMA descriptors) past the I/O


 DMA descriptors in main memory are dependant on cache behaviour anyway
 and the dma_* operators should be the ones enforcing the needed behaviour.


What about memory obtained from dma_alloc_coherent()?  We still need a sync 
and a compiler barrier.  The current I/O accessors have the former, but not 
the latter.


There doesn't appear to be any barriers to use for coherent dma other than
mb() and wmb().

Correct me if I'm wrong, but I think the sync isn't actually _required_ (by
memory-barriers.txt's definitions), and it would be enough to use eieio,
except there is code that doesn't use mmiowb() between I/O access and
unlocking.

So, as I understand it, the minimum needed is eieio.  To provide strict
ordering w.r.t. spin locks without using mmiowb(), you need sync.  To provide
strict ordering w.r.t. normal memory, you need sync and a compiler barrier.

Right now no archs provide the last option.  powerpc is currently the middle
option.  I don't know if anything uses the first option, maybe alpha?  I'm
almost certain x86 is the middle option (the first isn't possible, the arch
already has more ordering than that), which is probably why powerpc used that
option and not the first.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] [POWERPC] Improve (in|out)_beXX() asm code

2008-05-20 Thread Trent Piepho

On Wed, 21 May 2008, Andreas Schwab wrote:

Trent Piepho [EMAIL PROTECTED] writes:


It's the _le versions that have a problem, since we can't get gcc to just use
the register indexed mode.  It seems like an obvious thing to have a
constraint for, but I guess there weren't enough instructions that only come
in 'x' versions to bother with it.  There is a 'Z' constraint, Memory operand
that is an indexed or indirect from a register, but I tried it and it can use
both rb,ri and disp(rb) forms.  Actually, I'm not sure how 'Z' is any
different than m?


'Z' will never emit a non-zero constant displacement.


It's too bad gas doesn't appear to be smart enough to turn:
stwbrx 0, 0(3)   -or-   stwbr 0, 0(3)

into the desired:
stwbrx 0, 0, 3
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev