Re: [PATCH] Use lwsync in PowerPC sync_* builtins

2008-09-04 Thread Paolo Bonzini
David Edelsohn wrote:
 On Wed, Sep 3, 2008 at 6:53 PM, Anton Blanchard [EMAIL PROTECTED] wrote:
 The only thing lwsync wont order is a store followed by a load. Since
 the lwsync will always be paired with a store (the stwcx), we will order
 all accesses before it and provide a release barrier.
 
 Anton,
 
 My one other concern is developers using the builtins for applications on
 embedded PowerPC processors.  lwsync will not order accesses to device
 memory space, AFAICT.

Don't you need eieio+sync for that?  GCC does not generate the eieio now.

Paolo


Re: [PATCH] Use lwsync in PowerPC sync_* builtins

2008-09-04 Thread Joel Sherrill

David Daney wrote:

On Wed, Sep 3, 2008 at 6:09 PM, David Edelsohn [EMAIL PROTECTED] wrote:
  

On Wed, Sep 3, 2008 at 6:53 PM, Anton Blanchard [EMAIL PROTECTED] wrote:


The only thing lwsync wont order is a store followed by a load. Since
the lwsync will always be paired with a store (the stwcx), we will order
all accesses before it and provide a release barrier.
  

Anton,

My one other concern is developers using the builtins for applications on
embedded PowerPC processors.  lwsync will not order accesses to device
memory space, AFAICT.  I do not know if developers would rely on GCC builtins
in that context and assume it implements the correct semantics.  Otherwise,
I agree that the memory barrier operations probably can use lwsync.



Would it be possible to have a conservative default and use a more
optimal form based on a specific CPU specified by -mcpu=?

I was thinking of doing something similar on MIPS where there are
similar issues.

  

Another related issue is that psim in gdb does not currently
support the lwsync instruction so any code generated using it
would fail there.  Since this is used as the test platform for
the embedded gcc targets (at least powerpc-elf and powerpc-rtems)
if gcc is going to generate this instruction, the simulator
needs to support it or there will be lots of failures.

David Daney
  



--
Joel Sherrill, Ph.D. Director of Research  Development
[EMAIL PROTECTED]On-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35805
  Support Available (256) 722-9985




Re: [PATCH] Use lwsync in PowerPC sync_* builtins

2008-09-04 Thread David Edelsohn
On Thu, Sep 4, 2008 at 8:31 AM, Joel Sherrill [EMAIL PROTECTED] wrote:
 Another related issue is that psim in gdb does not currently
 support the lwsync instruction so any code generated using it
 would fail there.  Since this is used as the test platform for
 the embedded gcc targets (at least powerpc-elf and powerpc-rtems)
 if gcc is going to generate this instruction, the simulator
 needs to support it or there will be lots of failures.

Joel,

That is unfortunate, but it is a long-term, known problem with PSIM.  Someone
maintaining PSIM needs to update it.  At worst, just make lwsync an alias
for sync.  It does not make sense to delay improving performance because
a simulator is far out of date.  You also can make a private change to GCC
rs6000 port sources to set TARGET_NO_LWSYNC so that sync is generated
instead of lwsync.

David


Re: [PATCH] Use lwsync in PowerPC sync_* builtins

2008-09-04 Thread Joel Sherrill

David Edelsohn wrote:

On Thu, Sep 4, 2008 at 8:31 AM, Joel Sherrill [EMAIL PROTECTED] wrote:
  

Another related issue is that psim in gdb does not currently
support the lwsync instruction so any code generated using it
would fail there.  Since this is used as the test platform for
the embedded gcc targets (at least powerpc-elf and powerpc-rtems)
if gcc is going to generate this instruction, the simulator
needs to support it or there will be lots of failures.



Joel,

That is unfortunate, but it is a long-term, known problem with PSIM.  Someone
maintaining PSIM needs to update it.  At worst, just make lwsync an alias
for sync.  

Honestly the psim code is very cryptic in this area.  I would
add it if I could figure out how.  :(

It does not make sense to delay improving performance because
a simulator is far out of date. 

No I would never say that it did either.

 You also can make a private change to GCC
rs6000 port sources to set TARGET_NO_LWSYNC so that sync is generated
instead of lwsync.
  

That's what we do now but I would like to not have to do
that.

David
  



--
Joel Sherrill, Ph.D. Director of Research  Development
[EMAIL PROTECTED]On-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35805
  Support Available (256) 722-9985




Re: [PATCH] Use lwsync in PowerPC sync_* builtins

2008-09-04 Thread Daniel Jacobowitz
On Thu, Sep 04, 2008 at 09:04:30AM -0400, David Edelsohn wrote:
 That is unfortunate, but it is a long-term, known problem with PSIM.  Someone
 maintaining PSIM needs to update it.

Also unfortunately, there is no one maintaining PSIM.  It's shipped
with GDB, but I consider that only a convenience for the
implementation of 'target sim'; it's really an independent project.

-- 
Daniel Jacobowitz
CodeSourcery


Re: [PATCH] Use lwsync in PowerPC sync_* builtins

2008-09-03 Thread Richard Henderson

Anton Blanchard wrote:

I noticed that sync_lock_release uses lwsync if available but every other
sync_* builtin uses a heavyweight sync. eg:


Every other sync builtin has full-barrier semantics.  AFAIK, isync is 
correct.



r~


Re: [PATCH] Use lwsync in PowerPC sync_* builtins

2008-09-03 Thread Anton Blanchard

Hi Richard,

 I noticed that sync_lock_release uses lwsync if available but every other
 sync_* builtin uses a heavyweight sync. eg:

 Every other sync builtin has full-barrier semantics.  AFAIK, isync is  
 correct.

I think we can change the sync to an lwsync and still maintain full
barrier semantics. The code sequence now is:

fetch_and_add:
  60:   7c 00 04 ac sync
  64:   7c 69 1b 78 mr  r9,r3
  68:   7c 60 48 28 lwarx   r3,0,r9
  6c:   39 63 00 01 addir11,r3,1
  70:   7d 60 49 2d stwcx.  r11,0,r9
  74:   40 a2 ff f4 bne-68 sync_fetch_and_add+0x8
  78:   4c 00 01 2c isync
  7c:   4e 80 00 20 blr

The only thing lwsync wont order is a store followed by a load. Since
the lwsync will always be paired with a store (the stwcx), we will order
all accesses before it and provide a release barrier.

The stwcx; bne; isync combination provides the acquire barrier.

Anton


Re: [PATCH] Use lwsync in PowerPC sync_* builtins

2008-09-03 Thread David Edelsohn
On Wed, Sep 3, 2008 at 6:53 PM, Anton Blanchard [EMAIL PROTECTED] wrote:
 The only thing lwsync wont order is a store followed by a load. Since
 the lwsync will always be paired with a store (the stwcx), we will order
 all accesses before it and provide a release barrier.

Anton,

My one other concern is developers using the builtins for applications on
embedded PowerPC processors.  lwsync will not order accesses to device
memory space, AFAICT.  I do not know if developers would rely on GCC builtins
in that context and assume it implements the correct semantics.  Otherwise,
I agree that the memory barrier operations probably can use lwsync.

David


Re: [PATCH] Use lwsync in PowerPC sync_* builtins

2008-09-03 Thread David Daney
On Wed, Sep 3, 2008 at 6:09 PM, David Edelsohn [EMAIL PROTECTED] wrote:
 On Wed, Sep 3, 2008 at 6:53 PM, Anton Blanchard [EMAIL PROTECTED] wrote:
 The only thing lwsync wont order is a store followed by a load. Since
 the lwsync will always be paired with a store (the stwcx), we will order
 all accesses before it and provide a release barrier.

 Anton,

 My one other concern is developers using the builtins for applications on
 embedded PowerPC processors.  lwsync will not order accesses to device
 memory space, AFAICT.  I do not know if developers would rely on GCC builtins
 in that context and assume it implements the correct semantics.  Otherwise,
 I agree that the memory barrier operations probably can use lwsync.

Would it be possible to have a conservative default and use a more
optimal form based on a specific CPU specified by -mcpu=?

I was thinking of doing something similar on MIPS where there are
similar issues.

David Daney


Re: [PATCH] Use lwsync in PowerPC sync_* builtins

2008-09-03 Thread Anton Blanchard
 
Hi David,

 My one other concern is developers using the builtins for applications on
 embedded PowerPC processors.  lwsync will not order accesses to device
 memory space, AFAICT.  I do not know if developers would rely on GCC builtins
 in that context and assume it implements the correct semantics.  Otherwise,
 I agree that the memory barrier operations probably can use lwsync.

Thats a good point, but I think embedded driver writers need to create
their own synchronisation macros (like Linux and Xorg do). The current
gcc builtins only solve some of the issues for drivers and I worry it
will give a false confidence if we say they can be used for IO. A few
places they will trip up on PowerPC:

- eieio required to enforce ordering between non cacheable loads and stores.
- eieio doesn't order cacheable and non cacheable storage accesses.

To get it right they are going to need IO macros etc.

Anton