Re: [PATCH] Use lwsync in PowerPC sync_* builtins
David Edelsohn wrote: On Wed, Sep 3, 2008 at 6:53 PM, Anton Blanchard [EMAIL PROTECTED] wrote: The only thing lwsync wont order is a store followed by a load. Since the lwsync will always be paired with a store (the stwcx), we will order all accesses before it and provide a release barrier. Anton, My one other concern is developers using the builtins for applications on embedded PowerPC processors. lwsync will not order accesses to device memory space, AFAICT. Don't you need eieio+sync for that? GCC does not generate the eieio now. Paolo
Re: [PATCH] Use lwsync in PowerPC sync_* builtins
David Daney wrote: On Wed, Sep 3, 2008 at 6:09 PM, David Edelsohn [EMAIL PROTECTED] wrote: On Wed, Sep 3, 2008 at 6:53 PM, Anton Blanchard [EMAIL PROTECTED] wrote: The only thing lwsync wont order is a store followed by a load. Since the lwsync will always be paired with a store (the stwcx), we will order all accesses before it and provide a release barrier. Anton, My one other concern is developers using the builtins for applications on embedded PowerPC processors. lwsync will not order accesses to device memory space, AFAICT. I do not know if developers would rely on GCC builtins in that context and assume it implements the correct semantics. Otherwise, I agree that the memory barrier operations probably can use lwsync. Would it be possible to have a conservative default and use a more optimal form based on a specific CPU specified by -mcpu=? I was thinking of doing something similar on MIPS where there are similar issues. Another related issue is that psim in gdb does not currently support the lwsync instruction so any code generated using it would fail there. Since this is used as the test platform for the embedded gcc targets (at least powerpc-elf and powerpc-rtems) if gcc is going to generate this instruction, the simulator needs to support it or there will be lots of failures. David Daney -- Joel Sherrill, Ph.D. Director of Research Development [EMAIL PROTECTED]On-Line Applications Research Ask me about RTEMS: a free RTOS Huntsville AL 35805 Support Available (256) 722-9985
Re: [PATCH] Use lwsync in PowerPC sync_* builtins
On Thu, Sep 4, 2008 at 8:31 AM, Joel Sherrill [EMAIL PROTECTED] wrote: Another related issue is that psim in gdb does not currently support the lwsync instruction so any code generated using it would fail there. Since this is used as the test platform for the embedded gcc targets (at least powerpc-elf and powerpc-rtems) if gcc is going to generate this instruction, the simulator needs to support it or there will be lots of failures. Joel, That is unfortunate, but it is a long-term, known problem with PSIM. Someone maintaining PSIM needs to update it. At worst, just make lwsync an alias for sync. It does not make sense to delay improving performance because a simulator is far out of date. You also can make a private change to GCC rs6000 port sources to set TARGET_NO_LWSYNC so that sync is generated instead of lwsync. David
Re: [PATCH] Use lwsync in PowerPC sync_* builtins
David Edelsohn wrote: On Thu, Sep 4, 2008 at 8:31 AM, Joel Sherrill [EMAIL PROTECTED] wrote: Another related issue is that psim in gdb does not currently support the lwsync instruction so any code generated using it would fail there. Since this is used as the test platform for the embedded gcc targets (at least powerpc-elf and powerpc-rtems) if gcc is going to generate this instruction, the simulator needs to support it or there will be lots of failures. Joel, That is unfortunate, but it is a long-term, known problem with PSIM. Someone maintaining PSIM needs to update it. At worst, just make lwsync an alias for sync. Honestly the psim code is very cryptic in this area. I would add it if I could figure out how. :( It does not make sense to delay improving performance because a simulator is far out of date. No I would never say that it did either. You also can make a private change to GCC rs6000 port sources to set TARGET_NO_LWSYNC so that sync is generated instead of lwsync. That's what we do now but I would like to not have to do that. David -- Joel Sherrill, Ph.D. Director of Research Development [EMAIL PROTECTED]On-Line Applications Research Ask me about RTEMS: a free RTOS Huntsville AL 35805 Support Available (256) 722-9985
Re: [PATCH] Use lwsync in PowerPC sync_* builtins
On Thu, Sep 04, 2008 at 09:04:30AM -0400, David Edelsohn wrote: That is unfortunate, but it is a long-term, known problem with PSIM. Someone maintaining PSIM needs to update it. Also unfortunately, there is no one maintaining PSIM. It's shipped with GDB, but I consider that only a convenience for the implementation of 'target sim'; it's really an independent project. -- Daniel Jacobowitz CodeSourcery
Re: [PATCH] Use lwsync in PowerPC sync_* builtins
Anton Blanchard wrote: I noticed that sync_lock_release uses lwsync if available but every other sync_* builtin uses a heavyweight sync. eg: Every other sync builtin has full-barrier semantics. AFAIK, isync is correct. r~
Re: [PATCH] Use lwsync in PowerPC sync_* builtins
Hi Richard, I noticed that sync_lock_release uses lwsync if available but every other sync_* builtin uses a heavyweight sync. eg: Every other sync builtin has full-barrier semantics. AFAIK, isync is correct. I think we can change the sync to an lwsync and still maintain full barrier semantics. The code sequence now is: fetch_and_add: 60: 7c 00 04 ac sync 64: 7c 69 1b 78 mr r9,r3 68: 7c 60 48 28 lwarx r3,0,r9 6c: 39 63 00 01 addir11,r3,1 70: 7d 60 49 2d stwcx. r11,0,r9 74: 40 a2 ff f4 bne-68 sync_fetch_and_add+0x8 78: 4c 00 01 2c isync 7c: 4e 80 00 20 blr The only thing lwsync wont order is a store followed by a load. Since the lwsync will always be paired with a store (the stwcx), we will order all accesses before it and provide a release barrier. The stwcx; bne; isync combination provides the acquire barrier. Anton
Re: [PATCH] Use lwsync in PowerPC sync_* builtins
On Wed, Sep 3, 2008 at 6:53 PM, Anton Blanchard [EMAIL PROTECTED] wrote: The only thing lwsync wont order is a store followed by a load. Since the lwsync will always be paired with a store (the stwcx), we will order all accesses before it and provide a release barrier. Anton, My one other concern is developers using the builtins for applications on embedded PowerPC processors. lwsync will not order accesses to device memory space, AFAICT. I do not know if developers would rely on GCC builtins in that context and assume it implements the correct semantics. Otherwise, I agree that the memory barrier operations probably can use lwsync. David
Re: [PATCH] Use lwsync in PowerPC sync_* builtins
On Wed, Sep 3, 2008 at 6:09 PM, David Edelsohn [EMAIL PROTECTED] wrote: On Wed, Sep 3, 2008 at 6:53 PM, Anton Blanchard [EMAIL PROTECTED] wrote: The only thing lwsync wont order is a store followed by a load. Since the lwsync will always be paired with a store (the stwcx), we will order all accesses before it and provide a release barrier. Anton, My one other concern is developers using the builtins for applications on embedded PowerPC processors. lwsync will not order accesses to device memory space, AFAICT. I do not know if developers would rely on GCC builtins in that context and assume it implements the correct semantics. Otherwise, I agree that the memory barrier operations probably can use lwsync. Would it be possible to have a conservative default and use a more optimal form based on a specific CPU specified by -mcpu=? I was thinking of doing something similar on MIPS where there are similar issues. David Daney
Re: [PATCH] Use lwsync in PowerPC sync_* builtins
Hi David, My one other concern is developers using the builtins for applications on embedded PowerPC processors. lwsync will not order accesses to device memory space, AFAICT. I do not know if developers would rely on GCC builtins in that context and assume it implements the correct semantics. Otherwise, I agree that the memory barrier operations probably can use lwsync. Thats a good point, but I think embedded driver writers need to create their own synchronisation macros (like Linux and Xorg do). The current gcc builtins only solve some of the issues for drivers and I worry it will give a false confidence if we say they can be used for IO. A few places they will trip up on PowerPC: - eieio required to enforce ordering between non cacheable loads and stores. - eieio doesn't order cacheable and non cacheable storage accesses. To get it right they are going to need IO macros etc. Anton