Hello Evandro!

x87 registers. In contrast, x86_64 ABI specifies that FP values are passed in SSE registers, so they avoid costly SSE reg->stack moves. Until i386 ABI (together with supporting math functions) is changed to something similar to x86_64, use of -mfpmath=sse won't show all its power.

Actually, in many cases, SSE did help x86 performance as well.  That
happens in FP-intensive applications which spend a lot of time in loops
when the XMM register set can be used more efficiently than the x87 stack.

 There is an annoying piece of code attached to PR19780
(http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19780), a loop that shuffles
registers around a lot:

 int i;

 real v1x, v1y, v1z;
 real v2x, v2y, v2z;
 real v3x, v3y, v3z;

 for (i = 0; i < 100000000; i++)
   {
     v3x = v1y * v2z - v1z * v2y;
     v3y = v1z * v2x - v1x * v2z;
     v3z = v1x * v2y - v1y * v2x;

     v1x = v2x;
     v1y = v2y;
     v1z = v2z;

     v2x = v3x;
     v2y = v3y;
     v2z = v3z;
   }

 This code could be a perfect example how XMM register file beats x87 reg stack.
However, contrary to all expectations, x87 code is 20% faster(!!) /on p4, but it
would be interesting to see this comparison on x86_64, or perhaps on 32bit AMD/.
The code structure, produced with -mfpmath=sse, is the same as the code 
structure produced
with -mfpmath=x87, so IMO there is no register allocator effects in play.

 I was trying to look into this problem, but on first sight, code seems optimal 
to me...

Uros.

Reply via email to