subject:"fancy x87 ops, SSE and \-mfpmath=sse,387 performance"

fancy x87 ops, SSE and -mfpmath=sse,387 performance

2006-08-05 Thread tbp


Basically i'd like to have the cake and also eat it.

With g++-4.2-20060805/cygwin on a k8 box on some software path with
lots of sp float ops but no transcendentals or library calls
-mfpmath=sse,387: 5.2 Mray/s
-mfpmath=sse: 6 Mray/s
That 15% performance difference is no surprise when you see things like
 4037c8:   flds   0x4(%esp)
 4037cc:   mulss  %xmm5,%xmm2
 4037d0:   fsubrp %st,%st(1)
 4037d2:   movss  %xmm1,0x4(%esp)
 4037d8:   addss  0x278(%esp,%ecx,4),%xmm0
 4037e1:   flds   0x4(%esp)
 4037e5:   fsubrp %st,%st(1)
 4037e7:   addss  %xmm2,%xmm0
 4037eb:   movss  %xmm0,0x4(%esp)
 4037f1:   flds   0x4(%esp)
 4037f5:   fdivrp %st,%st(1)
 4037f7:   fcomi  %st(1),%st
 4037f9:   fldz
 4037fb:   setae  %dl
 4037fe:   fcomip %st(1),%st
 403800:   seta   %al
 403803:   or %al,%dl
 403805:   je 4036ca

Therefore -mfpmath=sse is the way to go and is in fact on par or
better than what i get out of icc 9.1 for the same code.
Where it gets ugly is when, for example, you throw some cosf() into
the same compilation unit as with -mfpmath=sse you pay for some really
really slow library function calls (at least on cygwin).
Wishful thinking got me trying -march=k8 -mfpmath=sse
-mfancy-math-387, to no avail :(
Is there a way to enable such exotic codegen for 32bit environments?

Re: fancy x87 ops, SSE and -mfpmath=sse,387 performance

2006-08-06 Thread Paolo Bonzini




Is there a way to enable such exotic codegen for 32bit environments?


With libgcc-math you didn't have exotic instructions, but you had 
trascendental operations compiled with -mfpmath=sse and with a special 
ABI.  -mfpmath=sse won about 8% over -mfpmath=387 for tramp3d, which 
does have trascendental operations.


Let's see what happens for 4.3.

Paolo

Re: fancy x87 ops, SSE and -mfpmath=sse,387 performance

2006-08-07 Thread tbp


On 8/6/06, Paolo Bonzini <[EMAIL PROTECTED]> wrote:

> Is there a way to enable such exotic codegen for 32bit environments?

With libgcc-math you didn't have exotic instructions, but you had
trascendental operations compiled with -mfpmath=sse and with a special
ABI.  -mfpmath=sse won about 8% over -mfpmath=387 for tramp3d, which
does have trascendental operations.

Let's see what happens for 4.3.

I'm not sure i groked the fuss about libgcc-math.
What i know is that -mfpmath=sse in recent gcc does wonders, just like
SSE implementations of such library calls as i can experience them in
a sane environment like linux x86-64. But it's truely horrible in
cygwin and off the mark by an order of magnitude.

My complaint is that atm the only stopgap on such platform is to
ressort to -mfpmath=sse,387 which is not without drawbacks.

I understand -march=k8 -mfpmath=sse -mfancy-math-387 is out of
question, but could clarify what i should expect from 4.3?

fancy x87 ops, SSE and -mfpmath=sse,387 performance

Re: fancy x87 ops, SSE and -mfpmath=sse,387 performance

Re: fancy x87 ops, SSE and -mfpmath=sse,387 performance

3 matches

Site Navigation

Mail list logo

Footer information