On Sun, Mar 23, 2003 at 06:43:16PM +0100, Michael Nottebrock wrote:
Content-Description: signed data
> On Sunday 23 March 2003 18:02, Till Riedel wrote:
> > why not
> > +_CPUCFLAGS = -march=pentium4 -mno-sse2
> >
> > > choose, and in the case of pentium4 producing broken code the
> > > obvious fallback would be pentium3...
> >
> > above would be in fact the same because only the SSE2 code differs from
> > march=pentium3 which in turn only defines SSE additionally (which
> > probably generates the slower code compared to pentiumpro) as i see it.
> > code generation for all x86 uses the same rules (i386.md)
> > except that some rules only apply if TARGET_SSE2 is defined.
I at least now know to some extend what make -mpentium4 slow. someone at
gcc hacked a stupid cost table for its operations.This makes pentium4
fast again:
*** i386.c      Sun Mar 23 17:32:38 2003
--- i386.c.orig Sun Mar 23 17:45:35 2003
***************
*** 893,895 ****
{"pentium3", PROCESSOR_PENTIUMPRO, PTA_MMX | PTA_SSE | PTA_PREFETCH_SSE},
!       {"pentium4", PROCESSOR_PENTIUMPRO, PTA_SSE | PTA_SSE2 |
                     PTA_MMX | PTA_PREFETCH_SSE},
--- 893,895 ----
{"pentium3", PROCESSOR_PENTIUMPRO, PTA_MMX | PTA_SSE | PTA_PREFETCH_SSE},
!  {"pentium4", PROCESSOR_PENTIUM4, PTA_SSE | PTA_SSE2 | PTA_MMX | PTA_PREFETCH_SSE},

> 
> Just out of curiousity, have you tried using -mfpmath=sse? I remember someone 
> on this list claiming that the SSE fpa-code works much better than the i387 
> code which is used by default (even with -march=pentium4).
seems to be equally fast with whetstone benchmark , 
but makes sse2 slower because most sse2 rules depend on i387 math.
here some results after the cost patch above:

-march=pentiumpro
 whetstone took: 1.05 secs for 954 MFLOPS (w/  math lib)
 whetstone took: 0.28 secs for 3555 MFLOPS (w/o math lib)
-march=pentium3
 whetstone took: 1.05 secs for 954 MFLOPS (w/  math lib)
 whetstone took: 0.28 secs for 3556 MFLOPS (w/o math lib)
-march=pentium3  -mfpmath=sse
  whetstone took: 1.05 secs for 953 MFLOPS (w/  math lib)
  whetstone took: 0.28 secs for 3555 MFLOPS (w/o math lib)
-march=pentium4
 whetstone took: 1.06 secs for 942 MFLOPS (w/  math lib)
 whetstone took: 0.29 secs for 3393 MFLOPS (w/o math lib)
-march=pentium4  -mno-sse2  should after patch be the same as pentium3
 whetstone took: 1.05 secs for 954 MFLOPS (w/  math lib)
 whetstone took: 0.28 secs for 3555 MFLOPS (w/o math lib)
-march=pentium4  -mfpmath=sse
 whetstone took: 1.14 secs for 880 MFLOPS (w/  math lib)
 whetstone took: 0.36 secs for 2768 MFLOPS (w/o math lib)

till

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message

Reply via email to