When comparing the kind of code ICC outputs vs gcc, it's really obvious gcc
could make a better use of x86 baroque addressing modes.
More specifically i rarely ever see it using the *8 scale factor, even when
addressing nicely power-of-2 sized stuff, and that's definitely a performance
problem when dealing with those large SSE vectors.

In that testcase the *8 scale factor is only used once and even if it's
questionnable the use a of a fancier mode would help in this particular
testcase, there's no doubt it would in Real World.

Also, take note of the horrible code for massage48; in Real World it's even 
worse:
  4012a6:       mov    $0x30,%edx
  4012af:       imul   %edx,%eax
That's not from the testcase, that's in a loop and edx get reloaded each time.

Tested with today's cvs and something like -O3 -march=k8 -fomit-frame-pointer
-mfpmath=sse and -O3 -march=pentium4 -fomit-frame-pointer -mfpmath=sse.

-- 
           Summary: [missed-optimization] gcc4 is really reluctant to use
                    fancy x86 addressing modes
           Product: gcc
           Version: 4.0.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P2
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: tbptbp at gmail dot com
                CC: gcc-bugs at gcc dot gnu dot org
  GCC host triplet: cygwin


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19680

Reply via email to