Consider this function:

unsigned long long x(unsigned long long l) {
  return l >> 4;
}

gcc will use the shrd instruction here, which is much slower than doing it "by
hand" on at least Athlon, Pentium 3, VIA C3.  On Core 2 shrd appears to be
faster.

On my Athlon 64, I measured 350 cycles vs 441 for a loop of 100.
On my Core 2, I measured 672 cycles vs 624.

So, my suggestion is: if -march= is set to Pentium 3 or a non-Intel CPU, don't
use shrd and shrl.

My benchmark program is on http://dl.fefe.de/shrd.c


-- 
           Summary: gcc generates suboptimal code for long long shifts
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: felix-gcc at fefe dot de
 GCC build triplet: i386-pc-linux-gnu
  GCC host triplet: i386-pc-linux-gnu
GCC target triplet: i386-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33716

Reply via email to