> Posting some random numbers without a test-case and precise command line
> parameters for both compilers makes the numbers useless, IMHO. You also
> only mention instruction counts. Have you actually benchmarked the
> resulting code? CPUs are complicated and what you might perceive as worse
> code might actually be superior thanks to scheduling and internal CPU
> parallelism etc.

Thanks for reminding.
After some investigation, I could demonstrate the issue by following
piece of code:
-------------------------------------begin here-------------------
extern int *p[5];

# define REAL_RADIX_2            24
# define REAL_MUL_2(x, y)        (((long long)(x) * (long long)(y)) >>
REAL_RADIX_2)


void func(int *b1, int *b2)
{
  int c0 = p[3][0];
  int c1 = p[3][1];

  b2[0x18] = b1[0x18] + b1[0x1B];
  b2[0x1B] = REAL_MUL_2((b1[0x18] - b1[0x1B]) , c0);

  b2[0x19] = b1[0x19] + b1[0x1A];
  b2[0x1A] = REAL_MUL_2((b1[0x19] - b1[0x1A]) , c1);

  b2[0x1C] = b1[0x1C] + b1[0x1F];
  b2[0x1F] = REAL_MUL_2((b1[0x1F] - b1[0x1C]) , c0);

  b2[0x1D] = b1[0x1D] + b1[0x1E];
  b2[0x1E] = REAL_MUL_2((b1[0x1E] - b1[0x1D]) , c1);
}
-------------------------------------cut here-------------------

It seems GCC4.3.4 always expands the long long multiplication into
three long multiplications, like
-------------------------------------begin here-------------------
#  b2[0x1A] = REAL_MUL_2((b1[0x19] - b1[0x1A]) , c1);

        lw      $6,104($4)
        lw      $2,100($4)
        subu    $2,$2,$6
        mult    $11,$2
        sra     $6,$2,31
        madd    $6,$9
        mflo    $6
        multu   $2,$9
        mfhi    $3
        addu    $3,$6,$3
        sll     $6,$3,8
        mflo    $2
        srl     $7,$2,24
        or      $7,$6,$7
        sw      $7,104($5)
-------------------------------------cut here-------------------

while GCC3.4.4 treats the long long multiplication just like simple
ones, which generates only one
mult insn for each statement, like
-------------------------------------begin here-------------------
#  b2[0x1A] = REAL_MUL_2((b1[0x19] - b1[0x1A]) , c1);

        lw      $2,100($4)
        lw      $7,104($4)
        subu    $3,$2,$7
        mult    $3,$9
        mflo    $6
        mfhi    $25
        srl     $15,$6,24
        sll     $24,$25,8
        or      $14,$15,$24
        sw      $14,104($5)
-------------------------------------cut here-------------------

In my understanding, It‘s not necessary using three mult insn to implement
long long mult, since the operands are converted from int type.

And as before, the compiling options are like "-march=mips32r2  -O3"

Thanks.

-- 
Best Regards.

Reply via email to