Re: [MIPS] MADD issue

2007-04-20 Thread Richard Sandiford
Nigel Stephens [EMAIL PROTECTED] writes:
 OK, so maybe as the person who removed adddi3 from the MIPS backend, and 
 the main  proponent of the new fused opcodes, you get to volunteer to 
 implement this? :)

Hey, I was pretty happy with the status quo ;)

Richard


Re: [MIPS] MADD issue

2007-04-20 Thread Nigel Stephens



Richard Sandiford wrote:

Nigel Stephens [EMAIL PROTECTED] writes:
  
I notice that at least the 32-bit rs6000, i386, sparc, frv, sh, cris, 
mcore, score, arm  pa backends still implement adddi3 as either a 
define_insn which outputs two instructions or an explicit define_expand 
followed define_split and associated sub patterns.  Are we setting the 
bar too high for MIPS? :)



I don't think that follows.  The main reason that ports like rs6000,
i386, arm, sparc and pa define adddi3 is because those architectures
provide special add-with-carry type instructions, or similar
architecture-specific optimisations.  


Right, good point.


MIPS has nothing like that.
  


Actually the MIPS DSP ASE does have addsc and addwc, which could be used 
for this purpose. Sadly not subsc and subwc, though.



The old MIPS patterns just re-implemented the standard optabs version
(and often did so less efficiently, as I said before).

  
Whilst I'm sure that your proposal is the right one going forward, it 
still feels like it could be significant amount of work to implement. 
And the simplified optab/expand support would only work for multiply-add 
sequences expressed within a single expression, and wouldn't be able to 
optimise disjoint multiply, then add expressions. Or have I missed 
something.



I don't think that follows either.  Out-of-ssa should coalesce them
if they are in the same basic block.  And if they aren't in the
same basic block, that's presumably because the tree optimisers
think they shouldn't be.  Combine wouldn't look across basic
block boundaries either AFAIK.

E.g. compiling:


typedef unsigned long long ull;
typedef unsigned int ui;

ull foo (ui x, ui y, ull z)
{
  ull tmp = (ull) x * y;
  return tmp + z;
}


with a recent snapshot and -O2 -fdump-tree-all shows that the
final_cleanup dump does indeed have the combined form:


;; Function foo (foo)

foo (x, y, z)
{
bb 2:
  return (long long unsigned int) y * (long long unsigned int) x + z;

}


  


Ah I see. Fair enough.


I realise no-one else has spoken out in support of me, so perhaps
I'm in a minority of one here.  But it does seem to me that in the
Tree-SSA world, it makes less sense to duplicate standard optabs
in the backend purely for the reason of keeping DImode arithmetic
around as DImode arithmetic for longer.

  
in the short term we really do need to reenable madd/msub for MIPS32 
targets in GCC 4. We could do that with a local patch to put back 
adddi3, but it would be better if we could coordinate this with you.



If removing the patterns had been purely a clean-up, I would be more
open to the idea of putting the patterns back.  But given that removing
the patterns had an optimisation benefit of its own, I'm less open to
the idea of adding them back, especially when (as far as I'm concerned)
there's a clean way of getting the best of both worlds.

  


OK, so maybe as the person who removed adddi3 from the MIPS backend, and 
the main  proponent of the new fused opcodes, you get to volunteer to 
implement this? :)


In the meantime the performance gain from being able to use a widening 
madd is more important to us than the benefit of improved optimisation 
of 64-bit addition, so we'll probably have to put adddi3 back in as a 
local patch.


Nigel



Re: [MIPS] MADD issue

2007-04-20 Thread Nigel Stephens



Richard Sandiford wrote:

Nigel Stephens [EMAIL PROTECTED] writes:
  
While I agree with you philosophically, it feels like (b) might be quite 
a major task. A number of optimisation passes which currently recognise 
and MUL and PLUS separately (e.g. loop strength reduction) would now 
need to be extended to handle the fused MULPLUS and MULSUB operators.


And although the reduction in instruction count due to your previous 
change is good, what is it as a percentage of the total? After all it 
only helps code which uses 64-bit integer types with a 32-bit ABI, which 
is probably quite a small proportion of most real-life applications -- 
whereas for some algorithms the ability to use MADD is absolutely 
critical to performance, and for them losing the ability to generate 
MADD is a significant backward step for the compiler.


How about, as a workaround until (b) sees the light of day, we 
reimplement adddi3 and subdi3 only (not the other di mode patterns), 
qualified by ISA_HAS_MADD_MSUB. Perhaps they could also be implemented 
more cleanly nowadays, using  define_insn_and_split and/or a # 
template, to avoid generating multi-instruction assembler sequences.



The old patterns had a define_split too.  That wasn't really the problem.

If you don't want to add a tree code yet, it would still be possible to
add the optab and expand support, recognising mult-add sequences in a
similar way to how we recognise widening multiplies now.  I feel at
least that's a step in the right direction.

  


Richard

I notice that at least the 32-bit rs6000, i386, sparc, frv, sh, cris, 
mcore, score, arm  pa backends still implement adddi3 as either a 
define_insn which outputs two instructions or an explicit define_expand 
followed define_split and associated sub patterns.  Are we setting the 
bar too high for MIPS? :)


Whilst I'm sure that your proposal is the right one going forward, it 
still feels like it could be significant amount of work to implement. 
And the simplified optab/expand support would only work for multiply-add 
sequences expressed within a single expression, and wouldn't be able to 
optimise disjoint multiply, then add expressions. Or have I missed 
something.


in the short term we really do need to reenable madd/msub for MIPS32 
targets in GCC 4. We could do that with a local patch to put back 
adddi3, but it would be better if we could coordinate this with you.


Nigel



Re: [MIPS] MADD issue

2007-04-20 Thread Richard Sandiford
Nigel Stephens [EMAIL PROTECTED] writes:
 I notice that at least the 32-bit rs6000, i386, sparc, frv, sh, cris, 
 mcore, score, arm  pa backends still implement adddi3 as either a 
 define_insn which outputs two instructions or an explicit define_expand 
 followed define_split and associated sub patterns.  Are we setting the 
 bar too high for MIPS? :)

I don't think that follows.  The main reason that ports like rs6000,
i386, arm, sparc and pa define adddi3 is because those architectures
provide special add-with-carry type instructions, or similar
architecture-specific optimisations.  MIPS has nothing like that.
The old MIPS patterns just re-implemented the standard optabs version
(and often did so less efficiently, as I said before).

 Whilst I'm sure that your proposal is the right one going forward, it 
 still feels like it could be significant amount of work to implement. 
 And the simplified optab/expand support would only work for multiply-add 
 sequences expressed within a single expression, and wouldn't be able to 
 optimise disjoint multiply, then add expressions. Or have I missed 
 something.

I don't think that follows either.  Out-of-ssa should coalesce them
if they are in the same basic block.  And if they aren't in the
same basic block, that's presumably because the tree optimisers
think they shouldn't be.  Combine wouldn't look across basic
block boundaries either AFAIK.

E.g. compiling:


typedef unsigned long long ull;
typedef unsigned int ui;

ull foo (ui x, ui y, ull z)
{
  ull tmp = (ull) x * y;
  return tmp + z;
}


with a recent snapshot and -O2 -fdump-tree-all shows that the
final_cleanup dump does indeed have the combined form:


;; Function foo (foo)

foo (x, y, z)
{
bb 2:
  return (long long unsigned int) y * (long long unsigned int) x + z;

}


I realise no-one else has spoken out in support of me, so perhaps
I'm in a minority of one here.  But it does seem to me that in the
Tree-SSA world, it makes less sense to duplicate standard optabs
in the backend purely for the reason of keeping DImode arithmetic
around as DImode arithmetic for longer.

 in the short term we really do need to reenable madd/msub for MIPS32 
 targets in GCC 4. We could do that with a local patch to put back 
 adddi3, but it would be better if we could coordinate this with you.

If removing the patterns had been purely a clean-up, I would be more
open to the idea of putting the patterns back.  But given that removing
the patterns had an optimisation benefit of its own, I'm less open to
the idea of adding them back, especially when (as far as I'm concerned)
there's a clean way of getting the best of both worlds.

Richard


Re: [MIPS] MADD issue

2007-04-20 Thread Ian Lance Taylor
Richard Sandiford [EMAIL PROTECTED] writes:

 I realise no-one else has spoken out in support of me, so perhaps
 I'm in a minority of one here.  But it does seem to me that in the
 Tree-SSA world, it makes less sense to duplicate standard optabs
 in the backend purely for the reason of keeping DImode arithmetic
 around as DImode arithmetic for longer.

The main issue I know of is the RTL level loop optimizers (combine and
CSE can mostly work off of REG_EQUAL notes).  If you define_expand
adddi3, they won't be able to handle loops using long long types.
Whether this matters in practice for real code, I don't know.

Certainly adddi3 and friends should not be straight define_insns, as
they used to be for MIPS.  With the lower-subreg pass, they should be
either define_expand to individual insns or define_insn_and_split with
an unconditional split before reload.

Ian


Re: [MIPS] MADD issue

2007-04-20 Thread Richard Sandiford
Ian Lance Taylor [EMAIL PROTECTED] writes:
 Richard Sandiford [EMAIL PROTECTED] writes:
 I realise no-one else has spoken out in support of me, so perhaps
 I'm in a minority of one here.  But it does seem to me that in the
 Tree-SSA world, it makes less sense to duplicate standard optabs
 in the backend purely for the reason of keeping DImode arithmetic
 around as DImode arithmetic for longer.

 The main issue I know of is the RTL level loop optimizers (combine and
 CSE can mostly work off of REG_EQUAL notes).  If you define_expand
 adddi3, they won't be able to handle loops using long long types.
 Whether this matters in practice for real code, I don't know.

My point was that I thought the interesting multi-word optimisations
(including loop optimisations) ought to be done at the tree level instead,
and that the main focus of the RTL optimisers ought to be optimising
things after machine-specific information has been exposed.  In contrast,
the MIPS define_insn  define_splits existed specifically to avoid
exposing machine-specific information to those optimisations.  I'm not
sure from your reply whether you disagree (although it sounds like you might).

 Certainly adddi3 and friends should not be straight define_insns, as
 they used to be for MIPS.  With the lower-subreg pass, they should be
 either define_expand to individual insns or define_insn_and_split with
 an unconditional split before reload.

Well, the old patterns had define_splits too.  I don't think that
was really the problem.

Richard


Re: [MIPS] MADD issue

2007-04-20 Thread Ian Lance Taylor
Richard Sandiford [EMAIL PROTECTED] writes:

 Ian Lance Taylor [EMAIL PROTECTED] writes:
  Richard Sandiford [EMAIL PROTECTED] writes:
  I realise no-one else has spoken out in support of me, so perhaps
  I'm in a minority of one here.  But it does seem to me that in the
  Tree-SSA world, it makes less sense to duplicate standard optabs
  in the backend purely for the reason of keeping DImode arithmetic
  around as DImode arithmetic for longer.
 
  The main issue I know of is the RTL level loop optimizers (combine and
  CSE can mostly work off of REG_EQUAL notes).  If you define_expand
  adddi3, they won't be able to handle loops using long long types.
  Whether this matters in practice for real code, I don't know.
 
 My point was that I thought the interesting multi-word optimisations
 (including loop optimisations) ought to be done at the tree level instead,
 and that the main focus of the RTL optimisers ought to be optimising
 things after machine-specific information has been exposed.  In contrast,
 the MIPS define_insn  define_splits existed specifically to avoid
 exposing machine-specific information to those optimisations.  I'm not
 sure from your reply whether you disagree (although it sounds like you might).

I suppose I neither agree nor disagree.  It's a matter for testing.

It's clear that with our present scheme there are loop optimization
opportunities at the RTL level in the form of hoisting new loop
invariants created by expanding the addressing modes.  And, of course,
some machines have specific loop instructions which currently can only
be handled at the RTL level.  However, those should be more or less
independent of adddi3.

Ian


Re: [MIPS] MADD issue

2007-04-20 Thread Richard Sandiford
Ian Lance Taylor [EMAIL PROTECTED] writes:
 Richard Sandiford [EMAIL PROTECTED] writes:

 Ian Lance Taylor [EMAIL PROTECTED] writes:
  Richard Sandiford [EMAIL PROTECTED] writes:
  I realise no-one else has spoken out in support of me, so perhaps
  I'm in a minority of one here.  But it does seem to me that in the
  Tree-SSA world, it makes less sense to duplicate standard optabs
  in the backend purely for the reason of keeping DImode arithmetic
  around as DImode arithmetic for longer.
 
  The main issue I know of is the RTL level loop optimizers (combine and
  CSE can mostly work off of REG_EQUAL notes).  If you define_expand
  adddi3, they won't be able to handle loops using long long types.
  Whether this matters in practice for real code, I don't know.
 
 My point was that I thought the interesting multi-word optimisations
 (including loop optimisations) ought to be done at the tree level instead,
 and that the main focus of the RTL optimisers ought to be optimising
 things after machine-specific information has been exposed.  In contrast,
 the MIPS define_insn  define_splits existed specifically to avoid
 exposing machine-specific information to those optimisations.  I'm not
 sure from your reply whether you disagree (although it sounds like you 
 might).

 I suppose I neither agree nor disagree.  It's a matter for testing.

 It's clear that with our present scheme there are loop optimization
 opportunities at the RTL level in the form of hoisting new loop
 invariants created by expanding the addressing modes.  And, of course,
 some machines have specific loop instructions which currently can only
 be handled at the RTL level.

Right.  In case my message might be interpreted as saying that we
shouldn't have RTL loop optimisers, that wasn't the intention at all.
MIPS definitely benefits from them with %hi accesses, etc.  It was
more the opposite: splitting the instructions early ought to give
the RTL optimisers the opportunity to do more things that the tree
optimisers simply couldn't.  OTOH, trying to give the RTL optimisers
the same sort of arithmetic operations that the tree level had seems
like going out of our way to make the RTL optimisers repeat work.

 However, those should be more or less independent of adddi3.

Yeah, I hope so.

Richard


Re: [MIPS] MADD issue

2007-04-13 Thread Paolo Bonzini



(define_insn adddi3_internal_1
  [(set (match_operand:DI 0 register_operand =d,d)
(plus:DI (match_operand:DI 1 register_operand 0,d)
 (match_operand:DI 2 register_operand d,d)))
   (clobber (match_operand:SI 3 register_operand =d,d))]
  !TARGET_64BIT  !TARGET_DEBUG_G_MODE  !TARGET_MIPS16
{
  return (REGNO (operands[0]) == REGNO (operands[1])
   REGNO (operands[0]) == REGNO (operands[2]))
? srl\t%3,%L0,31\;sll\t%M0,%M0,1\;sll\t%L0,%L1,1\;addu\t%M0,%M0,%3
: 
addu\t%L0,%L1,%L2\;sltu\t%3,%L0,%L2\;addu\t%M0,%M1,%M2\;addu\t%M0,%M0,%3;
}


This should be a post-reload (i.e. predicated on reload_completed) 
split, I think.


Paolo


Re: [MIPS] MADD issue

2007-04-13 Thread Richard Sandiford
Fu, Chao-Ying [EMAIL PROTECTED] writes:
   After tracing GCC 4.x to see why MADD is not generated for MIPS32,
 I found out the main issue is that the pattern adddi3
 is not available for MIPS32.  Because the missing
 of adddi3, GCC 4.x needs to split 64-bit addition to 4 separate
 RTL insns.  This leads to that the combining phase fails
 to combine RTL insns to a single madd pattern.

   Could we enable adddi3 for MIPS32 in GCC 4.x?  Or is there a 
 better way to generate MADD?  Thanks a lot!

The problem with:

 Ex: (mips.md in GCC 3.4)
 (define_expand adddi3
   [(parallel [(set (match_operand:DI 0 register_operand )
(plus:DI (match_operand:DI 1 register_operand )
 (match_operand:DI 2 arith_operand )))
   (clobber (match_dup 3))])]
   TARGET_64BIT || (!TARGET_DEBUG_G_MODE  !TARGET_MIPS16)
 {
 

 (define_insn adddi3_internal_1
   [(set (match_operand:DI 0 register_operand =d,d)
 (plus:DI (match_operand:DI 1 register_operand 0,d)
  (match_operand:DI 2 register_operand d,d)))
(clobber (match_operand:SI 3 register_operand =d,d))]
   !TARGET_64BIT  !TARGET_DEBUG_G_MODE  !TARGET_MIPS16
 {
   return (REGNO (operands[0]) == REGNO (operands[1])
REGNO (operands[0]) == REGNO (operands[2]))
 ? srl\t%3,%L0,31\;sll\t%M0,%M0,1\;sll\t%L0,%L1,1\;addu\t%M0,%M0,%3
 : 
 addu\t%L0,%L1,%L2\;sltu\t%3,%L0,%L2\;addu\t%M0,%M1,%M2\;addu\t%M0,%M0,%3;
 }
   [(set_attr type darith)
(set_attr mode DI)
(set_attr length   16)])

...this was that it tended to be very poor for the additions themselves.
When optabs.c implements the additions instead, the early RTL optimisers
get to see the individual instructions, and are often able to handle
constant or part-constant operands better.  This led to a noticable
size improvement when I tested it originally.  (I imagine the effects
are even better now, thanks to the subreg lowering pass.)  See:

http://gcc.gnu.org/ml/gcc-patches/2004-05/msg00947.html

for the patch that made this change, and some rationale.

As far as madd goes, I think it would be better to either
(a) get combine to handle this situation or (b) get expand
to generate a fused multiply-add from the outset.

(b) sounds like it might be useful in its own right.  At the moment we
treat the generation of floating-point multiply-adds as an optimisation,
but in some applications it's critical not to round the intermediate
result.  (I don't know if there's a bugzilla entry about this.)
If we treated fused multiply-add as a primitive operation, we could
extend it to integer types too.  In this case we'd also need to
handle widening multiplications, but we already need to do that
for stand-alone multiplications.

Just random musings, and probably not the answer you wanted to hear,
sorry.

Richard


Re: [MIPS] MADD issue

2007-04-13 Thread Nigel Stephens



Richard Sandiford wrote:


As far as madd goes, I think it would be better to either
(a) get combine to handle this situation or (b) get expand
to generate a fused multiply-add from the outset.

(b) sounds like it might be useful in its own right.  At the moment we
treat the generation of floating-point multiply-adds as an optimisation,
but in some applications it's critical not to round the intermediate
result.  (I don't know if there's a bugzilla entry about this.)
If we treated fused multiply-add as a primitive operation, we could
extend it to integer types too.  In this case we'd also need to
handle widening multiplications, but we already need to do that
for stand-alone multiplications.
  


Richard

While I agree with you philosophically, it feels like (b) might be quite 
a major task. A number of optimisation passes which currently recognise 
and MUL and PLUS separately (e.g. loop strength reduction) would now 
need to be extended to handle the fused MULPLUS and MULSUB operators.


And although the reduction in instruction count due to your previous 
change is good, what is it as a percentage of the total? After all it 
only helps code which uses 64-bit integer types with a 32-bit ABI, which 
is probably quite a small proportion of most real-life applications -- 
whereas for some algorithms the ability to use MADD is absolutely 
critical to performance, and for them losing the ability to generate 
MADD is a significant backward step for the compiler.


How about, as a workaround until (b) sees the light of day, we 
reimplement adddi3 and subdi3 only (not the other di mode patterns), 
qualified by ISA_HAS_MADD_MSUB. Perhaps they could also be implemented 
more cleanly nowadays, using  define_insn_and_split and/or a # 
template, to avoid generating multi-instruction assembler sequences.


Nigel


Re: [MIPS] MADD issue

2007-04-13 Thread Richard Sandiford
Nigel Stephens [EMAIL PROTECTED] writes:
 While I agree with you philosophically, it feels like (b) might be quite 
 a major task. A number of optimisation passes which currently recognise 
 and MUL and PLUS separately (e.g. loop strength reduction) would now 
 need to be extended to handle the fused MULPLUS and MULSUB operators.

 And although the reduction in instruction count due to your previous 
 change is good, what is it as a percentage of the total? After all it 
 only helps code which uses 64-bit integer types with a 32-bit ABI, which 
 is probably quite a small proportion of most real-life applications -- 
 whereas for some algorithms the ability to use MADD is absolutely 
 critical to performance, and for them losing the ability to generate 
 MADD is a significant backward step for the compiler.

 How about, as a workaround until (b) sees the light of day, we 
 reimplement adddi3 and subdi3 only (not the other di mode patterns), 
 qualified by ISA_HAS_MADD_MSUB. Perhaps they could also be implemented 
 more cleanly nowadays, using  define_insn_and_split and/or a # 
 template, to avoid generating multi-instruction assembler sequences.

The old patterns had a define_split too.  That wasn't really the problem.

If you don't want to add a tree code yet, it would still be possible to
add the optab and expand support, recognising mult-add sequences in a
similar way to how we recognise widening multiplies now.  I feel at
least that's a step in the right direction.

Richard


Re: [MIPS] MADD issue

2007-04-13 Thread Nigel Stephens



Richard Sandiford wrote:

Nigel Stephens [EMAIL PROTECTED] writes:
  
While I agree with you philosophically, it feels like (b) might be quite 
a major task. A number of optimisation passes which currently recognise 
and MUL and PLUS separately (e.g. loop strength reduction) would now 
need to be extended to handle the fused MULPLUS and MULSUB operators.


And although the reduction in instruction count due to your previous 
change is good, what is it as a percentage of the total? After all it 
only helps code which uses 64-bit integer types with a 32-bit ABI, which 
is probably quite a small proportion of most real-life applications -- 
whereas for some algorithms the ability to use MADD is absolutely 
critical to performance, and for them losing the ability to generate 
MADD is a significant backward step for the compiler.


How about, as a workaround until (b) sees the light of day, we 
reimplement adddi3 and subdi3 only (not the other di mode patterns), 
qualified by ISA_HAS_MADD_MSUB. Perhaps they could also be implemented 
more cleanly nowadays, using  define_insn_and_split and/or a # 
template, to avoid generating multi-instruction assembler sequences.



The old patterns had a define_split too.  That wasn't really the problem.

If you don't want to add a tree code yet, it would still be possible to
add the optab and expand support, recognising mult-add sequences in a
similar way to how we recognise widening multiplies now.  I feel at
least that's a step in the right direction.
  


OK, we'll have a think about that.

Thanks

Nigel


Re: [MIPS] MADD issue

2007-04-13 Thread Ian Lance Taylor
Paolo Bonzini [EMAIL PROTECTED] writes:

  (define_insn adddi3_internal_1
[(set (match_operand:DI 0 register_operand =d,d)
  (plus:DI (match_operand:DI 1 register_operand 0,d)
   (match_operand:DI 2 register_operand d,d)))
 (clobber (match_operand:SI 3 register_operand =d,d))]
!TARGET_64BIT  !TARGET_DEBUG_G_MODE  !TARGET_MIPS16
  {
return (REGNO (operands[0]) == REGNO (operands[1])
 REGNO (operands[0]) == REGNO (operands[2]))
  ? srl\t%3,%L0,31\;sll\t%M0,%M0,1\;sll\t%L0,%L1,1\;addu\t%M0,%M0,%3
  : 
  addu\t%L0,%L1,%L2\;sltu\t%3,%L0,%L2\;addu\t%M0,%M1,%M2\;addu\t%M0,%M0,%3;
  }
 
 This should be a post-reload (i.e. predicated on reload_completed)
 split, I think.

Actually, with the relatively recent lower-subreg work, it is
desirable to split this sort of instruction before reload.  That is,
do an unconditional split.

Ian


Re: [MIPS] MADD issue

2007-04-13 Thread Paolo Bonzini



This should be a post-reload (i.e. predicated on reload_completed)
split, I think.


Actually, with the relatively recent lower-subreg work, it is
desirable to split this sort of instruction before reload.  That is,
do an unconditional split.


Right.  Combine cannot cope with the resulting 4-insn sequence and merge 
it back with the multiplication, but split is ran after combine and 
before the second lower-subreg pass.


So, making this an unconditional split should be the best of both worlds.

Paolo


Re: [MIPS] MADD issue

2007-04-13 Thread Richard Sandiford
Paolo Bonzini [EMAIL PROTECTED] writes:
 This should be a post-reload (i.e. predicated on reload_completed)
 split, I think.
 
 Actually, with the relatively recent lower-subreg work, it is
 desirable to split this sort of instruction before reload.  That is,
 do an unconditional split.

 Right.  Combine cannot cope with the resulting 4-insn sequence and merge 
 it back with the multiplication, but split is ran after combine and 
 before the second lower-subreg pass.

 So, making this an unconditional split should be the best of both worlds.

The problem is, combine is also one of the passes that was able to
optimise the split form so effectively.  It's not the best of both
worlds from that point of view.

Richard


[MIPS] MADD issue

2007-04-12 Thread Fu, Chao-Ying
Hi Richard,

  After tracing GCC 4.x to see why MADD is not generated for MIPS32,
I found out the main issue is that the pattern adddi3
is not available for MIPS32.  Because the missing
of adddi3, GCC 4.x needs to split 64-bit addition to 4 separate
RTL insns.  This leads to that the combining phase fails
to combine RTL insns to a single madd pattern.

  Could we enable adddi3 for MIPS32 in GCC 4.x?  Or is there a 
better way to generate MADD?  Thanks a lot!

Ex: (b67.c)
long long test (long long a, int b, int c)
{
  return a + (long long) b * (long long) c;
}

# gcc -S b67.c -O3 -mips32
(b67.s)
test:
.frame  $sp,0,$31
.mask   0x,0
.fmask  0x,0
.setnoreorder
.setnomacro

mtlo$5
mthi$4
madd$6,$7
mflo$3
j   $31
mfhi$2

Regards,
Chao-ying
  
-
Ex: (mips.md in GCC 3.4)
(define_expand adddi3
  [(parallel [(set (match_operand:DI 0 register_operand )
   (plus:DI (match_operand:DI 1 register_operand )
(match_operand:DI 2 arith_operand )))
  (clobber (match_dup 3))])]
  TARGET_64BIT || (!TARGET_DEBUG_G_MODE  !TARGET_MIPS16)
{


(define_insn adddi3_internal_1
  [(set (match_operand:DI 0 register_operand =d,d)
(plus:DI (match_operand:DI 1 register_operand 0,d)
 (match_operand:DI 2 register_operand d,d)))
   (clobber (match_operand:SI 3 register_operand =d,d))]
  !TARGET_64BIT  !TARGET_DEBUG_G_MODE  !TARGET_MIPS16
{
  return (REGNO (operands[0]) == REGNO (operands[1])
   REGNO (operands[0]) == REGNO (operands[2]))
? srl\t%3,%L0,31\;sll\t%M0,%M0,1\;sll\t%L0,%L1,1\;addu\t%M0,%M0,%3
: 
addu\t%L0,%L1,%L2\;sltu\t%3,%L0,%L2\;addu\t%M0,%M1,%M2\;addu\t%M0,%M0,%3;
}
  [(set_attr type darith)
   (set_attr mode DI)
   (set_attr length   16)])

(define_insn *smul_acc_di
  [(set (match_operand:DI 0 register_operand =x)
(plus:DI
 (mult:DI (sign_extend:DI (match_operand:SI 1 register_operand d))
  (sign_extend:DI (match_operand:SI 2 register_operand d)))
 (match_operand:DI 3 register_operand 0)))]
  (TARGET_MAD || ISA_HAS_MACC)
!TARGET_64BIT
{
  if (TARGET_MAD)
return mad\t%1,%2;
  else if (TARGET_MIPS5500)
return madd\t%1,%2;
  else
return macc\t%.,%1,%2;
}
  [(set_attr type   imadd)
   (set_attr mode   SI)])
-