Re: [Patch,AVR]: Fix PR 50358
Denis Chertykov schrieb: 2011/9/12 Georg-Johann Lay a...@gjlay.de: This patch introduces patterns for multiply-add and multiply-sub. On the enhanced core, these operations can be performed with the product in R0; there is no need to MOVW it out of that register. The code is smaller and faster and has lower register pressure. Tested without regressions. Ok to commit? Ok. Denis. This is the second part to fix this PR; it introduced multiply-add/-sub for QImode and one insn for HI = sign_extend (QI 1). With this patch PR50358 is fixed up to some corner cases. The insns with CONST_INT split the load of the constant after reload. avr_rtx_costs describes these costs, but it would be advantageous to do the split pre-reload because IRA/reload could reuse constants. The trouble is that reload_in_progress is false in IRA and therefore the patterns match in IRA, so here is the same trouble I faced in the patch for widening multiply where a function like avr_gate_split1() was regarded as to hackish because it tested the pass-number to help out. Without such a function in the insn condition, the insn matches in IRA and a register is re-replaced with CONST_INT again, leading to crash in split2 because of gen_reg_rtx or because of !reload_completed in the insn condition. So this patch comes without reusing constants. Besides that, the Write as one pattern changes gather two insns and write them down as one; that's no functional change, it's just about using iterators to reduce lines of code. The order of insns changes, but that does not matter here. I didn't make an extra patch for that. Passed without regression. Ok to install? Johann PR target/50358 * config/avr/avr.md (*ashiftqihi2.signx.1): New insn. (*maddqi4, *maddqi4.const): New insns. (*msubqi4, *msubqi4.const): New insns. (umulqihi3, mulqihi3): Write as one pattern. (umulqi3_highpart, smulqi3_highpart): Ditto. (*maddqihi4.const, *umaddqihi4.uconst): Ditto. (*msubqihi4.const, *umsubqihi4.uconst): Ditto. (*muluqihi3.uconst, *mulsqihi3.sconst): Ditto. * config/avr/avr.c (avr_rtx_costs): Record costs of above in cases PLUS:QI and MINUS:QI. Increase costs of multiply-add/-sub for HImode by 1 in the case of multiplying with a CONST_INT. Record cost of *ashiftqihi2.signx.1 in case ASHIFT:QI. Index: config/avr/avr.md === --- config/avr/avr.md (revision 178806) +++ config/avr/avr.md (working copy) @@ -1027,31 +1027,21 @@ (define_insn *mulqi3_call [(set_attr type xcall) (set_attr cc clobber)]) -(define_insn smulqi3_highpart - [(set (match_operand:QI 0 register_operand =r) - (truncate:QI - (lshiftrt:HI (mult:HI (sign_extend:HI (match_operand:QI 1 register_operand d)) - (sign_extend:HI (match_operand:QI 2 register_operand d))) +;; umulqi3_highpart +;; smulqi3_highpart +(define_insn extend_sumulqi3_highpart + [(set (match_operand:QI 0 register_operand =r) +(truncate:QI + (lshiftrt:HI (mult:HI (any_extend:HI (match_operand:QI 1 register_operand mul_r_d)) + (any_extend:HI (match_operand:QI 2 register_operand mul_r_d))) (const_int 8] AVR_HAVE_MUL - muls %1,%2 + mulextend_s %1,%2 mov %0,r1 clr __zero_reg__ [(set_attr length 3) (set_attr cc clobber)]) -(define_insn umulqi3_highpart - [(set (match_operand:QI 0 register_operand =r) - (truncate:QI - (lshiftrt:HI (mult:HI (zero_extend:HI (match_operand:QI 1 register_operand r)) - (zero_extend:HI (match_operand:QI 2 register_operand r))) - (const_int 8] - AVR_HAVE_MUL - mul %1,%2 - mov %0,r1 - clr __zero_reg__ - [(set_attr length 3) - (set_attr cc clobber)]) ;; Used when expanding div or mod inline for some special values (define_insn *subqi3.ashiftrt7 @@ -1064,25 +1054,16 @@ (define_insn *subqi3.ashiftrt7 [(set_attr length 2) (set_attr cc clobber)]) -(define_insn mulqihi3 - [(set (match_operand:HI 0 register_operand =r) - (mult:HI (sign_extend:HI (match_operand:QI 1 register_operand d)) - (sign_extend:HI (match_operand:QI 2 register_operand d] - AVR_HAVE_MUL - muls %1,%2 - movw %0,r0 - clr r1 - [(set_attr length 3) - (set_attr cc clobber)]) - -(define_insn umulqihi3 - [(set (match_operand:HI 0 register_operand =r) - (mult:HI (zero_extend:HI (match_operand:QI 1 register_operand r)) - (zero_extend:HI (match_operand:QI 2 register_operand r] +;; umulqihi3 +;; mulqihi3 +(define_insn extend_umulqihi3 + [(set (match_operand:HI 0 register_operand =r) +(mult:HI (any_extend:HI (match_operand:QI 1 register_operand mul_r_d)) + (any_extend:HI (match_operand:QI 2 register_operand mul_r_d] AVR_HAVE_MUL - mul %1,%2 +
Re: [Patch,AVR]: Fix PR 50358
Georg-Johann Lay schrieb: Denis Chertykov schrieb: 2011/9/12 Georg-Johann Lay a...@gjlay.de: This patch introduces patterns for multiply-add and multiply-sub. On the enhanced core, these operations can be performed with the product in R0; there is no need to MOVW it out of that register. The code is smaller and faster and has lower register pressure. Tested without regressions. Ok to commit? Ok. Denis. This is the second part to fix this PR; it introduced multiply-add/-sub for QImode and one insn for HI = sign_extend (QI 1). With this patch PR50358 is fixed up to some corner cases. The insns with CONST_INT split the load of the constant after reload. avr_rtx_costs describes these costs, but it would be advantageous to do the split pre-reload because IRA/reload could reuse constants. The trouble is that reload_in_progress is false in IRA and therefore the patterns match in IRA, so here is the same trouble I faced in the patch for widening multiply where a function like avr_gate_split1() was regarded as to hackish because it tested the pass-number to help out. Without such a function in the insn condition, the insn matches in IRA and a register is re-replaced with CONST_INT again, leading to crash in split2 because of gen_reg_rtx or because of !reload_completed in the insn condition. So this patch comes without reusing constants. Besides that, the Write as one pattern changes gather two insns and write them down as one; that's no functional change, it's just about using iterators to reduce lines of code. The order of insns changes, but that does not matter here. I didn't make an extra patch for that. Split the two patches Ok to install? Johann Patch-1 PR target/50358 * config/avr/avr.md (*ashiftqihi2.signx.1): New insn. (*maddqi4, *maddqi4.const): New insns. (*msubqi4, *msubqi4.const): New insns. * config/avr/avr.c (avr_rtx_costs): Record costs of above in cases PLUS:QI and MINUS:QI. Increase costs of multiply-add/-sub for HImode by 1 in the case of multiplying with a CONST_INT. Record cost of *ashiftqihi2.signx.1 in case ASHIFT:QI. Patch-2 * config/avr/avr.md: (umulqihi3, mulqihi3): Write as one pattern. (umulqi3_highpart, smulqi3_highpart): Ditto. (*maddqihi4.const, *umaddqihi4.uconst): Ditto. (*msubqihi4.const, *umsubqihi4.uconst): Ditto. (*muluqihi3.uconst, *mulsqihi3.sconst): Ditto. Index: config/avr/avr.md === --- config/avr/avr.md (revision 178806) +++ config/avr/avr.md (working copy) @@ -1138,6 +1138,72 @@ (define_insn *oumulqihi3 (set_attr cc clobber)]) ;** +; multiply-add/sub QI: $0 = $3 +/- $1*$2 +;** + +(define_insn *maddqi4 + [(set (match_operand:QI 0 register_operand =r) +(plus:QI (mult:QI (match_operand:QI 1 register_operand r) + (match_operand:QI 2 register_operand r)) + (match_operand:QI 3 register_operand 0)))] + + AVR_HAVE_MUL + mul %1,%2 + add %A0,r0 + clr __zero_reg__ + [(set_attr length 4) + (set_attr cc clobber)]) + +(define_insn *msubqi4 + [(set (match_operand:QI 0 register_operand =r) +(minus:QI (match_operand:QI 3 register_operand 0) + (mult:QI (match_operand:QI 1 register_operand r) + (match_operand:QI 2 register_operand r] + AVR_HAVE_MUL + mul %1,%2 + sub %A0,r0 + clr __zero_reg__ + [(set_attr length 4) + (set_attr cc clobber)]) + +(define_insn_and_split *maddqi4.const + [(set (match_operand:QI 0 register_operand =r) +(plus:QI (mult:QI (match_operand:QI 1 register_operand r) + (match_operand:QI 2 const_int_operand n)) + (match_operand:QI 3 register_operand 0))) + (clobber (match_scratch:QI 4 =d))] + AVR_HAVE_MUL + # + reload_completed + [(set (match_dup 4) +(match_dup 2)) + ; *maddqi4 + (set (match_dup 0) +(plus:QI (mult:QI (match_dup 1) + (match_dup 4)) + (match_dup 3)))] + ) + +(define_insn_and_split *msubqi4.const + [(set (match_operand:QI 0 register_operand=r) +(minus:QI (match_operand:QI 3 register_operand 0) + (mult:QI (match_operand:QI 1 register_operand r) + (match_operand:QI 2 const_int_operand n + (clobber (match_scratch:QI 4 =d))] + AVR_HAVE_MUL + # + reload_completed + [(set (match_dup 4) +(match_dup 2)) + ; *msubqi4 + (set (match_dup 0) +(minus:QI (match_dup 3) + (mult:QI
Re: [Patch,AVR]: Fix PR 50358
2011/9/16 Georg-Johann Lay a...@gjlay.de: Georg-Johann Lay schrieb: Denis Chertykov schrieb: 2011/9/12 Georg-Johann Lay a...@gjlay.de: This patch introduces patterns for multiply-add and multiply-sub. On the enhanced core, these operations can be performed with the product in R0; there is no need to MOVW it out of that register. The code is smaller and faster and has lower register pressure. Tested without regressions. Ok to commit? Ok. Denis. This is the second part to fix this PR; it introduced multiply-add/-sub for QImode and one insn for HI = sign_extend (QI 1). With this patch PR50358 is fixed up to some corner cases. The insns with CONST_INT split the load of the constant after reload. avr_rtx_costs describes these costs, but it would be advantageous to do the split pre-reload because IRA/reload could reuse constants. The trouble is that reload_in_progress is false in IRA and therefore the patterns match in IRA, so here is the same trouble I faced in the patch for widening multiply where a function like avr_gate_split1() was regarded as to hackish because it tested the pass-number to help out. Without such a function in the insn condition, the insn matches in IRA and a register is re-replaced with CONST_INT again, leading to crash in split2 because of gen_reg_rtx or because of !reload_completed in the insn condition. So this patch comes without reusing constants. Besides that, the Write as one pattern changes gather two insns and write them down as one; that's no functional change, it's just about using iterators to reduce lines of code. The order of insns changes, but that does not matter here. I didn't make an extra patch for that. Split the two patches Ok to install? Please, commit. Denis.
Re: [Patch,AVR]: Fix PR 50358
2011/9/12 Georg-Johann Lay a...@gjlay.de: This patch introduces patterns for multiply-add and multiply-sub. On the enhanced core, these operations can be performed with the product in R0; there is no need to MOVW it out of that register. The code is smaller and faster and has lower register pressure. Tested without regressions. Ok to commit? Ok. Denis.