Re: [Patch,AVR]: Fix PR 50358

2011-09-16 Thread Georg-Johann Lay
Denis Chertykov schrieb:
 2011/9/12 Georg-Johann Lay a...@gjlay.de:
 This patch introduces patterns for multiply-add and multiply-sub.

 On the enhanced core, these operations can be performed with the product in 
 R0;
 there is no need to MOVW it out of that register.  The code is smaller and
 faster and has lower register pressure.

 Tested without regressions.

 Ok to commit?
 
 Ok.
 
 Denis.

This is the second part to fix this PR; it introduced multiply-add/-sub for
QImode and one insn for HI = sign_extend (QI  1).

With this patch PR50358 is fixed up to some corner cases.

The insns with CONST_INT split the load of the constant after reload.
avr_rtx_costs describes these costs, but it would be advantageous to do the
split pre-reload because IRA/reload could reuse constants.

The trouble is that reload_in_progress is false in IRA and therefore the
patterns match in IRA, so here is the same trouble I faced in the patch for
widening multiply where a function like avr_gate_split1() was regarded as to
hackish because it tested the pass-number to help out.  Without such a function
in the insn condition, the insn matches in IRA and a register is re-replaced
with CONST_INT again, leading to crash in split2 because of gen_reg_rtx or
because of !reload_completed in the insn condition.

So this patch comes without reusing constants.

Besides that, the Write as one pattern changes gather two insns and write
them down as one; that's no functional change, it's just about using iterators
to reduce lines of code.  The order of insns changes, but that does not matter
here.  I didn't make an extra patch for that.

Passed without regression.

Ok to install?

Johann

PR target/50358
* config/avr/avr.md (*ashiftqihi2.signx.1): New insn.
(*maddqi4, *maddqi4.const): New insns.
(*msubqi4, *msubqi4.const): New insns.
(umulqihi3, mulqihi3): Write as one pattern.
(umulqi3_highpart, smulqi3_highpart): Ditto.
(*maddqihi4.const, *umaddqihi4.uconst): Ditto.
(*msubqihi4.const, *umsubqihi4.uconst): Ditto.
(*muluqihi3.uconst, *mulsqihi3.sconst): Ditto.
* config/avr/avr.c (avr_rtx_costs): Record costs of above in cases
PLUS:QI and MINUS:QI.  Increase costs of multiply-add/-sub for
HImode by 1 in the case of multiplying with a CONST_INT.
Record cost of *ashiftqihi2.signx.1 in case ASHIFT:QI.
Index: config/avr/avr.md
===
--- config/avr/avr.md	(revision 178806)
+++ config/avr/avr.md	(working copy)
@@ -1027,31 +1027,21 @@ (define_insn *mulqi3_call
   [(set_attr type xcall)
(set_attr cc clobber)])
 
-(define_insn smulqi3_highpart
-  [(set (match_operand:QI 0 register_operand =r)
-	(truncate:QI
- (lshiftrt:HI (mult:HI (sign_extend:HI (match_operand:QI 1 register_operand d))
-   (sign_extend:HI (match_operand:QI 2 register_operand d)))
+;; umulqi3_highpart
+;; smulqi3_highpart
+(define_insn extend_sumulqi3_highpart
+  [(set (match_operand:QI 0 register_operand   =r)
+(truncate:QI
+ (lshiftrt:HI (mult:HI (any_extend:HI (match_operand:QI 1 register_operand mul_r_d))
+   (any_extend:HI (match_operand:QI 2 register_operand mul_r_d)))
   (const_int 8]
   AVR_HAVE_MUL
-  muls %1,%2
+  mulextend_s %1,%2
 	mov %0,r1
 	clr __zero_reg__
   [(set_attr length 3)
(set_attr cc clobber)])
   
-(define_insn umulqi3_highpart
-  [(set (match_operand:QI 0 register_operand =r)
-	(truncate:QI
- (lshiftrt:HI (mult:HI (zero_extend:HI (match_operand:QI 1 register_operand r))
-   (zero_extend:HI (match_operand:QI 2 register_operand r)))
-  (const_int 8]
-  AVR_HAVE_MUL
-  mul %1,%2
-	mov %0,r1
-	clr __zero_reg__
-  [(set_attr length 3)
-   (set_attr cc clobber)])
 
 ;; Used when expanding div or mod inline for some special values
 (define_insn *subqi3.ashiftrt7
@@ -1064,25 +1054,16 @@ (define_insn *subqi3.ashiftrt7
   [(set_attr length 2)
(set_attr cc clobber)])
 
-(define_insn mulqihi3
-  [(set (match_operand:HI 0 register_operand =r)
-	(mult:HI (sign_extend:HI (match_operand:QI 1 register_operand d))
-		 (sign_extend:HI (match_operand:QI 2 register_operand d]
-  AVR_HAVE_MUL
-  muls %1,%2
-	movw %0,r0
-	clr r1
-  [(set_attr length 3)
-   (set_attr cc clobber)])
-
-(define_insn umulqihi3
-  [(set (match_operand:HI 0 register_operand =r)
-	(mult:HI (zero_extend:HI (match_operand:QI 1 register_operand r))
-		 (zero_extend:HI (match_operand:QI 2 register_operand r]
+;; umulqihi3
+;; mulqihi3
+(define_insn extend_umulqihi3
+  [(set (match_operand:HI 0 register_operand =r)
+(mult:HI (any_extend:HI (match_operand:QI 1 register_operand mul_r_d))
+ (any_extend:HI (match_operand:QI 2 register_operand mul_r_d]
   AVR_HAVE_MUL
-  mul %1,%2
+  

Re: [Patch,AVR]: Fix PR 50358

2011-09-16 Thread Georg-Johann Lay
Georg-Johann Lay schrieb:
 Denis Chertykov schrieb:
 2011/9/12 Georg-Johann Lay a...@gjlay.de:
 This patch introduces patterns for multiply-add and multiply-sub.

 On the enhanced core, these operations can be performed with the product in 
 R0;
 there is no need to MOVW it out of that register.  The code is smaller and
 faster and has lower register pressure.

 Tested without regressions.

 Ok to commit?
 Ok.

 Denis.
 
 This is the second part to fix this PR; it introduced multiply-add/-sub for
 QImode and one insn for HI = sign_extend (QI  1).
 
 With this patch PR50358 is fixed up to some corner cases.
 
 The insns with CONST_INT split the load of the constant after reload.
 avr_rtx_costs describes these costs, but it would be advantageous to do the
 split pre-reload because IRA/reload could reuse constants.
 
 The trouble is that reload_in_progress is false in IRA and therefore the
 patterns match in IRA, so here is the same trouble I faced in the patch for
 widening multiply where a function like avr_gate_split1() was regarded as to
 hackish because it tested the pass-number to help out.  Without such a 
 function
 in the insn condition, the insn matches in IRA and a register is re-replaced
 with CONST_INT again, leading to crash in split2 because of gen_reg_rtx or
 because of !reload_completed in the insn condition.
 
 So this patch comes without reusing constants.
 
 Besides that, the Write as one pattern changes gather two insns and write
 them down as one; that's no functional change, it's just about using iterators
 to reduce lines of code.  The order of insns changes, but that does not matter
 here.  I didn't make an extra patch for that.

Split the two patches

Ok to install?

Johann


Patch-1
PR target/50358
* config/avr/avr.md (*ashiftqihi2.signx.1): New insn.
(*maddqi4, *maddqi4.const): New insns.
(*msubqi4, *msubqi4.const): New insns.
* config/avr/avr.c (avr_rtx_costs): Record costs of above in cases
PLUS:QI and MINUS:QI.  Increase costs of multiply-add/-sub for
HImode by 1 in the case of multiplying with a CONST_INT.
Record cost of *ashiftqihi2.signx.1 in case ASHIFT:QI.


Patch-2
* config/avr/avr.md: (umulqihi3, mulqihi3): Write as one pattern.
(umulqi3_highpart, smulqi3_highpart): Ditto.
(*maddqihi4.const, *umaddqihi4.uconst): Ditto.
(*msubqihi4.const, *umsubqihi4.uconst): Ditto.
(*muluqihi3.uconst, *mulsqihi3.sconst): Ditto.
Index: config/avr/avr.md
===
--- config/avr/avr.md	(revision 178806)
+++ config/avr/avr.md	(working copy)
@@ -1138,6 +1138,72 @@ (define_insn *oumulqihi3
(set_attr cc clobber)])
 
 ;**
+; multiply-add/sub QI: $0 = $3 +/- $1*$2
+;**
+
+(define_insn *maddqi4
+  [(set (match_operand:QI 0 register_operand  =r)
+(plus:QI (mult:QI (match_operand:QI 1 register_operand r)
+  (match_operand:QI 2 register_operand r))
+ (match_operand:QI 3 register_operand  0)))]
+  
+  AVR_HAVE_MUL
+  mul %1,%2
+	add %A0,r0
+	clr __zero_reg__
+  [(set_attr length 4)
+   (set_attr cc clobber)])
+
+(define_insn *msubqi4
+  [(set (match_operand:QI 0 register_operand   =r)
+(minus:QI (match_operand:QI 3 register_operand  0)
+  (mult:QI (match_operand:QI 1 register_operand r)
+   (match_operand:QI 2 register_operand r]
+  AVR_HAVE_MUL
+  mul %1,%2
+	sub %A0,r0
+	clr __zero_reg__
+  [(set_attr length 4)
+   (set_attr cc clobber)])
+
+(define_insn_and_split *maddqi4.const
+  [(set (match_operand:QI 0 register_operand   =r)
+(plus:QI (mult:QI (match_operand:QI 1 register_operand  r)
+  (match_operand:QI 2 const_int_operand n))
+ (match_operand:QI 3 register_operand   0)))
+   (clobber (match_scratch:QI 4 =d))]
+  AVR_HAVE_MUL
+  #
+   reload_completed
+  [(set (match_dup 4)
+(match_dup 2))
+   ; *maddqi4
+   (set (match_dup 0)
+(plus:QI (mult:QI (match_dup 1)
+  (match_dup 4))
+ (match_dup 3)))]
+  )
+
+(define_insn_and_split *msubqi4.const
+  [(set (match_operand:QI 0 register_operand=r)
+(minus:QI (match_operand:QI 3 register_operand   0)
+  (mult:QI (match_operand:QI 1 register_operand  r)
+   (match_operand:QI 2 const_int_operand n
+   (clobber (match_scratch:QI 4  =d))]
+  AVR_HAVE_MUL
+  #
+   reload_completed
+  [(set (match_dup 4)
+(match_dup 2))
+   ; *msubqi4
+   (set (match_dup 0)
+(minus:QI (match_dup 3)
+  (mult:QI 

Re: [Patch,AVR]: Fix PR 50358

2011-09-16 Thread Denis Chertykov
2011/9/16 Georg-Johann Lay a...@gjlay.de:
 Georg-Johann Lay schrieb:
 Denis Chertykov schrieb:
 2011/9/12 Georg-Johann Lay a...@gjlay.de:
 This patch introduces patterns for multiply-add and multiply-sub.

 On the enhanced core, these operations can be performed with the product 
 in R0;
 there is no need to MOVW it out of that register.  The code is smaller and
 faster and has lower register pressure.

 Tested without regressions.

 Ok to commit?
 Ok.

 Denis.

 This is the second part to fix this PR; it introduced multiply-add/-sub for
 QImode and one insn for HI = sign_extend (QI  1).

 With this patch PR50358 is fixed up to some corner cases.

 The insns with CONST_INT split the load of the constant after reload.
 avr_rtx_costs describes these costs, but it would be advantageous to do the
 split pre-reload because IRA/reload could reuse constants.

 The trouble is that reload_in_progress is false in IRA and therefore the
 patterns match in IRA, so here is the same trouble I faced in the patch for
 widening multiply where a function like avr_gate_split1() was regarded as to
 hackish because it tested the pass-number to help out.  Without such a 
 function
 in the insn condition, the insn matches in IRA and a register is re-replaced
 with CONST_INT again, leading to crash in split2 because of gen_reg_rtx or
 because of !reload_completed in the insn condition.

 So this patch comes without reusing constants.

 Besides that, the Write as one pattern changes gather two insns and write
 them down as one; that's no functional change, it's just about using 
 iterators
 to reduce lines of code.  The order of insns changes, but that does not 
 matter
 here.  I didn't make an extra patch for that.

 Split the two patches

 Ok to install?


Please, commit.

Denis.


Re: [Patch,AVR]: Fix PR 50358

2011-09-13 Thread Denis Chertykov
2011/9/12 Georg-Johann Lay a...@gjlay.de:
 This patch introduces patterns for multiply-add and multiply-sub.

 On the enhanced core, these operations can be performed with the product in 
 R0;
 there is no need to MOVW it out of that register.  The code is smaller and
 faster and has lower register pressure.

 Tested without regressions.

 Ok to commit?

Ok.

Denis.