On Wed, Dec 12, 2012 at 3:45 PM, Richard Biener
<richard.guent...@gmail.com> wrote:

>> I assume that this is not right way for fixing such simple performance
>> anomaly since we need to do redundant work - combine load to
>> conditional and then split it back in peephole2? Does it look
>> reasonable? Why we should produce non-efficient instrucction that must
>> be splitted later?
>
> Well, either don't allow this instruction variant from the start, or allow
> the extra freedom for register allocation this creates.  It doesn't make
> sense to just reject it being generated by combine - that doesn't address
> when it materializes in another way.

Please check the attached patch, it implements this limitation in a correct way:
- keeps memory operands for -Os or cold parts of the executable
- doesn't increase register pressure
- handles all situations where memory operand can propagate into RTX

Yuri, can you please check if this patch fixes the performance problem for you?

BTW: I would really like to add some TARGET_USE_CMOVE_WITH_MEMOP
target macro and conditionalize new peephole2 patterns on it.

Uros.
Index: i386.md
===================================================================
--- i386.md     (revision 194451)
+++ i386.md     (working copy)
@@ -16122,6 +16122,31 @@
   operands[3] = gen_lowpart (SImode, operands[3]);
 })
 
+;; Don't do conditional moves with memory inputs
+(define_peephole2
+  [(match_scratch:SWI248 2 "r")
+   (set (match_operand:SWI248 0 "register_operand")
+       (if_then_else:SWI248 (match_operator 1 "ix86_comparison_operator"
+                              [(reg FLAGS_REG) (const_int 0)])
+         (match_dup 0)
+         (match_operand:SWI248 3 "memory_operand")))]
+  "TARGET_CMOVE && optimize_insn_for_speed_p ()"
+  [(set (match_dup 2) (match_dup 3))
+   (set (match_dup 0)
+       (if_then_else:SWI248 (match_dup 1) (match_dup 0) (match_dup 2)))])
+
+(define_peephole2
+  [(match_scratch:SWI248 2 "r")
+   (set (match_operand:SWI248 0 "register_operand")
+       (if_then_else:SWI248 (match_operator 1 "ix86_comparison_operator"
+                              [(reg FLAGS_REG) (const_int 0)])
+         (match_operand:SWI248 3 "memory_operand")
+         (match_dup 0)))]
+  "TARGET_CMOVE && optimize_insn_for_speed_p ()"
+  [(set (match_dup 2) (match_dup 3))
+   (set (match_dup 0)
+       (if_then_else:SWI248 (match_dup 1) (match_dup 2) (match_dup 0)))])
+
 (define_expand "mov<mode>cc"
   [(set (match_operand:X87MODEF 0 "register_operand")
        (if_then_else:X87MODEF
@@ -16209,6 +16234,35 @@
   [(set_attr "type" "fcmov,fcmov,icmov,icmov")
    (set_attr "mode" "SF,SF,SI,SI")])
 
+;; Don't do conditional moves with memory inputs
+(define_peephole2
+  [(match_scratch:MODEF 2 "r")
+   (set (match_operand:MODEF 0 "register_and_not_any_fp_reg_operand")
+       (if_then_else:MODEF (match_operator 1 "fcmov_comparison_operator"
+                             [(reg FLAGS_REG) (const_int 0)])
+         (match_dup 0)
+         (match_operand:MODEF 3 "memory_operand")))]
+  "(<MODE>mode != DFmode || TARGET_64BIT)
+   && TARGET_80387 && TARGET_CMOVE
+   && optimize_insn_for_speed_p ()"
+  [(set (match_dup 2) (match_dup 3))
+   (set (match_dup 0)
+       (if_then_else:MODEF (match_dup 1) (match_dup 0) (match_dup 2)))])
+
+(define_peephole2
+  [(match_scratch:MODEF 2 "r")
+   (set (match_operand:MODEF 0 "register_and_not_any_fp_reg_operand")
+       (if_then_else:MODEF (match_operator 1 "fcmov_comparison_operator"
+                             [(reg FLAGS_REG) (const_int 0)])
+         (match_operand:MODEF 3 "memory_operand")
+         (match_dup 0)))]
+  "(<MODE>mode != DFmode || TARGET_64BIT)
+   && TARGET_80387 && TARGET_CMOVE
+   && optimize_insn_for_speed_p ()"
+  [(set (match_dup 2) (match_dup 3))
+   (set (match_dup 0)
+       (if_then_else:MODEF (match_dup 1) (match_dup 2) (match_dup 0)))])
+
 ;; All moves in XOP pcmov instructions are 128 bits and hence we restrict
 ;; the scalar versions to have only XMM registers as operands.
 

Reply via email to