On Wed, Dec 12, 2012 at 3:45 PM, Richard Biener <richard.guent...@gmail.com> wrote:
>> I assume that this is not right way for fixing such simple performance >> anomaly since we need to do redundant work - combine load to >> conditional and then split it back in peephole2? Does it look >> reasonable? Why we should produce non-efficient instrucction that must >> be splitted later? > > Well, either don't allow this instruction variant from the start, or allow > the extra freedom for register allocation this creates. It doesn't make > sense to just reject it being generated by combine - that doesn't address > when it materializes in another way. Please check the attached patch, it implements this limitation in a correct way: - keeps memory operands for -Os or cold parts of the executable - doesn't increase register pressure - handles all situations where memory operand can propagate into RTX Yuri, can you please check if this patch fixes the performance problem for you? BTW: I would really like to add some TARGET_USE_CMOVE_WITH_MEMOP target macro and conditionalize new peephole2 patterns on it. Uros.
Index: i386.md =================================================================== --- i386.md (revision 194451) +++ i386.md (working copy) @@ -16122,6 +16122,31 @@ operands[3] = gen_lowpart (SImode, operands[3]); }) +;; Don't do conditional moves with memory inputs +(define_peephole2 + [(match_scratch:SWI248 2 "r") + (set (match_operand:SWI248 0 "register_operand") + (if_then_else:SWI248 (match_operator 1 "ix86_comparison_operator" + [(reg FLAGS_REG) (const_int 0)]) + (match_dup 0) + (match_operand:SWI248 3 "memory_operand")))] + "TARGET_CMOVE && optimize_insn_for_speed_p ()" + [(set (match_dup 2) (match_dup 3)) + (set (match_dup 0) + (if_then_else:SWI248 (match_dup 1) (match_dup 0) (match_dup 2)))]) + +(define_peephole2 + [(match_scratch:SWI248 2 "r") + (set (match_operand:SWI248 0 "register_operand") + (if_then_else:SWI248 (match_operator 1 "ix86_comparison_operator" + [(reg FLAGS_REG) (const_int 0)]) + (match_operand:SWI248 3 "memory_operand") + (match_dup 0)))] + "TARGET_CMOVE && optimize_insn_for_speed_p ()" + [(set (match_dup 2) (match_dup 3)) + (set (match_dup 0) + (if_then_else:SWI248 (match_dup 1) (match_dup 2) (match_dup 0)))]) + (define_expand "mov<mode>cc" [(set (match_operand:X87MODEF 0 "register_operand") (if_then_else:X87MODEF @@ -16209,6 +16234,35 @@ [(set_attr "type" "fcmov,fcmov,icmov,icmov") (set_attr "mode" "SF,SF,SI,SI")]) +;; Don't do conditional moves with memory inputs +(define_peephole2 + [(match_scratch:MODEF 2 "r") + (set (match_operand:MODEF 0 "register_and_not_any_fp_reg_operand") + (if_then_else:MODEF (match_operator 1 "fcmov_comparison_operator" + [(reg FLAGS_REG) (const_int 0)]) + (match_dup 0) + (match_operand:MODEF 3 "memory_operand")))] + "(<MODE>mode != DFmode || TARGET_64BIT) + && TARGET_80387 && TARGET_CMOVE + && optimize_insn_for_speed_p ()" + [(set (match_dup 2) (match_dup 3)) + (set (match_dup 0) + (if_then_else:MODEF (match_dup 1) (match_dup 0) (match_dup 2)))]) + +(define_peephole2 + [(match_scratch:MODEF 2 "r") + (set (match_operand:MODEF 0 "register_and_not_any_fp_reg_operand") + (if_then_else:MODEF (match_operator 1 "fcmov_comparison_operator" + [(reg FLAGS_REG) (const_int 0)]) + (match_operand:MODEF 3 "memory_operand") + (match_dup 0)))] + "(<MODE>mode != DFmode || TARGET_64BIT) + && TARGET_80387 && TARGET_CMOVE + && optimize_insn_for_speed_p ()" + [(set (match_dup 2) (match_dup 3)) + (set (match_dup 0) + (if_then_else:MODEF (match_dup 1) (match_dup 2) (match_dup 0)))]) + ;; All moves in XOP pcmov instructions are 128 bits and hence we restrict ;; the scalar versions to have only XMM registers as operands.