PR 49095 requested the following optimization: - movl -120(%rax), %ecx - leal -1(%rcx), %edx - movl %edx, -120(%rax) - testl %edx, %edx + subl $1, -120(%rax) jne .L92
The PR was fixed by adding a peephole, but it doesn't actually trigger for the code sequence quoted above. This is because the pattern expects to see a parallel including a clobber of CC, which is what you'd get for a normal add or logical operation. For lea, this does not match: the clobber is missing, and also the input and output operands can be different.
This shows up with some IRA cost changes I'm testing for a different PR. The following patch adds a variant peephole. It would be a prerequisite for those IRA changes so as to not regress an existing testcase. The new peephole triggers a few times in my collection of .i files.
Bootstrapped and tested on x86_64-linux. Ok? Bernd
* config/i386/i386.md (operation on memory peephole): Duplicate an existing peephole and adapt it to match lea rather than an operation that clobbers CC. Index: gcc/config/i386/i386.md =================================================================== --- gcc/config/i386/i386.md (revision 233451) +++ gcc/config/i386/i386.md (working copy) @@ -17952,6 +17952,38 @@ (define_peephole2 operands[5], const0_rtx); }) +;; Likewise for instances where we have a lea pattern. +(define_peephole2 + [(set (match_operand:SWI 0 "register_operand") + (match_operand:SWI 1 "memory_operand")) + (set (match_operand:SWI 3 "register_operand") + (plus (match_dup 0) + (match_operand:SWI 2 "<nonmemory_operand>"))) + (set (match_dup 1) (match_dup 3)) + (set (reg FLAGS_REG) (compare (match_dup 3) (const_int 0)))] + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ()) + && peep2_reg_dead_p (4, operands[3]) + && (rtx_equal_p (operands[0], operands[3]) + || peep2_reg_dead_p (2, operands[0])) + && !reg_overlap_mentioned_p (operands[0], operands[1]) + && !reg_overlap_mentioned_p (operands[3], operands[1]) + && !reg_overlap_mentioned_p (operands[0], operands[2]) + && (<MODE>mode != QImode + || immediate_operand (operands[2], QImode) + || any_QIreg_operand (operands[2], QImode)) + && ix86_match_ccmode (peep2_next_insn (3), CCGOCmode)" + [(parallel [(set (match_dup 4) (match_dup 5)) + (set (match_dup 1) (plus:SWI (match_dup 1) + (match_dup 2)))])] +{ + operands[4] = SET_DEST (PATTERN (peep2_next_insn (3))); + operands[5] = gen_rtx_PLUS (<MODE>mode, + copy_rtx (operands[1]), + copy_rtx (operands[2])); + operands[5] = gen_rtx_COMPARE (GET_MODE (operands[4]), + operands[5], const0_rtx); +}) + (define_peephole2 [(parallel [(set (match_operand:SWI 0 "register_operand") (match_operator:SWI 2 "plusminuslogic_operator"