Paolo Bonzini schrieb:
> On 10/22/2010 01:16 PM, Georg Lay wrote:
>> I already tried to fix this by introducing a different return-pattern,
>> i.e. a
>> PARALLEL of return and bunch of clobbers of unused regs. That fixes
>> this problem
>> but has many other disadvantages compared to a simple return.
> 
> What were this disadvantages?

Suppose the following snip from libm's cosh computation. We are in pass DCE
somewhere after peephole2 and before BBRO:

;; d2 = d4 * d6

(call_insn/u 16 250 17 3 e_cosh.c:926 (parallel [
            (set (reg:DF 2 d2)
                (call (mem:HI (symbol_ref:SI ("__d_mul") [flags 0x41]) [0 S2 
A16])
                    (const_int 0 [0x0])))
            (use (const_int 0 [0x0]))
        ]) 92 {call_value_insn}
    (expr_list:REG_DEP_TRUE (use (reg:DF 6 d6))
        (expr_list:REG_DEP_TRUE (use (reg:DF 4 d4))
            (nil))))

;; d8 = d2

(insn 17 16 209 3 e_cosh.c:926 (set (reg/v:DF 8 d8 [orig:39 w ] [39])
        (reg:DF 2 d2)) 15 {*movdf_insn} (nil))

;; goto 161

(jump_insn 209 17 210 3 e_cosh.c:926 (set (pc)
        (label_ref 161)) 84 {jump} (nil)
 -> 161)

......

;; 161:
;; Pred edge  14 [100.0%]  (fallthru)
;; Pred edge  5 [50.0%]
;; Pred edge  3 [100.0%]
;; Pred edge  6 [100.0%]
;; Pred edge  8 [100.0%]
;; Pred edge  10 [100.0%]
;; Pred edge  13 [100.0%]
(code_label 161 159 162 15 3 "" [6 uses])

(note 162 161 167 15 [bb 15] NOTE_INSN_BASIC_BLOCK)

;; d2 = d8
(insn 167 162 170 15 e_cosh.c:956 (set (reg/i:DF 2 d2)
        (reg/v:DF 8 d8 [orig:39 w ] [39])) 15 {*movdf_insn} (nil))

(insn 170 167 247 15 e_cosh.c:956 (use (reg/i:DF 2 d2)) -1 (nil))

(note 247 170 248 15 NOTE_INSN_EPILOGUE_BEG)

;; return d2
(jump_insn 248 247 249 15 e_cosh.c:956 (return) 119 {return_insn} (nil))
;; End of basic block 15 -> ( 1)

So d8 (call-saved) is set in several places and just copied to the return
register d2. Up to now the function hat just one exit in bb 15.

The BB reordering now copies the small part bb 15 of the return code and removes
jump insns like 209. the BBRO dump then is:

;; d8 = d4 * d6
(call_insn/u 16 250 17 10 e_cosh.c:926 (parallel [
            (set (reg:DF 2 d2)
                (call (mem:HI (symbol_ref:SI ("__d_mul") ...

;; d8 = d2
(insn 17 16 256 10 e_cosh.c:926 (set (reg/v:DF 8 d8 [orig:39 w ] [39])
        (reg:DF 2 d2)) 15 {*movdf_insn} (nil))

;; d2 = d8
(insn 256 17 257 10 e_cosh.c:956 (set (reg/i:DF 2 d2)
        (reg/v:DF 8 d8 [orig:39 w ] [39])) 15 {*movdf_insn} (nil))

(insn 257 256 258 10 e_cosh.c:956 (use (reg/i:DF 2 d2)) -1 (nil))

(note 258 257 259 10 NOTE_INSN_EPILOGUE_BEG)

;; return d2
(jump_insn 259 258 262 10 e_cosh.c:956 (return) 119 {return_insn} (nil))

This code is not nice.

;; d8 = d4 * d6
;; d8 = d2
;; d2 = d8
;; return d2

 The copy to d8 is superfluous and all in all this is just a tail call of
__d_mul which is the ABI-name of muldf3. So the code could be

;; return d4 * d6 // tail-call of __d_mul

because all this runs after peephole2 I had to introduce a second peephole2 in
passes.c like this:

          NEXT_PASS (pass_postreload_cse);
          NEXT_PASS (pass_gcse2);
          NEXT_PASS (pass_split_after_reload);
          NEXT_PASS (pass_branch_target_load_optimize1);
          NEXT_PASS (pass_thread_prologue_and_epilogue);
          NEXT_PASS (pass_rtl_dse2);
          NEXT_PASS (pass_stack_adjustments);
          NEXT_PASS (pass_peephole2);
          NEXT_PASS (pass_if_after_reload);
          NEXT_PASS (pass_regrename);
          NEXT_PASS (pass_cprop_hardreg);
          NEXT_PASS (pass_fast_rtl_dce);
          NEXT_PASS (pass_reorder_blocks);
+#ifdef __TRICORE__
+          NEXT_PASS (pass_cprop_hardreg);
+          NEXT_PASS (pass_fast_rtl_dce);
+          NEXT_PASS (pass_peephole2);
+#endif /* __TRICORE__ */

          NEXT_PASS (pass_branch_target_load_optimize2);
          NEXT_PASS (pass_leaf_regs);
          NEXT_PASS (pass_split_before_sched2);
          NEXT_PASS (pass_sched2);
          NEXT_PASS (pass_stack_regs);

The secon run of DCE then eliminates the superfluous move to/from d8 and the
second peephole 2 maps the CALL/RET seequence to a tail-call.

All this needs life information to be up to date or d8 won't be eliminated and
not all peephole2s find their targets. I must admit that I don't like peephole2
and that I just consider it as a kind of last resort optimization. I very much
prefer passes that run before reload like insn-combine, but in cases like this I
have no clue how to fix the bad code without peep2 and without to patch passes.c

To come back to your question, AFAIR, if the return pattern is too complex (i.e.
PARALLEL with CLOBBERs) pass BBRO won't generate copies of the exit block. So
the code will have many jumps to a more or less trivial epilogue.

My original post, however, addresses the first, original peephole2 pass that
comes with gcc sources.


Georg

Reply via email to