When compiling ARM/thumb with -Os for size, I've seen some cases where GCC 
generates unnecessary move instructions.
It seems sometimes that there are some possibility to improve the use from 
2-operand into 3-operand instructions.

Some patterns I see is:

----------------
Generated code Case 1:

mov Ry, Rx
...
add Ry,Ry,Rz
....
mov Rx,Ry

  ------> can be transformed to

add Rx, Rz
mov Ry, Rx

--------------------
Generated code Case 2:

mov Ry, Rx
....
add Ry, Ry, Rx

  -----> can be transformed to

add Ry,Ry,Ry

-------------
Generated code Case 3:

mov Ry,Rx
....
add Rz,Ry,Rx
...
mov Rx,Ry

  -----> can be transformed to

add Rz,Rx,Rx
mov Ry,Rx

----------------------

I'm sure there are alot of more similar patterns, I guess 'add' could be 'sub' 
or other instructions.
It seems like the optimizers sometimes prefer the additional move, maybe for 
performance its equal due to other instruction stall etc,
but when optimizing for size, its quite straight forward that you can gain 
bytes on these transformations, if possible, and should be preferred.

The thing I was thinking of if it was possible to add a more generic GCC pass 
that could check for such "transformations",
like an 'merge_multi_operator_insn' pass, that could do these transformations 
for any target, not only 2-op to 3-op transforms.
Or maybe this is a peephole2 type of pass. The pass could be run maybe just if 
optimizing for size, where the cost is obvious (bytes generated).

The pass could maybe be executed after reload when all hard registers are set, 
but before scheduling passes, like sched2.
Proposed inbetween "pass_cprop_hardreg" and "pass_fast_rtl_dce".

I'm new to these topics, so maybe I'm all wrong, but please comment my ideas if 
you have the time =)

Thanks and Kind Regards, Fredrik

Reply via email to