When compiling ARM/thumb with -Os for size, I've seen some cases where GCC generates unnecessary move instructions. It seems sometimes that there are some possibility to improve the use from 2-operand into 3-operand instructions.
Some patterns I see is: ---------------- Generated code Case 1: mov Ry, Rx ... add Ry,Ry,Rz .... mov Rx,Ry ------> can be transformed to add Rx, Rz mov Ry, Rx -------------------- Generated code Case 2: mov Ry, Rx .... add Ry, Ry, Rx -----> can be transformed to add Ry,Ry,Ry ------------- Generated code Case 3: mov Ry,Rx .... add Rz,Ry,Rx ... mov Rx,Ry -----> can be transformed to add Rz,Rx,Rx mov Ry,Rx ---------------------- I'm sure there are alot of more similar patterns, I guess 'add' could be 'sub' or other instructions. It seems like the optimizers sometimes prefer the additional move, maybe for performance its equal due to other instruction stall etc, but when optimizing for size, its quite straight forward that you can gain bytes on these transformations, if possible, and should be preferred. The thing I was thinking of if it was possible to add a more generic GCC pass that could check for such "transformations", like an 'merge_multi_operator_insn' pass, that could do these transformations for any target, not only 2-op to 3-op transforms. Or maybe this is a peephole2 type of pass. The pass could be run maybe just if optimizing for size, where the cost is obvious (bytes generated). The pass could maybe be executed after reload when all hard registers are set, but before scheduling passes, like sched2. Proposed inbetween "pass_cprop_hardreg" and "pass_fast_rtl_dce". I'm new to these topics, so maybe I'm all wrong, but please comment my ideas if you have the time =) Thanks and Kind Regards, Fredrik