http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56102
--- Comment #1 from bin.cheng <amker.cheng at gmail dot com> 2013-01-25 03:46:59 UTC --- I have investigated this issue. GCC uses function init_lower_subreg to initialize costs of MOVE insn with different mode, then uses this information to decompose multi-word pseudo registers into individual registers. The problem is ARM backend returns wrong rtx cost for SET insn with multi-word mode. Specifically, if you define LOG_COSTS in lower-subreg.c, GCC will dump rtx costs when compiling with: arm-none-eabi-gcc -mthumb -mcpu=cortex-m0 -Os/-O2 .... The dump is: Size costs ========== SI move: from zero cost 4, from reg cost 4 DI move: original cost 4, split cost 4 * 2 TI move: original cost 4, split cost 4 * 4 EI move: original cost 4, split cost 4 * 6 OI move: original cost 4, split cost 4 * 8 CI move: original cost 4, split cost 4 * 12 XI move: original cost 4, split cost 4 * 16 DQ move: original cost 4, split cost 4 * 2 TQ move: original cost 4, split cost 4 * 4 UDQ move: original cost 4, split cost 4 * 2 UTQ move: original cost 4, split cost 4 * 4 DA move: original cost 4, split cost 4 * 2 TA move: original cost 4, split cost 4 * 4 UDA move: original cost 4, split cost 4 * 2 UTA move: original cost 4, split cost 4 * 4 DF move: original cost 4, split cost 4 * 2 XF move: original cost 4, split cost 4 * 3 DD move: original cost 4, split cost 4 * 2 TD move: original cost 4, split cost 4 * 4 CSI move: original cost 4, split cost 4 * 2 CDI move: original cost 4, split cost 4 * 4 CTI move: original cost 4, split cost 4 * 8 CEI move: original cost 4, split cost 4 * 12 COI move: original cost 4, split cost 4 * 16 CCI move: original cost 4, split cost 4 * 24 CXI move: original cost 4, split cost 4 * 32 SC move: original cost 4, split cost 4 * 2 DC move: original cost 4, split cost 4 * 4 XC move: original cost 4, split cost 4 * 6 V8QI move: original cost 4, split cost 4 * 2 V4HI move: original cost 4, split cost 4 * 2 V2SI move: original cost 4, split cost 4 * 2 V16QI move: original cost 4, split cost 4 * 4 V8HI move: original cost 4, split cost 4 * 4 V4SI move: original cost 4, split cost 4 * 4 V2DI move: original cost 4, split cost 4 * 4 V4HF move: original cost 4, split cost 4 * 2 V2SF move: original cost 4, split cost 4 * 2 V8HF move: original cost 4, split cost 4 * 4 V4SF move: original cost 4, split cost 4 * 4 V2DF move: original cost 4, split cost 4 * 4 Speed costs =========== SI move: from zero cost 4, from reg cost 4 DI move: original cost 4, split cost 4 * 2 TI move: original cost 4, split cost 4 * 4 EI move: original cost 4, split cost 4 * 6 OI move: original cost 4, split cost 4 * 8 CI move: original cost 4, split cost 4 * 12 XI move: original cost 4, split cost 4 * 16 DQ move: original cost 4, split cost 4 * 2 TQ move: original cost 4, split cost 4 * 4 UDQ move: original cost 4, split cost 4 * 2 UTQ move: original cost 4, split cost 4 * 4 DA move: original cost 4, split cost 4 * 2 TA move: original cost 4, split cost 4 * 4 UDA move: original cost 4, split cost 4 * 2 UTA move: original cost 4, split cost 4 * 4 DF move: original cost 4, split cost 4 * 2 XF move: original cost 4, split cost 4 * 3 DD move: original cost 4, split cost 4 * 2 TD move: original cost 4, split cost 4 * 4 CSI move: original cost 4, split cost 4 * 2 CDI move: original cost 4, split cost 4 * 4 CTI move: original cost 4, split cost 4 * 8 CEI move: original cost 4, split cost 4 * 12 COI move: original cost 4, split cost 4 * 16 CCI move: original cost 4, split cost 4 * 24 CXI move: original cost 4, split cost 4 * 32 SC move: original cost 4, split cost 4 * 2 DC move: original cost 4, split cost 4 * 4 XC move: original cost 4, split cost 4 * 6 V8QI move: original cost 4, split cost 4 * 2 V4HI move: original cost 4, split cost 4 * 2 V2SI move: original cost 4, split cost 4 * 2 V16QI move: original cost 4, split cost 4 * 4 V8HI move: original cost 4, split cost 4 * 4 V4SI move: original cost 4, split cost 4 * 4 V2DI move: original cost 4, split cost 4 * 4 V4HF move: original cost 4, split cost 4 * 2 V2SF move: original cost 4, split cost 4 * 2 V8HF move: original cost 4, split cost 4 * 4 V4SF move: original cost 4, split cost 4 * 4 V2DF move: original cost 4, split cost 4 * 4 The original MOVE insn with multi-word mode has lower costs then split insns, thus preventing gcc from splitting. Root cause is that thumb1_rtx_costs/thumb1_size_rtx_costs does not handle SET/ASHIFT/ASHIFTRT/LSHIFTRT/ROTATERT patterns with multi-word mode, as rtx_cost does. I am working on this and will send a patch.