https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61578
--- Comment #26 from Vladimir Makarov <vmakarov at gcc dot gnu.org> --- (In reply to Fredrik Hederstierna from comment #23) > > Here's is another small example I tested yesterday that also gives > unnecessary moves, both for arm7tdmi, arm966e-s and cortex-m0 tested. > > extern void func(int data); > char cdata[4]; > void test(void) { > int *idata = (int*)cdata; > func(*idata); > } > > Compiles with GCC 4.8.5 (cortex-m0): > > 00000000 <test>: > 0: b508 push {r3, lr} > 2: 4b07 ldr r3, [pc, #28] ; (20 <test+0x20>) > 4: 7858 ldrb r0, [r3, #1] > 6: 781a ldrb r2, [r3, #0] > 8: 0200 lsls r0, r0, #8 > a: 4310 orrs r0, r2 > c: 789a ldrb r2, [r3, #2] > e: 78db ldrb r3, [r3, #3] > 10: 0412 lsls r2, r2, #16 > 12: 4310 orrs r0, r2 > 14: 061b lsls r3, r3, #24 > 16: 4318 orrs r0, r3 > 18: f7ff fffe bl 0 <func> > 1c: bd08 pop {r3, pc} > 1e: 46c0 nop ; (mov r8, r8) > 20: 00000000 .word 0x00000000 > > With GCC 6 master with latest LRA patch (+4 bytes): > > 00000000 <test>: > 0: b510 push {r4, lr} > 2: 4c08 ldr r4, [pc, #32] ; (24 <test+0x24>) > 4: 7863 ldrb r3, [r4, #1] > 6: 7821 ldrb r1, [r4, #0] > 8: 78a0 ldrb r0, [r4, #2] > a: 021b lsls r3, r3, #8 > c: 430b orrs r3, r1 > e: 0400 lsls r0, r0, #16 > 10: 001a movs r2, r3 ??? MOVE > 12: 0003 movs r3, r0 ??? MOVE > 14: 78e0 ldrb r0, [r4, #3] > 16: 4313 orrs r3, r2 > 18: 0600 lsls r0, r0, #24 > 1a: 4318 orrs r0, r3 > 1c: f7ff fffe bl 0 <func> > 20: bd10 pop {r4, pc} > 22: 46c0 nop ; (mov r8, r8) > 24: 00000000 .word 0x00000000 > > Kind Regards, Fredrik I found the problem root. We have insn 9: p115=p114|p112 ... insn 12: p118=p117|p115 ... IRA assigns different regs to p112, p115, and p118 ... Popping a0(r121,l0) -- assign reg 0 Popping a1(r118,l0) -- assign reg 3 Popping a5(r115,l0) -- assign reg 2 Popping a8(r112,l0) -- assign reg 1 ... Therefore LRA generates redundant insn 22 for insn 9 and insn 23 for insn 12 as an input and the output operands should be in the same register. There is no conflicts preventing to assign the same hard reg to p112, p115, and p118 but IRA does not do this following heuristics taking other conflict pseudos costs into account. So the solution is to change the heuristics somehow. Even if I manage to do this, the changes should be benchmarked on other architectures thorougly. It means the PR will need a lot of time to be fixed but I am going to work on it.