Hi, suppose the following C code:
static __inline__ __attribute__((__always_inline__)) _Fract rbits (const int i) { _Fract f; __builtin_memcpy (&f, &i, sizeof (_Fract)); return f; } _Fract func (void) { #if B == 1 return rbits (0x1234); #elif B == 2 return 0.14222r; #endif } Type-punning idioms like in rbits above are very common in libgcc, for example in fixed-bit.c. In this example, both compilation variants are equivalent (provided int and _Fract are 16 bits wide). The problem with the B=1 variant is that it is inefficient: Variant B=1 needs 2 instructions. Variant B=2 needs 11 instructions, 9 of them are not needed at all. The problem goes as follows: The memcpy is represented as a VIEW_CONVERT_EXPR<_Fract> and expanded to memory moves through the frame: (insn 5 4 6 (set (reg:HI 45) (const_int 4660 [0x1234])) bloat.c:5 -1 (nil)) (insn 6 5 7 (set (mem/c:HI (reg/f:HI 37 virtual-stack-vars) [2 S2 A8]) (reg:HI 45)) bloat.c:5 -1 (nil)) (insn 7 6 8 (set (reg:HQ 46) (mem/c:HQ (reg/f:HI 37 virtual-stack-vars) [2 S2 A8])) bloat.c:12 -1 (nil)) (insn 8 7 9 (set (reg:HQ 43 [ <retval> ]) (reg:HQ 46)) bloat.c:12 -1 (nil)) Is there a specific reason why this is not expanded as subreg like this? (set (reg:HQ 46) (subreg:HQ [(reg:HI 45)] 0)) The insns are analyzed in .dfinit: ;; regs ever live 24[r24] 25[r25] 29[r29] 24/25 is the return register, 28/29 is the frame pointer: (insn 6 5 7 2 (set (mem/c:HI (plus:HI (reg/f:HI 28 r28) (const_int 1 [0x1])) [2 S2 A8]) (reg:HI 45)) bloat.c:5 82 {*movhi} Is there a reason why R28 is not marked as live? The memory accesses are optimized out in .fwprop2 so that no frame pointer is needed any more. However, in the subsequent passes, R29 is still reported as "regs ever live" which is not correct. The "regs ever live" don't change until .ira, where R28 springs to live again: ;; regs ever live 24[r24] 25[r25] 28[r28] 29[r29] And in .reload: ;; regular block artificial uses 28 [r28] 32 [__SP_L__] ;; eh block artificial uses 28 [r28] 32 [__SP_L__] 34 [argL] ;; entry block defs 8 [r8] 9 [r9] 10 [r10] 11 [r11] 12 [r12] 13 [r13] 14 [r14] 15 [r15] 16 [r16] 17 [r17] 18 [r18] 19 [r19] 20 [r20] 21 [r21] 22 [r22] 23 [r23] 24 [r24] 25 [r25] 28 [r28] 32 [__SP_L__] ;; exit block uses 24 [r24] 25 [r25] 28 [r28] 32 [__SP_L__] ;; regs ever live 24[r24] 25[r25] 28[r28] 29[r29] Outcome is that the frame pointer is set up without need, which is very costly on avr. The compiler is trunk from 2013-01-16 configured for avr: --target=avr --enable-languages=c,c++ --disable-nls --with-dwarf2 The example is compiled with avr-gcc bloat.c -S -dp -save-temps -Os -da -fdump-tree-optimized -DB=1 In what stage happens the misoptimization? Is this worth a PR? (I.e. is there a change that anybody cares?) Thanks.