https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107957
Bug ID: 107957 Summary: Missed optimization in access to upper-half of a variable Product: gcc Version: 12.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: mrjjot at gmail dot com Target Milestone: --- Hello, I think I've found an optimization opportunity for AVR GCC. This might be similar to bug 66511, but also affects variables smaller than 64 bits. Please consider the following C code: uint64_t x; uint32_t y; uint16_t z; uint8_t w; void foo(void) { y = x >> 32; } void bar(void) { z = y >> 16; } void rawr(void) { w = z >> 8; } As you can see, all three functions just assign upper half of one variable to the another. When compiled with avr-gcc and -Wall -Wextra and -O3 flags, the following assembly is produced: foo(): push r16 lds r18,x lds r19,x+1 lds r20,x+2 lds r21,x+3 lds r22,x+4 lds r23,x+5 lds r24,x+6 lds r25,x+7 ldi r16,lo8(32) rcall __lshrdi3 sts y,r18 sts y+1,r19 sts y+2,r20 sts y+3,r21 pop r16 ret bar(): lds r24,y lds r25,y+1 lds r26,y+2 lds r27,y+3 sts z+1,r27 sts z,r26 ret rawr(): lds r24,z+1 sts w,r24 ret I'm not a compiler expert, but I'd say that this is a missed optimization. In every case there are twice as many lds operations as needed. For comparison, GCC for x86_64 does generate code which performs DWORD read in foo(), WORD read in bar() and BYTE read in rawr(). I've found that the following definitions generate identical assembly on x86_64 and more optimal assembly on AVR: void foo2(void) { y = ((uint32_t*)&x)[1]; } void bar2(void) { z = ((uint16_t*)&y)[1]; } void rawr2(void) { w = ((uint8_t*)&z)[1]; } foo2(): lds r24,x+4 lds r25,x+4+1 lds r26,x+4+2 lds r27,x+4+3 sts y,r24 sts y+1,r25 sts y+2,r26 sts y+3,r27 ret bar2(): lds r24,y+2 lds r25,y+2+1 sts z+1,r25 sts z,r24 ret rawr2(): lds r24,z+1 sts w,r24 ret I've checked my local installation of AVR GCC 12.2.0 on Manjaro and different AVR GCC versions on Godbolt. They all seem to produce the same machine code. $ avr-gcc -v Using built-in specs. Reading specs from /usr/lib/gcc/avr/12.2.0/device-specs/specs-avr2 COLLECT_GCC=avr-gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/avr/12.2.0/lto-wrapper Target: avr Configured with: /build/avr-gcc/src/gcc-12.2.0/configure --disable-install-libiberty --disable-libssp --disable-libstdcxx-pch --disable-libunwind-exceptions --disable-linker-build-id --disable-nls --disable-werror --disable-__cxa_atexit --enable-checking=release --enable-clocale=gnu --enable-gnu-unique-object --enable-gold --enable-languages=c,c++ --enable-ld=default --enable-lto --enable-plugin --enable-shared --infodir=/usr/share/info --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --prefix=/usr --target=avr --with-as=/usr/bin/avr-as --with-gnu-as --with-gnu-ld --with-ld=/usr/bin/avr-ld --with-plugin-ld=ld.gold --with-system-zlib --with-isl --enable-gnu-indirect-function Thread model: single Supported LTO compression algorithms: zlib zstd gcc version 12.2.0 (GCC) I'd appreciate if you could look into this. Thank you!