[Bug target/50304] poor code for accessing certain element of arrays
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50304 --- Comment #4 from Tamas Fenyvesi 2011-09-08 13:16:53 UTC --- Please find a sample code and its objdump-ed asm in the attachment. The command line is: arm-none-eabi-gcc -D__REDLIB__ -DDEBUG -D__USE_CMSIS -D__CODE_RED -I"../cmsis" -I"../config" -I"../driver" -O1 -g3 -Wall -c -fmessage-length=0 -fno-builtin -ffunction-sections -fdata-sections -mcpu=cortex-m0 -mthumb -MMD -MP -MF"src/bugreport.d" -MT"src/bugreport.d" -o"src/bugreport.o" "../src/bugreport.c" Some comments: -It is the same from -O1 up to -O3. The -O0 is worse. -Access of an int array differs somewhat but both have adding (or relative) operation. Access of a struct doesn't differ in anything whether or not uses a *const. -All (* const) should have been exist (rather than have been replaced by some adding of two (or more) other adders). It definetely costs more (in code area) to have some offset vector and adding code than to have a precomputed ofsetted address (even if the base address were stored elsewhere), not to mention the huge wasted runtime. There can appear several cascaded adding operations for computing a single address. The code is larger and much slower, plus it needs more register to allocate, which in turn also increase code size and required runtime. -Why does the compiler resolve the mode of computing the ofsetted address instead of simply materializing a *const if the user explicitly instructs it to hace a *const? The code can have any number of copies of *const address (as it can't change) if it were required for e.g. a pc-relative loading. There's simply no point in not direct materializing any *const. -Precompiling any known addesses and adding only the variables, on the other hand, is good practice (and not uncommon in other compilers than gcc), even without any *const. (E.g. for "a[1][2][var]" it should precompile "a[1][2]" and add only "var".)
[Bug target/50304] poor code for accessing certain element of arrays
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50304 --- Comment #3 from Tamas Fenyvesi 2011-09-08 13:16:20 UTC --- Created attachment 25226 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25226 -O1..-O3 compiled objdumped result
[Bug target/50304] poor code for accessing certain element of arrays
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50304 --- Comment #2 from Tamas Fenyvesi 2011-09-08 13:14:44 UTC --- Created attachment 25225 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25225 some testcases
[Bug tree-optimization/50304] New: poor code for accessing certain element of arrays
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50304 Bug #: 50304 Summary: poor code for accessing certain element of arrays Classification: Unclassified Product: gcc Version: 4.5.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: t...@deltasystem.hu Array elements on known (hardcoded) positions are accessed by inefficient code. The addresses are computed runtime, waisting both runtime and code size. For example: int a[4][4][16]; void foo( void ) { int c; c = a[2][3][4]; ... } compiles this way in -O1, ARM Cortex M0: 0:4b02 ldrr3, [pc, #8]; (c ) 2:22b4 movsr2, #180; 0xb4 4:0092 lslsr2, r2, #2 6:589a ldrr2, [r3, r2] --- I tried to convert this operation to be precompiled by forming a constant address. int a[4][4][16]; int *const b=&a[2][3][4]; void foo( void ) { int c; c = *b; ... } However, it is ignored by gcc, and compiles this way in -O1: 0:4b03 ldrr3, [pc, #12]; (10 ) 2:21b4 movsr1, #180; 0xb4 4:0089 lslsr1, r1, #2 6:185a addsr2, r3, r1 8:6812 ldrr2, [r2, #0] The actual code can vary, depending on the situation. For example: 0:4b02 ldrr3, [pc, #8]; (c ) 2:4a03 ldrr2, [pc, #12]; (10 ) 4:589a ldrr2, [r3, r2] or 0:4b02 ldrr3, [pc, #8]; (c ) 2:4903 ldrr1, [pc, #12]; (10 ) 4:185a addsr2, r3, r1a[0][0][0]=c; 6:6812 ldrr2, [r2, #0] Sometimes (I don't know yet why and exactly when) it can be even much worse, introducing bytewide accessing of e.g. an int32_t, dissassembling and reassembling it. (It's definetely not an alignment problem.) This waists about 40 instructions for a read-modify-write access. -- It should have look like this: 0:4b02 ldrr3, [pc, #8]; (c ) 2:681b ldrr3, [r3, #0] 4:681a ldrr2, [r3, #0] where the constant (at 0x0c) at the first row is a precalculated address. -- With variable address, like this: int a[4][4][16]; int *b=&a[2][3][4]; ... it work nicely. However, it is really not equivalent solution at flash based processors.