[Bug target/50304] poor code for accessing certain element of arrays

2011-09-08 Thread tom at deltasystem dot hu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50304

--- Comment #4 from Tamas Fenyvesi  2011-09-08 
13:16:53 UTC ---
Please find a sample code and its objdump-ed asm in the attachment.

The command line is:
arm-none-eabi-gcc -D__REDLIB__ -DDEBUG -D__USE_CMSIS -D__CODE_RED -I"../cmsis"
-I"../config" -I"../driver" -O1 -g3 -Wall -c -fmessage-length=0 -fno-builtin
-ffunction-sections -fdata-sections -mcpu=cortex-m0 -mthumb -MMD -MP
-MF"src/bugreport.d" -MT"src/bugreport.d" -o"src/bugreport.o"
"../src/bugreport.c"

Some comments:
-It is the same from -O1 up to -O3. The -O0 is worse.

-Access of an int array differs somewhat but both have adding (or relative)
operation. Access of a struct doesn't differ in anything whether or not uses a
*const.

-All (* const) should have been exist (rather than have been replaced by some
adding of two (or more) other adders). It definetely costs more (in code area)
to have some offset vector and adding code than to have a precomputed ofsetted
address (even if the base address were stored elsewhere), not to mention the
huge wasted runtime. There can appear several cascaded adding operations for
computing a single address. The code is larger and much slower, plus it needs
more register to allocate, which in turn also increase code size and required
runtime.

-Why does the compiler resolve the mode of computing the ofsetted address
instead of simply materializing a *const if the user explicitly instructs it to
hace a *const? The code can have any number of copies of *const address (as it
can't change) if it were required for e.g. a pc-relative loading. There's
simply no point in not direct materializing any *const.

-Precompiling any known addesses and adding only the variables, on the other
hand, is good practice (and not uncommon in other compilers than gcc), even
without any *const. (E.g. for "a[1][2][var]" it should precompile "a[1][2]" and
add only "var".)


[Bug target/50304] poor code for accessing certain element of arrays

2011-09-08 Thread tom at deltasystem dot hu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50304

--- Comment #3 from Tamas Fenyvesi  2011-09-08 
13:16:20 UTC ---
Created attachment 25226
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25226
-O1..-O3 compiled objdumped result


[Bug target/50304] poor code for accessing certain element of arrays

2011-09-08 Thread tom at deltasystem dot hu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50304

--- Comment #2 from Tamas Fenyvesi  2011-09-08 
13:14:44 UTC ---
Created attachment 25225
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25225
some testcases


[Bug tree-optimization/50304] New: poor code for accessing certain element of arrays

2011-09-06 Thread tom at deltasystem dot hu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50304

 Bug #: 50304
   Summary: poor code for accessing certain element of arrays
Classification: Unclassified
   Product: gcc
   Version: 4.5.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: t...@deltasystem.hu


Array elements on known (hardcoded) positions are accessed by inefficient code.
The addresses are computed runtime, waisting both runtime and code size. For
example:

int a[4][4][16];

void foo( void )
{
int c;

c = a[2][3][4];
...
}
compiles this way in -O1, ARM Cortex M0:
   0:4b02  ldrr3, [pc, #8]; (c )
   2:22b4  movsr2, #180; 0xb4
   4:0092  lslsr2, r2, #2
   6:589a  ldrr2, [r3, r2]
---
I tried to convert this operation to be precompiled by forming a constant
address.

int a[4][4][16];
int *const b=&a[2][3][4];

void foo( void )
{
int c;

c = *b;
...
}
However, it is ignored by gcc, and compiles this way in -O1:
   0:4b03  ldrr3, [pc, #12]; (10 )
   2:21b4  movsr1, #180; 0xb4
   4:0089  lslsr1, r1, #2
   6:185a  addsr2, r3, r1
   8:6812  ldrr2, [r2, #0]

The actual code can vary, depending on the situation. For example:
   0:4b02  ldrr3, [pc, #8]; (c )
   2:4a03  ldrr2, [pc, #12]; (10 )
   4:589a  ldrr2, [r3, r2]

or
   0:4b02  ldrr3, [pc, #8]; (c )
   2:4903  ldrr1, [pc, #12]; (10 )
   4:185a  addsr2, r3, r1a[0][0][0]=c;
   6:6812  ldrr2, [r2, #0]

Sometimes (I don't know yet why and exactly when) it can be even much worse,
introducing bytewide accessing of e.g. an int32_t, dissassembling and
reassembling it. (It's definetely not an alignment problem.) This waists about
40 instructions for a read-modify-write access.
--

It should have look like this:
   0:4b02  ldrr3, [pc, #8]; (c )
   2:681b  ldrr3, [r3, #0]
   4:681a  ldrr2, [r3, #0]
where the constant (at 0x0c) at the first row is a precalculated address.

--
With variable address, like this:

int a[4][4][16];
int *b=&a[2][3][4];
...
it work nicely. However, it is really not equivalent solution at flash based
processors.