[Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated

2009-04-27 Thread alexvod at google dot com


--- Comment #5 from alexvod at google dot com  2009-04-27 09:06 ---
Vladimir, many thanks for your analysis! I will try to do analysis myself and
make comparison on larger real-word examples next time. Lowering severity for
now.


-- 

alexvod at google dot com changed:

   What|Removed |Added

   Severity|normal  |minor


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836



[Bug rtl-optimization/39871] New: [4.3/4.4/4.5 regression] CSE doesn't work

2009-04-23 Thread alexvod at google dot com
The following code:

struct A
{
  int version;
  const char *name;
  void* group;
};
struct B
{
  const char *name;
  int ok;
};
void func(struct A*, int);

void test(struct B *p)
{
  struct A a;
  a.name = p-name;
  func(a, p-ok);
}
options: --march=armv5te -mthumb -mthumb-interwork -fpic -Os

is compiled to 18 bytes by GCC 4.2.1 and to 20 bytes by GCC 4.3 (and later,
including 4.4).

Bisection shows that it is changed by
http://gcc.gnu.org/viewcvs?view=revrevision=118475:

GCC rev118474:
push{lr}
sub sp, sp, #20
ldr r3, [r0]
ldr r1, [r0, #4]
add r0, sp, #4
str r3, [sp, #8]
bl  func
add sp, sp, #20
@ sp needed for prologue
pop {pc}

GCC rev118475:

test:
push{lr}
sub sp, sp, #20
add r2, sp, #4 // this could be stored directly in r0
ldr r3, [r0]
ldr r1, [r0, #4]
str r3, [r2, #4]
mov r0, r2  // this mov can be eliminated
bl  func
add sp, sp, #20
@ sp needed for prologue
pop {pc}

A lot of CSE pass code was removed in this change, so there is no surprise that
CSE started to work worse after it.


-- 
   Summary: [4.3/4.4/4.5 regression] CSE doesn't work
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: alexvod at google dot com
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: arm-eabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39871



[Bug rtl-optimization/39837] [4.3/4.4/4.5 regression] unoptimal code generated

2009-04-23 Thread alexvod at google dot com


--- Comment #4 from alexvod at google dot com  2009-04-23 16:39 ---
A more simple example of this issue:

void func(int*);

void test()
{
  int a = 0;
  while (1) {
func(a);
if (a  12) break;
  }
}

GCC rev123918:
push{lr}
sub sp, sp, #12
mov r3, #0
str r3, [sp, #4]
.L2:
add r0, sp, #4
bl  func
ldr r3, [sp, #4]
cmp r3, #12
ble .L2
add sp, sp, #12
@ sp needed for prologue
pop {pc}

GCC rev123919:
test:
push{r4, lr}
sub sp, sp, #8
mov r3, #0
add r4, sp, #4
str r3, [sp, #4]
.L2:
mov r0, r4
bl  func
ldr r3, [sp, #4]
cmp r3, #12
ble .L2
add sp, sp, #8
@ sp needed for prologue
pop {r4, pc}


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39837



[Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated

2009-04-23 Thread alexvod at google dot com


--- Comment #3 from alexvod at google dot com  2009-04-23 16:49 ---
Another example of sub-optimal register allocation on ARM/thumb with IRA (not
sure if this the same bug or a different one).

int func(char*);
void func2(const char*, int);

void test(char **pSignature)
{
  int clazz = 0;
  char *signature = *pSignature;
  if (*signature == '[') {
char savedChar;
savedChar = *++signature;
clazz = func(*pSignature);
*signature = savedChar;
  }
  if (clazz == 0) {
func2(abc, 0);
  }
  *pSignature = signature;
}

It was changed by http://gcc.gnu.org/viewcvs?view=revrevision=139590:

GCC rev139589:
test:
push{lr}
sub sp, sp, #12
mov r3, #0
str r3, [sp, #4]
.L2:
add r0, sp, #4
bl  func
ldr r3, [sp, #4]
cmp r3, #12
ble .L2
add sp, sp, #12
@ sp needed for prologue
pop {pc}

GCC rev139590:
test:
push{r4, lr}
sub sp, sp, #8
mov r3, #0
add r4, sp, #4   // why put sp+4 in r4 if we can use sp+4 directly?
str r3, [sp, #4]
.L2:
mov r0, r4
bl  func
ldr r3, [sp, #4]
cmp r3, #12
ble .L2
add sp, sp, #8
@ sp needed for prologue
pop {r4, pc}


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836



[Bug tree-optimization/39874] New: [4.4 regression] missing DCE

2009-04-23 Thread alexvod at google dot com
The following code:
void func();
void test(char *signature)
{
  char ch = signature[0];
  if (ch == 15 || ch == 3)
  {
if (ch == 15) func();
  }
}
is compiled in suboptimal way by gcc 4.4. Check for ch==3 can be completely
eliminated since func is only called if ch==15. gcc 4.3 is able to properly
infer this and eliminate the unneeded check, but gcc 4.4 fails to do this.

Althouth, found originally on ARM, this bug also reproduces on x86 as well.

gcc 4.3.1 (with -m32 -O2):
test:
  pushl   %ebp
  movl%esp, %ebp
  movl8(%ebp), %eax
  movzbl  (%eax), %eax
  cmpb$15, %al   // %al compared only with 15
  jne .L8
  popl%ebp
  jmp func
.L8:
  popl%ebp
  ret

gcc 4.4.0:
test:
  pushl   %ebp
  movl%esp, %ebp
  subl$8, %esp
  movl8(%ebp), %eax
  movzbl  (%eax), %eax
  cmpb$15, %al
  sete%dl
  cmpb$3, %al   // compiler was not able to optimize ch==3
  je  .L4
  testb   %dl, %dl
  jne .L8
.L4:
  leave
  ret
  .p2align 4,,7
  .p2align 3
.L8:
  leave
  .p2align 4,,8
  .p2align 3
  jmp func

Bisection shows that it was introduced by
http://gcc.gnu.org/viewcvs?view=revrevision=140288


-- 
   Summary: [4.4 regression] missing DCE
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: alexvod at google dot com
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86-unknown-linux-gnu, arm-eabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39874



[Bug regression/39836] New: [4.4 regression] unoptimal code generated

2009-04-21 Thread alexvod at google dot com
Very simple code:

int* func();
int func2(long long);
void test (int unused, int idx, char tag, long long value)
{
  int *p = func() + idx;
  switch (tag) {
case 1:
  *p = (int) value;
case 2:
  *p = func2(value);
}
}

is compiled to 46 bytes by GCC 4.3.1 and to 48 bytes by GCC 4.4.0. Bisection
shows that it was changed by
http://gcc.gnu.org/viewcvs?view=revrevision=139949:

Code generated by 139948:
test:
push{r3, r4, r5, r6, r7, lr}
mov r4, r1
mov r5, r2
ldr r6, [sp, #24]
ldr r7, [sp, #28]
bl  func
lsl r4, r4, #2
add r4, r0, r4
cmp r5, #1
beq .L3
cmp r5, #2
bne .L5
b   .L4
.L3:
str r6, [r4]
.L4:
mov r0, r6
mov r1, r7
bl  func2
str r0, [r4]
.L5:
@ sp needed for prologue
pop {r3, r4, r5, r6, r7, pc}

Code generated by 139949:
test:
push{r4, r5, r6, r7, lr}
sub sp, sp, #12
mov r5, r1
ldr r1, [sp, #36]
mov r6, r2
ldr r7, [sp, #32]
str r1, [sp, #4]
bl  func
lsl r4, r5, #2
add r4, r0, r4
ldr r1, [sp, #4]
cmp r6, #1
beq .L3
cmp r6, #2
bne .L5
b   .L4
.L3:
str r7, [r4]
.L4:
mov r0, r7
bl  func2
str r0, [r4]
.L5:
add sp, sp, #12
@ sp needed for prologue
pop {r4, r5, r6, r7, pc}

Temporary variable was spilled on the stack [sp+4].

BTW, this function is compiled by GCC 4.2.1 to 42 (which is event better!).


-- 
   Summary: [4.4 regression] unoptimal code generated
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: regression
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: alexvod at google dot com
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: arm-eabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836



[Bug rtl-optimization/39836] [4.4/4.5 regression] unoptimal code generated

2009-04-21 Thread alexvod at google dot com


--- Comment #1 from alexvod at google dot com  2009-04-21 16:08 ---
Compilation options: -march=armv5te -fpic -mthumb-interwork -Os -mthumb


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39836



[Bug rtl-optimization/39837] New: [4.3/4.4/4.5 regression] unoptimal code generated

2009-04-21 Thread alexvod at google dot com
The following code

struct Glob
{
  int f1, f2;
  int x;
};
extern struct Glob g;
int func(int, int*, int);
void test()
{
  int a = 0;
  int* b = g.x;
  do
{
  a = *b;
}
  while (func(a, b, a) != 0);
}
// compilation options: -march=armv5te -fpic -mthumb-interwork -mthumb -Os

is compiled to 36 bytes by GCC 4.2.1 and to 40 bytes bytes by GCC 4.3.1 and
4.4.0.

GCC 4.2.1:
test:
push{r4, lr}
ldr r3, .L7
ldr r2, .L7+4
.LPIC0:
add r3, pc
ldr r4, [r3, r2]
.L2:
ldr r2, [r4, #8]
mov r1, r4
add r1, r1, #8
mov r0, r2
bl  func
cmp r0, #0
bne .L2
@ sp needed for prologue
pop {r4, pc}
.L8:
.align  2
.L7:
.word   _GLOBAL_OFFSET_TABLE_-(.LPIC0+4)
.word   g(GOT)

GCC 4.4.0:
test:
push{r4, r5, r6, lr}
ldr r3, .L6
ldr r2, .L6+4
.LPIC0:
add r3, pc
ldr r4, [r3, r2]
mov r5, r4
add r5, r5, #8
.L2:
ldr r2, [r4, #8]
mov r1, r5
mov r0, r2
bl  func
cmp r0, #0
bne .L2
@ sp needed for prologue
pop {r4, r5, r6, pc}
.L7:
.align  2
.L6:
.word   _GLOBAL_OFFSET_TABLE_-(.LPIC0+4)
.word   g(GOT)

Bisection shows that this was changed by CL
http://gcc.gnu.org/viewcvs?view=revrevision=123919

When running with -da -ftree-dump-all options, I see that first changed dump is
regr.c.139r.loop2_invariant.


-- 
   Summary: [4.3/4.4/4.5 regression] unoptimal code generated
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: alexvod at google dot com
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: arm-eabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39837



[Bug rtl-optimization/39837] [4.3/4.4/4.5 regression] unoptimal code generated

2009-04-21 Thread alexvod at google dot com


--- Comment #1 from alexvod at google dot com  2009-04-21 16:45 ---
Created an attachment (id=17664)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17664action=view)
gcc-rev123918.regr.c.139r.loop2_invariant

A dump of loop2_invariant phase with gcc rev123918


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39837



[Bug rtl-optimization/39837] [4.3/4.4/4.5 regression] unoptimal code generated

2009-04-21 Thread alexvod at google dot com


--- Comment #2 from alexvod at google dot com  2009-04-21 16:47 ---
Created an attachment (id=17665)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17665action=view)
gcc-rev123919.regr.c.139r.loop2_invariant

A dump of loop2_invariant phase from gcc rev123919


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39837



[Bug regression/39838] New: [4.3/4.4/4.5 regression] unoptimal code for two simple loops

2009-04-21 Thread alexvod at google dot com
  4 popl
  4 pushl
 19 movl

12-19 movl's is not very good.


-- 
   Summary: [4.3/4.4/4.5 regression] unoptimal code for two simple
loops
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: regression
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: alexvod at google dot com
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39838



[Bug regression/39839] New: [4.3/4.4/4.5 regression] loop invariant motion causes stack spill

2009-04-21 Thread alexvod at google dot com
The following code:
struct S
{
  int count;
  char *addr;
};

void func(const char*, const char*, int, const char*);

void test(struct S *p)
{
  int off = p-count;
  while (p-count = 0)
{
  const char *s = xyz;
  if (*p-addr) s = pqr;
  func(abcde, p-addr + off, off, s);
  p-count--;
}
}

is compiled by GCC 4.2.1 to 64 bytes, and by GCC 4.4.0 to 76 bytes. Bisection
shows that size is increased several times:
123918 - 123919: 64 - 72
124041 - 124042: 72 - 76
I already filed a bug for 123919
(http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39837), so let's take a look at
http://gcc.gnu.org/viewcvs?view=revrevision=124042

GCC rev124041 (with -march=armv5te -mthumb -mthumb-interwork -fpic -Os)
test:
push{r4, r5, r6, r7, lr}
ldr r3, .L9
ldr r2, .L9+4
.LPIC0:
add r3, pc
add r7, r3, r2
ldr r2, .L9+8
ldr r5, [r0]
sub sp, sp, #4
mov r4, r0
add r6, r3, r2
b   .L2
.L3:
ldr r0, [r4, #4]
ldrbr3, [r0]
cmp r3, #0
beq .L4
mov r2, r6
b   .L6
.L4:
mov r2, r7
.L6:
add r0, r0, r5
lsl r1, r5, #1
bl  func
ldr r3, [r4]
sub r3, r3, #1
str r3, [r4]
.L2:
ldr r3, [r4]
cmp r3, #0
bge .L3
add sp, sp, #4
@ sp needed for prologue
pop {r4, r5, r6, r7, pc}
.L10:

GCC rev124042:
test:
push{r4, r5, r6, r7, lr}
ldr r3, .L9
ldr r2, .L9+4
.LPIC0:
add r3, pc
add r2, r3, r2
sub sp, sp, #12
ldr r5, [r0]
str r2, [sp, #4]
ldr r2, .L9+8
mov r4, r0
lsl r6, r5, #1
add r7, r3, r2
b   .L2
.L3:
ldr r0, [r4, #4]
ldrbr3, [r0]
cmp r3, #0
beq .L4
mov r2, r7
b   .L6
.L4:
ldr r2, [sp, #4]
.L6:
add r0, r0, r5
mov r1, r6
bl  func
ldr r3, [r4]
sub r3, r3, #1
str r3, [r4]
.L2:
ldr r3, [r4]
cmp r3, #0
bge .L3
add sp, sp, #12
@ sp needed for prologue
pop {r4, r5, r6, r7, pc}

The first different dump is 090t.lim, which moves (off  1) out of the loop.
But this extra variable causes extra stack spill, so it actually a loss, not a
win. Any ideas about what to tweak?


-- 
   Summary: [4.3/4.4/4.5 regression] loop invariant motion causes
stack spill
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: regression
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: alexvod at google dot com
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: arm-eabi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39839



[Bug regression/39838] [4.3/4.4/4.5 regression] unoptimal code for two simple loops

2009-04-21 Thread alexvod at google dot com


--- Comment #2 from alexvod at google dot com  2009-04-21 18:37 ---
(In reply to comment #1)
 This is IV-opts messing way up as far as I can tell.  Pointer Plus just helped
 out PRE and code motion passes which confuses the hell out of IV-opts.
 
I tried to use -fno-ivopts flag, but it doesn't have any effect on this.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39838



[Bug regression/39799] New: missing 'may be used uninitialized' warning

2009-04-17 Thread alexvod at google dot com
The following code:

inline int foo(int x)
{
  return x;
}
static void bar(int a, int *ptr)
{
  do
  {
int b;
if (b  40)
{
  ptr[0] = b;
}
b += 1;
ptr++;
  }
  while (--a != 0);
}
void foobar(int a, int *ptr)
{
  bar(foo(a), ptr);
}

generates correct warning when compiled by gcc 4.2.4:
$ gcc -O3 -Wall -Werror -c 1.c
cc1: warnings being treated as errors
1.c: In function ‘foobar’:
1.c:9: warning: ‘b’ may be used uninitialized in this function
1.c:9: note: ‘b’ was declared here

But it compiles without any warning with gcc 4.4.0. The bug reproduces on gcc
4.3.1 as well.


-- 
   Summary: missing 'may be used uninitialized' warning
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: regression
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: alexvod at google dot com
 GCC build triplet: x86_64-unknown-linux-gnu
  GCC host triplet: x86_64-unknown-linux-gnu
GCC target triplet: x86_64-unknown-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39799